- 1.7
- 1.6
- 1.5
- Beta
- Examples (for 1.6a)
- Misc
We are all familiar with talking about atoms being chemically different depending on the functional group in which they exist - e.g. ether, carbonyl, and alcoholic oxygens - and this categorisation of atoms forms basis of forcefield writing. That is, a large number of different molecules and types of molecule should be described by a small set of different atoms, i.e. atom types. At the simplest level, the connectivity of an atom is enough to uniquely identify its specific type.
Some methods to use this information to uniquely assign types to atomic centres involve deriving a unique integer from the local connectivity of the atom (e.g. the SATIS method REF XXX), but including information beyond second neighbours is rather impractical. Others use a typing 'language' to describe individual elements of the topology of atoms in molecules, and are flexible enough to be able to describe complex situations in a more satisfactory way (e.g. that employed in Vega ref XXX). Aten uses the latter style and provides a clear, powerful, and chemically-intuitive way of describing atom types in, most importantly, a readable and easily comprehended style.
Type descriptions are used primarily for assigning forcefield types, but also make for an extremely useful way to select specific atoms as well.
![]() | See also: |
|---|---|
|
Type descriptions in Aten use connectivity to other atoms as a basis, extending easily to rings (and the constituent atoms), lists of allowable elements in certain connections, atom hybridicities, and local atom geometries. Descriptions can be nested to arbitrary depth since the algorithm is recursive, and may be re-used in other atom's type descriptions to simplify their identification. Time to jump straight in with some examples. Note that these examples only serve to illustrate the concepts of describing chemical environment at different levels. They may not provide the most elegant descriptions to the problem at hand, don't take advantage of reusing types, and certainly aren't the only ways of writing the descriptions. They're just plain 'ol examples of the language!
Consider a water molecule. If you were describing it in terms of its structure to someone who understands the concept of atoms and bonds, but has no idea what the water molecule looks like, you might say:
A water molecule contains an oxygen that is connected two hydrogen atoms by single bonds
...or even...
It's an oxygen atom with two hydrogens on it
Given this degree-level knowledge, to describe the individual oxygen and hydrogen atoms in the grand scheme of the water molecule exactly, you might say:
A 'water oxygen' is an oxygen atom that is connected to two hydrogen atoms ''via'' single bonds
...and...
A 'water hydrogen' is a hydrogen that is connected ''via'' a single bond to an oxygen atom that itself is connected by a single bond to another (different) hydrogen atom
The extra information regarding the second hydrogen is necessary because otherwise we could apply the description of the 'water hydrogen' to the hydrogen in any alcohol group as well. Similarly, we might mistake the oxygen in the hydroxonium ion (H3O+) as being a 'water oxygen', when in fact it is quite different. In this case, we could extend the description to:
A 'water oxygen' is an oxygen atom that is connected to two hydrogen atoms ''via'' single bonds, and nothing else
An atom description in Aten is a string of comma-separated commands that describe this kind of information. So, to tell the program how to recognise a water oxygen and a water hydrogen, we could use the following type descriptions (written in the proper forcefield input style for the types block:
1 OW O "nbonds=2,-H,-H" # Water oxygen 2 HW H "-O(nbonds=2,-H,-H)" # Water hydrogen
Aten now recognises that a water oxygen (OW) is 'an oxygen atom that has exactly two bonds AND is bound to a hydrogen AND is bound to another hydrogen'. Similarly, a water hydrogen (HW) is 'a hydrogen bound to an oxygen atom that; has two bonds to it, AND is bound to a hydrogen, AND is bound to another hydrogen'. The dash '-' is short-hand for saying 'is bound to', while the bracketed part after '-O' in the water hydrogen description describes the required local environment of the attached oxygen. Using brackets to describe more fully the attached atoms is a crucial part of atom typing, and may be used to arbitrary depth (so, for example, we could add a bracketed description to the hydrogen atoms as well, if there was anything left to describe). If necessary, descriptions can be written that uniquely describe every single atom in a complex molecule by specifying completely all other connections within the molecule. This should not be needed for normal use, however, and short descriptions of atom environment up to first or second neighbours will usually suffice.
Assuming that the OH group in the carboxylic acid functionalisation will have different forcefield parameters to the primary alcohol at the other end of the molecule, here we must describe the first and second neighbours of the oxygen atoms to differentiate them.
To begin, we can describe the carbon atoms as either two or three different types -- either methylene/carboxylic acid, or carboxylic acid/adjacent to a carboxylic acid/adjacent to alcohol. For both, we only need describe the first neighbours of the atoms. For the first:
3 C(H2) C "nbonds=4,-H,-H,-C" # Methylene Carbon 4 C_cbx C "nbonds=3,-O(bond=double),-O,-C" # Carboxylic Acid C
Note the ordering of the oxygen connections for the carboxylic acid carbon, where the most qualified carbon is listed first. This is to stop the doubly-bound oxygen being used to match '-O', subsequently preventing a successful match. This is a general lesson - bound atoms with the most descriptive terms should appear at the beginning of the type description (as it is read left-to-right) and those with the least left until the end.
Where all three carbons need to be identified separately, we may write:
5 C(OH) C "nbonds=4,-H,-H,-C,-O" # CH2 adjacent to OH 6 C(COOH) C "nbonds=4,-H,-H,-C,-C" # CH2 adjacent to COOH 7 C_cbx C "nbonds=3,-O(bond=double),-O,-C" # Carboxylic Acid C
Let us now assume that the hydrogens within the alcohol and carboxylic acid groups must also be seen as different types. In this case, the second neighbours of the atoms must be considered:
8 HO H "-O(-C(-H,-H))" # Alcoholic H 9 H_cbx H "-O(-C(-O(bond=double)))" # Carboxylic acid H
The assignment is thus based entirely on the nature of the carbon atom to which the OH group is bound since this is the next available source of connectivity information. The determination of the three different oxygen atoms is similar:
10 OH O "-H,-C(-H,-H)" # Alcoholic O 11 O_cbx O "-C(-O(-H))" # Carboxylic acid =O 12 OH_cbx O "-H,-C(-O(bond=double))" # Carboxylic acid O(H)
Of course, we could just have specified 'nbonds=1' for the doubly-bound oxygen of the carboxylic acid group, but that wouldn't be very instructive, would it?
At last, a proper problem - an asymmetric substituted pyridine. Lets assume that we need to distinguish between every non-hydrogen atom - we'll skip describing the hydrogen atoms for now, but note that this is most easily achieved by specifying directly the atomtype that the H is bound to (see later on). Let's start with the pyridine nitrogen. We basically need to say that its in a 6-membered aromatic ring:
13 N_py N "ring(size=6,aromatic)" # Pyridine N
TODO


![[Note]](images/note.png)


