Chapter 9. Methods

Chapter 9. Methods

Some things in Aten are implemented from existing routines and algorithms. Some have been written from scratch, even when existing algorithms were available, either as an attempt to improve those existing algorithms or simply to learn more by working out how best to go about solving a given problem. A selection of algorithms are detailed in the following pages, grouped into those that were re-used from the literature, and those that were written specifically for Aten.

It should be pointed out that, in the eventuality that somebody notices that one of Aten's 'custom' algorithms is actually a reproduction of an existing method, then fair enough - send me the reference and I'll be happy to move it to the 'literature' section.

Custom Algorithms

NETA

NETA stands for the Nested English Typing Algorithm - a fairly meaningless acronym, all said and done, but with the advantage that it is 'Aten' backwards. NETA is an attempt to provide a descriptive atom typing language that is:

  • Easily readable

  • Easily written from a small subset of keywords

  • Recursive and able to describe complex molecules

It's closest relative that I'm aware of in the literature is the ATDL as implemented in Vega-ZZ,[[1] but was (genuinely) conceived without prior knowledge of that system. NETA tries to keep the language simple enough that it can almost be read aloud an make sense, given one or two special syntactic tokens, rather than needlessly use numerical codes and spurious symbols to signify certain quantities or create an ultra-compact language. The former destroys readability and the latter promotes convolution, neither of which help when trying to interpret old rules or write new ones. So, for the most part NETA is keyword-based, with a limited number of fairly 'natural' symbols employed to denote common terms.

[Note]See Also:

Typing begins from a provided set of atoms and bonds (i.e. the chemical graph). The connectivity between atoms must be 'set' prior to typing, either by automatic calculation of bonds based on distance criteria, manually adding them by hand, or reading them from the input model file. The typing algorithm itself makes no additions or changes to the connectivity of the input structure.

NETA requires a knowledge of species/molecule types in the model is required. For single molecule systems there is 1 distinct molecule (species) and 1 occurrence of it. For condensed phases, e.g. liquids, there are 1 or more species each with many copies of the molecule. In the interests of efficiency for the following routines, Aten attempts to generate a valid pattern description of the system if one is not present already. This essentially picks out the individual species and the number of molecules of each, and permits the typing routines to consider only one molecule of each species when determining atom types etc. The assumption here is that, since all molecules in a given species will have the same chemical graph, atom types can be worked out for a single molecule and then duplicated on all of the others.

Following detection of a suitable pattern description, several tasks are then performed:

  1. Cycle Detection

    Firstly, any cyclic structures within a single molecule are detected up to a maximum (adjustable) ring size. This is achieved from a series of simple recursive searches beginning from each atom with more than one bond. A sequence of 'walks' along bonds are made in order to form a path of some specified length (i.e. ring size). If the final atom in this path shares a bond with the starting atom, a cycle has been found. If not, the final atom is removed and replaced with another. If there are no more atoms to try in this final position, the preceeding atom in the path is removed and replaced with another, and so on. Each unique ring (its size and pointers to the sequence of constituent atoms) is stored in the pattern.

  2. Assignment of Atom Environmenti

    From the list of bound neighbours, each atom is assigned a simple hybridicity based on the character of the bonds it is involved in, mainly used for the determination of aromatic cycles in the next step.

  3. Ring Types

    Once atom hybridicities have been assigned, ring types can be determined. Rings are classed as either aliphatic, aromatic, or non-aromatic (i.e. a mix of resonant and aliphatic bonds that is not itself aromatic.

Now, working only with the representative molecule of each pattern, associated (or current) forcefield(s) are searched for types that match the contained atoms. Each NETA description whose character element is the same as a given atom is tested, and a score is obtained. If this score is non-zero and positive then the atomtype is a match and is assigned to the atom if it has no previous type, or if the score is higher than the previous one. See atom type scoring for more information.

[1] Pedretti, A.; Villa, L.; Vistoli, G. "Theoretical Chemistry Accounts", 109, 229-232 (2003).

Augment

Augmentation of bonds, as far as Aten is concerned, means to take a collection of atoms with 'basic' connectivity (i.e. all single bonds, as per the result of rebonding) and assign multiple bonds where necessary. The method is based loosely on previously described algorithms.[1]

The basis of the method involves modifying the bond order of a particular connection to best satisfy the bonding requirements of the two involved atoms, for example making sure all carbon atoms possess an optimal total bond order of 4. However, many atoms (in particular S and P) happily exist with more than one total bond order (e.g. P) - the methodology borrowed from [1] solves this problem by scoring the total bond order for each particular element ranging from zero (meaning 'natural' or 'no penalty') to some positive number. The higher the positive number, the more 'unhappy' the element is with this number of bonds. For example, hydrogen atoms score 0 for a total bond order of 1, a small positive number (2) for no bonds (hydrogen ion) and a very large positive value (here, 32) for any other bond order. In this way we penalise the total bond orders that an atom does not naturally take on, and always tend towards the lowest score (i.e. the natural total bond order) wherever possible. When modifying the bond order of a particular connection, the total bond order scores of both atoms are calculated once for the current connection and again for the potential new bond order of the connection. If the new score is lower, the change of bond order is accepted.

  1. Pattern Detection

    As with many other routines in Aten, a suitable pattern description is first detected for the system in order to isolate individual molecular species and make the algorithm as efficient as possible.

  2. Augmentation of Terminal Bonds

    Bonds that involve a heavy (i.e. non-hydrogen) atom connected to no other atoms (e.g. C=O in a ketone) are treated before all others. The bond order is modified such that the total bond order score for both atoms is as low as possible.

  3. Augmentation of Other Bonds

    Following optimisation of terminal bonds, all other bonds are modified using exactly the same procedure.

  4. Second Stage Augmentation

    The above two steps are enough to correctly determine multiple bonds in a chemically-correct molecule, provided no cyclic moities are present in the system. The second stage is designed to correct improper augmentations within cycles, or shift existing augmentations around cycles such that other (missing) multiple bonds may be created.

    For each existing multiple bond in each cyclic structure in each pattern's molecule, a simple re-augmentation of the constituent bonds is first attempted in order to try and lower the total bond order score for the whole ring (i.e. the sum of the individual bond order scores of every atom present in the cycle). Then, each bond in the ring is considered in sequence. If the bond is a double bond, then we attempt to convert this into a single bond and make the two adjacent bonds in the ring double bonds in an attempt to 'aromaticise' the ring. The total bond order score is checked and, if lower than the previous score, the change is accepted. If not, the change is reversed and the next bond is considered. By performing these secondary adjustments the double-bond pattern of many complex (poly)aromatics can be correctly (and fully automatically) detected.

[1] "Automatic atom type and bond type perception in molecular mechanical calculations", J. Wang, W. Wang, P. A. Kollman, and D. A. Case, ''Journal of Molecular Graphics and Modelling'', 25 (2), 247-260 (2006).

Autoellipsoids

TODO

Autopolyhedra

TODO

Rebond

The most common means of determining connectivity between a collection of atoms is based on simple check of the actual distance between two atoms and the sum of their assigned radii:

{img align="center" src=show_image.php?id=116}

The two ''sigma''s represent the radii of atoms ''i'' and ''j'' which have coordinates ''xi'', ''yi'', ''zi'' and ''xj'', ''yj'', ''zj''. The parameter ''alpha'' is an adjustable tolerance value to enable fine-tuning, and using Aten's set of built-in radii[[1] usually lays between 1.0 and 2.0. For molecules or periodic systems of modest size the method can be used as is, but for large systems of many atoms the use of a double loop over atoms results in a very slow algorithm.

Aten overcomes this slowdown for larger systems by partitioning the system up into a series of overlapping ''cuboids''. For a system of N particles in a periodic box (or an isolated system with an orthorhombic pseudo-box determined by the extreme positions of atoms), the volume is partitioned into a number of subvolumes of some minimum size in each direction. The minimum size of any one of the subvolume's dimensions is chosen relative to the maximum bond length possible given the largest elemental radius and the current bond tolerance ''alpha''. A single loop over atoms is then performed to associate them to these subvolumes. Each atom belongs to at least one cuboid, determined by its absolute position in the system, and commonly belongs to one other cuboid, determined by adding half of the cuboids dimensions on to the atoms position. While a little counterintuitive, potentially adding atoms to a neighbouring cuboid along this diagonal vector allows the final calculation of distances between pairs of atoms to consider only eight 'neighbouring' (more correctly 'overlapping') subvolumes rather than the 26 needed if each atom belongs exclusively to only one cuboid. For atoms that exist in subvolumes along the edges of the whole volume, these are also added to the subvolume(s) on the opposite side(s) to account for minimum image effects in periodic systems.

Once the effort has been made to assign atoms to cuboids, the final loops to calculate distances runs over a much reduced subset of atom pairs owing to the partitioning. A loop over cuboids is performed, first considering all atom pairs within the same cuboid, and then extending this to consider distances between a particular atom of this central cuboid and its eight 'overlapping' neighbours.

There is some redundancy of atom pairs since the same pair may be considered twice when taking into account the overlapping cuboids. However, in the interests of facile book-keeping this is not checked for explicitly during the running of the algorithm.

[1] "Covalent radii revisited", B. Cordero, V. Gómez, A. E. Platero-Prats, M. Revés, J. Echeverría, E. Cremades, F. Barragán and S. Alvarez, Dalton Trans., (2008) (DOI: ihttp://dx.doi.org/10.1039/b801115j)