API

amsr.encode.FromMol(mol, useGroups=True, stringent=True, randomize=False, canonical=False, useStereo=True)

Convert RDKit Mol to AMSR

Parameters:
  • mol (Mol) – RDKit Mol

  • useGroups (Optional[bool]) – use group symbols/abbreviations (default: True)

  • stringent (Optional[bool]) – try to exclude unstable or synthetically inaccessible molecules (default: True)

  • randomize (Optional[bool]) – randomize order of graph traversal (default: False)

  • canonical (Optional[bool]) – canonical order of graph traversal (default: False)

  • useStereo (Optional[bool]) – encode stereochemistry (default: True)

Return type:

str

Returns:

list of AMSR tokens

amsr.encode.FromMolToTokens(mol, useGroups=True, stringent=True, randomize=False, canonical=False, useStereo=True)

Convert RDKit Mol to list of AMSR tokens

Parameters:
  • mol (Mol) – RDKit Mol

  • useGroups (Optional[bool]) – use group symbols/abbreviations (default: True)

  • stringent (Optional[bool]) – try to exclude unstable or synthetically inaccessible molecules (default: True)

  • randomize (Optional[bool]) – randomize order of graph traversal (default: False)

  • canonical (Optional[bool]) – canonical order of graph traversal (default: False)

  • useStereo (Optional[bool]) – encode stereochemistry (default: True)

Return type:

list[str]

Returns:

list of AMSR tokens

amsr.encode.FromSmiles(s, useGroups=True, stringent=True, randomize=False, canonical=False, useStereo=True)

Convert SMILES to AMSR

Parameters:
  • s (str) – SMILES

  • useGroups (Optional[bool]) – use group symbols/abbreviations

  • stringent (Optional[bool]) – try to exclude unstable or synthetically inaccessible molecules

  • randomize (Optional[bool]) – randomize order of graph traversal

  • canonical (Optional[bool]) – canonical order of graph traversal (default: False)

  • useStereo (Optional[bool]) – encode stereochemistry (default: True)

Return type:

str

Returns:

AMSR

amsr.encode.FromSmilesToTokens(s, useGroups=True, stringent=True, randomize=False, canonical=False, useStereo=True)

Convert SMILES to list of AMSR tokens

Parameters:
  • mol – RDKit Mol

  • useGroups (Optional[bool]) – use group symbols/abbreviations

  • stringent (Optional[bool]) – try to exclude unstable or synthetically inaccessible molecules

  • randomize (Optional[bool]) – randomize order of graph traversal

  • canonical (Optional[bool]) – canonical order of graph traversal (default: False)

  • useStereo (Optional[bool]) – encode stereochemistry (default: True)

Return type:

list[str]

Returns:

AMSR

amsr.decode.ToMol(s, stringent=True, dihedral=None)

Convert AMSR to an RDKit Mol

Parameters:
  • s (str) – AMSR

  • stringent (Optional[bool]) – try to exclude unstable or synthetically inaccessible molecules

  • dihedral (Optional[dict[tuple[int, int, int, int], int]]) – return dictionary of dihedral angles, where keys are indices and values are angles in degrees

Return type:

Mol

Returns:

RDKit Mol

amsr.decode.ToSmiles(s, stringent=True)

Convert AMSR to SMILES

Parameters:
  • s (str) – AMSR

  • stringent (Optional[bool]) – try to exclude unstable or synthetically inaccessible molecules

Return type:

str

Returns:

SMILES

amsr.check.CheckAMSR(s, stringent=True)

Decode AMSR and check for valid molecule

Parameters:
  • s (str) – AMSR

  • stringent (Optional[bool]) – try to exclude unstable or synthetically inaccessible molecules

Return type:

bool

Returns:

valid molecule?

amsr.check.CheckMol(m1, stringent=True)

Do round trip from RDKit mol m to AMSR. Compare InChI strings (with -FixedH) before and after round trip.

Parameters:
  • m – RDKit Mol

  • stringent (Optional[bool]) – try to exclude unstable or synthetically inaccessible molecules

  • m1 (Mol)

Return type:

bool

Returns:

do InChI strings match?

amsr.check.CheckSmiles(s, stringent=True)

Convert SMILES s to RDKit Mol, then do round trip to AMSR. Compare InChI strings (with -FixedH) before and after round trip.

Parameters:
  • s (str) – SMILES

  • stringent (Optional[bool]) – try to exclude unstable or synthetically inaccessible molecules

Return type:

bool

Returns:

do InChI strings match?

amsr.groups.DecodeGroups(s)
Return type:

str

Parameters:

s (str)

amsr.groups.EncodeGroups(s)
Return type:

list[str]

Parameters:

s (list[str])

amsr.groups.Groups()

Keys are functional group abbreviations, values are lists of one or more AMSR strings consisting only of atom/bond tokens. May be modified, but InitializeGroups() must be called after modification.

Return type:

dict[str, list[str]]

Returns:

Groups dictionary

amsr.groups.InitializeGroups()

Initialize tree and compile regular expression for converting between group abbreviations and tokens. Must be called after modification of Groups() dictionary.

Return type:

None

amsr.tokens.ToTokens(s)

Convert AMSR string to a list of tokens

Parameters:

s (str) – AMSR

Return type:

list[str]

Returns:

list of tokens

class amsr.morph.Morph(s, t)

morph between two molecules, by taking the minimum-edit pathway between their string representations

Parameters:
  • s (list[str]) – list of AMSR tokens

  • t (list[str]) – list of AMSR tokens

classmethod fromSmiles(s, t)

create morph from two SMILES strings

Parameters:
  • s (str) – SMILES

  • t (str) – SMILES

Returns:

Morph object

showAsSmiles()

display each mol in the morph as SMILES

class amsr.markov.Markov(mols)

generate molecules using a simple Markov model

Parameters:

mols (list[Mol]) – rdkit Mols, from which to draw token frequencies

generate(nmax=-1)

generate an AMSR string

Parameters:

nmax (Optional[int]) – maximum length of string

Returns:

AMSR string

generateTokens(nmax=-1)

generate sequence of tokens

Parameters:

nmax (Optional[int]) – maximum number of tokens to generate

Returns:

sequence of tokens

class amsr.modifier.Modifier(model_path, nDeleteAvg=2, nAddMax=10, nReplaceAvg=3)

modify a molecule by shuffling atom order, deleting tokens, then adding tokens

Parameters:
  • mols – rdkit Mols, from which to draw token frequencies

  • nDeleteAvg (Optional[int]) – average number of tokens to delete

  • nAddMax (Optional[int]) – maximum number of tokens to add

  • nReplaceAvg (Optional[int]) – number of token replacements

  • model_path (str)

modify(mol)

modify given molecule

Parameters:

mol (Mol) – molecule to modify

Return type:

Mol

Returns:

modified molecule

modifySmiles(s)

modify given SMILES

Parameters:

s (str) – SMILES to modify

Return type:

str

Returns:

modified SMILES