Chemistry

Molecule API

class tflon.chem.molecule.Molecule(graph, node_properties=[], edge_properties=[])

Defines a networkx based representation of a molecule for dynamic ML computations

classmethod from_json(serialized)

Generate a networkx representation of a serialized json format molecule

Parameters:serialized (str) – The serialized json representation
heavy_atoms

Get a list of heavy atom indices

heavy_bonds

Get a list of heavy bonds

Returns:A list of tuples of atom indices
pymol

Convert this Molecule to a pybel.Molecule

topological_groups

Compute topological groups by exhaustive path enumeration (warning – still very slow, use pymol_topological_groups instead)

Returns:a dictionary with keys of atom indices and values topological group (int)
Return type:atom_topology (dict)
tflon.chem.molecule.atom_to_parquet(molecules, data, output_fn=None, numeric_ids=False, topological_index=False)

Generate a parquet file containing atom-level data from a pandas.DataFrame.

The input atom-level data has indexes of the form molecule_id.atom_index, with one atom per row

Each column of the data is condensed to a list of lists format, with all atoms for a given molecule appearing on one row, with the molecule ID as row ID.

Atoms are sorted by their index. Values in each list appear in this order. Atoms with missing data appear as NaN. This may occur with data files generated without hydrogens.

Parameters:
  • molecules (DataFrame) – A dataframe containing tflon.chem.Molecule objects
  • data (DataFrame) – Pandas dataframe containing atom-level data
  • output_fn (str) – Path to an output parquet file
Keyword Arguments:
 
  • numeric_ids (bool) – Expect molecule_id to be the index of the molecule instead of the title (default = False)
  • topological_index (bool) – Expect the atom index of the input data frame contain the topological index (T) in the format molID.T.atomID (default = False)
tflon.chem.molecule.bond_to_parquet(molecules, data, output_fn, numeric_ids=False, lonepair=False, topological_index=False)

Generate a parquet file containing bond-level data from a pandas.DataFrame.

The input bond-level data has indexes of the form molecule_id.atom1_index.atom2_index, with one bond per row

Each column of the data is condensed to a list of lists format, with all bonds for a given molecule appearing on one row, with the molecule ID as row ID.

Bonds are sorted by the natural sort on tuples of atom pair indices (i, j), with i < j. Values in each list appear in this order. Bonds with missing data appear as NaN. This may occur with data files generated without hydrogens.

Parameters:
  • molecules (DataFrame) – A dataframe containing tflon.chem.Molecule objects
  • data (DataFrame) – Pandas dataframe containing bond-level data
  • output_fn (str) – Path to an output parquet file
Keyword Arguments:
 
  • numeric_ids (bool) – Expect molecule_id to be the index of the molecule instead of the title (default = False)
  • topological_index (bool) – Expect the atom index of the input data frame contain the topological index (T) in the format molID.T.atomID (default = False)
tflon.chem.molecule.pymol_to_json(mol, addhs=True, kekulize=True, lonepair=False, coordinates=False)

Convert pybel Molecule to json format for storage and import to Molecule class.

If hydrogens are interspersed, pybel preserves the correct atom ordering.

Molecules must be properly kekulized, as only bond types 1, 2, and 3 are accepted.

Parameters:

mol (pybel.Molecule) – A molecule object obtained from pybel.read*

Keyword Arguments:
 
  • addhs (bool) – Add explicit hydrogens (default=True)
  • kekulize (bool) – Kekulize the molecule (default=True)
  • lonepair (bool) – Add dummy atoms and bonds for lone pairs (default=False)
  • coordinates (bool) – whether to include coordinates in the json representation, this will override addhs (default=False)
tflon.chem.molecule.pymol_topological_groups(mol)

A faster implementation of topological indexes than Molecule.topological_groups

tflon.chem.molecule.rdmol_to_json(mol, addhs=True, kekulize=True, coordinates=False)

Convert rdmol to json format for storage and import to Molecule class.

Please note: If hydrogens are interspersed, rdkit breaks any expected ordering constraints. If this is a problem, use pybel with pymol_to_json.

Molecules must be properly kekulized, as only bond types 1, 2, and 3 are accepted.

Parameters:

mol (rdmol) – A molecule object obtained from an rdkit reader

Keyword Arguments:
 
  • addhs (bool) – Add explicit hydrogens (default=True)
  • kekulize (bool) – Kekulize the molecule (default=True)
tflon.chem.molecule.sdf_to_parquet(sdf_fn, output_fn=None, numeric_ids=False, removehs=True, addhs=True, lonepair=False, coordinates=False)

Generate a parquet file containing tflon.chem.Molecule json blobs from an SDF.

The resulting file contains a single column ‘Structures’, indexed by pybel.Molecule.title

Parameters:

sdf_fn (str) – Path to the sdf file

Keyword Arguments:
 
  • output_fn (str) – Output parquet file path (default=None)
  • numeric_ids (bool) – Typecast pybel.Molecule.title to int before writing index
  • removehs (bool) – Remove hydrogens (default=True)
  • addhs (bool) – Add missing hydrogens (default=True)
  • lonepair (bool) – Add dummy atoms and bonds for lone pairs (default=False)
  • coordinates (bool) – Add coordinates to the json molecule representations, overrides removehs (default=False)
class tflon.chem.data.MoleculeTable(data)

Convert a DataFrame containing serialized json format molecule structures to tflon.chem.Molecule objects.

See documentation of tflon.graph.GraphTable for instantiation

Toolkit API

class tflon.chem.toolkit.AtomProperties(*args, **kwargs)
build(molecule_table, atom_types=['H', 'B', 'C', 'N', 'O', 'F', 'P', 'S', 'Cl', 'As', 'Se', 'Br', 'I'], symbols=True)

Instantiate variables required for this module. This is called by __init__, with *args and **kwargs passed from __init__ arguments list.

call()

Instantiate ops executing the module. This is the implementation of __call__, invoked by the base class.

Returns:A tensor or sequence of tensors
class tflon.chem.toolkit.BondLength(*args, **kwargs)
build(graph_table)

Instantiate variables required for this module. This is called by __init__, with *args and **kwargs passed from __init__ arguments list.

call()

Instantiate ops executing the module. This is the implementation of __call__, invoked by the base class.

Returns:A tensor or sequence of tensors
class tflon.chem.toolkit.GraphConverter(*args, **kwargs)
build(molecule_table)

Instantiate variables required for this module. This is called by __init__, with *args and **kwargs passed from __init__ arguments list.

call(bond_properties=None)

Instantiate ops executing the module. This is the implementation of __call__, invoked by the base class.

Returns:A tensor or sequence of tensors