Chemistry¶
Molecule API¶
-
class
tflon.chem.molecule.
Molecule
(graph, node_properties=[], edge_properties=[])¶ Defines a networkx based representation of a molecule for dynamic ML computations
-
classmethod
from_json
(serialized)¶ Generate a networkx representation of a serialized json format molecule
Parameters: serialized (str) – The serialized json representation
-
heavy_atoms
¶ Get a list of heavy atom indices
-
heavy_bonds
¶ Get a list of heavy bonds
Returns: A list of tuples of atom indices
-
pymol
¶ Convert this Molecule to a pybel.Molecule
-
topological_groups
¶ Compute topological groups by exhaustive path enumeration (warning – still very slow, use pymol_topological_groups instead)
Returns: a dictionary with keys of atom indices and values topological group (int) Return type: atom_topology (dict)
-
classmethod
-
tflon.chem.molecule.
atom_to_parquet
(molecules, data, output_fn=None, numeric_ids=False, topological_index=False)¶ Generate a parquet file containing atom-level data from a pandas.DataFrame.
The input atom-level data has indexes of the form molecule_id.atom_index, with one atom per row
Each column of the data is condensed to a list of lists format, with all atoms for a given molecule appearing on one row, with the molecule ID as row ID.
Atoms are sorted by their index. Values in each list appear in this order. Atoms with missing data appear as NaN. This may occur with data files generated without hydrogens.
Parameters: - molecules (DataFrame) – A dataframe containing tflon.chem.Molecule objects
- data (DataFrame) – Pandas dataframe containing atom-level data
- output_fn (str) – Path to an output parquet file
Keyword Arguments: - numeric_ids (bool) – Expect molecule_id to be the index of the molecule instead of the title (default = False)
- topological_index (bool) – Expect the atom index of the input data frame contain the topological index (T) in the format molID.T.atomID (default = False)
-
tflon.chem.molecule.
bond_to_parquet
(molecules, data, output_fn, numeric_ids=False, lonepair=False, topological_index=False)¶ Generate a parquet file containing bond-level data from a pandas.DataFrame.
The input bond-level data has indexes of the form molecule_id.atom1_index.atom2_index, with one bond per row
Each column of the data is condensed to a list of lists format, with all bonds for a given molecule appearing on one row, with the molecule ID as row ID.
Bonds are sorted by the natural sort on tuples of atom pair indices (i, j), with i < j. Values in each list appear in this order. Bonds with missing data appear as NaN. This may occur with data files generated without hydrogens.
Parameters: - molecules (DataFrame) – A dataframe containing tflon.chem.Molecule objects
- data (DataFrame) – Pandas dataframe containing bond-level data
- output_fn (str) – Path to an output parquet file
Keyword Arguments: - numeric_ids (bool) – Expect molecule_id to be the index of the molecule instead of the title (default = False)
- topological_index (bool) – Expect the atom index of the input data frame contain the topological index (T) in the format molID.T.atomID (default = False)
-
tflon.chem.molecule.
pymol_to_json
(mol, addhs=True, kekulize=True, lonepair=False, coordinates=False)¶ Convert pybel Molecule to json format for storage and import to Molecule class.
If hydrogens are interspersed, pybel preserves the correct atom ordering.
Molecules must be properly kekulized, as only bond types 1, 2, and 3 are accepted.
Parameters: mol (pybel.Molecule) – A molecule object obtained from pybel.read*
Keyword Arguments: - addhs (bool) – Add explicit hydrogens (default=True)
- kekulize (bool) – Kekulize the molecule (default=True)
- lonepair (bool) – Add dummy atoms and bonds for lone pairs (default=False)
- coordinates (bool) – whether to include coordinates in the json representation, this will override addhs (default=False)
-
tflon.chem.molecule.
pymol_topological_groups
(mol)¶ A faster implementation of topological indexes than Molecule.topological_groups
-
tflon.chem.molecule.
rdmol_to_json
(mol, addhs=True, kekulize=True, coordinates=False)¶ Convert rdmol to json format for storage and import to Molecule class.
Please note: If hydrogens are interspersed, rdkit breaks any expected ordering constraints. If this is a problem, use pybel with pymol_to_json.
Molecules must be properly kekulized, as only bond types 1, 2, and 3 are accepted.
Parameters: mol (rdmol) – A molecule object obtained from an rdkit reader
Keyword Arguments: - addhs (bool) – Add explicit hydrogens (default=True)
- kekulize (bool) – Kekulize the molecule (default=True)
-
tflon.chem.molecule.
sdf_to_parquet
(sdf_fn, output_fn=None, numeric_ids=False, removehs=True, addhs=True, lonepair=False, coordinates=False)¶ Generate a parquet file containing tflon.chem.Molecule json blobs from an SDF.
The resulting file contains a single column ‘Structures’, indexed by pybel.Molecule.title
Parameters: sdf_fn (str) – Path to the sdf file
Keyword Arguments: - output_fn (str) – Output parquet file path (default=None)
- numeric_ids (bool) – Typecast pybel.Molecule.title to int before writing index
- removehs (bool) – Remove hydrogens (default=True)
- addhs (bool) – Add missing hydrogens (default=True)
- lonepair (bool) – Add dummy atoms and bonds for lone pairs (default=False)
- coordinates (bool) – Add coordinates to the json molecule representations, overrides removehs (default=False)
-
class
tflon.chem.data.
MoleculeTable
(data)¶ Convert a DataFrame containing serialized json format molecule structures to tflon.chem.Molecule objects.
See documentation of tflon.graph.GraphTable for instantiation
Toolkit API¶
-
class
tflon.chem.toolkit.
AtomProperties
(*args, **kwargs)¶ -
build
(molecule_table, atom_types=['H', 'B', 'C', 'N', 'O', 'F', 'P', 'S', 'Cl', 'As', 'Se', 'Br', 'I'], symbols=True)¶ Instantiate variables required for this module. This is called by
__init__
, with*args
and**kwargs
passed from__init__
arguments list.
-
call
()¶ Instantiate ops executing the module. This is the implementation of
__call__
, invoked by the base class.Returns: A tensor or sequence of tensors
-
-
class
tflon.chem.toolkit.
BondLength
(*args, **kwargs)¶ -
build
(graph_table)¶ Instantiate variables required for this module. This is called by
__init__
, with*args
and**kwargs
passed from__init__
arguments list.
-
call
()¶ Instantiate ops executing the module. This is the implementation of
__call__
, invoked by the base class.Returns: A tensor or sequence of tensors
-
-
class
tflon.chem.toolkit.
GraphConverter
(*args, **kwargs)¶ -
build
(molecule_table)¶ Instantiate variables required for this module. This is called by
__init__
, with*args
and**kwargs
passed from__init__
arguments list.
-
call
(bond_properties=None)¶ Instantiate ops executing the module. This is the implementation of
__call__
, invoked by the base class.Returns: A tensor or sequence of tensors
-