Welcome to tflon’s documentation!

What is tflon?

Tflon is an up-and-coming deep learning toolkit designed to streamline tensorflow model and data pipeline construction while enabling complex, dynamically structured operations on data to foster innovative deep learning research.

Currently, tflon’s most developed use case is executing computations on graph-structure data, such as chemical data, using the wave architecture.

Ultimately, tflon aims to provide three key features.

  1. A flexible data input model, which uses python multiprocessing to enable throughput without necessitating custom tensorflow extensions for handling complex data.
  2. A modular model construction framework, which supports complex, dynamically structured operations on data, and can be mixed with pure tensorflow, or alternative APIs such as keras or sonnet.
  3. Seamless, intuitive integration with other tools including: data storage (pyarrow, parquet), distributed execution (horovod, MPI), profiling (pyflame, tfprof), resource monitoring (psutil, GPUtil), and others.

Currently, we are still a long way off from achieving these goals. If you are interested in contributing, please let us know!

How does tflon work?

tflon is designed to simplify book-keeping in tensorflow, so that you can focus on producing high quality models. It provides five key functionalities:

  1. A data API which combines high-speed, column-based data formats with convienent transformations functions to convert between disk storage and tensorflow compatible forms (see Data API)
  2. A simple model API, which featurizes input data, tracks variables/parameters, coordinates training/inference, and provides model serialization (see Model API)
  3. A toolkit API, which provides custom built tensorflow ops, and higher level components (see Toolkit API)
  4. A distributed API, which enables running tensorflow in parallel on certain types of clusters (see Distributed API)
  5. Domain-specific APIs, which provide some tools for working with data for specific applications (e.g chemistry, see Domain-specific APIs)

Data API

The tflon data pipeline slightly prioritizes flexibility over speed.

Data flow:

Main process |   Rows --+                               +--> Push tf.Queue
             |          |                               |
             |          |                               |
Subprocesses |          +--> featurize + to feed dict --+

For stochastic minibatch gradient descent type models, there are three stages:

  1. An iterator provides batch dictionaries mapping input name (str) -> data (Table)
  2. One or more worker processes receive batches for pre-processing, which has 3 steps:
    1. Model objects pre-process the batch dictionary, changing or adding Table objects
    2. Module objects augment the batch dictionary with additional Table objects
    3. Table objects are converted to a tensor dictionary of name (str) -> tf.Tensor or tf.SparseTensorValue
    4. Tensor dictionaries are fed into a TensorQueue for transfer to device memory
  3. Data is loaded to a tower from the TensorQueue assigned to the tower

Global gradient descent models (e.g using OpenOptTrainer) use a single batch dictionary input (e.g a TableFeed object), but store the resulting tensors in (GPU) memory for the entire training run using a TensorLoader.

Model API

The model API is designed to be as lightweight as possible. User models extend the Model base class and override the def _model(self, tower) function. tower in this case is an instance of Tower, which provides convenient methods for creating named input and output tensors and creating, storing, and tracking variables. Unlike keras, models do not have to conform to a particular structure, but can have flexible forms and use pure tensorflow ops.

There are two important constraints on model creation:

  1. Variables should be created with Tower.get_weight, Tower.get_bias and Tower.get_variable
  2. Inputs and outputs should be defined by Tower.add_input, Tower.add_target and Tower.add_output

Toolkit API

The toolkit API provides extensions to tensorflow to provide additional ops, and also higher level components that require additional features such as:

  1. Data-based initialization (e.g input data min-max windowing)
  2. Weight reuse (e.g recursive network nodes)
  3. Special input featurizations (e.g indexing atom pairs)

Distributed API

The distributed API leverages horovod and mpi4py to enable simple scaling of single-GPU tensorflow models to multi-GPU environments, possibly spread over many individual computing nodes. This framework will run in any environment supporting mpirun.

To convert a single-GPU model to distributed form, only a few steps are required

  1. Data must be split into multiple shards (subsets of examples). Ideally, shards are split evenly among all the compute units. Shards are stored as subdirectories of a top-level directory, for example:
dataset/
    shard_X/
        table_A.pq
        table_B.pq
        table_C.pq
    shard_Y/
        table_A.pq
        table_B.pq
        table_C.pq
  1. The distributed environment is initialized by a call to tflon.distributed.init_distributed_resources at the beginning of your scripts.
  2. Data is loaded by tflon.distributed.make_distributed_table_feed(), which divides shards among nodes using a modulus strategy, and wraps tables in a tflon.distributed.DistributedTable wrapper, which supports distributed data-initialization ops (e.g min/max windowing).

An example for running distributed models can be found at tflon_core/examples/distributed.py

Domain-specific APIs

Tools for working with chemical and graph data are provided by tflon.chem and tflon.graph.

Usage example

This basic neural network example has five main parts: imports, model definition, data loading, training and evaluation.

  1. Imports
import tensorflow as tf
import pandas as pd
import tflon
from pkg_resources import resource_filename
  1. We create a simple neural network with two hidden layers, sigmoid output, cross entropy loss and l2 regularization.
class NeuralNet(tflon.model.Model):
    def _model(self):
        I = self.add_input('desc', shape=[None, 210])
        T = self.add_target('targ', shape=[None, 1])

        net = tflon.toolkit.WindowInput() |\
              tflon.toolkit.Dense(20, activation=tf.tanh) |\
              tflon.toolkit.Dense(5, activation=tf.tanh) |\
              tflon.toolkit.Dense(1)
        L = net(I)

        self.add_output( "pred", tf.nn.sigmoid(L) )
        self.add_loss( "xent", tflon.toolkit.xent(T, L) )
        self.add_loss( "l2", tflon.toolkit.l2_penalty(self.weights) )
        self.add_metric( 'auc', tflon.toolkit.auc(T, L) )
  1. We load data from a csv file, create a feed consisting of two tables, which feed the two tensor inputs (descriptors and targets).
# Import data and create a feed
df = pd.read_csv( resource_filename('tflon_test.data', 'cyp.tsv'), sep='\t', index_col='ID' )
feed = tflon.data.TableFeed({'desc':tflon.data.Table(df[df.columns[:-1]]), 'targ':tflon.data.Table(df[[df.columns[-1]]])})
  1. We then instantiate and train the model with the OpenOptTrainer, an interface between tensorflow and the scipy optimizers.
  2. Finally, we perform inference on the training data and check the fit of the model using the AUC metric.
# Create a neural network tower
NN = NeuralNet()

# Create an L-BFGS trainer
trainer = tflon.train.OpenOptTrainer( iterations=100 )

with tf.Session():
    # Run the trainer
    NN.fit( feed, trainer, restarts=2 )

    # Perform inference
    auc = NN.evaluate( feed, query='auc' )
    print "AUC:", auc

More Examples

More examples can be found in tflon_core/python/examples, and at Usage Examples.

Indices and tables