Distributed API

class tflon.distributed.distributed.DistributedTable(table)

Table wrapper implementing mpi Reduce ops for table data.

class tflon.distributed.distributed.DistributedTrainer(optimizer, iterations, **kwargs)

This trainer adds horovod DistributedOptimizer wrapper to a tensorflow optimizer, and handles broadcasting initialized model states.


Returns rank (the index for distributed thread)


Initialize mpi and Horovod for distributed training. This should be called before any tflon or tensorflow calls.


Check whether the current process is the rank 0 mpi process.

tflon.distributed.distributed.make_distributed_table_feed(root, schema, master_table=None, partition_strategy='mod')

Load data shards for distributed training.

  • root (str) – The root directory containing shards. Each shard should be a subdirectory of this directory.
  • schema (tflon.data.Schema) – Schema mapping files to named tables as returned by tflon.model.Model.schema
Keyword Arguments:

partition_strategy (str) – Strategy for dividing shards among nodes. Supported values include: * ‘mod’: For process rank r and number n, divide shards evenly by distributing every r+n-th shard * ‘all’: Replicate all shards on all processes