Distributed API

class tflon.distributed.distributed.DistributedTable(table)

Table wrapper implementing mpi Reduce ops for table data.

class tflon.distributed.distributed.DistributedTrainer(optimizer, iterations, **kwargs)

This trainer adds horovod DistributedOptimizer wrapper to a tensorflow optimizer, and handles broadcasting initialized model states.

tflon.distributed.distributed.get_rank()

Returns rank (the index for distributed thread)

tflon.distributed.distributed.init_distributed_resources()

Initialize mpi and Horovod for distributed training. This should be called before any tflon or tensorflow calls.

tflon.distributed.distributed.is_master()

Check whether the current process is the rank 0 mpi process.

tflon.distributed.distributed.make_distributed_table_feed(root, schema, master_table=None, partition_strategy='mod')

Load data shards for distributed training.

Parameters:
  • root (str) – The root directory containing shards. Each shard should be a subdirectory of this directory.
  • schema (tflon.data.Schema) – Schema mapping files to named tables as returned by tflon.model.Model.schema
Keyword Arguments:
 

partition_strategy (str) – Strategy for dividing shards among nodes. Supported values include: * ‘mod’: For process rank r and number n, divide shards evenly by distributing every r+n-th shard * ‘all’: Replicate all shards on all processes