Distributed API¶

class tflon.distributed.distributed.DistributedTable(table)¶: Table wrapper implementing mpi Reduce ops for table data.

class tflon.distributed.distributed.DistributedTrainer(optimizer, iterations, **kwargs)¶: This trainer adds horovod DistributedOptimizer wrapper to a tensorflow optimizer, and handles broadcasting initialized model states.

tflon.distributed.distributed.get_rank()¶: Returns rank (the index for distributed thread)

tflon.distributed.distributed.init_distributed_resources()¶: Initialize mpi and Horovod for distributed training. This should be called before any tflon or tensorflow calls.

tflon.distributed.distributed.is_master()¶: Check whether the current process is the rank 0 mpi process.

tflon.distributed.distributed.make_distributed_table_feed(root, schema, master_table=None, partition_strategy='mod')¶

Load data shards for distributed training.

Keyword Arguments:
Parameters:	root (str) – The root directory containing shards. Each shard should be a subdirectory of this directory. schema (tflon.data.Schema) – Schema mapping files to named tables as returned by tflon.model.Model.schema
	partition_strategy (str) – Strategy for dividing shards among nodes. Supported values include: * ‘mod’: For process rank r and number n, divide shards evenly by distributing every r+n-th shard * ‘all’: Replicate all shards on all processes