mlbench_core.aggregation¶

pytorch¶

Aggregation¶

class mlbench_core.aggregation.pytorch.aggregation.Aggregation(use_cuda=False)[source]¶

Aggregate updates / models from different processes.

Parameters: use_cuda (bool) – Whether to use CUDA tensors for communication

abstract _agg(self, data, op, denom=None)[source]¶

Aggregate data using op operation.

Parameters

data (torch.Tensor) – A Tensor to be aggregated.
op (str) – Aggregation methods like avg, sum, min, max, etc.
denom (torch.Tensor, optional) – Custom denominator to average by Use with op == custom_avg. (default: None)

Returns

An aggregated tensor.

Return type

torch.Tensor

_agg_gradients_by_layer(self, model, op, denom=None)[source]¶

Aggregate models gradients each layer individually

Parameters

model (torch.Module) – Models to be averaged.
op (str) – Aggregation method. Should be in ALLREDUCE_AGGREGATION_OPS
denom (torch.Tensor, optional) – Custom denominator to average by Use with op == custom_avg. (default: None)

_agg_gradients_by_model(self, model, op, denom=None)[source]¶

Aggregate models gradients, all layers at once

Parameters

model (torch.Module) – Models to be averaged.
op (str) – Aggregation method. Should be in ALLREDUCE_AGGREGATION_OPS
denom (torch.Tensor, optional) – Custom denominator to average by Use with op == custom_avg. (default: None)

_agg_weights_by_layer(self, model, op, denom=None)[source]¶

Aggregate models by model weight, for each layer individually

Parameters

model (torch.Module) – Models to be averaged.
op (str) – Aggregation method. Should be in ALLREDUCE_AGGREGATION_OPS
denom (torch.Tensor, optional) – Custom denominator to average by Use with op == custom_avg. (default: None)

_agg_weights_by_model(self, model, op, denom=None)[source]¶

Aggregate models by model weight, all layers at once

Parameters

model (torch.Module) – Models to be averaged.
op (str) – Aggregation method. Should be in ALLREDUCE_AGGREGATION_OPS
denom (torch.Tensor, optional) – Custom denominator to average by Use with op == custom_avg. (default: None)

agg_grad(self, by_layer=False)[source]¶

agg_model(self, by_layer=False)[source]¶

Centralized (Synchronous) aggregation¶

All-Reduce¶

class mlbench_core.aggregation.pytorch.centralized.AllReduceAggregation(world_size, divide_before=False, use_cuda=False)[source]¶

Bases: mlbench_core.aggregation.pytorch.aggregation.Aggregation

Aggregate weights / models from different processes using all-reduce aggregation

Parameters

world_size (int) – Current distributed world size
divide_before (bool) – Perform division before reduction (avoid overflow)
use_cuda (bool) – Use cuda tensors for reduction

All-Reduce Horovod¶

class mlbench_core.aggregation.pytorch.centralized.AllReduceAggregationHVD(world_size, divide_before=False, use_cuda=False)[source]¶

Bases: AllReduceAggregation

Implements AllReduceAggregation using horovod for communication

Sparsified Aggregation¶

class mlbench_core.aggregation.pytorch.centralized.SparsifiedAggregation(model, use_cuda=False)[source]¶

Bases: mlbench_core.aggregation.pytorch.aggregation.Aggregation

Aggregate sparsified updates.

Power Aggregation¶

class mlbench_core.aggregation.pytorch.centralized.PowerAggregation(model, use_cuda=False, reuse_query=False, world_size=1, rank=1)[source]¶

Bases: mlbench_core.aggregation.pytorch.aggregation.Aggregation

Aggregate updates using power iteration and error feedback.

Parameters

model (nn.Module) – Model which contains parameters for SGD
use_cuda (bool) – Whether to use cuda tensors for aggregation
reuse_query (bool) – Whether to use warm start to initialize the power iteration
rank (int) – The rank of the gradient approximation

Decentralized (Asynchronous) aggregation¶

Decentralized Aggregation¶

class mlbench_core.aggregation.pytorch.decentralized.DecentralizedAggregation(rank, neighbors, use_cuda=False)[source]¶

Bases: mlbench_core.aggregation.pytorch.aggregation.Aggregation

Aggregate updates in a decentralized manner.