mlbench_core.aggregation¶
pytorch¶
Aggregation¶
- class mlbench_core.aggregation.pytorch.aggregation.Aggregation(use_cuda=False)[source]¶
Aggregate updates / models from different processes.
- Parameters
use_cuda (bool) – Whether to use CUDA tensors for communication
- abstract _agg(self, data, op, denom=None)[source]¶
Aggregate data using op operation.
- Parameters
data (
torch.Tensor
) – A Tensor to be aggregated.op (str) – Aggregation methods like avg, sum, min, max, etc.
denom (
torch.Tensor
, optional) – Custom denominator to average by Use with op == custom_avg. (default: None)
- Returns
An aggregated tensor.
- Return type
torch.Tensor
- _agg_gradients_by_layer(self, model, op, denom=None)[source]¶
Aggregate models gradients each layer individually
- Parameters
model (
torch.Module
) – Models to be averaged.op (str) – Aggregation method. Should be in ALLREDUCE_AGGREGATION_OPS
denom (
torch.Tensor
, optional) – Custom denominator to average by Use with op == custom_avg. (default: None)
- _agg_gradients_by_model(self, model, op, denom=None)[source]¶
Aggregate models gradients, all layers at once
- Parameters
model (
torch.Module
) – Models to be averaged.op (str) – Aggregation method. Should be in ALLREDUCE_AGGREGATION_OPS
denom (
torch.Tensor
, optional) – Custom denominator to average by Use with op == custom_avg. (default: None)
- _agg_weights_by_layer(self, model, op, denom=None)[source]¶
Aggregate models by model weight, for each layer individually
- Parameters
model (
torch.Module
) – Models to be averaged.op (str) – Aggregation method. Should be in ALLREDUCE_AGGREGATION_OPS
denom (
torch.Tensor
, optional) – Custom denominator to average by Use with op == custom_avg. (default: None)
- _agg_weights_by_model(self, model, op, denom=None)[source]¶
Aggregate models by model weight, all layers at once
- Parameters
model (
torch.Module
) – Models to be averaged.op (str) – Aggregation method. Should be in ALLREDUCE_AGGREGATION_OPS
denom (
torch.Tensor
, optional) – Custom denominator to average by Use with op == custom_avg. (default: None)
Centralized (Synchronous) aggregation¶
All-Reduce¶
- class mlbench_core.aggregation.pytorch.centralized.AllReduceAggregation(world_size, divide_before=False, use_cuda=False)[source]¶
Bases:
mlbench_core.aggregation.pytorch.aggregation.Aggregation
Aggregate weights / models from different processes using all-reduce aggregation
- Parameters
world_size (int) – Current distributed world size
divide_before (bool) – Perform division before reduction (avoid overflow)
use_cuda (bool) – Use cuda tensors for reduction
All-Reduce Horovod¶
- class mlbench_core.aggregation.pytorch.centralized.AllReduceAggregationHVD(world_size, divide_before=False, use_cuda=False)[source]¶
Bases:
AllReduceAggregation
Implements AllReduceAggregation using horovod for communication
Sparsified Aggregation¶
- class mlbench_core.aggregation.pytorch.centralized.SparsifiedAggregation(model, use_cuda=False)[source]¶
Bases:
mlbench_core.aggregation.pytorch.aggregation.Aggregation
Aggregate sparsified updates.
Power Aggregation¶
- class mlbench_core.aggregation.pytorch.centralized.PowerAggregation(model, use_cuda=False, reuse_query=False, world_size=1, rank=1)[source]¶
Bases:
mlbench_core.aggregation.pytorch.aggregation.Aggregation
Aggregate updates using power iteration and error feedback.
- Parameters
model (
nn.Module
) – Model which contains parameters for SGDuse_cuda (bool) – Whether to use cuda tensors for aggregation
reuse_query (bool) – Whether to use warm start to initialize the power iteration
rank (int) – The rank of the gradient approximation
Decentralized (Asynchronous) aggregation¶
Decentralized Aggregation¶
- class mlbench_core.aggregation.pytorch.decentralized.DecentralizedAggregation(rank, neighbors, use_cuda=False)[source]¶
Bases:
mlbench_core.aggregation.pytorch.aggregation.Aggregation
Aggregate updates in a decentralized manner.