mlbench_core.aggregation¶
pytorch¶
Aggregation¶
-
class
mlbench_core.aggregation.pytorch.aggregation.
Aggregation
(use_cuda=False)[source]¶ Aggregate updates / models from different processes.
- Parameters
use_cuda (bool) – Whether to use CUDA tensors for communication
-
abstract
_agg
(self, data, op, denom=None)[source]¶ Aggregate data using op operation.
- Parameters
data (
torch.Tensor
) – A Tensor to be aggregated.op (str) – Aggregation methods like avg, sum, min, max, etc.
denom (
torch.Tensor
, optional) – Custom denominator to average by Use with op == custom_avg. (default: None)
- Returns
An aggregated tensor.
- Return type
torch.Tensor
-
_agg_gradients_by_layer
(self, model, op, denom=None)[source]¶ Aggregate models gradients each layer individually
- Parameters
model (
torch.Module
) – Models to be averaged.op (str) – Aggregation method. Should be in ALLREDUCE_AGGREGATION_OPS
denom (
torch.Tensor
, optional) – Custom denominator to average by Use with op == custom_avg. (default: None)
-
_agg_gradients_by_model
(self, model, op, denom=None)[source]¶ Aggregate models gradients, all layers at once
- Parameters
model (
torch.Module
) – Models to be averaged.op (str) – Aggregation method. Should be in ALLREDUCE_AGGREGATION_OPS
denom (
torch.Tensor
, optional) – Custom denominator to average by Use with op == custom_avg. (default: None)
-
_agg_weights_by_layer
(self, model, op, denom=None)[source]¶ Aggregate models by model weight, for each layer individually
- Parameters
model (
torch.Module
) – Models to be averaged.op (str) – Aggregation method. Should be in ALLREDUCE_AGGREGATION_OPS
denom (
torch.Tensor
, optional) – Custom denominator to average by Use with op == custom_avg. (default: None)
-
_agg_weights_by_model
(self, model, op, denom=None)[source]¶ Aggregate models by model weight, all layers at once
- Parameters
model (
torch.Module
) – Models to be averaged.op (str) – Aggregation method. Should be in ALLREDUCE_AGGREGATION_OPS
denom (
torch.Tensor
, optional) – Custom denominator to average by Use with op == custom_avg. (default: None)
Centralized (Synchronous) aggregation¶
All-Reduce¶
-
class
mlbench_core.aggregation.pytorch.centralized.
AllReduceAggregation
(world_size, divide_before=False, use_cuda=False)[source]¶ Bases:
mlbench_core.aggregation.pytorch.aggregation.Aggregation
Aggregate weights / models from different processes using all-reduce aggregation
- Parameters
world_size (int) – Current distributed world size
divide_before (bool) – Perform division before reduction (avoid overflow)
use_cuda (bool) – Use cuda tensors for reduction
All-Reduce Horovod¶
-
class
mlbench_core.aggregation.pytorch.centralized.
AllReduceAggregationHVD
(world_size, divide_before=False, use_cuda=False)[source]¶ Bases:
mlbench_core.aggregation.pytorch.centralized.AllReduceAggregation
Implements AllReduceAggregation using horovod for communication
Sparsified Aggregation¶
-
class
mlbench_core.aggregation.pytorch.centralized.
SparsifiedAggregation
(model, use_cuda=False)[source]¶ Bases:
mlbench_core.aggregation.pytorch.aggregation.Aggregation
Aggregate sparsified updates.
Power Aggregation¶
-
class
mlbench_core.aggregation.pytorch.centralized.
PowerAggregation
(model, use_cuda=False, reuse_query=False, rank=1)[source]¶ Bases:
mlbench_core.aggregation.pytorch.aggregation.Aggregation
Aggregate updates using power iteration and error feedback.
- Parameters
model (
nn.Module
) – Model which contains parameters for SGDuse_cuda (bool) – Whether to use cuda tensors for aggregation
reuse_query (bool) – Whether to use warm start to initialize the power iteration
rank (int) – The rank of the gradient approximation
Decentralized (Asynchronous) aggregation¶
Decentralized Aggregation¶
-
class
mlbench_core.aggregation.pytorch.decentralized.
DecentralizedAggregation
(rank, neighbors, use_cuda=False)[source]¶ Bases:
mlbench_core.aggregation.pytorch.aggregation.Aggregation
Aggregate updates in a decentralized manner.