mlbench_core.aggregation

pytorch

Aggregation

class mlbench_core.aggregation.pytorch.aggregation.Aggregation(use_cuda=False)[source]

Aggregate updates / models from different processes.

Parameters

use_cuda (bool) – Whether to use CUDA tensors for communication

abstract _agg(self, data, op, denom=None)[source]

Aggregate data using op operation.

Parameters
  • data (torch.Tensor) – A Tensor to be aggregated.

  • op (str) – Aggregation methods like avg, sum, min, max, etc.

  • denom (torch.Tensor, optional) – Custom denominator to average by Use with op == custom_avg. (default: None)

Returns

An aggregated tensor.

Return type

torch.Tensor

_agg_gradients_by_layer(self, model, op, denom=None)[source]

Aggregate models gradients each layer individually

Parameters
  • model (torch.Module) – Models to be averaged.

  • op (str) – Aggregation method. Should be in ALLREDUCE_AGGREGATION_OPS

  • denom (torch.Tensor, optional) – Custom denominator to average by Use with op == custom_avg. (default: None)

_agg_gradients_by_model(self, model, op, denom=None)[source]

Aggregate models gradients, all layers at once

Parameters
  • model (torch.Module) – Models to be averaged.

  • op (str) – Aggregation method. Should be in ALLREDUCE_AGGREGATION_OPS

  • denom (torch.Tensor, optional) – Custom denominator to average by Use with op == custom_avg. (default: None)

_agg_weights_by_layer(self, model, op, denom=None)[source]

Aggregate models by model weight, for each layer individually

Parameters
  • model (torch.Module) – Models to be averaged.

  • op (str) – Aggregation method. Should be in ALLREDUCE_AGGREGATION_OPS

  • denom (torch.Tensor, optional) – Custom denominator to average by Use with op == custom_avg. (default: None)

_agg_weights_by_model(self, model, op, denom=None)[source]

Aggregate models by model weight, all layers at once

Parameters
  • model (torch.Module) – Models to be averaged.

  • op (str) – Aggregation method. Should be in ALLREDUCE_AGGREGATION_OPS

  • denom (torch.Tensor, optional) – Custom denominator to average by Use with op == custom_avg. (default: None)

agg_grad(self, by_layer=False)[source]
agg_model(self, by_layer=False)[source]

Centralized (Synchronous) aggregation

All-Reduce

class mlbench_core.aggregation.pytorch.centralized.AllReduceAggregation(world_size, divide_before=False, use_cuda=False)[source]

Bases: mlbench_core.aggregation.pytorch.aggregation.Aggregation

Aggregate weights / models from different processes using all-reduce aggregation

Parameters
  • world_size (int) – Current distributed world size

  • divide_before (bool) – Perform division before reduction (avoid overflow)

  • use_cuda (bool) – Use cuda tensors for reduction

All-Reduce Horovod

class mlbench_core.aggregation.pytorch.centralized.AllReduceAggregationHVD(world_size, divide_before=False, use_cuda=False)[source]

Bases: AllReduceAggregation

Implements AllReduceAggregation using horovod for communication

Sparsified Aggregation

class mlbench_core.aggregation.pytorch.centralized.SparsifiedAggregation(model, use_cuda=False)[source]

Bases: mlbench_core.aggregation.pytorch.aggregation.Aggregation

Aggregate sparsified updates.

Power Aggregation

class mlbench_core.aggregation.pytorch.centralized.PowerAggregation(model, use_cuda=False, reuse_query=False, world_size=1, rank=1)[source]

Bases: mlbench_core.aggregation.pytorch.aggregation.Aggregation

Aggregate updates using power iteration and error feedback.

Parameters
  • model (nn.Module) – Model which contains parameters for SGD

  • use_cuda (bool) – Whether to use cuda tensors for aggregation

  • reuse_query (bool) – Whether to use warm start to initialize the power iteration

  • rank (int) – The rank of the gradient approximation

Decentralized (Asynchronous) aggregation

Decentralized Aggregation

class mlbench_core.aggregation.pytorch.decentralized.DecentralizedAggregation(rank, neighbors, use_cuda=False)[source]

Bases: mlbench_core.aggregation.pytorch.aggregation.Aggregation

Aggregate updates in a decentralized manner.