mlbench_core.models¶
pytorch¶
Since Kuang Liu<https://github.com/kuangliu/pytorch-cifar> has already included many classical neural network models. We use their implementation direclty for
VGG
resnet¶
Contains definitions for Residual Networks.
Residual networks were originally proposed in [HZRS16a] . Then they improve the [HZRS16b] Here we refer to the settings in [HZRS16a] as v1 and [HZRS16b] as v2.
Since torchvision resnet has already implemented.
ResNet-18
ResNet-34
ResNet-50
ResNet-101
ResNet-152
for image net. Here we only implemented the remaining models
ResNet-20
ResNet-32
ResNet-44
ResNet-56
for CIFAR-10 dataset. Besides, their implementation uses projection shortcut by default.
ResNetCIFAR¶
- class mlbench_core.models.pytorch.resnet.ResNetCIFAR(resnet_size, bottleneck, num_classes, version=_DEFAULT_RESNETCIFAR_VERSION)[source]¶
Basic ResNet implementation.
- Parameters
resnet_size (int) – Number of layers
bottleneck (bool) – Whether to use a bottleneck layer (
Not Implemented
)num_classes (int) – Number of output classes
version (int) – Resnet version (1 or 2). Default:
1
RNN¶
—
Google Neural Machine Translation¶
Model¶
- class mlbench_core.models.pytorch.gnmt.GNMT(vocab_size, hidden_size=1024, num_layers=4, dropout=0.2, share_embedding=True, fusion=True)[source]¶
GNMT v2 model
- Parameters
vocab_size (int) – size of vocabulary (number of tokens)
hidden_size (int) – internal hidden size of the model
num_layers (int) – number of layers, applies to both encoder and decoder
dropout (float) – probability of dropout (in encoder and decoder) tensors, if false the model uses (seq, batch, feature)
share_embedding (bool) – if True embeddings are shared between encoder and decoder
- decode(self, inputs, context, inference=False)¶
Applies the decoder to inputs, given the context from the encoder.
- Parameters
inputs (torch.tensor) – tensor with inputs (seq_len, batch)
context – context from the encoder
inference – if True inference mode, if False training mode
- Returns
torch.tensor
- encode(self, inputs, lengths)¶
Applies the encoder to inputs with a given input sequence lengths.
- Parameters
inputs (torch.tensor) – tensor with inputs (seq_len, batch)
lengths – vector with sequence lengths (excluding padding)
- Returns
torch.tensor
- generate(self, inputs, context, beam_size)¶
Autoregressive generator, works with SequenceGenerator class. Executes decoder (in inference mode), applies log_softmax and topK for inference with beam search decoding.
- Parameters
inputs – tensor with inputs to the decoder
context – context from the encoder
beam_size – beam size for the generator
- Returns
(words, logprobs, scores, new_context) words: indices of topK tokens logprobs: log probabilities of topK tokens scores: scores from the attention module (for coverage penalty) new_context: new decoder context, includes new hidden states for decoder RNN cells
BahdanauAttention¶
Encoder¶
- class mlbench_core.models.pytorch.gnmt.encoder.ResidualRecurrentEncoder(vocab_size, hidden_size=1024, num_layers=4, dropout=0.2, embedder=None, init_weight=0.1)[source]¶
Encoder with Embedding, LSTM layers, residual connections and optional dropout.
The first LSTM layer is bidirectional and uses variable sequence length API, the remaining (num_layers-1) layers are unidirectional. Residual connections are enabled after third LSTM layer, dropout is applied on inputs to LSTM layers.
- Parameters
vocab_size – size of vocabulary
hidden_size – hidden size for LSTM layers
num_layers – number of LSTM layers, 1st layer is bidirectional
dropout – probability of dropout (on input to LSTM layers)
embedder – instance of nn.Embedding, if None constructor will create new embedding layer
init_weight – range for the uniform initializer
Decoder¶
- class mlbench_core.models.pytorch.gnmt.decoder.RecurrentAttention(input_size=1024, context_size=1024, hidden_size=1024, num_layers=1, dropout=0.2, init_weight=0.1, fusion=True)[source]¶
LSTM wrapped with an attention module.
- Parameters
input_size (int) – number of features in input tensor
context_size (int) – number of features in output from encoder
hidden_size (int) – internal hidden size
num_layers (int) – number of layers in LSTM
dropout (float) – probability of dropout (on input to LSTM layer)
init_weight (float) – range for the uniform initializer
- forward(self, inputs, hidden, context, context_len)[source]¶
Execute RecurrentAttention.
- Parameters
inputs (int) – tensor with inputs
hidden (int) – hidden state for LSTM layer
context – context tensor from encoder
context_len – vector of encoder sequence lengths
- Returns
(rnn_outputs, hidden, attn_output, attn_scores)
- class mlbench_core.models.pytorch.gnmt.decoder.Classifier(in_features, out_features, init_weight=0.1)[source]¶
Fully-connected classifier
- Parameters
in_features (int) – number of input features
out_features (int) – number of output features (size of vocabulary)
init_weight (float) – range for the uniform initializer
- class mlbench_core.models.pytorch.gnmt.decoder.ResidualRecurrentDecoder(vocab_size, hidden_size=1024, num_layers=4, dropout=0.2, embedder=None, init_weight=0.1, fusion=True)[source]¶
Decoder with Embedding, LSTM layers, attention, residual connections and optinal dropout.
Attention implemented in this module is different than the attention discussed in the GNMT arxiv paper. In this model the output from the first LSTM layer of the decoder goes into the attention module, then the re-weighted context is concatenated with inputs to all subsequent LSTM layers in the decoder at the current timestep.
Residual connections are enabled after 3rd LSTM layer, dropout is applied on inputs to LSTM layers.
- Parameters
vocab_size (int) – size of vocabulary
hidden_size (int) – hidden size for LSMT layers
num_layers (int) – number of LSTM layers
dropout (float) – probability of dropout (on input to LSTM layers)
embedder (nn.Embedding) – if None constructor will create new embedding layer
init_weight (float) – range for the uniform initializer
Appends the hidden vector h to the list of internal hidden states.
- Parameters
h – hidden vector
- forward(self, inputs, context, inference=False)[source]¶
Execute the decoder.
- Parameters
inputs – tensor with inputs to the decoder
context – state of encoder, encoder sequence lengths and hidden state of decoder’s LSTM layers
inference – if True stores and repackages hidden state
Returns:
Converts flattened hidden state (from sequence generator) into a tuple of hidden states. :param hidden: None or flattened hidden state for decoder RNN layers
Flattens the hidden state from all LSTM layers into one tensor (for the sequence generator).
Transformer Model for Translation¶
Model¶
- class mlbench_core.models.pytorch.transformer.TransformerModel(args, src_dict, trg_dict)[source]¶
Transformer model
This model uses MultiHeadAttention as described in [VSP+17]
- Parameters
args – Arguments of model. All arguments should be accessible via __getattribute__ method
src_dict (
mlbench_core.dataset.nlp.pytorch.wmt17.Dictionary
) – Source dictionarytrg_dict (
mlbench_core.dataset.nlp.pytorch.wmt17.Dictionary
) – Target dictionary
- forward(self, src_tokens, src_lengths, prev_output_tokens)¶
Run the forward pass of the transformer model.
- Parameters
src_tokens (
torch.Tensor
) – Source tokenssrc_lengths (
torch.Tensor
) – Source sentence lengthsprev_output_tokens (
torch.Tensor
) – Previous output tokens
- Returns
The model output, and attention weights if needed
- Return type
(
torch.Tensor
, Optional[torch.Tensor
])
Encoder¶
- class mlbench_core.models.pytorch.transformer.encoder.TransformerEncoder(args, dictionary, embed_tokens, left_pad=True)[source]¶
Transformer encoder consisting of args.encoder_layers layers. Each layer is a
TransformerEncoderLayer
.- Parameters
args – Arguments of model. All arguments should be accessible via __getattribute__ method
dictionary (
mlbench_core.dataset.nlp.pytorch.wmt17.Dictionary
) – encoding dictionaryembed_tokens (torch.nn.Embedding) – input embedding
left_pad (bool) – Pad sources to the left (True) or right (False). Default: True
Decoder¶
- class mlbench_core.models.pytorch.transformer.decoder.TransformerDecoder(args, dictionary, embed_tokens, no_encoder_attn=False, left_pad=False)[source]¶
Transformer decoder consisting of args.decoder_layers layers. Each layer is a
TransformerDecoderLayer
.- Parameters
args – Arguments of model. All arguments should be accessible via __getattribute__ method
dictionary (
mlbench_core.dataset.nlp.pytorch.wmt17.Dictionary
) – decoding dictionaryembed_tokens (torch.nn.Embedding) – output embedding
no_encoder_attn (bool, optional) – whether to attend to encoder outputs (default: False).
left_pad (bool) – Pad targets to the left (True) or right (False). Default: False
Layers¶
- class mlbench_core.models.pytorch.transformer.modules.TransformerEncoderLayer(args)[source]¶
Encoder layer block.
In the original paper each operation (multi-head attention or FFN) is postprocessed with: dropout -> add residual -> layernorm. In the tensor2tensor code they suggest that learning is more robust when preprocessing each layer with layernorm and postprocessing with: dropout -> add residual. We default to the approach in the paper, but the tensor2tensor approach can be enabled by setting args.encoder_normalize_before to
True
.- Parameters
args (argparse.Namespace) – parsed command-line arguments
- class mlbench_core.models.pytorch.transformer.modules.TransformerDecoderLayer(args, no_encoder_attn=False)[source]¶
Decoder layer block.
In the original paper each operation (multi-head attention, encoder attention or FFN) is postprocessed with: dropout -> add residual -> layernorm. In the tensor2tensor code they suggest that learning is more robust when preprocessing each layer with layernorm and postprocessing with: dropout -> add residual. We default to the approach in the paper, but the tensor2tensor approach can be enabled by setting args.decoder_normalize_before to
True
.- Parameters
args (argparse.Namespace) – parsed command-line arguments
no_encoder_attn (bool, optional) – whether to attend to encoder outputs (default: False).
SequenceGenerator¶
- class mlbench_core.models.pytorch.transformer.sequence_generator.SequenceGenerator(model, src_dict, trg_dict, beam_size=1, minlen=1, maxlen=None, stop_early=True, normalize_scores=True, len_penalty=1, retain_dropout=False, sampling=False, sampling_topk=- 1, sampling_temperature=1)[source]¶
Generates translations of a given source sentence.
- Parameters
model (
torch.nn.Module
) – The model to predict on. Should be instance of TransformerModelsrc_dict (
mlbench_core.dataset.nlp.pytorch.wmt17.Dictionary
) – Source dictionarytrg_dict (
mlbench_core.dataset.nlp.pytorch.wmt17.Dictionary
) – Target dictionarybeam_size (int) – Size of the beam. Default 1
minlen (int) – Minimum generation length. Default 1
maxlen (int) – Maximum generation length. If None, takes value of model.max_decoder_positions(). Default None
stop_early (bool) – Stop generation immediately after we finalize beam_size hypotheses, even though longer hypotheses might have better normalized scores. Default True
normalize_scores (bool) – Normalize scores by the length of the output. Default True
len_penalty (float) – length penalty: <1.0 favors shorter, >1.0 favors longer sentences. Default 1
retain_dropout (bool) – Keep dropout layers. Default False
sampling (bool) – sample hypotheses instead of using beam search. Default False
sampling_topk (int) – sample from top K likely next words instead of all words. Default -1
sampling_temperature (int) – temperature for random sampling. Default 1
- generate(self, src_tokens, src_lengths, maxlen=None, prefix_tokens=None)[source]¶
Generate a batch of translations.
- generate_batch_translations(self, batch, maxlen_a=0.0, maxlen_b=None, prefix_size=0)[source]¶
Yield individual translations of a batch.
- Parameters
batch (dict) – The model input batch. Must have keys net_input, target and ntokens
maxlen_a (float) –
maxlen_b (Optional[int]) – Generate sequences of max lengths maxlen_a*x + maxlen_b where x = input sentence length
prefix_size (int) – Prefix size
- translate_batch(self, batch, maxlen_a=1.0, maxlen_b=50, prefix_size=0, remove_bpe=None, nbest=1, ignore_case=True)[source]¶
- Parameters
batch (dict) – The model input batch. Must have keys net_input, target and ntokens
maxlen_a (float) – Default 1.0
maxlen_b (Optional[int]) – Generate sequences of max lengths maxlen_a*x + maxlen_b where x = input sentence length. Default 50
prefix_size (int) – Prefix size. Default 0
remove_bpe (Optional[str]) – BPE token. Default None
nbest (int) – Number of hypotheses to output. Default 1
ignore_case (bool) – Ignore case druing online eval. Default True
- Returns
The translations and their targets for the given batch
- Return type
(list[str], list[str])
References
- HZRS16a(1,2,3)
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778. 2016.
- HZRS16b(1,2,3)
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European conference on computer vision, 630–645. Springer, 2016.
- VSP+17
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5998–6008. Curran Associates, Inc., 2017. URL: http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf.
tensorflow¶
resnet¶
Contains definitions for Residual Networks. Residual networks (‘v1’ ResNets) were originally proposed in [HZRS16a]. The full preactivation ‘v2’ ResNet variant was introduced by [HZRS16b]. The key difference of the full preactivation ‘v2’ variant compared to the ‘v1’ variant in [1] is the use of batch normalization before every weight layer rather than after.
- mlbench_core.models.tensorflow.resnet_model.fixed_padding(inputs, kernel_size, data_format)[source]¶
Pads the input along the spatial dimensions independently of input size.
- Parameters
inputs (
tf.Tensor
) – A tensor of size [batch, channels, height_in, width_in] or [batch, height_in, width_in, channels] depending on data_format.kernel_size (int) – The kernel to be used in the conv2d or max_pool2d operation. Should be a positive integer.
data_format (str) – The input format (‘channels_last’ or ‘channels_first’).
- Returns
A tensor with the same format as the input with the data either intact (if kernel_size == 1) or padded (if kernel_size > 1).
- mlbench_core.models.tensorflow.resnet_model.conv2d_fixed_padding(inputs, filters, kernel_size, strides, data_format)[source]¶
Strided 2-D convolution with explicit padding.
- mlbench_core.models.tensorflow.resnet_model.block_layer(inputs, filters, bottleneck, block_fn, blocks, strides, training, name, data_format)[source]¶
Creates one layer of blocks for the ResNet model.
- Parameters
inputs (
tf.Tensor
) – A tensor of size [batch, channels, height_in, width_in] or [batch, height_in, width_in, channels] depending on data_format.filters (int) – The number of filters for the first convolution of the layer.
bottleneck (bool) – Is the block created a bottleneck block.
block_fn (callable) – The block to use within the model, either building_block or bottleneck_block.
blocks (int) – The number of blocks contained in the layer.
strides (int) – The stride to use for the first convolution of the layer. If greater than 1, this layer will ultimately downsample the input.
training (bool) – Either True or False, whether we are currently training the model. Needed for batch norm.
name (str) – A string name for the tensor output of the block layer.
data_format (str) – The input format (‘channels_last’ or ‘channels_first’).
- Returns
The output tensor of the block layer.
- mlbench_core.models.tensorflow.resnet_model.batch_norm(inputs, training, data_format)[source]¶
Performs a batch normalization using a standard set of parameters.
Model¶
- class mlbench_core.models.tensorflow.resnet_model.Model(resnet_size, bottleneck, num_classes, num_filters, kernel_size, conv_stride, first_pool_size, first_pool_stride, block_sizes, block_strides, resnet_version=DEFAULT_VERSION, data_format=None, dtype=DEFAULT_DTYPE)[source]¶
Base class for building the Resnet Model.