Layers

Layers are how you build Pipelines. They’re the transformations you’re applying to your data.

Image Layers

These Layers are for transformations geared towards image data.
class megatron.layers.image.Downsample(new_shape)

Bases: megatron.layers.core.StatelessLayer

Shrink an image to a given size proportionally.

Parameters:new_shape (tuple of int) – the target shape for the new image.
transform(X)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.
class megatron.layers.image.RGBtoBinary(keep_dim=True)

Bases: megatron.layers.core.StatelessLayer

Convert image to binary mask where a 1 indicates a non-black cell.

Parameters:keep_dim (bool) – if True, resulting image will stay 3D and will have 1 color channel. Otherwise 2D.
transform(X)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.
class megatron.layers.image.RGBtoGrey(method='luminosity', keep_dim=False)

Bases: megatron.layers.core.StatelessLayer

Convert an RGB array representation of an image to greyscale.

Parameters:method ({'luminosity', 'lightness', 'average'}) –
transform(X)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.
class megatron.layers.image.Upsample(new_shape)

Bases: megatron.layers.core.StatelessLayer

Expand an image to a given size proportionally.

Parameters:new_shape (tuple of int) – the target shape for the new image.
transform(X)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

Missing Data Layers

These Layers are for dealing with missing data.
class megatron.layers.missing.Impute(imputation_dict)

Bases: megatron.layers.core.StatelessLayer

Replace instances of one data item with another, such as missing or NaN with zero.

Parameters:imputation_dict (dict) – keys of the dictionary are targets to be replaced; values are corresponding replacements.
transform(X)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

Numeric Layers

These Layers are for mathematical operations on your data, such as arithmetic.
class megatron.layers.numeric.Add

Bases: megatron.layers.core.StatelessLayer

Add up arrays element-wise.

transform(*arrays)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.
class megatron.layers.numeric.Divide(impute=0)

Bases: megatron.layers.core.StatelessLayer

Divide given array by another given array element-wise.

Parameters:impute (int/float or None) – the value to impute when encountering a divide by zero.
transform(X1, X2)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.
class megatron.layers.numeric.Dot(n_outputs=1, **kwargs)

Bases: megatron.layers.core.StatelessLayer

Multiply multiple arrays together as matrix multiplication.

transform(*arrays)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.
class megatron.layers.numeric.ElementWiseMultiply(n_outputs=1, **kwargs)

Bases: megatron.layers.core.StatelessLayer

Multiply two same-sized arrays element-by-element.

transform(X, Y)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.
class megatron.layers.numeric.Normalize(n_outputs=1, **kwargs)

Bases: megatron.layers.core.StatelessLayer

Divide array by total to cause it to sum to one. If zero array, make uniform.

transform(X)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.
class megatron.layers.numeric.ScalarMultiply(factor)

Bases: megatron.layers.core.StatelessLayer

Multiply array by a given scalar.

Parameters:factor (float) – multiplier.
transform(X)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.
class megatron.layers.numeric.StaticDot(W)

Bases: megatron.layers.core.StatelessLayer

Multiply array by a given matrix, as matrix mulitplication.

Parameters:W (np.array) – matrix by which to multiply.
transform(X)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.
class megatron.layers.numeric.Subtract(n_outputs=1, **kwargs)

Bases: megatron.layers.core.StatelessLayer

Subtract one array from another.

transform(X1, X2)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

Shaping Layers

These Layers are for manipulating the shape of your data, from adding axes to creating time series windows.
class megatron.layers.shaping.AddDim(axis=-1)

Bases: megatron.layers.core.StatelessLayer

Add a dimension to an array.

Parameters:axis (int) – the axis along which to place the new dimension.
transform(X)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.
class megatron.layers.shaping.Cast(new_type)

Bases: megatron.layers.core.StatelessLayer

Re-defines the data type for a Numpy array’s contents.

Parameters:new_type (type) – the new type for the array to be cast to.
transform(X)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.
class megatron.layers.shaping.Concatenate(axis=-1)

Bases: megatron.layers.core.StatelessLayer

Combine arrays along a given axis. Does not create a new axis, unless all 1D inputs.

Parameters:axis (int (default: -1)) – axis along which to concatenate arrays. -1 means the last axis.
transform(*arrays)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.
class megatron.layers.shaping.Filter(n_outputs=1, **kwargs)

Bases: megatron.layers.core.StatelessLayer

Apply given mask to given array along the first axis to filter out observations.

transform(X, mask)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.
class megatron.layers.shaping.Flatten(n_outputs=1, **kwargs)

Bases: megatron.layers.core.StatelessLayer

Reshape an array to be 1D.

transform(X)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.
class megatron.layers.shaping.OneHotLabels(strict=False)

Bases: megatron.layers.core.StatefulLayer

One-hot encode an array of categorical values, or non-consecutive numeric values.

partial_fit(X)

Update metadata based on given batch of data or full dataset.

Contains the main logic of fitting. This is what should be overwritten by all child classes.

Parameters:inputs (numpy.ndarray(s)) – the input data to be fit to; could be one array or a list of arrays.
transform(X)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.
class megatron.layers.shaping.OneHotRange(strict=False)

Bases: megatron.layers.core.StatefulLayer

One-hot encode a numeric array where the values are a sequence.

partial_fit(X)

Update metadata based on given batch of data or full dataset.

Contains the main logic of fitting. This is what should be overwritten by all child classes.

Parameters:inputs (numpy.ndarray(s)) – the input data to be fit to; could be one array or a list of arrays.
transform(X)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.
class megatron.layers.shaping.Reshape(new_shape)

Bases: megatron.layers.core.StatelessLayer

Reshape an array to a given new shape.

Parameters:new_shape (tuple of int) – desired new shape for array.
transform(X)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.
class megatron.layers.shaping.Slice(*slices)

Bases: megatron.layers.core.StatelessLayer

Apply Numpy array slicing. Each slice corresponds to a dimension.

Slices (passed as hyperparameters) are constructed by the following procedure: - To get just N: provide the integer N as the slice - To slice from N to the end: provide a 1-tuple of the integer N, e.g. (5,). - To slice from M to N exclusive: provide a 2-tuple of the integers M and N, e.g. (3, 6). - To slice from M to N with skip P: provide a 3-tuple of the integers M, N, and P.

Parameters:*slices (int(s) or tuple(s)) – the slices to be applied. Must not overlap. Formatting discussed above.
transform(X)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.
class megatron.layers.shaping.SplitDict(fields)

Bases: megatron.layers.core.StatelessLayer

Split dictionary data into separate nodes, with one node per key in the dictionary.

Parameters:fields (list of str) – list of fields, dictionary keys, to be pulled out into their own nodes.
transform(dicts)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.
class megatron.layers.shaping.TimeSeries(window_size, time_axis=1, reverse=False)

Bases: megatron.layers.core.StatefulLayer

Adds a time dimension to a dataset by rolling a window over the data.

Parameters:
  • window_size (int) – length of the window; number of timesteps in the time series.
  • time_axis (int) – on which axis in the array to place the time dimension.
  • reverse (bool (default: False)) – if True, oldest data is first; if False, newest data is first.
partial_fit(X)

Update metadata based on given batch of data or full dataset.

Contains the main logic of fitting. This is what should be overwritten by all child classes.

Parameters:inputs (numpy.ndarray(s)) – the input data to be fit to; could be one array or a list of arrays.
transform(X)

Apply transformation to given input data.

Parameters:inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

Text Layers

These Layers are for manipulating text data.
class megatron.layers.text.RemoveStopwords(language='english')

Bases: megatron.layers.core.StatelessLayer

Remove common, low-information words from all elements of text array.

Parameters:language (str (default: english)) – the language in which the text is written.