Layers¶

Layers are how you build Pipelines. They’re the transformations you’re applying to your data.

Image Layers¶

These Layers are for transformations geared towards image data.

class megatron.layers.image.Downsample(new_shape)¶

Bases: megatron.layers.core.StatelessLayer

Shrink an image to a given size proportionally.

Parameters:	new_shape (tuple of int) – the target shape for the new image.

transform(X)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

class megatron.layers.image.RGBtoBinary(keep_dim=True)¶

Bases: megatron.layers.core.StatelessLayer

Convert image to binary mask where a 1 indicates a non-black cell.

Parameters:	keep_dim (bool) – if True, resulting image will stay 3D and will have 1 color channel. Otherwise 2D.

transform(X)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

class megatron.layers.image.RGBtoGrey(method='luminosity', keep_dim=False)¶

Bases: megatron.layers.core.StatelessLayer

Convert an RGB array representation of an image to greyscale.

Parameters:	method ({'luminosity', 'lightness', 'average'}) –

transform(X)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

class megatron.layers.image.Upsample(new_shape)¶

Bases: megatron.layers.core.StatelessLayer

Expand an image to a given size proportionally.

Parameters:	new_shape (tuple of int) – the target shape for the new image.

transform(X)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

Missing Data Layers¶

These Layers are for dealing with missing data.

class megatron.layers.missing.Impute(imputation_dict)¶

Bases: megatron.layers.core.StatelessLayer

Replace instances of one data item with another, such as missing or NaN with zero.

Parameters:	imputation_dict (dict) – keys of the dictionary are targets to be replaced; values are corresponding replacements.

transform(X)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

Numeric Layers¶

These Layers are for mathematical operations on your data, such as arithmetic.

class megatron.layers.numeric.Add¶

Bases: megatron.layers.core.StatelessLayer

Add up arrays element-wise.

transform(*arrays)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

class megatron.layers.numeric.Divide(impute=0)¶

Bases: megatron.layers.core.StatelessLayer

Divide given array by another given array element-wise.

Parameters:	impute (int/float or None) – the value to impute when encountering a divide by zero.

transform(X1, X2)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

class megatron.layers.numeric.Dot(n_outputs=1, **kwargs)¶

Bases: megatron.layers.core.StatelessLayer

Multiply multiple arrays together as matrix multiplication.

transform(*arrays)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

class megatron.layers.numeric.ElementWiseMultiply(n_outputs=1, **kwargs)¶

Bases: megatron.layers.core.StatelessLayer

Multiply two same-sized arrays element-by-element.

transform(X, Y)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

class megatron.layers.numeric.Normalize(n_outputs=1, **kwargs)¶

Bases: megatron.layers.core.StatelessLayer

Divide array by total to cause it to sum to one. If zero array, make uniform.

transform(X)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

class megatron.layers.numeric.ScalarMultiply(factor)¶

Bases: megatron.layers.core.StatelessLayer

Multiply array by a given scalar.

Parameters:	factor (float) – multiplier.

transform(X)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

class megatron.layers.numeric.StaticDot(W)¶

Bases: megatron.layers.core.StatelessLayer

Multiply array by a given matrix, as matrix mulitplication.

Parameters:	W (np.array) – matrix by which to multiply.

transform(X)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

class megatron.layers.numeric.Subtract(n_outputs=1, **kwargs)¶

Bases: megatron.layers.core.StatelessLayer

Subtract one array from another.

transform(X1, X2)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

Shaping Layers¶

These Layers are for manipulating the shape of your data, from adding axes to creating time series windows.

class megatron.layers.shaping.AddDim(axis=-1)¶

Bases: megatron.layers.core.StatelessLayer

Add a dimension to an array.

Parameters:	axis (int) – the axis along which to place the new dimension.

transform(X)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

class megatron.layers.shaping.Cast(new_type)¶

Bases: megatron.layers.core.StatelessLayer

Re-defines the data type for a Numpy array’s contents.

Parameters:	new_type (type) – the new type for the array to be cast to.

transform(X)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

class megatron.layers.shaping.Concatenate(axis=-1)¶

Bases: megatron.layers.core.StatelessLayer

Combine arrays along a given axis. Does not create a new axis, unless all 1D inputs.

Parameters:	axis (int (default: -1)) – axis along which to concatenate arrays. -1 means the last axis.

transform(*arrays)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

class megatron.layers.shaping.Filter(n_outputs=1, **kwargs)¶

Bases: megatron.layers.core.StatelessLayer

Apply given mask to given array along the first axis to filter out observations.

transform(X, mask)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

class megatron.layers.shaping.Flatten(n_outputs=1, **kwargs)¶

Bases: megatron.layers.core.StatelessLayer

Reshape an array to be 1D.

transform(X)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

class megatron.layers.shaping.OneHotLabels(strict=False)¶

Bases: megatron.layers.core.StatefulLayer

One-hot encode an array of categorical values, or non-consecutive numeric values.

partial_fit(X)¶

Update metadata based on given batch of data or full dataset.

Contains the main logic of fitting. This is what should be overwritten by all child classes.

Parameters:	inputs (numpy.ndarray(s)) – the input data to be fit to; could be one array or a list of arrays.

transform(X)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

class megatron.layers.shaping.OneHotRange(strict=False)¶

Bases: megatron.layers.core.StatefulLayer

One-hot encode a numeric array where the values are a sequence.

partial_fit(X)¶

Update metadata based on given batch of data or full dataset.

Contains the main logic of fitting. This is what should be overwritten by all child classes.

Parameters:	inputs (numpy.ndarray(s)) – the input data to be fit to; could be one array or a list of arrays.

transform(X)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

class megatron.layers.shaping.Reshape(new_shape)¶

Bases: megatron.layers.core.StatelessLayer

Reshape an array to a given new shape.

Parameters:	new_shape (tuple of int) – desired new shape for array.

transform(X)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

class megatron.layers.shaping.Slice(*slices)¶

Bases: megatron.layers.core.StatelessLayer

Apply Numpy array slicing. Each slice corresponds to a dimension.

Slices (passed as hyperparameters) are constructed by the following procedure: - To get just N: provide the integer N as the slice - To slice from N to the end: provide a 1-tuple of the integer N, e.g. (5,). - To slice from M to N exclusive: provide a 2-tuple of the integers M and N, e.g. (3, 6). - To slice from M to N with skip P: provide a 3-tuple of the integers M, N, and P.

Parameters:	slices (int(s) or tuple*(s)) – the slices to be applied. Must not overlap. Formatting discussed above.

transform(X)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

class megatron.layers.shaping.SplitDict(fields)¶

Bases: megatron.layers.core.StatelessLayer

Split dictionary data into separate nodes, with one node per key in the dictionary.

Parameters:	fields (list of str) – list of fields, dictionary keys, to be pulled out into their own nodes.

transform(dicts)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

class megatron.layers.shaping.TimeSeries(window_size, time_axis=1, reverse=False)¶

Bases: megatron.layers.core.StatefulLayer

Adds a time dimension to a dataset by rolling a window over the data.

Parameters:	window_size (int) – length of the window; number of timesteps in the time series. time_axis (int) – on which axis in the array to place the time dimension. reverse (bool (default: False)) – if True, oldest data is first; if False, newest data is first.

partial_fit(X)¶

Update metadata based on given batch of data or full dataset.

Contains the main logic of fitting. This is what should be overwritten by all child classes.

Parameters:	inputs (numpy.ndarray(s)) – the input data to be fit to; could be one array or a list of arrays.

transform(X)¶

Apply transformation to given input data.

Parameters:	inputs (np.ndarray(s)) – input data to be transformed; could be one array or a list of arrays.

Text Layers¶

These Layers are for manipulating text data.

class megatron.layers.text.RemoveStopwords(language='english')¶

Bases: megatron.layers.core.StatelessLayer

Remove common, low-information words from all elements of text array.

Parameters:	language (str (default: english)) – the language in which the text is written.