Nodes¶

Core Nodes¶

Nodes are the internal building blocks of Pipelines. While you’re usually not using them directly, it’s helpful to understand how they work.

class megatron.nodes.core.InputNode(name, shape=())¶

Bases: megatron.nodes.core.Node

A pipeline node holding input data as a Numpy array.

It is always an initial node in a Pipeline (has no inbound nodes) and, when run, stores its given data (either from a feed dict or a function call) in its output.

Parameters:	name (str) – a name to associate with the data; the keys of the Pipeline feed dict will be these names. shape (tuple of int) – the shape, not including the observation dimension (1st), of the Numpy arrays to be input.

name¶

a name to associate with the data; the keys of the Pipeline feed dict will be these names.

Type:	str

shape¶

the shape, not including the observation dimension (1st), of the Numpy arrays to be input.

Type:	tuple of int

load(observations)¶

Validate and store the data passed in.

Parameters:	observations (np.ndarray) – data from either the feed dict or the function call, to be validated.
Raises:	`megatron.utils.ShapeError` – error indicating that the shape of the data does not match the shape of the node.

validate_input(observations)¶

Ensure shape of data passed in aligns with shape of the node.

Parameters:	observations (np.ndarray) – data from either the feed dict or the function call, to be validated.
Raises:	`megatron.utils.ShapeError` – error indicating that the shape of the data does not match the shape of the node.

class megatron.nodes.core.Node(inbound_nodes)¶

Bases: object

Base class of pipeline nodes.

Parameters:	inbound_nodes (list of megatron.Node) – nodes who are to be connected as inputs to this node.

inbound_nodes¶

nodes who are to be connected as inputs to this node.

Type:	list of megatron.Node

outbound_nodes¶

nodes to whom this node is connected as an input.

Type:	list of megatron.Node

output¶

holds the data output by the node’s having been run on its inputs.

Type:	np.ndarray

outbounds_run¶

number of outbound nodes that have been executed. this is a helper for efficiently removing unneeded data.

Type:	int

traverse(*path)¶

Return a Node from elsewhere in the graph by navigating to it from this Node.

A negative number indicates moving up to a parent, a positive number down to a child. The number itself is a 1-based index into the parents/children, from left to right. For example, a step of -2 will go to the second parent, while a step of 3 will go to the third child.

Parameters:	path (*ints) – Arbitrary number of integers indicating the steps in the path.
Returns:	the node at the end of the provided path.
Return type:	Node

class megatron.nodes.core.TransformationNode(layer, inbound_nodes, layer_out_index=0)¶

Bases: megatron.nodes.core.Node

A pipeline node holding a Transformation.

It connects to a set of input Nodes (of class Node or Input) and, when run, applies its given Transformation, storing the result in its output variable.

Parameters:	layer (megatron.Layer) – the Layer to be applied to the data from its inbound Nodes. inbound_nodes (list of megatron.Node / megatron.Input) – the Nodes to be connected to this node as input. layer_out_index (int (default: 0)) – when a Layer has multiple return values, shows which one corresponds to this node.

transformation¶

the transformation to be applied to the data from its input Nodes.

Type:	megatron.Transformation

output¶

is None until Node is run; when run, the Numpy array produced is stored here.

Type:	None or np.ndarray

is_fitted¶

indicates whether the Transformation inside the Node has, if necessary, been fit to data.

Type:	bool

fit()¶: Apply fit method from Layer to inbound Nodes’ data.

partial_fit()¶: Apply partial fit method from Layer to inbound Nodes’ data.

transform(prune=True)¶

Apply and store result of transform method from Layer on inbound Nodes’ data.

Parameters:	prune (bool (default: True)) – whether to erase data from intermediate nodes after they are fully used.

Loading Nodes From Files¶

A set of Nodes can be defined according to the schema of a given data source. Here’s how.

megatron.nodes.fromfile.from_csv(filepath, exclude_cols=[], eager=False, nrows=None)¶

Load Input nodes from columns of a CSV file.

Parameters:	filepath (str) – path of CSV file to be loaded. exclude_cols (list of str (default: [])) – any columns that should not be loaded as Input. eager (bool) – whether to load data as well, making for eager execution. nrows (int) – number of rows to load when eager is True. Default is for all rows to load.

megatron.nodes.fromfile.from_dataframe(df, exclude_cols=[], eager=False, nrows=None)¶

Load Input nodes from columns of a Pandas dataframe.

Parameters:	df (Pandas.DataFrame) – dataframe from which to load columns. exclude_cols (list of str (default: [])) – any columns that should not be loaded as Input. eager (bool) – whether to load data as well, making for eager execution. nrows (int) – number of rows to load when eager is True. Default is for all rows to load.

megatron.nodes.fromfile.from_sql(connection, query, eager=False, nrows=None)¶

Load Input nodes from columns of a Pandas dataframe.

Parameters:	connection (Connection) – database connection to load from. query (str) – query to execute in connection to load columns. eager (bool) – whether to load data as well, making for eager execution. nrows (int) – number of rows to load when eager is True. Default is for all rows to load.