Nodes

Core Nodes

Nodes are the internal building blocks of Pipelines. While you’re usually not using them directly, it’s helpful to understand how they work.
class megatron.nodes.core.InputNode(name, shape=())

Bases: megatron.nodes.core.Node

A pipeline node holding input data as a Numpy array.

It is always an initial node in a Pipeline (has no inbound nodes) and, when run, stores its given data (either from a feed dict or a function call) in its output.

Parameters:
  • name (str) – a name to associate with the data; the keys of the Pipeline feed dict will be these names.
  • shape (tuple of int) – the shape, not including the observation dimension (1st), of the Numpy arrays to be input.
name

a name to associate with the data; the keys of the Pipeline feed dict will be these names.

Type:str
shape

the shape, not including the observation dimension (1st), of the Numpy arrays to be input.

Type:tuple of int
load(observations)

Validate and store the data passed in.

Parameters:observations (np.ndarray) – data from either the feed dict or the function call, to be validated.
Raises:megatron.utils.ShapeError – error indicating that the shape of the data does not match the shape of the node.
validate_input(observations)

Ensure shape of data passed in aligns with shape of the node.

Parameters:observations (np.ndarray) – data from either the feed dict or the function call, to be validated.
Raises:megatron.utils.ShapeError – error indicating that the shape of the data does not match the shape of the node.
class megatron.nodes.core.Node(inbound_nodes)

Bases: object

Base class of pipeline nodes.

Parameters:inbound_nodes (list of megatron.Node) – nodes who are to be connected as inputs to this node.
inbound_nodes

nodes who are to be connected as inputs to this node.

Type:list of megatron.Node
outbound_nodes

nodes to whom this node is connected as an input.

Type:list of megatron.Node
output

holds the data output by the node’s having been run on its inputs.

Type:np.ndarray
outbounds_run

number of outbound nodes that have been executed. this is a helper for efficiently removing unneeded data.

Type:int
traverse(*path)

Return a Node from elsewhere in the graph by navigating to it from this Node.

A negative number indicates moving up to a parent, a positive number down to a child. The number itself is a 1-based index into the parents/children, from left to right. For example, a step of -2 will go to the second parent, while a step of 3 will go to the third child.

Parameters:path (*ints) – Arbitrary number of integers indicating the steps in the path.
Returns:the node at the end of the provided path.
Return type:Node
class megatron.nodes.core.TransformationNode(layer, inbound_nodes, layer_out_index=0)

Bases: megatron.nodes.core.Node

A pipeline node holding a Transformation.

It connects to a set of input Nodes (of class Node or Input) and, when run, applies its given Transformation, storing the result in its output variable.

Parameters:
  • layer (megatron.Layer) – the Layer to be applied to the data from its inbound Nodes.
  • inbound_nodes (list of megatron.Node / megatron.Input) – the Nodes to be connected to this node as input.
  • layer_out_index (int (default: 0)) – when a Layer has multiple return values, shows which one corresponds to this node.
transformation

the transformation to be applied to the data from its input Nodes.

Type:megatron.Transformation
output

is None until Node is run; when run, the Numpy array produced is stored here.

Type:None or np.ndarray
is_fitted

indicates whether the Transformation inside the Node has, if necessary, been fit to data.

Type:bool
fit()

Apply fit method from Layer to inbound Nodes’ data.

partial_fit()

Apply partial fit method from Layer to inbound Nodes’ data.

transform(prune=True)

Apply and store result of transform method from Layer on inbound Nodes’ data.

Parameters:prune (bool (default: True)) – whether to erase data from intermediate nodes after they are fully used.

Loading Nodes From Files

A set of Nodes can be defined according to the schema of a given data source. Here’s how.
megatron.nodes.fromfile.from_csv(filepath, exclude_cols=[], eager=False, nrows=None)

Load Input nodes from columns of a CSV file.

Parameters:
  • filepath (str) – path of CSV file to be loaded.
  • exclude_cols (list of str (default: [])) – any columns that should not be loaded as Input.
  • eager (bool) – whether to load data as well, making for eager execution.
  • nrows (int) – number of rows to load when eager is True. Default is for all rows to load.
megatron.nodes.fromfile.from_dataframe(df, exclude_cols=[], eager=False, nrows=None)

Load Input nodes from columns of a Pandas dataframe.

Parameters:
  • df (Pandas.DataFrame) – dataframe from which to load columns.
  • exclude_cols (list of str (default: [])) – any columns that should not be loaded as Input.
  • eager (bool) – whether to load data as well, making for eager execution.
  • nrows (int) – number of rows to load when eager is True. Default is for all rows to load.
megatron.nodes.fromfile.from_sql(connection, query, eager=False, nrows=None)

Load Input nodes from columns of a Pandas dataframe.

Parameters:
  • connection (Connection) – database connection to load from.
  • query (str) – query to execute in connection to load columns.
  • eager (bool) – whether to load data as well, making for eager execution.
  • nrows (int) – number of rows to load when eager is True. Default is for all rows to load.