Nodes¶
Core Nodes¶
Nodes are the internal building blocks of Pipelines. While you’re usually not using them directly, it’s helpful to understand how they work.
-
class
megatron.nodes.core.
InputNode
(name, shape=())¶ Bases:
megatron.nodes.core.Node
A pipeline node holding input data as a Numpy array.
It is always an initial node in a Pipeline (has no inbound nodes) and, when run, stores its given data (either from a feed dict or a function call) in its output.
Parameters: - name (str) – a name to associate with the data; the keys of the Pipeline feed dict will be these names.
- shape (tuple of int) – the shape, not including the observation dimension (1st), of the Numpy arrays to be input.
-
name
¶ a name to associate with the data; the keys of the Pipeline feed dict will be these names.
Type: str
-
shape
¶ the shape, not including the observation dimension (1st), of the Numpy arrays to be input.
Type: tuple of int
-
load
(observations)¶ Validate and store the data passed in.
Parameters: observations (np.ndarray) – data from either the feed dict or the function call, to be validated. Raises: megatron.utils.ShapeError
– error indicating that the shape of the data does not match the shape of the node.
-
validate_input
(observations)¶ Ensure shape of data passed in aligns with shape of the node.
Parameters: observations (np.ndarray) – data from either the feed dict or the function call, to be validated. Raises: megatron.utils.ShapeError
– error indicating that the shape of the data does not match the shape of the node.
-
class
megatron.nodes.core.
Node
(inbound_nodes)¶ Bases:
object
Base class of pipeline nodes.
Parameters: inbound_nodes (list of megatron.Node) – nodes who are to be connected as inputs to this node. -
inbound_nodes
¶ nodes who are to be connected as inputs to this node.
Type: list of megatron.Node
-
outbound_nodes
¶ nodes to whom this node is connected as an input.
Type: list of megatron.Node
-
output
¶ holds the data output by the node’s having been run on its inputs.
Type: np.ndarray
-
outbounds_run
¶ number of outbound nodes that have been executed. this is a helper for efficiently removing unneeded data.
Type: int
-
traverse
(*path)¶ Return a Node from elsewhere in the graph by navigating to it from this Node.
A negative number indicates moving up to a parent, a positive number down to a child. The number itself is a 1-based index into the parents/children, from left to right. For example, a step of -2 will go to the second parent, while a step of 3 will go to the third child.
Parameters: path (*ints) – Arbitrary number of integers indicating the steps in the path. Returns: the node at the end of the provided path. Return type: Node
-
-
class
megatron.nodes.core.
TransformationNode
(layer, inbound_nodes, layer_out_index=0)¶ Bases:
megatron.nodes.core.Node
A pipeline node holding a Transformation.
It connects to a set of input Nodes (of class Node or Input) and, when run, applies its given Transformation, storing the result in its output variable.
Parameters: - layer (megatron.Layer) – the Layer to be applied to the data from its inbound Nodes.
- inbound_nodes (list of megatron.Node / megatron.Input) – the Nodes to be connected to this node as input.
- layer_out_index (int (default: 0)) – when a Layer has multiple return values, shows which one corresponds to this node.
-
transformation
¶ the transformation to be applied to the data from its input Nodes.
Type: megatron.Transformation
-
output
¶ is None until Node is run; when run, the Numpy array produced is stored here.
Type: None or np.ndarray
-
is_fitted
¶ indicates whether the Transformation inside the Node has, if necessary, been fit to data.
Type: bool
-
fit
()¶ Apply fit method from Layer to inbound Nodes’ data.
-
partial_fit
()¶ Apply partial fit method from Layer to inbound Nodes’ data.
-
transform
(prune=True)¶ Apply and store result of transform method from Layer on inbound Nodes’ data.
Parameters: prune (bool (default: True)) – whether to erase data from intermediate nodes after they are fully used.
Loading Nodes From Files¶
A set of Nodes can be defined according to the schema of a given data source. Here’s how.
-
megatron.nodes.fromfile.
from_csv
(filepath, exclude_cols=[], eager=False, nrows=None)¶ Load Input nodes from columns of a CSV file.
Parameters: - filepath (str) – path of CSV file to be loaded.
- exclude_cols (list of str (default: [])) – any columns that should not be loaded as Input.
- eager (bool) – whether to load data as well, making for eager execution.
- nrows (int) – number of rows to load when eager is True. Default is for all rows to load.
-
megatron.nodes.fromfile.
from_dataframe
(df, exclude_cols=[], eager=False, nrows=None)¶ Load Input nodes from columns of a Pandas dataframe.
Parameters: - df (Pandas.DataFrame) – dataframe from which to load columns.
- exclude_cols (list of str (default: [])) – any columns that should not be loaded as Input.
- eager (bool) – whether to load data as well, making for eager execution.
- nrows (int) – number of rows to load when eager is True. Default is for all rows to load.
-
megatron.nodes.fromfile.
from_sql
(connection, query, eager=False, nrows=None)¶ Load Input nodes from columns of a Pandas dataframe.
Parameters: - connection (Connection) – database connection to load from.
- query (str) – query to execute in connection to load columns.
- eager (bool) – whether to load data as well, making for eager execution.
- nrows (int) – number of rows to load when eager is True. Default is for all rows to load.