Data loading

Every NEExT workflow starts by loading a collection of graphs into a GraphCollection. There are three entry points, all reachable from the top-level NEExT object (DataFrame loading lives on GraphIO).

All three share the same options:

  • graph_type"networkx" (default, flexible) or "igraph" (fast). See Graphs & collections.
  • reindex_nodes — reindex each graph’s nodes to start at 0 (default True).
  • filter_largest_component — keep only each graph’s largest connected component (default True).
  • node_sample_rate — fraction of nodes to sample per graph, in (0, 1] (default 1.0).

From CSV files

read_from_csv is the primary loader. Paths may be local files or URLs (NEExT reads them with pandas).

from_csv.py
from NEExT import NEExT

nxt = NEExT()

graphs = nxt.read_from_csv(
  edges_path="edges.csv",
  node_graph_mapping_path="node_graph_mapping.csv",
  graph_label_path="graph_labels.csv",       # optional
  node_features_path="node_features.csv",     # optional
  edge_features_path="edge_features.csv",     # optional
  graph_type="networkx",
)

CSV column contracts

NEExT validates these columns and raises a clear ValueError if they’re missing:

FileRequired columnsNotes
edges.csvsrc_node_id, dest_node_idRequired. One row per edge.
node_graph_mapping.csvnode_id, graph_idRequired. Assigns each node to a graph.
graph_labels.csvgraph_id, graph_labelOptional. Targets for supervised models.
node_features.csvnode_id (+ feature columns)Optional. Pre-existing node attributes.
edge_features.csvsrc_node_id, dest_node_id (+ feature columns)Optional.

Node IDs must be integer-compatible — NEExT graph objects use integer node IDs. If your edge/feature files include a graph_id column, NEExT scopes rows per graph by it (correct even when node IDs are only unique within a graph).

From NetworkX graphs

Pass a list of networkx.Graph objects. A graph’s label is read from its graph-level label attribute (G.graph["label"]).

from_networkx.py
import networkx as nx
from NEExT import NEExT

G1 = nx.karate_club_graph()
G1.graph["label"] = 0

G2 = nx.complete_graph(10)
G2.graph["label"] = 1

nxt = NEExT()
graphs = nxt.load_from_networkx([G1, G2])

From pandas DataFrames

When your data already lives in memory, use GraphIO.load_from_dfs with the same column contracts as the CSV loader.

from_dfs.py
from NEExT.io import GraphIO

graph_io = GraphIO()
graphs = graph_io.load_from_dfs(
  edges_df=edges_df,                  # src_node_id, dest_node_id
  node_graph_df=node_graph_df,        # node_id, graph_id
  graph_labels_df=graph_labels_df,    # optional: graph_id, graph_label
  node_features_df=node_features_df,  # optional
  edge_features_df=edge_features_df,  # optional
)

Inspecting a collection

A loaded GraphCollection exposes describe() for a quick summary:

inspect.py
info = graphs.describe()
print(info)
print(len(graphs.graphs), "graphs loaded")

No labeled data yet? Generate a labeled collection with the built-in synthetic generators.