Data loading

Every NEExT workflow starts by loading a collection of graphs into a GraphCollection. There are three entry points, all reachable from the top-level NEExT object (DataFrame loading lives on GraphIO).

All three share the same options:

graph_type — "networkx" (default, flexible) or "igraph" (fast). See Graphs & collections.
reindex_nodes — reindex each graph’s nodes to start at 0 (default True).
filter_largest_component — keep only each graph’s largest connected component (default True).
node_sample_rate — fraction of nodes to sample per graph, in (0, 1] (default 1.0).

From CSV files

read_from_csv is the primary loader. Paths may be local files or URLs (NEExT reads them with pandas).

from_csv.py

from NEExT import NEExT

nxt = NEExT()

graphs = nxt.read_from_csv(
  edges_path="edges.csv",
  node_graph_mapping_path="node_graph_mapping.csv",
  graph_label_path="graph_labels.csv",       # optional
  node_features_path="node_features.csv",     # optional
  edge_features_path="edge_features.csv",     # optional
  graph_type="networkx",
)

CSV column contracts

NEExT validates these columns and raises a clear ValueError if they’re missing:

File	Required columns	Notes
`edges.csv`	`src_node_id`, `dest_node_id`	Required. One row per edge.
`node_graph_mapping.csv`	`node_id`, `graph_id`	Required. Assigns each node to a graph.
`graph_labels.csv`	`graph_id`, `graph_label`	Optional. Targets for supervised models.
`node_features.csv`	`node_id` (+ feature columns)	Optional. Pre-existing node attributes.
`edge_features.csv`	`src_node_id`, `dest_node_id` (+ feature columns)	Optional.

Node IDs must be integer-compatible — NEExT graph objects use integer node IDs. If your edge/feature files include a graph_id column, NEExT scopes rows per graph by it (correct even when node IDs are only unique within a graph).

From NetworkX graphs

Pass a list of networkx.Graph objects. A graph’s label is read from its graph-level label attribute (G.graph["label"]).

from_networkx.py

import networkx as nx
from NEExT import NEExT

G1 = nx.karate_club_graph()
G1.graph["label"] = 0

G2 = nx.complete_graph(10)
G2.graph["label"] = 1

nxt = NEExT()
graphs = nxt.load_from_networkx([G1, G2])

From pandas DataFrames

When your data already lives in memory, use GraphIO.load_from_dfs with the same column contracts as the CSV loader.

from_dfs.py

from NEExT.io import GraphIO

graph_io = GraphIO()
graphs = graph_io.load_from_dfs(
  edges_df=edges_df,                  # src_node_id, dest_node_id
  node_graph_df=node_graph_df,        # node_id, graph_id
  graph_labels_df=graph_labels_df,    # optional: graph_id, graph_label
  node_features_df=node_features_df,  # optional
  edge_features_df=edge_features_df,  # optional
)

Inspecting a collection

A loaded GraphCollection exposes describe() for a quick summary:

inspect.py

info = graphs.describe()
print(info)
print(len(graphs.graphs), "graphs loaded")

No labeled data yet? Generate a labeled collection with the built-in synthetic generators.