Data loading
Every NEExT workflow starts by loading a collection of graphs into a
GraphCollection. There are three entry points, all reachable from the top-level NEExT
object (DataFrame loading lives on GraphIO).
All three share the same options:
graph_type—"networkx"(default, flexible) or"igraph"(fast). See Graphs & collections.reindex_nodes— reindex each graph’s nodes to start at 0 (defaultTrue).filter_largest_component— keep only each graph’s largest connected component (defaultTrue).node_sample_rate— fraction of nodes to sample per graph, in(0, 1](default1.0).
From CSV files
read_from_csv is the primary loader. Paths may be local files or URLs (NEExT reads
them with pandas).
from NEExT import NEExT
nxt = NEExT()
graphs = nxt.read_from_csv(
edges_path="edges.csv",
node_graph_mapping_path="node_graph_mapping.csv",
graph_label_path="graph_labels.csv", # optional
node_features_path="node_features.csv", # optional
edge_features_path="edge_features.csv", # optional
graph_type="networkx",
) CSV column contracts
NEExT validates these columns and raises a clear ValueError if they’re missing:
| File | Required columns | Notes |
|---|---|---|
edges.csv | src_node_id, dest_node_id | Required. One row per edge. |
node_graph_mapping.csv | node_id, graph_id | Required. Assigns each node to a graph. |
graph_labels.csv | graph_id, graph_label | Optional. Targets for supervised models. |
node_features.csv | node_id (+ feature columns) | Optional. Pre-existing node attributes. |
edge_features.csv | src_node_id, dest_node_id (+ feature columns) | Optional. |
Node IDs must be integer-compatible — NEExT graph objects use integer node IDs. If your edge/feature files include a
graph_idcolumn, NEExT scopes rows per graph by it (correct even when node IDs are only unique within a graph).
From NetworkX graphs
Pass a list of networkx.Graph objects. A graph’s label is read from its graph-level
label attribute (G.graph["label"]).
import networkx as nx
from NEExT import NEExT
G1 = nx.karate_club_graph()
G1.graph["label"] = 0
G2 = nx.complete_graph(10)
G2.graph["label"] = 1
nxt = NEExT()
graphs = nxt.load_from_networkx([G1, G2]) From pandas DataFrames
When your data already lives in memory, use GraphIO.load_from_dfs with the same column
contracts as the CSV loader.
from NEExT.io import GraphIO
graph_io = GraphIO()
graphs = graph_io.load_from_dfs(
edges_df=edges_df, # src_node_id, dest_node_id
node_graph_df=node_graph_df, # node_id, graph_id
graph_labels_df=graph_labels_df, # optional: graph_id, graph_label
node_features_df=node_features_df, # optional
edge_features_df=edge_features_df, # optional
) Inspecting a collection
A loaded GraphCollection exposes describe() for a quick summary:
info = graphs.describe()
print(info)
print(len(graphs.graphs), "graphs loaded") No labeled data yet? Generate a labeled collection with the built-in synthetic generators.