Graphs & collections

NEExT operates on collections of graphs. The two core data structures are Graph (a single graph) and GraphCollection (an ordered set of them). You rarely construct these by hand — the loaders build them for you — but understanding their shape makes the rest of the pipeline clearer.

The `Graph` object

Each Graph wraps an underlying NetworkX or iGraph object and carries:

Attribute	Meaning
`graph_id`	Unique identifier for the graph
`graph_label`	Optional label/target (from `graph_labels.csv` or `G.graph["label"]`)
`nodes`	List of integer node IDs
`edges`	List of `(src, dest)` tuples
`node_attributes`	Per-node attribute dictionaries
`edge_attributes`	Per-edge attribute dictionaries
`graph_type`	`"networkx"` or `"igraph"`
`G`	The underlying backend graph object

Backends: NetworkX vs iGraph

Set graph_type at load time. The whole collection uses one backend.

"networkx" (default) — maximum flexibility and compatibility.
"igraph" — faster for large graphs. Required by some operations, notably Leiden community egonets.

backend.py

# Fast iGraph backend
graphs = nxt.read_from_csv(
  edges_path="edges.csv",
  node_graph_mapping_path="node_graph_mapping.csv",
  graph_type="igraph",
)

Load-time transforms

The loaders apply three normalizations, all controllable:

reindex_nodes (default True) — relabels each graph’s nodes to a contiguous range starting at 0. NEExT graph objects require integer node IDs.
filter_largest_component (default True) — drops all but the largest connected component of each graph, so structural features are computed on a connected graph.
node_sample_rate (default 1.0) — keeps a random fraction of nodes per graph in (0, 1]. Lower it to control memory on very large graphs.

transforms.py

graphs = nxt.read_from_csv(
  edges_path="edges.csv",
  node_graph_mapping_path="node_graph_mapping.csv",
  reindex_nodes=True,
  filter_largest_component=True,
  node_sample_rate=0.5,   # keep ~half the nodes per graph
)

Working with a collection

A GraphCollection holds its graphs in .graphs and summarizes itself with .describe():

collection.py

print(len(graphs.graphs))   # number of graphs
print(graphs.graph_type)    # "networkx" or "igraph"
print(graphs.describe())    # summary statistics

For node-level tasks, a GraphCollection can be decomposed into per-node subgraphs — an EgonetCollection. See Egonets.

Graphs & collections

The Graph object

Backends: NetworkX vs iGraph

Load-time transforms

Working with a collection

The `Graph` object