Graphs & collections
NEExT operates on collections of graphs. The two core data structures are Graph (a
single graph) and GraphCollection (an ordered set of them). You rarely construct these
by hand — the loaders build them for you — but understanding their
shape makes the rest of the pipeline clearer.
The Graph object
Each Graph wraps an underlying NetworkX or iGraph object and carries:
| Attribute | Meaning |
|---|---|
graph_id | Unique identifier for the graph |
graph_label | Optional label/target (from graph_labels.csv or G.graph["label"]) |
nodes | List of integer node IDs |
edges | List of (src, dest) tuples |
node_attributes | Per-node attribute dictionaries |
edge_attributes | Per-edge attribute dictionaries |
graph_type | "networkx" or "igraph" |
G | The underlying backend graph object |
Backends: NetworkX vs iGraph
Set graph_type at load time. The whole collection uses one backend.
"networkx"(default) — maximum flexibility and compatibility."igraph"— faster for large graphs. Required by some operations, notably Leiden community egonets.
# Fast iGraph backend
graphs = nxt.read_from_csv(
edges_path="edges.csv",
node_graph_mapping_path="node_graph_mapping.csv",
graph_type="igraph",
) Load-time transforms
The loaders apply three normalizations, all controllable:
reindex_nodes(defaultTrue) — relabels each graph’s nodes to a contiguous range starting at 0. NEExT graph objects require integer node IDs.filter_largest_component(defaultTrue) — drops all but the largest connected component of each graph, so structural features are computed on a connected graph.node_sample_rate(default1.0) — keeps a random fraction of nodes per graph in(0, 1]. Lower it to control memory on very large graphs.
graphs = nxt.read_from_csv(
edges_path="edges.csv",
node_graph_mapping_path="node_graph_mapping.csv",
reindex_nodes=True,
filter_largest_component=True,
node_sample_rate=0.5, # keep ~half the nodes per graph
) Working with a collection
A GraphCollection holds its graphs in .graphs and summarizes itself with
.describe():
print(len(graphs.graphs)) # number of graphs
print(graphs.graph_type) # "networkx" or "igraph"
print(graphs.describe()) # summary statistics For node-level tasks, a GraphCollection can be decomposed into per-node subgraphs —
an EgonetCollection. See Egonets.