Graphs & collections

NEExT operates on collections of graphs. The two core data structures are Graph (a single graph) and GraphCollection (an ordered set of them). You rarely construct these by hand — the loaders build them for you — but understanding their shape makes the rest of the pipeline clearer.

The Graph object

Each Graph wraps an underlying NetworkX or iGraph object and carries:

AttributeMeaning
graph_idUnique identifier for the graph
graph_labelOptional label/target (from graph_labels.csv or G.graph["label"])
nodesList of integer node IDs
edgesList of (src, dest) tuples
node_attributesPer-node attribute dictionaries
edge_attributesPer-edge attribute dictionaries
graph_type"networkx" or "igraph"
GThe underlying backend graph object

Backends: NetworkX vs iGraph

Set graph_type at load time. The whole collection uses one backend.

  • "networkx" (default) — maximum flexibility and compatibility.
  • "igraph" — faster for large graphs. Required by some operations, notably Leiden community egonets.
backend.py
# Fast iGraph backend
graphs = nxt.read_from_csv(
  edges_path="edges.csv",
  node_graph_mapping_path="node_graph_mapping.csv",
  graph_type="igraph",
)

Load-time transforms

The loaders apply three normalizations, all controllable:

  • reindex_nodes (default True) — relabels each graph’s nodes to a contiguous range starting at 0. NEExT graph objects require integer node IDs.
  • filter_largest_component (default True) — drops all but the largest connected component of each graph, so structural features are computed on a connected graph.
  • node_sample_rate (default 1.0) — keeps a random fraction of nodes per graph in (0, 1]. Lower it to control memory on very large graphs.
transforms.py
graphs = nxt.read_from_csv(
  edges_path="edges.csv",
  node_graph_mapping_path="node_graph_mapping.csv",
  reindex_nodes=True,
  filter_largest_component=True,
  node_sample_rate=0.5,   # keep ~half the nodes per graph
)

Working with a collection

A GraphCollection holds its graphs in .graphs and summarizes itself with .describe():

collection.py
print(len(graphs.graphs))   # number of graphs
print(graphs.graph_type)    # "networkx" or "igraph"
print(graphs.describe())    # summary statistics

For node-level tasks, a GraphCollection can be decomposed into per-node subgraphs — an EgonetCollection. See Egonets.