Quickstart

This walkthrough runs the full NEExT pipeline on a labeled dataset — no external files required, thanks to the built-in synthetic graph generator. Swap in read_from_csv to use your own data.

The complete pipeline

quickstart.py
import numpy as np
from NEExT import NEExT

nxt = NEExT()

# 1. Get a labeled collection of graphs.
#    Here: 50 Erdos-Renyi vs 50 Barabasi-Albert graphs (a binary classification task).
graphs = nxt.generate_synthetic_graphs(preset="er_vs_ba", seed=42)

# 2. Compute structural node features (all 11 built-ins), with k-hop aggregation.
features = nxt.compute_node_features(graphs, feature_list=["all"], feature_vector_length=3)

# 3. Aggregate node features into one vector per graph.
embeddings = nxt.compute_graph_embeddings(
  graphs,
  features,
  embedding_algorithm="approx_wasserstein",
  embedding_dimension=16,
)

# 4. Train and cross-evaluate a classifier.
results = nxt.train_ml_model(graphs, embeddings, model_type="classifier")
print("mean accuracy:", np.mean(results["accuracy"]))

# 5. Rank which structural features drive the task.
importance = nxt.compute_feature_importance(
  graphs,
  features,
  feature_importance_algorithm="supervised_fast",
)
print(importance.head())

Understanding the output

Each call hands its result to the next stage:

  1. generate_synthetic_graphs returns a GraphCollection. The er_vs_ba preset builds two topologically distinct classes, so there is a real signal to learn. To use your own data instead, call read_from_csv(...) — see Data loading.
  2. compute_node_features returns a Features object wrapping a DataFrame with one row per node. feature_list=["all"] computes every built-in structural feature, and feature_vector_length=3 captures each feature over the node plus its k-hop neighborhood. Features are normalized by default. See Structural features.
  3. compute_graph_embeddings rolls those per-node features up into a single embedding_dimension-length vector per graph using a Wasserstein-distance embedding. See Embeddings.
  4. train_ml_model trains an XGBoost classifier over several train/test splits and returns a dict of metric lists — results["accuracy"] for classifiers, results["rmse"] for regressors. See ML models.
  5. compute_feature_importance returns a DataFrame ranking the structural features by how much they contribute to the task. See Feature importance.

Swapping in the GNN

Every stage is independent, so you can change the embedding algorithm without touching the rest. To train a graph neural network instead of the Wasserstein embedding (requires the gnn extra):

gnn.py
embeddings = nxt.compute_graph_embeddings(
  graphs,
  features,
  embedding_algorithm="gnn",
  embedding_dimension=16,
  architecture="GraphSAGE",   # or "GCN", "GIN"
)

Loading your own graphs

NEExT reads a simple CSV contract — an edge list plus a node-to-graph mapping, with optional labels and features:

load_csv.py
graphs = nxt.read_from_csv(
  edges_path="edges.csv",                       # src_node_id, dest_node_id
  node_graph_mapping_path="node_graph_mapping.csv",  # node_id, graph_id
  graph_label_path="graph_labels.csv",          # graph_id, graph_label
)

See Data loading for every accepted source and the exact column contracts.