Quickstart
This walkthrough runs the full NEExT pipeline on a labeled dataset — no external files
required, thanks to the built-in synthetic graph generator. Swap in
read_from_csv to use your own data.
The complete pipeline
import numpy as np
from NEExT import NEExT
nxt = NEExT()
# 1. Get a labeled collection of graphs.
# Here: 50 Erdos-Renyi vs 50 Barabasi-Albert graphs (a binary classification task).
graphs = nxt.generate_synthetic_graphs(preset="er_vs_ba", seed=42)
# 2. Compute structural node features (all 11 built-ins), with k-hop aggregation.
features = nxt.compute_node_features(graphs, feature_list=["all"], feature_vector_length=3)
# 3. Aggregate node features into one vector per graph.
embeddings = nxt.compute_graph_embeddings(
graphs,
features,
embedding_algorithm="approx_wasserstein",
embedding_dimension=16,
)
# 4. Train and cross-evaluate a classifier.
results = nxt.train_ml_model(graphs, embeddings, model_type="classifier")
print("mean accuracy:", np.mean(results["accuracy"]))
# 5. Rank which structural features drive the task.
importance = nxt.compute_feature_importance(
graphs,
features,
feature_importance_algorithm="supervised_fast",
)
print(importance.head()) Understanding the output
Each call hands its result to the next stage:
generate_synthetic_graphsreturns aGraphCollection. Theer_vs_bapreset builds two topologically distinct classes, so there is a real signal to learn. To use your own data instead, callread_from_csv(...)— see Data loading.compute_node_featuresreturns aFeaturesobject wrapping a DataFrame with one row per node.feature_list=["all"]computes every built-in structural feature, andfeature_vector_length=3captures each feature over the node plus its k-hop neighborhood. Features are normalized by default. See Structural features.compute_graph_embeddingsrolls those per-node features up into a singleembedding_dimension-length vector per graph using a Wasserstein-distance embedding. See Embeddings.train_ml_modeltrains an XGBoost classifier over several train/test splits and returns a dict of metric lists —results["accuracy"]for classifiers,results["rmse"]for regressors. See ML models.compute_feature_importancereturns a DataFrame ranking the structural features by how much they contribute to the task. See Feature importance.
Swapping in the GNN
Every stage is independent, so you can change the embedding algorithm without touching the
rest. To train a graph neural network instead of the Wasserstein embedding (requires the
gnn extra):
embeddings = nxt.compute_graph_embeddings(
graphs,
features,
embedding_algorithm="gnn",
embedding_dimension=16,
architecture="GraphSAGE", # or "GCN", "GIN"
) Loading your own graphs
NEExT reads a simple CSV contract — an edge list plus a node-to-graph mapping, with optional labels and features:
graphs = nxt.read_from_csv(
edges_path="edges.csv", # src_node_id, dest_node_id
node_graph_mapping_path="node_graph_mapping.csv", # node_id, graph_id
graph_label_path="graph_labels.csv", # graph_id, graph_label
) See Data loading for every accepted source and the exact column contracts.