Graph embeddings
A graph embedding rolls a graph’s per-node features up into a single fixed-length vector, so each graph becomes one row you can cluster, compare, or feed to a model. NEExT offers four algorithms behind one call.
embeddings = nxt.compute_graph_embeddings(
graph_collection=graphs,
features=features,
embedding_algorithm="approx_wasserstein",
embedding_dimension=16,
) embedding_dimension is required and sets the output vector length.
The four algorithms
embedding_algorithm | Family | Notes |
|---|---|---|
approx_wasserstein | Distribution-based | Fast approximate Wasserstein embedding (a good default) |
wasserstein | Distribution-based | Exact Wasserstein embedding |
sinkhornvectorizer | Distribution-based | Sinkhorn-based vectorizer |
gnn | Graph neural network | Pure-PyTorch GCN / GraphSAGE / GIN |
The three distribution-based algorithms come from the vectorizers library (included in
the core install) and accept feature_columns, random_state, and memory_size.
GNN embeddings
The gnn algorithm trains a graph neural network unsupervised (node-feature
reconstruction) and pools node representations to the graph level. It is pure PyTorch — no
DGL or PyTorch Geometric — and requires the gnn extra
(pip install "NEExT[gnn]").
embeddings = nxt.compute_graph_embeddings(
graph_collection=graphs,
features=features,
embedding_algorithm="gnn",
embedding_dimension=16,
architecture="GraphSAGE", # "GCN", "GraphSAGE", or "GIN"
hidden_dims=[64, 32],
epochs=100,
learning_rate=0.01,
weight_decay=5e-4,
dropout=0.0,
pooling="mean", # "mean", "sum", or "max"
early_stopping_patience=10,
) These GNN-only parameters are ignored by the other algorithms:
| Parameter | Default | Meaning |
|---|---|---|
architecture | "GCN" | "GCN", "GraphSAGE", or "GIN" |
hidden_dims | [64, 32] | Hidden layer sizes |
epochs | 100 | Training epochs |
learning_rate | 0.01 | Adam learning rate |
weight_decay | 5e-4 | Adam weight decay |
dropout | 0.0 | Dropout between layers (0–1) |
pooling | "mean" | Node-to-graph pooling ("mean", "sum", "max") |
early_stopping_patience | 10 | Epochs without validation improvement before stopping |
Each graph is processed with a dense adjacency matrix, which suits NEExT’s typically small graphs. NEExT warns when a graph exceeds ~5,000 nodes.
The Embeddings container
compute_graph_embeddings returns an Embeddings object:
embeddings.embeddings_df— DataFrame withgraph_idplusemb_0 … emb_{D-1}.embeddings.embedding_name— the algorithm used.embeddings.embedding_columns— the embedding column names.emb_a + emb_b— merge two embeddings ongraph_id, prefixing columns with each algorithm name so you can stack representations.
Next: train a model on these embeddings.