Graph embeddings

A graph embedding rolls a graph’s per-node features up into a single fixed-length vector, so each graph becomes one row you can cluster, compare, or feed to a model. NEExT offers four algorithms behind one call.

embeddings.py

embeddings = nxt.compute_graph_embeddings(
  graph_collection=graphs,
  features=features,
  embedding_algorithm="approx_wasserstein",
  embedding_dimension=16,
)

embedding_dimension is required and sets the output vector length.

The four algorithms

`embedding_algorithm`	Family	Notes
`approx_wasserstein`	Distribution-based	Fast approximate Wasserstein embedding (a good default)
`wasserstein`	Distribution-based	Exact Wasserstein embedding
`sinkhornvectorizer`	Distribution-based	Sinkhorn-based vectorizer
`gnn`	Graph neural network	Pure-PyTorch GCN / GraphSAGE / GIN

The three distribution-based algorithms come from the vectorizers library (included in the core install) and accept feature_columns, random_state, and memory_size.

GNN embeddings

The gnn algorithm trains a graph neural network unsupervised (node-feature reconstruction) and pools node representations to the graph level. It is pure PyTorch — no DGL or PyTorch Geometric — and requires the gnn extra (pip install "NEExT[gnn]").

gnn.py

embeddings = nxt.compute_graph_embeddings(
  graph_collection=graphs,
  features=features,
  embedding_algorithm="gnn",
  embedding_dimension=16,
  architecture="GraphSAGE",     # "GCN", "GraphSAGE", or "GIN"
  hidden_dims=[64, 32],
  epochs=100,
  learning_rate=0.01,
  weight_decay=5e-4,
  dropout=0.0,
  pooling="mean",               # "mean", "sum", or "max"
  early_stopping_patience=10,
)

These GNN-only parameters are ignored by the other algorithms:

Parameter	Default	Meaning
`architecture`	`"GCN"`	`"GCN"`, `"GraphSAGE"`, or `"GIN"`
`hidden_dims`	`[64, 32]`	Hidden layer sizes
`epochs`	`100`	Training epochs
`learning_rate`	`0.01`	Adam learning rate
`weight_decay`	`5e-4`	Adam weight decay
`dropout`	`0.0`	Dropout between layers (0–1)
`pooling`	`"mean"`	Node-to-graph pooling (`"mean"`, `"sum"`, `"max"`)
`early_stopping_patience`	`10`	Epochs without validation improvement before stopping

Each graph is processed with a dense adjacency matrix, which suits NEExT’s typically small graphs. NEExT warns when a graph exceeds ~5,000 nodes.

The `Embeddings` container

compute_graph_embeddings returns an Embeddings object:

embeddings.embeddings_df — DataFrame with graph_id plus emb_0 … emb_{D-1}.
embeddings.embedding_name — the algorithm used.
embeddings.embedding_columns — the embedding column names.
emb_a + emb_b — merge two embeddings on graph_id, prefixing columns with each algorithm name so you can stack representations.

Next: train a model on these embeddings.

Graph embeddings

The four algorithms

GNN embeddings

The Embeddings container

The `Embeddings` container