Structural features

NEExT describes every node by a vector of structural features — graph-theoretic metrics computed from the graph’s topology. These per-node features are the raw material for graph embeddings.

features.py
features = nxt.compute_node_features(
  graph_collection=graphs,
  feature_list=["all"],       # or a subset of the names below
  feature_vector_length=3,    # k-hop aggregation depth
  normalize_features=True,
  n_jobs=1,
)

The 11 built-in features

Pass any subset by name, or ["all"] for every one:

NameDescription
page_rankPageRank — influence from the link structure
degree_centralityFraction of nodes a node connects to
closeness_centralityInverse mean shortest-path distance to all nodes
betweenness_centralityFraction of shortest paths passing through the node
eigenvector_centralityInfluence weighted by neighbors’ influence
clustering_coefficientHow tightly a node’s neighbors interconnect
local_efficiencyEfficiency of information flow in the node’s neighborhood
lsmeLocal Spectral Method Embedding — local connectivity signature
load_centralityShortest-path load through the node
basic_expansionNeighborhood expansion structure
betastarCommunity-aware node metric (βstar)

["all"] expands to all eleven features. You can also mix "all" with custom feature names in the same feature_list.

The community-aware betastar feature is based on Kamiński, Prałat, Théberge, and Zając, “Predicting Properties of Nodes via Community-Aware Features” (Social Network Analysis and Mining 14(1), 2024) — arXiv:2311.04730, doi:10.1007/s13278-024-01281-2.

k-hop neighborhood aggregation

feature_vector_length controls how far each feature reaches. With the default of 3, every feature is computed for the node itself and aggregated over its k-hop neighborhood, producing a short vector per feature that captures multi-scale structure. Larger values capture wider context at higher cost.

Normalization

With normalize_features=True (default), features are scaled across all nodes. You can also normalize a Features object directly with a chosen scaler:

normalize.py
features.normalize(type="StandardScaler")   # or "MinMaxScaler", "RobustScaler"

Parallelism

Feature computation parallelizes across graphs with joblib:

  • n_jobs — number of parallel workers (default 1).
  • parallel_backend"loky" (default, process-based; serializes notebook-defined functions) or "threading".
  • joblib_kwargs — advanced options forwarded to joblib.Parallel (you may not pass NEExT-owned keys like n_jobs or backend here).
  • profile_features — log per graph-feature timing at INFO level.

Custom features

Register your own metric with my_feature_methods. Your function receives a graph and must return a DataFrame with node_id, graph_id, then one column per feature dimension (named <feature_name>_0, <feature_name>_1, …).

custom_feature.py
import pandas as pd

def degree_squared(graph):
  G = graph.G
  return pd.DataFrame({
      "node_id": graph.nodes,
      "graph_id": graph.graph_id,
      "degree_squared_0": [G.degree(n) ** 2 for n in graph.nodes],
  })[["node_id", "graph_id", "degree_squared_0"]]

features = nxt.compute_node_features(
  graph_collection=graphs,
  feature_list=["page_rank", "degree_squared"],
  my_feature_methods=[
      {"feature_name": "degree_squared", "feature_function": degree_squared},
  ],
)

The Features container

compute_node_features returns a Features object:

  • features.features_df — the underlying DataFrame (node_id, graph_id, feature cols).
  • features.feature_columns — the feature column names.
  • features.normalize(type=...) — re-scale in place.
  • features_a + features_b — merge two feature sets on (node_id, graph_id).

Next: turn these node features into one vector per graph with Embeddings.