Structural features

NEExT describes every node by a vector of structural features — graph-theoretic metrics computed from the graph’s topology. These per-node features are the raw material for graph embeddings.

features.py

features = nxt.compute_node_features(
  graph_collection=graphs,
  feature_list=["all"],       # or a subset of the names below
  feature_vector_length=3,    # k-hop aggregation depth
  normalize_features=True,
  n_jobs=1,
)

The 11 built-in features

Pass any subset by name, or ["all"] for every one:

Name	Description
`page_rank`	PageRank — influence from the link structure
`degree_centrality`	Fraction of nodes a node connects to
`closeness_centrality`	Inverse mean shortest-path distance to all nodes
`betweenness_centrality`	Fraction of shortest paths passing through the node
`eigenvector_centrality`	Influence weighted by neighbors’ influence
`clustering_coefficient`	How tightly a node’s neighbors interconnect
`local_efficiency`	Efficiency of information flow in the node’s neighborhood
`lsme`	Local Spectral Method Embedding — local connectivity signature
`load_centrality`	Shortest-path load through the node
`basic_expansion`	Neighborhood expansion structure
`betastar`	Community-aware node metric (βstar)

["all"] expands to all eleven features. You can also mix "all" with custom feature names in the same feature_list.

The community-aware betastar feature is based on Kamiński, Prałat, Théberge, and Zając, “Predicting Properties of Nodes via Community-Aware Features” (Social Network Analysis and Mining 14(1), 2024) — arXiv:2311.04730, doi:10.1007/s13278-024-01281-2.

k-hop neighborhood aggregation

feature_vector_length controls how far each feature reaches. With the default of 3, every feature is computed for the node itself and aggregated over its k-hop neighborhood, producing a short vector per feature that captures multi-scale structure. Larger values capture wider context at higher cost.

Normalization

With normalize_features=True (default), features are scaled across all nodes. You can also normalize a Features object directly with a chosen scaler:

normalize.py

features.normalize(type="StandardScaler")   # or "MinMaxScaler", "RobustScaler"

Parallelism

Feature computation parallelizes across graphs with joblib:

n_jobs — number of parallel workers (default 1).
parallel_backend — "loky" (default, process-based; serializes notebook-defined functions) or "threading".
joblib_kwargs — advanced options forwarded to joblib.Parallel (you may not pass NEExT-owned keys like n_jobs or backend here).
profile_features — log per graph-feature timing at INFO level.

Custom features

Register your own metric with my_feature_methods. Your function receives a graph and must return a DataFrame with node_id, graph_id, then one column per feature dimension (named <feature_name>_0, <feature_name>_1, …).

custom_feature.py

import pandas as pd

def degree_squared(graph):
  G = graph.G
  return pd.DataFrame({
      "node_id": graph.nodes,
      "graph_id": graph.graph_id,
      "degree_squared_0": [G.degree(n) ** 2 for n in graph.nodes],
  })[["node_id", "graph_id", "degree_squared_0"]]

features = nxt.compute_node_features(
  graph_collection=graphs,
  feature_list=["page_rank", "degree_squared"],
  my_feature_methods=[
      {"feature_name": "degree_squared", "feature_function": degree_squared},
  ],
)

The `Features` container

compute_node_features returns a Features object:

features.features_df — the underlying DataFrame (node_id, graph_id, feature cols).
features.feature_columns — the feature column names.
features.normalize(type=...) — re-scale in place.
features_a + features_b — merge two feature sets on (node_id, graph_id).

Next: turn these node features into one vector per graph with Embeddings.