ML models
Once you have embeddings, train_ml_model trains a supervised model
that maps each graph’s vector to its graph_label, and evaluates it over several
train/test splits.
import numpy as np
results = nxt.train_ml_model(
graph_collection=graphs,
embeddings=embeddings,
model_type="classifier", # or "regressor"
balance_dataset=False,
sample_size=5, # number of train/test iterations
)
print("mean accuracy:", np.mean(results["accuracy"])) Parameters
| Parameter | Default | Meaning |
|---|---|---|
model_type | — | "classifier" or "regressor" (required) |
balance_dataset | False | Balance classes with SMOTE before training (classification) |
sample_size | 5 | Number of train/test iterations to average over |
n_jobs | -1 | Parallel workers (-1 = all CPUs) |
parallel_backend | "process" | "process" or "thread" |
The result
train_ml_model returns a dictionary of evaluation metrics, each a list with one
entry per iteration. Classifiers report results["accuracy"]; regressors report
results["rmse"]. Averaging over the iterations gives a stable estimate:
# Classification
print("accuracy:", np.mean(results["accuracy"]))
# Regression
# results = nxt.train_ml_model(graphs, embeddings, model_type="regressor")
# print("rmse:", np.mean(results["rmse"])) Models used
NEExT trains XGBoost by default (XGBClassifier / XGBRegressor). If XGBoost isn’t
installed, it falls back to scikit-learn’s random forest. Class balancing uses SMOTE from
imbalanced-learn when balance_dataset=True and the package is available.
For finer control (e.g.
test_size, choosingrandom_forestexplicitly), use the underlyingMLModelsclass fromNEExT.ml_modelsdirectly — see the API reference.
To understand why a model works, rank the structural features that feed it with Feature importance.