ML models

Once you have embeddings, train_ml_model trains a supervised model that maps each graph’s vector to its graph_label, and evaluates it over several train/test splits.

train.py
import numpy as np

results = nxt.train_ml_model(
  graph_collection=graphs,
  embeddings=embeddings,
  model_type="classifier",   # or "regressor"
  balance_dataset=False,
  sample_size=5,             # number of train/test iterations
)

print("mean accuracy:", np.mean(results["accuracy"]))

Parameters

ParameterDefaultMeaning
model_type"classifier" or "regressor" (required)
balance_datasetFalseBalance classes with SMOTE before training (classification)
sample_size5Number of train/test iterations to average over
n_jobs-1Parallel workers (-1 = all CPUs)
parallel_backend"process""process" or "thread"

The result

train_ml_model returns a dictionary of evaluation metrics, each a list with one entry per iteration. Classifiers report results["accuracy"]; regressors report results["rmse"]. Averaging over the iterations gives a stable estimate:

metrics.py
# Classification
print("accuracy:", np.mean(results["accuracy"]))

# Regression
# results = nxt.train_ml_model(graphs, embeddings, model_type="regressor")
# print("rmse:", np.mean(results["rmse"]))

Models used

NEExT trains XGBoost by default (XGBClassifier / XGBRegressor). If XGBoost isn’t installed, it falls back to scikit-learn’s random forest. Class balancing uses SMOTE from imbalanced-learn when balance_dataset=True and the package is available.

For finer control (e.g. test_size, choosing random_forest explicitly), use the underlying MLModels class from NEExT.ml_models directly — see the API reference.

To understand why a model works, rank the structural features that feed it with Feature importance.