Modeling Nodes

This page documents modeling configuration for SkyulfPipeline.

Common config shape

SkyulfPipeline expects a modeling block like:

{
  "type": "logistic_regression",
  "node_id": "model_node",  # optional
  "params": { ... }          # optional; estimator hyperparameters
}

The sklearn wrapper supports both:

Nested params (preferred): { "params": {"C": 1.0} }
Flat params (legacy): { "C": 1.0, "type": "..." }

Example (RandomForestClassifier):

{
  "type": "random_forest_classifier",
  "params": {"n_estimators": 50, "random_state": 42}
}

Classification

logistic_regression

Backed by sklearn.linear_model.LogisticRegression.

Defaults:

max_iter=1000
solver=lbfgs
random_state=42

Learned params:

fitted sklearn estimator (stored in-memory and pickled when saving the pipeline)

random_forest_classifier

Backed by sklearn.ensemble.RandomForestClassifier.

Defaults include:

n_estimators=50, max_depth=10
min_samples_split=5, min_samples_leaf=2
n_jobs=-1, random_state=42

Learned params:

fitted sklearn estimator

Regression

ridge_regression

Backed by sklearn.linear_model.Ridge.

Defaults:

alpha=1.0, solver=auto, random_state=42

random_forest_regressor

Backed by sklearn.ensemble.RandomForestRegressor.

Defaults include:

n_estimators=50, max_depth=10
min_samples_split=5, min_samples_leaf=2
n_jobs=-1, random_state=42

Hyperparameter tuning

hyperparameter_tuner

This mode wraps a base model and performs search.

Config:

type: hyperparameter_tuner
base_model: dict with a supported base model type (e.g., logistic regression)
tuning options such as:
strategy: grid | random | halving_grid | halving_random | optuna (availability depends on installed packages)
search_space: dict of parameter → list/range
metric: e.g., accuracy, f1, roc_auc, rmse, r2
cv_enabled, cv_type, cv_folds, random_state

Learned params:

a tuple (best_model, tuning_result) where best_model is a fitted estimator.

Cross-validation

StatefulEstimator.cross_validate() can perform CV on the train split and returns aggregated fold metrics.

Note: SkyulfPipeline performs modeling through the same building blocks (a calculator + applier); StatefulEstimator is the lightweight wrapper exposed for low-level usage.