Hyperparameter Tuning

Skyulf wraps scikit-learn's search utilities and Optuna into a single hyperparameter_tuner model type. You configure tuning entirely through the pipeline config — no code changes required.

Supported strategies

Strategy	Key	What it does
Grid Search	`grid`	Exhaustive search over every combination in the search space
Random Search	`random`	Samples `n_trials` random combinations
Optuna (Bayesian)	`optuna`	Uses TPE (Tree-structured Parzen Estimators) to intelligently explore the space
Halving Grid	`halving_grid`	Successive halving — trains on small subsets first, promotes the best
Halving Random	`halving_random`	Random sampling + successive halving

Quick example

config = {
    "preprocessing": [
        {"name": "split", "transformer": "TrainTestSplitter",
         "params": {"test_size": 0.25, "random_state": 42,
                    "stratify": True, "target_column": "target"}},
        {"name": "impute", "transformer": "SimpleImputer",
         "params": {"columns": ["age"], "strategy": "mean"}},
    ],
    "modeling": {
        "type": "hyperparameter_tuner",
        "base_model": {"type": "random_forest_classifier"},
        "strategy": "optuna",
        "search_space": {
            "n_estimators": [50, 100, 200],
            "max_depth": [5, 10, 20, "none"],
        },
        "n_trials": 25,
        "metric": "accuracy",
        "strategy_params": {
            "sampler": "tpe",
            "pruning": True,
        },
    },
}

Note: "none" (string) in the search space is automatically converted to Python None.

Configuration reference

These keys go inside the "modeling" block when "type" is "hyperparameter_tuner":

Key	Type	Default	Description
`base_model`	`dict`	required	The model to tune, e.g. `{"type": "logistic_regression"}`
`strategy`	`str`	`"random"`	One of `grid`, `random`, `optuna`, `halving_grid`, `halving_random`
`search_space`	`dict`	`{}`	Parameter name to list of candidate values
`n_trials`	`int`	`10`	Number of trials (ignored for `grid` which tests all combos)
`metric`	`str`	`"accuracy"`	Scoring metric (`accuracy`, `f1`, `roc_auc`, `mse`, `r2`, etc.)
`timeout`	`int\\|null`	`null`	Max seconds for tuning (Optuna only)
`strategy_params`	`dict`	`{}`	Strategy-specific settings (see below)
`cv_enabled`	`bool`	`true`	Whether to use cross-validation
`cv_folds`	`int`	`5`	Number of CV folds
`cv_type`	`str`	`"k_fold"`	One of `k_fold`, `stratified_k_fold`, `time_series_split`, `shuffle_split`, `nested_cv`
`cv_time_column`	`str\\|null`	`null`	Column name to sort by when using `time_series_split`. Auto-detects datetime column if omitted

Strategy-specific params

Pass these inside "strategy_params":

Optuna

Key	Default	Description
`sampler`	`"tpe"`	Optuna sampler: `tpe`, `random`, or `cmaes`
`pruning`	`false`	Enable Optuna pruning (early stopping of bad trials)

Halving (grid / random)

Key	Default	Description
`factor`	`3`	Successive halving factor (how aggressively to discard candidates)
`min_resources`	`"smallest"`	Minimum resources for the first iteration

Cross-validation types

Key	When to use
`k_fold`	General purpose, default
`stratified_k_fold`	Classification with imbalanced classes
`time_series_split`	Time-ordered data (no future leakage)
`shuffle_split`	When you want random train/test splits per fold
`nested_cv`	Unbiased evaluation — outer loop for generalization, inner loop for hyperparameter stability

Nested CV runs a dual-loop: an outer K-Fold evaluates the model on held-out data, while an inner 3-fold CV (capped at n_folds - 1) trains within each outer training set. This prevents optimistic bias when tuning and evaluating on the same splits.

With Advanced Tuning: When nested_cv is selected, the tuning search uses the inner CV folds to score candidates. After finding the best parameters, the post-tuning evaluation automatically uses stratified_k_fold (classification) or k_fold (regression) instead of re-running the full nested loop — because the inner loop already ran during the search.

Time column for Time Series Split

When using time_series_split, Skyulf auto-sorts your data chronologically:

If cv_time_column is set, data is sorted by that column (and the column is dropped from features to prevent leakage).
If omitted, the first datetime64 column is auto-detected and used.
If no datetime column exists, a warning is logged and row order is assumed correct.

In the ML Canvas UI, selecting Time Series Split reveals a date column picker.

Install requirements

Optuna strategies require the tuning extra:

pip install skyulf-core[tuning]

Grid, random, and halving strategies work out of the box (scikit-learn only).

What happens under the hood

SkyulfPipeline detects "type": "hyperparameter_tuner" and creates a TuningCalculator wrapping the base model.
During fit(), the TuningCalculator.tune() method runs the chosen search strategy.
The best parameters are used to refit the model on the full training set.
The fitted model is stored in the pipeline and used for predict() / save().

Tips

Start with "strategy": "random" and "n_trials": 20 for a quick baseline.
Switch to "optuna" when you want smarter exploration (Bayesian optimization).
Use "halving_random" for large search spaces where grid search is infeasible.
Always run a TrainTestSplitter before tuning to avoid data leakage.