Hyperparameter Tuning
Skyulf wraps scikit-learn's search utilities and Optuna into a single hyperparameter_tuner model type. You configure tuning entirely through the pipeline config — no code changes required.
Supported strategies
| Strategy | Key | What it does |
|---|---|---|
| Grid Search | grid |
Exhaustive search over every combination in the search space |
| Random Search | random |
Samples n_trials random combinations |
| Optuna (Bayesian) | optuna |
Uses TPE (Tree-structured Parzen Estimators) to intelligently explore the space |
| Halving Grid | halving_grid |
Successive halving — trains on small subsets first, promotes the best |
| Halving Random | halving_random |
Random sampling + successive halving |
Quick example
config = {
"preprocessing": [
{"name": "split", "transformer": "TrainTestSplitter",
"params": {"test_size": 0.25, "random_state": 42,
"stratify": True, "target_column": "target"}},
{"name": "impute", "transformer": "SimpleImputer",
"params": {"columns": ["age"], "strategy": "mean"}},
],
"modeling": {
"type": "hyperparameter_tuner",
"base_model": {"type": "random_forest_classifier"},
"strategy": "optuna",
"search_space": {
"n_estimators": [50, 100, 200],
"max_depth": [5, 10, 20, "none"],
},
"n_trials": 25,
"metric": "accuracy",
"strategy_params": {
"sampler": "tpe",
"pruning": True,
},
},
}
Note:
"none"(string) in the search space is automatically converted to PythonNone.
Configuration reference
These keys go inside the "modeling" block when "type" is "hyperparameter_tuner":
| Key | Type | Default | Description |
|---|---|---|---|
base_model |
dict |
required | The model to tune, e.g. {"type": "logistic_regression"} |
strategy |
str |
"random" |
One of grid, random, optuna, halving_grid, halving_random |
search_space |
dict |
{} |
Parameter name to list of candidate values |
n_trials |
int |
10 |
Number of trials (ignored for grid which tests all combos) |
metric |
str |
"accuracy" |
Scoring metric (accuracy, f1, roc_auc, mse, r2, etc.) |
timeout |
int\|null |
null |
Max seconds for tuning (Optuna only) |
strategy_params |
dict |
{} |
Strategy-specific settings (see below) |
cv_enabled |
bool |
true |
Whether to use cross-validation |
cv_folds |
int |
5 |
Number of CV folds |
cv_type |
str |
"k_fold" |
One of k_fold, stratified_k_fold, time_series_split, shuffle_split, nested_cv |
cv_time_column |
str\|null |
null |
Column name to sort by when using time_series_split. Auto-detects datetime column if omitted |
Strategy-specific params
Pass these inside "strategy_params":
Optuna
| Key | Default | Description |
|---|---|---|
sampler |
"tpe" |
Optuna sampler: tpe, random, or cmaes |
pruning |
false |
Enable Optuna pruning (early stopping of bad trials) |
Halving (grid / random)
| Key | Default | Description |
|---|---|---|
factor |
3 |
Successive halving factor (how aggressively to discard candidates) |
min_resources |
"smallest" |
Minimum resources for the first iteration |
Cross-validation types
| Key | When to use |
|---|---|
k_fold |
General purpose, default |
stratified_k_fold |
Classification with imbalanced classes |
time_series_split |
Time-ordered data (no future leakage) |
shuffle_split |
When you want random train/test splits per fold |
nested_cv |
Unbiased evaluation — outer loop for generalization, inner loop for hyperparameter stability |
Nested CV runs a dual-loop: an outer K-Fold evaluates the model on held-out data, while an inner 3-fold CV (capped at
n_folds - 1) trains within each outer training set. This prevents optimistic bias when tuning and evaluating on the same splits.With Advanced Tuning: When
nested_cvis selected, the tuning search uses the inner CV folds to score candidates. After finding the best parameters, the post-tuning evaluation automatically usesstratified_k_fold(classification) ork_fold(regression) instead of re-running the full nested loop — because the inner loop already ran during the search.
Time column for Time Series Split
When using time_series_split, Skyulf auto-sorts your data chronologically:
- If
cv_time_columnis set, data is sorted by that column (and the column is dropped from features to prevent leakage). - If omitted, the first
datetime64column is auto-detected and used. - If no datetime column exists, a warning is logged and row order is assumed correct.
In the ML Canvas UI, selecting Time Series Split reveals a date column picker.
Install requirements
Optuna strategies require the tuning extra:
pip install skyulf-core[tuning]
Grid, random, and halving strategies work out of the box (scikit-learn only).
What happens under the hood
SkyulfPipelinedetects"type": "hyperparameter_tuner"and creates aTuningCalculatorwrapping the base model.- During
fit(), theTuningCalculator.tune()method runs the chosen search strategy. - The best parameters are used to refit the model on the full training set.
- The fitted model is stored in the pipeline and used for
predict()/save().
Tips
- Start with
"strategy": "random"and"n_trials": 20for a quick baseline. - Switch to
"optuna"when you want smarter exploration (Bayesian optimization). - Use
"halving_random"for large search spaces where grid search is infeasible. - Always run a
TrainTestSplitterbefore tuning to avoid data leakage.