Multi-Path Pipelines
Skyulf supports building pipelines with multiple branches that merge into a single training node or fan out into separate experiments. This guide covers both patterns.
Merge: Combining Multiple Branches
When a training node has 2+ incoming edges, Skyulf automatically merges the upstream DataFrames before training.
How It Works
Dataset → Scaling ──┐
├──→ Training Node (⊕ Merge)
Dataset → Encoding ──┘
The training node collects all upstream branch outputs via _resolve_all_inputs() and combines them using _merge_inputs().
Merge Strategy (Auto-Detected)
| Condition | Strategy | Example |
|---|---|---|
| Same row count, different columns | Column-wise concat | Parallel preprocessing branches |
| Same columns, different rows | Row-wise concat | Data augmentation |
| No common columns, different shapes | Error | Incompatible inputs |
- Duplicate columns are automatically deduplicated after merging.
- Inputs are merged in deterministic topological order based on the pipeline graph.
Merge Badge
Nodes with 2+ incoming edges display a blue ⊕ Merge badge in the header showing the input count. Hover over it for a tooltip: "Merge: combining data from N upstream sources".
Connection Validation
Model-to-model connections (e.g., training → training) are blocked with an alert. Training nodes accept inputs from preprocessing nodes only.
Common Errors
| Error | Cause | Fix |
|---|---|---|
| "Empty DataFrame from upstream branch" | A preprocessing branch produced no rows | Check filters/cleaning nodes upstream |
| "No common columns" | Branches have incompatible schemas | Ensure branches produce compatible columns |
Parallel: Running Separate Experiments
When you have 2+ training nodes on the canvas connected to separate branches, each one runs as an independent experiment.
How It Works
Dataset → Scaling → Random Forest (Train)
│
└──→ Encoding → XGBoost (Train)
Each training node has its own Train button. Clicking it runs only that branch — the backend uses target_node_id filtering to isolate the sub-pipeline.
Run All Experiments
When 2+ training nodes are connected on separate branches, a "Run All Experiments" button (🚀 Rocket icon) appears in the toolbar. Clicking it queues all branches at once, returning a list of job_ids.
Merge/Parallel Toggle
Training nodes with 2+ incoming connections show a Merge / Parallel toggle:
- Merge (default): Combines upstream data before training.
- Parallel: Treats each incoming branch as a separate experiment and creates independent jobs.
The toggle is user-controlled — you decide based on your intent. The choice is stored as execution_mode on the node and passed to the backend during execution.
Keyboard Shortcuts
| Shortcut | Action |
|---|---|
| Ctrl+C (Cmd+C on Mac) | Copy selected nodes and their internal edges |
| Ctrl+V (Cmd+V on Mac) | Paste copied nodes with a position offset |
Supports multi-select. Each paste increments the offset so nodes don't stack.
Pipeline Partitioning (Backend)
The backend function partition_parallel_pipeline() in graph_utils.py handles splitting:
- Multiple terminals: If the graph has 2+ training/tuning nodes, each gets its own sub-pipeline via BFS ancestor tracing (
_collect_ancestors()). - Single terminal with parallel mode: If one training node has
execution_mode=parallel, each incoming branch becomes a separate sub-pipeline.
Shared prefix nodes (e.g., a dataset node used by both branches) are duplicated into each sub-pipeline so they can execute independently.