FAQ & Comparison

Frequently asked questions and how Skyulf compares to other ML platforms.

General

What is Skyulf?

Skyulf is a self-hosted, privacy-first MLOps platform that combines:

A Python library (skyulf-core) for reproducible ML pipelines.
A FastAPI backend for data management, pipeline execution, and model serving.
A React-based visual ML Canvas for building pipelines without writing code.

Who is Skyulf for?

Data Scientists who want a visual pipeline builder with proper leakage prevention.
ML Engineers who need a self-hosted alternative to cloud ML platforms.
Teams who need reproducible, auditable ML workflows without vendor lock-in.
Students/Researchers who want to learn ML engineering best practices.

Is Skyulf free?

Yes. Skyulf is open-source. See the LICENSE for details.

Can I use skyulf-core without the web platform?

Absolutely. skyulf-core is a standalone PyPI package. Install it with pip install skyulf-core and use it like any Python library. The web platform is optional.

How Skyulf differs from other tools

Skyulf vs. MLflow

Aspect	Skyulf	MLflow
Focus	End-to-end pipeline (preprocessing + training + deploy)	Experiment tracking and model registry
Pipeline building	Visual canvas + config-driven	Code-only (no visual builder)
Preprocessing	30+ built-in nodes (imputation, encoding, scaling, outliers, feature selection, resampling)	None — you bring your own preprocessing
Leakage prevention	Calculator/Applier pattern enforces train-only statistics	Not addressed
Deployment	Built into the platform	Separate deployment step
Self-hosted	Yes	Yes

Summary: MLflow is great for experiment tracking. Skyulf covers the full pipeline from raw data to deployed model, including preprocessing.

Skyulf vs. Kubeflow / ZenML

Aspect	Skyulf	Kubeflow / ZenML
Infrastructure	Single machine, Docker optional	Kubernetes required (Kubeflow) or multi-runtime (ZenML)
Setup complexity	`pip install` or `docker-compose up`	Significant infrastructure setup
Visual builder	Drag-and-drop React Flow canvas	DAG visualizations (read-only)
Target audience	Small-to-medium teams, individuals	Enterprise orchestration at scale
Preprocessing	Built-in node library	BYO preprocessing code

Summary: Kubeflow/ZenML excel at large-scale orchestration. Skyulf is simpler to set up and includes preprocessing out of the box.

Skyulf vs. scikit-learn Pipelines

Aspect	Skyulf	scikit-learn Pipeline
Config format	JSON-compatible dicts (serializable, storable)	Python objects (code-defined)
State management	Explicit `params` dict (inspectable, portable)	Hidden in `self.` attributes
Leakage safety	Enforced by architecture (Calculator learns on train only)	Manual responsibility (`fit` on train, `transform` on test)
Visual builder	Yes (ML Canvas)	No
Model variety	20 models + hyperparameter tuning	Full scikit-learn ecosystem
EDA/Profiling	Built-in analyzer + visualizer	None

Summary: scikit-learn is the gold standard for ML in Python. Skyulf wraps scikit-learn models and adds config-driven pipelines, leakage prevention, and a visual interface.

Skyulf vs. AutoML (Auto-sklearn, FLAML, H2O)

Aspect	Skyulf	AutoML tools
Approach	Manual or semi-automated pipeline building	Fully automated model selection
Control	Full control over every preprocessing and modeling step	Black-box optimization
Tuning	Configurable (grid, random, Optuna, halving)	Automatic (built-in)
Transparency	Every step inspectable, every parameter visible	Results-focused, less transparent
Use case	When you need to understand and control your pipeline	When you want fastest time-to-result

Summary: AutoML tools optimize for speed. Skyulf optimizes for transparency and control.

Technical FAQ

What Python version is required?

Python 3.9 or higher. We recommend 3.10 or 3.11 or 3.12.

Does Skyulf support GPU training?

Not directly. Models use scikit-learn (CPU) and XGBoost (which supports GPU if configured). There is no built-in PyTorch/TensorFlow integration.

Can I add my own preprocessing nodes?

Yes. Implement a Calculator and Applier, decorate with @node_meta and @NodeRegistry.register. See Extending Skyulf-Core.

Can I add my own models?

Yes. Implement a BaseModelCalculator and BaseModelApplier, register with @NodeRegistry.register, and use the model key in your config. See Extending Skyulf-Core.

Does Skyulf handle feature engineering?

Yes. The preprocessing system includes 30+ nodes: imputation (Simple, KNN, Iterative), encoding (OneHot, Ordinal, Label, Target, Hash), scaling (Standard, MinMax, Robust, MaxAbs), outlier detection (IQR, ZScore, Winsorize, EllipticEnvelope), feature generation (Polynomial, Math), feature selection (Variance, Correlation, Univariate, Model-based), and more.

What data formats are supported?

skyulf-core library: Pandas DataFrames and Polars DataFrames (auto-detected).
Web platform: CSV upload via the data ingestion API. Database sources (PostgreSQL, etc.) via the ingestion endpoint.

How does the hybrid Polars/Pandas engine work?

Skyulf auto-detects whether your data is Polars or Pandas. Simple operations (scaling, imputation) run natively in Polars for speed. Complex operations (feature selection, some sklearn-backed nodes) temporarily bridge to Pandas/NumPy via Apache Arrow (near zero-copy). See Engine Mechanics.

Is there an API for programmatic access?

Yes. The backend exposes a REST API with endpoints for data upload, pipeline execution, model deployment, and inference. See Platform Walkthrough.

How do I run multiple experiments in parallel? (v0.4.0+)

Connect 2+ training nodes to your dataset (each with its own preprocessing path). A Run All Experiments button appears in the toolbar — clicking it queues all branches at once and returns separate job_ids for each. You can also click Train on an individual node to run just that branch.

What's the difference between Merge and Parallel?

Merge: Combines data from multiple upstream branches into a single DataFrame before training. Use when you have parallel preprocessing paths feeding one model.
Parallel: Each incoming branch becomes a separate experiment job. Use when you want independent experiments.

Training nodes with 2+ inputs show a toggle to switch between modes. See the Multi-Path Pipelines guide.

How do I copy-paste nodes on the canvas? (v0.4.0+)

Select one or more nodes, press Ctrl+C (Cmd+C on Mac) to copy, then Ctrl+V (Cmd+V on Mac) to paste. Nodes are pasted with a position offset. Internal edges between selected nodes are preserved.