Skip to main content

Own Your ML Pipeline Run everything locally

Skyulf gives you a FastAPI workspace to build ML pipelines—load data, engineer features, train models—all running on your own infrastructure.

A passion project.

Drag & drop

Private

Fast

Scalable

⚠️ This is a work in progress. Expect bugs, incomplete features, and visual inconsistencies.

⋮⋮

Dataset Source

transactions.csv

Train
Val
Test
⋮⋮

Train/Val/Test Split

70% / 15% / 15%

⋮⋮

Model Trainer

Celery async

What's inside

A web UI for ingesting data, engineering features, training models, and running background jobs—all without uploading to third-party clouds.

Data Ingestion

Available now

Load CSV, Excel, Parquet, or SQL data. Schema gets detected automatically and results are cached so experiments stay reproducible.

Feature Canvas

Available now

Drag and drop to wire up feature pipelines, or write Python recipes directly. Includes transforms for scaling, binning, feature selection, and geospatial ops (planned).

Async Training

Available now

Kick off grid searches, random sweeps, or halving trials in the background with Celery workers. Hook in Optuna if you want fancier hyperparameter tuning.

GenAI & LangChain

Planned · Roadmap

Visual LLM Builder to drag-and-drop prompts and chains. RAG workflows to chat with your documents locally.

Run Monitoring

Planned · Roadmap

We're working on live logs and run tracking. Exports will be MLflow-compatible so you can plug into other observability tools.

Self-hosted DevEx

Available now

Configure once via `config.py`, run on SQLite by default, and graduate to PostgreSQL or Docker when you’re ready.

Drag-and-drop ML pipeline builder

Wire up your ML workflows with visual nodes—from data cleaning to model training. Each node is a reusable component running locally on your infrastructure. Skyulf ships with 39 node types across preprocessing, encoding, modeling, and inspection.

Preprocessing

Extensive toolkit

  • Drop missing rows/columns
  • Imputation strategies
  • Train/val/test splits

Highlights

  • Outlier removal & skewness fixes
  • Class imbalance handling (under/oversample)

Categorical Encoding

Alpha nodes available

  • One-hot encoding
  • Target encoding
  • Ordinal & frequency encoding

Highlights

  • Feature hashing with configurable buckets

Modeling

Alpha nodes available

  • Scikit-learn trainers
  • Grid/random/halving search
  • Async training via Celery

Inspection

Alpha nodes available

  • Dataset profiling
  • Transformer audit trails
  • Distribution previews

This demo shows our local Iris classification pipeline in action — from feature engineering and data splits to encoding and training

Playback speed
Product tour

See Skyulf in action

From ingestion to deployment, Skyulf keeps your experiments reproducible and private. Here’s a snapshot of how a local-first team moves from raw data to a deployed model.

01.

Load your data

Point to CSV, Parquet, or SQL sources. Schema gets detected automatically and everything's cached for repeat runs. REST loaders are coming next.

02.

Build your pipeline

Drag nodes around the canvas to wire up transforms. Preview stats at each step before saving to the feature store.

03.

Train and monitor

Launch training runs in the background, track metrics as they come in. MLflow-compatible exports are planned for seamless deployment.

The Vision

Skyulf is evolving from a tool into a complete "App Hub" for AI. Here is where we are going.

Phase 1: Polish & Stability

Current Focus

  • Extract Core Library (skyulf-core): Separate the pure ML logic (transformers, pipeline utils) from the web application.
  • Hybrid Architecture Strategy: Solidify the "Island Architecture" approach.
  • Robustness: Enhanced type checking, test coverage, and error handling.
  • Documentation: The "Skyulf Book" and interactive project templates.

Phase 2 & 3: The Future

Coming Soon

  • Deep Data Science: Ethics checks, Synthetic Data, and Advanced EDA.
  • Deployment: One-click export to standalone API (ZIP) or Docker container.
  • App Hub: Plugin system, Visual LLM Builder, and One-Click Deployments.
  • Collaboration: Real-time multiplayer editing ("Figma for ML").

Who's using this

Teams working with sensitive data who can't upload to cloud services. In the future, will be added more sectors—education, finance, IoT, and beyond. Anyone can use it freely.

Healthcare & Research

Train models on patient data or research datasets that can't leave your infrastructure due to GDPR or institutional policies.

Target use case

  • Load CSV/Parquet exports from local databases
  • Engineer features without cloud uploads
  • Train models with hyperparameter search locally

Government & Public Sector

Municipalities and public institutions working with citizen data that must stay on-premises for privacy and compliance.

Target use case

  • GDPR-compliant data processing
  • No cloud vendor lock-in
  • Full audit trails for transparency

SMEs & Startups

Small teams who need ML tooling but want to avoid expensive SaaS subscriptions and keep data under their own control.

Target use case

  • Run on SQLite to start, scale to Postgres later
  • Self-host with Docker or bare metal
  • No per-user licensing or API fees

Help shape Skyulf

We’re building a self-hosted, transparent MLOps stack together. Contribute code, docs, or feedback—every commit counts.

Frequently Asked Questions for Local-first MLOps & Self-hosted Workflows

Everything you need to understand how Skyulf keeps feature engineering, automation, and training entirely on your infrastructure.

FAQ Snapshot

Answers in this section boil down to one promise: everything you build with Skyulf stays on your turf, stays private, and stays under your control.

  • Local-first workflows that keep datasets on-premises and out of third-party clouds.
  • Drag-and-drop nodes plus code-friendly hooks so you can compose ingestion, transformation, and training steps however you like.
  • Background jobs that run once you hit play, leaving you to focus on experiments while the stack saves reproducible outputs.

Skyulf is a self-hosted MLOps platform. It lets you build, train, and monitor machine learning pipelines entirely on your infrastructure. Unlike cloud-based solutions, your data never leaves your servers—perfect for privacy-sensitive or regulated environments.

Local-first means complete data sovereignty. No vendor lock-in, no surprise cloud bills, no data privacy concerns. You control where your models train, where your data lives, and who can access it. It's MLOps on your terms.

Cloud platforms require uploading your data to third-party servers and charge based on usage. Skyulf runs 100% on-premise or in your private cloud. No data leaves your network, no usage-based pricing, and no vendor dependencies. You own the infrastructure and the workflow.

Skyulf is in Active Alpha. Core features like the drag-and-drop Feature Canvas, experiment tracking, and model monitoring are functional but actively evolving. Expect breaking changes as we refine the platform. Check our roadmap for what's coming next.

Skyulf is currently in Active Alpha. It is perfect for internal tools, research, and experimentation. We are working towards a stable v1.0 for critical production workloads. We'll announce production readiness clearly in the docs and releases.

Contributions are welcome! Check out our Contributing Guide to get started. Whether it's code, documentation, bug reports, or feature ideas—every contribution helps shape Skyulf. Join us on GitHub.

To democratize AI development. We are building an "App Hub" where anyone can drag-and-drop to create powerful AI tools—from traditional ML to GenAI agents—without needing a PhD or a cloud budget. Skyulf aims to be the go-to self-hosted MLOps stack for teams who value privacy and control.

Yes. Skyulf is 100% open source under the Apache-2.0 license. Free to use, modify, embed, and deploy—no copyleft clauses, no hidden costs, no premium tiers. If you find it useful, consider starring the repo or contributing back to the project.

Apache-2.0 already covers commercial and closed-source use. If you need SLAs, indemnification, or a partnership agreement, check COMMERCIAL-LICENSE.md. Open a “Commercial Partnership Request” discussion or contact us via the README links to start the conversation.