Platform Walkthrough
This guide walks through the full Skyulf web platform — from uploading a CSV to deploying a trained model. No screenshots yet, but every step maps to a specific page and API endpoint.
Prerequisite: The platform is running (
python run_skyulf.pyordocker-compose up). Open http://127.0.0.1:8000 in your browser.
Architecture overview
Browser (React SPA)
|
v
FastAPI Backend (REST API)
|
+-- Celery Workers (async jobs: pipeline execution, EDA)
+-- Redis (message broker + result backend)
+-- SQLite / PostgreSQL (metadata, job history)
+-- File Storage (uploads/, exports/)
|
v
skyulf-core (ML engine)
Step 1: Upload data
Page: /data (Data Sources)
- Navigate to the Data Sources page.
- Click Upload and select a CSV file.
- The backend ingests the file asynchronously (
POST /api/ingestion/upload). - Once ingested, the dataset appears in the list with row count, column count, and status.
API equivalent:
curl -X POST http://127.0.0.1:8000/api/ingestion/upload \
-F "file=@my_dataset.csv"
You can preview sample rows via the sample button or API:
curl http://127.0.0.1:8000/data/api/sources/{source_id}/sample
Step 2: Explore data (EDA)
Page: /eda (Exploratory Data Analysis)
- Select a dataset from the dropdown.
- Click Analyze to trigger a full automated EDA (
POST /api/eda/{dataset_id}/analyze). - The EDA runs as a background Celery task — poll status until complete.
- View the report: data quality summary, column statistics, distributions, correlations, outlier detection, and smart alerts.
What the EDA covers:
- Missing value analysis per column
- Numeric distributions (mean, std, skewness, kurtosis)
- Cardinality analysis for categorical columns
- Correlation matrix (Pearson, Spearman)
- Outlier detection (IQR, ZScore)
- Target variable analysis (classification balance / regression distribution)
- PCA loadings and explained variance
- Smart alerts (high cardinality, constant columns, high correlation pairs)
Step 3: Build a pipeline (ML Canvas)
Page: /canvas (ML Canvas)
This is the core of Skyulf. The canvas is a React Flow-based visual editor where you build ML pipelines by connecting nodes.
Available node categories:
| Category | Examples |
|---|---|
| Data | Dataset selector (connects to uploaded data) |
| Splitting | TrainTestSplitter |
| Cleaning | TextCleaning, ValueReplacement, Deduplicate |
| Imputation | SimpleImputer, KNNImputer, IterativeImputer |
| Encoding | OneHotEncoder, OrdinalEncoder, LabelEncoder, TargetEncoder |
| Scaling | StandardScaler, MinMaxScaler, RobustScaler |
| Outliers | IQR, ZScore, Winsorize, EllipticEnvelope |
| Feature Engineering | PolynomialFeatures, FeatureGeneration, FeatureSelection |
| Resampling | SMOTE, ADASYN, RandomUndersampling |
| Modeling | All 20 classifiers and regressors |
| Tuning | Hyperparameter tuner (grid, random, Optuna, halving) |
Building a pipeline:
- Add a data node and select your uploaded dataset.
- Add preprocessing nodes from the sidebar (drag onto the canvas).
- Connect nodes by dragging edges from output ports to input ports.
- Configure each node by clicking it and setting parameters in the side panel.
- Add a modeling node at the end (e.g.,
random_forest_classifier). - Optionally, add an Advanced Tuning node and configure the search strategy.
Recommended pipeline order:
Dataset → TrainTestSplitter → Imputer → Encoder → Scaler → Model
Tip: Always place
TrainTestSplitterearly to prevent data leakage. Skyulf's Calculator/Applier pattern ensures preprocessing statistics are learned only from the training split.
Multi-path pipelines (v0.3.0+)
You can build pipelines with multiple branches:
- Merge branches — Route data through different preprocessing paths (e.g., Scaling + Encoding), then connect both into a single training node. The node displays a ⊕ Merge badge showing how many inputs are being combined.
- Parallel experiments — Connect the dataset to multiple separate training nodes (e.g., RandomForest and XGBoost). Each runs as an independent experiment.
- Copy-paste nodes — Select nodes and press Ctrl+C / Ctrl+V to duplicate them with their internal edges.
Note: Model-to-model connections are blocked. See the Multi-Path Pipelines guide for details.
Step 4: Execute the pipeline
From the canvas: Click the Run Preview button to preview the pipeline, or use Train on an individual training node.
Single training node
The frontend converts your visual graph into a pipeline config and sends it to the backend:
POST /api/pipeline/run
The backend validates the config, queues the job in Celery, and returns a job_id.
Multiple training nodes (v0.4.0+)
When your canvas has 2+ training nodes on separate branches:
- Individual Train buttons — Each node's Train button runs only that branch (using
target_node_idfiltering). - Run All Experiments — A 🚀 Run All Experiments button appears in the toolbar if two separate branches connected to two separate training nodes. Clicking it queues all branches at once, returning
job_idsfor each.
Merge/Parallel toggle (v0.4.0+)
Training nodes with 2+ incoming connections show a Merge / Parallel toggle:
- Merge: Combines upstream data before training.
- Parallel: Each incoming branch becomes a separate experiment job.
Monitoring progress:
- The Jobs page (
/jobs) shows all running and completed jobs. - Poll job status:
GET /api/pipeline/jobs/{job_id} - For tuning jobs, real-time progress updates show trial-by-trial results.
Step 5: Review results
Page: /jobs (Jobs)
Once a job completes, view:
- Preprocessing metrics: Per-step artifacts (what was learned, e.g., imputed means, encoder categories).
- Modeling metrics: accuracy, F1, precision, recall, ROC-AUC (classification) or MSE, RMSE, R2, MAE (regression).
- Tuning results: Best parameters, trial history, convergence.
API equivalent:
curl http://127.0.0.1:8000/api/pipeline/jobs/{job_id}/evaluation
Step 6: Deploy the model
Page: /deployments (Deployments)
- From a completed job, click Deploy.
- The backend registers the model artifact and activates it (
POST /api/deployment/deploy/{job_id}). - Only one model can be active at a time. Deploying a new model deactivates the previous one.
Check active deployment:
curl http://127.0.0.1:8000/api/deployment/active
Step 7: Run predictions (inference)
Page: /deployments (Deployments) — Inference testing panel
Send new data to the active model:
curl -X POST http://127.0.0.1:8000/api/deployment/predict \
-H "Content-Type: application/json" \
-d '{"data": [{"age": 25, "income": 60000, "city": "London"}]}'
The deployed model applies the full preprocessing pipeline (imputation, encoding, scaling) and returns predictions.
Step 8: Monitor for drift
Page: /drift (Data Drift)
After your model is in production, monitor whether incoming data has shifted:
- Navigate to the Drift page.
- Select the reference dataset (training data) and current dataset (new data).
- Click Calculate Drift (
POST /api/monitoring/drift/calculate). - Review per-column drift metrics (Wasserstein distance, KS test, PSI, KL divergence).
If significant drift is detected, consider retraining the model.
API quick reference
| Action | Method | Endpoint |
|---|---|---|
| Upload data | POST | /api/ingestion/upload |
| List datasets | GET | /data/api/sources |
| Preview data | GET | /data/api/sources/{id}/sample |
| Run EDA | POST | /api/eda/{dataset_id}/analyze |
| Execute pipeline | POST | /api/pipeline/run |
| Job status | GET | /api/pipeline/jobs/{job_id} |
| Evaluation metrics | GET | /api/pipeline/jobs/{job_id}/evaluation |
| Deploy model | POST | /api/deployment/deploy/{job_id} |
| Active deployment | GET | /api/deployment/active |
| Predict | POST | /api/deployment/predict |
| Calculate drift | POST | /api/monitoring/drift/calculate |
What's next?
- Getting Started — Quickest path to a working pipeline (Python library).
- Configuration — All models and config keys.
- FAQ & Comparison — How Skyulf compares to MLflow, Kubeflow, etc.
- Troubleshooting — Common issues and fixes.