Data Validation Expectations
skyulf.profiling.expect is a lightweight, dependency-free data-validation
helper — a tiny subset of what Great Expectations offers, but with zero extra
dependencies. Each expect_* function checks a single condition and raises
ExpectationError with a precise
message when the condition is violated.
It is engine-agnostic: Pandas frames are used directly; Polars (or any frame
exposing to_pandas()) is converted first.
When to use it
These are manual assertions — they are not wired into profiling or CI automatically. You call them yourself in two main places:
- In tests / CI — guard a dataset contract so a bad upstream change fails the build.
- In a pipeline — assert preconditions before an expensive step, so you get a clear error instead of a deep traceback later.
Available expectations
| Function | Checks |
|---|---|
expect_columns_exist(df, columns) |
Every name in columns is present. |
expect_no_nulls(df, columns=None) |
Given columns (default: all) have no nulls. |
expect_value_range(df, column, *, minimum, maximum, inclusive=True) |
All values fall within [minimum, maximum]. |
expect_unique(df, columns) |
The combination of columns has no duplicate rows. |
Example: a dataset contract in CI
import pandas as pd
from skyulf.profiling.expect import (
expect_columns_exist,
expect_no_nulls,
expect_value_range,
expect_unique,
ExpectationError,
)
def validate_customers(df: pd.DataFrame) -> None:
"""Raises ExpectationError if the customers frame breaks its contract."""
expect_columns_exist(df, ["customer_id", "age", "signup_date"])
expect_unique(df, ["customer_id"])
expect_no_nulls(df, ["customer_id", "signup_date"])
expect_value_range(df, "age", minimum=0, maximum=120)
Wire it into a test so CI enforces it:
def test_customers_contract():
df = pd.read_parquet("data/customers.parquet")
validate_customers(df) # raises ExpectationError on violation → test fails
Example: a pipeline guard
from skyulf.profiling.expect import expect_no_nulls
def run(df):
# Fail fast with a clear message before an expensive fit.
expect_no_nulls(df, ["target"])
...
API reference
skyulf.profiling.expect
Lightweight data-validation expectations (no Great Expectations dependency).
Each expect_* function checks a single condition on a DataFrame and raises
:class:ExpectationError with a precise message when the condition is violated.
Pure-Python and engine-agnostic: Pandas frames are used directly; Polars (or any
frame exposing to_pandas()) is converted first.
Example
import pandas as pd from skyulf.profiling.expect import expect_no_nulls, expect_value_range df = pd.DataFrame({"age": [21, 35, 40]}) expect_no_nulls(df) expect_value_range(df, "age", minimum=0, maximum=120)
ExpectationError
Bases: ValueError
Raised when a data-validation expectation is not met.
Source code in skyulf-core/skyulf/profiling/expect.py
29 30 | |
expect_columns_exist(df, columns)
Assert that every name in columns is present in df.
Source code in skyulf-core/skyulf/profiling/expect.py
50 51 52 53 54 55 | |
expect_no_nulls(df, columns=None)
Assert that the given columns (default: all) contain no null values.
Source code in skyulf-core/skyulf/profiling/expect.py
58 59 60 61 62 63 64 65 | |
expect_unique(df, columns)
Assert that the combination of columns has no duplicate rows.
Source code in skyulf-core/skyulf/profiling/expect.py
112 113 114 115 116 117 118 119 120 121 | |
expect_value_range(df, column, *, minimum=None, maximum=None, inclusive=True)
Assert that all values in column fall within [minimum, maximum].
minimum / maximum are optional (open-ended on the unset side).
Null values are ignored. Set inclusive=False for a strict comparison.
Source code in skyulf-core/skyulf/profiling/expect.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | |