API: preprocessing.base
skyulf.preprocessing.base
BaseApplier
Bases: ABC
Source code in skyulf-core/skyulf/preprocessing/base.py
106 107 108 109 110 111 112 113 114 115 116 117 | |
apply(df, params)
abstractmethod
Applies the transformation using fitted parameters.
The return type is intentionally Any because the concrete shape
depends on the input: passing a DataFrame returns a DataFrame;
passing an (X, y) tuple returns a tuple; splitters return
SplitDataset. Encoding every case as a union forces callers to
defensively narrow on every use, which is worse than Any here.
Source code in skyulf-core/skyulf/preprocessing/base.py
107 108 109 110 111 112 113 114 115 116 117 | |
BaseCalculator
Bases: ABC
Source code in skyulf-core/skyulf/preprocessing/base.py
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | |
fit(df, config)
abstractmethod
Calculates parameters from the training data.
Returns a Mapping of fitted parameters (typically a TypedDict
*Artifact declared in preprocessing._artifacts). The return
type is Mapping rather than Dict so concrete TypedDict
subclasses are valid LSP-substitutable returns.
Source code in skyulf-core/skyulf/preprocessing/base.py
74 75 76 77 78 79 80 81 82 83 84 | |
infer_output_schema(input_schema, config)
Best-effort prediction of the output schema from config alone.
Override this in concrete Calculators when the output columns/dtypes
can be derived purely from input_schema and config (i.e.
without seeing data). Examples:
- Scalers — pass through (output == input).
- Drop columns by name — drop the configured names.
- One-hot — adds K columns per categorical (K is data-dependent →
return
None).
Default returns None to signal "unknown / data-dependent";
callers should fall back to runtime introspection.
Source code in skyulf-core/skyulf/preprocessing/base.py
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | |
apply_method(fn)
Decorator that handles unpack/pack boilerplate around an Applier's apply.
The decorated method is written with signature (self, X, y, params)
instead of (self, df, params). The wrapper:
- Calls
unpack_pipeline_input(df)to get(X, y, is_tuple). - Invokes the user's method with the unpacked
Xandy. - If the method returns a 2-tuple
(X_out, y_out), that pair is packed; otherwise the result is treated asX_outand the originalyis reused. - Calls
pack_pipeline_outputto restore the original input shape.
Useful for ~50 Appliers that share the same boilerplate. Skip it for
splitters (which return SplitDataset directly) or analyzers that
don't transform the frame.
Source code in skyulf-core/skyulf/preprocessing/base.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | |
fit_method(fn)
Decorator that handles unpack boilerplate around a Calculator's fit.
The decorated method is written as (self, X, y, config) and may
ignore y for X-only fits. No packing is done — fit returns a
params dict, not a frame.
The TypeVar _NodeParams preserves the specific TypedDict return type
(see preprocessing._artifacts) so callers see the concrete shape.
Source code in skyulf-core/skyulf/preprocessing/base.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | |