Description
I am working on making an existing scikit-learn model pipeline produce probabilistic output. To do that, I used model_builder to make a pymc model that could integrate into a scikit-learn Pipeline
, including standardization of inputs and outputs. However, I find that the current API doesn't seem suitable for this. I made my own modifications to the ModelBuilder
class and example LinearModel
subclass to get it to work. I think the main change was to have the fit
and predict
methods take X and y as separate parameters rather than as members of a data dict with specially-named keys. My reference for the scikit-learn estimator API is the scikit-learn documentation and template for TemplateEstimator.
I very well might be one the wrong track (or at least on a different one than what model_builder intends), but what I came up with seems to work for being able to apply sklearn.preprocessing.StandardScaler
to inputs and to point outputs using sklearn.compose.TransformedTargetRegressor
. These seem like reasonable goals for ModelBuilder
subclasses to be able to integrate with, so maybe tests and/or examples of such would be good.
Any thoughts? I'm happy to contribute what I can.