You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
See this prototype for hierarchical modeling added by PR #192 . The idea for hierarchical modeling described in the Phase II proposal was essentially taking a group of already-fit estimators as an argument to a new second layer estimator whose fit method involves calling the predict method of each of the already-fit estimators, concatenating their predictions along the column (feature) dimension, and using that feature matrix as the input to the fit method of the second layer estimator.
Initialization with a second layer estimator and group of already fit estimators:
A concatenation utility function that is called before most methods of the estimator (second layer estimator's fit, predict, decision_function, or other methods)
def _concat_features(self, X, y=None, **kw):
X, y, row_idx = self._as_numpy_arrs(X, y)
predicts = (getattr(est, 'predict') for est in self.estimators)
preds = [pred(X) for pred in predicts]
X2 = np.array(preds).T
return X2, y
A few TODO items I know of:
Consider cases where the estimators return:
Categorical y
Continuous y
y that differ in row count from estimator to estimator - raise an exception in the _concat_features function. (We have to have the same shape y returned by the predict method of each of the already-fit estimators)
Test that it does not matter if the estimators are heterogeneous in structure/parameters as long as all of them return the same shape y
What if I want to concatenate the predictions of N supervised classifiers or clustering estimators (or pipelines with classifier or clusterer as final step), then I may want to run a LabelBinarizer step to encode the 2nd layer feature matrix as a binary one (with corresponding expansion of the column dimension and possibly a sparse representation). How do I do a 2nd layer Pipeline like that? My first thought on this is that the MultiLayer class from the snippets above, would have a subclass (?) where the fit_transform method returns the concatenated estimators' predictions, then any of the usual Pipeline steps could be used thereafter.
Any limitations regarding parallelism - (sending the fitted estimators via dask.distributed usage in EaSearchCV or related code)
This MultiLayer for the _concat_features function should use elm.pipeline.predict_many (parallel prediction with dask.distributed).
The text was updated successfully, but these errors were encountered:
Also note in the documentation that is part of answering the general question in #198
"How do I hyperparameterize model structures (not just different model parameters)?"
In some cases, if hyperparameterization across different structure choices turns out infeasible, then maybe an alternate similar idea is using this MultiLayer idea above, where estimators of MultiLayer have different structures, and rather than automatically choosing a best structure(s) just predicting from all using a second layer models and inferring from a second layer estimator.
See this prototype for hierarchical modeling added by PR #192 . The idea for hierarchical modeling described in the Phase II proposal was essentially taking a group of already-fit estimators as an argument to a new second layer estimator whose
fit
method involves calling thepredict
method of each of the already-fit estimators, concatenating their predictions along the column (feature) dimension, and using that feature matrix as the input to thefit
method of the second layer estimator.Initialization with a second layer estimator and group of already fit estimators:
A concatenation utility function that is called before most methods of the
estimator
(second layer estimator'sfit
,predict
,decision_function
, or other methods)A few TODO items I know of:
estimators
return:y
y
y
that differ in row count from estimator to estimator - raise an exception in the_concat_features
function. (We have to have the same shapey
returned by thepredict
method of each of the already-fitestimators
)estimators
are heterogeneous in structure/parameters as long as all of them return the same shapey
LabelBinarizer
step to encode the 2nd layer feature matrix as a binary one (with corresponding expansion of the column dimension and possibly a sparse representation). How do I do a 2nd layer Pipeline like that? My first thought on this is that theMultiLayer
class from the snippets above, would have a subclass (?) where thefit_transform
method returns the concatenatedestimators
' predictions, then any of the usualPipeline
steps could be used thereafter.estimators
viadask.distributed
usage inEaSearchCV
or related code)MultiLayer
for the_concat_features
function should useelm.pipeline.predict_many
(parallel prediction withdask.distributed
).The text was updated successfully, but these errors were encountered: