-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SLEP005: Resampler API #15
base: main
Are you sure you want to change the base?
Changes from all commits
0c49bfb
4ecc51b
c855ffe
c16ef7b
e2f6a70
8f8ebb6
ae03400
10c85ff
c39d615
2de0d48
387b338
5ecfead
e7faa6e
a4019ed
87a1d5d
5ddc6f9
cde164b
e87fd7e
ad4e94f
b989562
ee197cb
35c140d
bc45d6a
e306795
79123fb
8538e82
c64044d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,246 @@ | ||||||
.. _slep_005: | ||||||
|
||||||
============= | ||||||
Resampler API | ||||||
============= | ||||||
|
||||||
:Author: Oliver Rausch ([email protected]), | ||||||
Christos Aridas ([email protected]), | ||||||
Guillaume Lemaitre ([email protected]) | ||||||
:Status: Draft | ||||||
:Created: created on, in 2019-03-01 | ||||||
:Resolution: <url> | ||||||
|
||||||
Abstract | ||||||
-------- | ||||||
|
||||||
We propose the inclusion of a new type of estimator: resampler. The | ||||||
resampler will change the samples in ``X`` and ``y`` and return both | ||||||
``Xt`` and ``yt``. In short: | ||||||
|
||||||
* a new verb/method that all resamplers must implement is introduced: | ||||||
``fit_resample``. | ||||||
* resamplers are able to reduce and/or augment the number of samples in | ||||||
``X`` and ``y`` during ``fit``, but will perform no changes during | ||||||
``predict``. | ||||||
* to facilitate this behavior a new meta-estimator (``ResampledTrainer``) that | ||||||
allows for the composition of resamplers and estimators is proposed. | ||||||
Alternatively we propose changes to ``Pipeline`` that also enable similar | ||||||
compositions. | ||||||
|
||||||
|
||||||
Motivation | ||||||
---------- | ||||||
|
||||||
Sample reduction or augmentation are common parts of machine-learning | ||||||
pipelines. The current scikit-learn API does not offer support for such | ||||||
use cases. | ||||||
|
||||||
Possible Usecases | ||||||
................. | ||||||
|
||||||
* Sample rebalancing to correct bias toward class with large cardinality. | ||||||
* Outlier rejection to fit a clean dataset. | ||||||
* Sample reduction e.g. representing a dataset by its k-means centroids. | ||||||
* Currently semi-supervised learning is not supported by scoring-based | ||||||
functions like ``cross_val_score``, ``GridSearchCV`` or ``validation_curve`` | ||||||
since the scorers will regard "unlabeled" as a separate class. A resampler | ||||||
could add the unlabeled samples to the dataset during fit time to solve this | ||||||
(note that this could also be solved by a new cv splitter). | ||||||
* NaNRejector (drop all samples that contain nan). | ||||||
* Dataset augmentation (like is commonly done in DL). | ||||||
|
||||||
Implementation | ||||||
-------------- | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it should be noted that the required changes are mostly (exclusively?) to And of course our API and contract changes. But the implementation is all contained within the pipeline and the resamplers? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, changes are limited to new resamplers and the composition implementation (So either |
||||||
|
||||||
API and Constraints | ||||||
................... | ||||||
|
||||||
* Resamplers implement a method ``fit_resample(X, y, **kwargs)``, a pure | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is now a bit inconsistent with the idea of having a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This kwargs cannot be, IMO, the source for deriving kwargs in the return value. I'm wondering whether we should disregard kwarg input here, and discuss it separately below. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed, we could decide to not implement kwarg resampling for now, and just reserve a third output item for when we do (because we know it should be possible to do, and to allow a resampler to generate sample weights)... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well even there is ambiguous whether sample_weight should be resampled and returned, or whether it should be used when fitting, or both. If we make it clear that sample-aligned kwargs like sample_weight are not to be resampled and returned by default, I think that's fine. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The SLEP does not explain why a new method name ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The API of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok. This should be added to the SLEP. |
||||||
function which returns ``Xt, yt, kwargs`` corresponding to the resampled | ||||||
dataset, where samples may have been added and/or removed. | ||||||
* An estimator may only implement either ``fit_transform`` or ``fit_resample`` | ||||||
if support for ``Resamplers`` in ``Pipeline`` is enabled | ||||||
(see Sect. "Limitations"). | ||||||
* Resamplers may not change the order, meaning, dtype or format of features | ||||||
(this is left to transformers). | ||||||
* Resamplers should also handled (e.g. resample, generate anew, etc.) any | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is generate anew ? What is the link with kwargs ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Imagine you pass sample weights into a composed estimator containing a resampler and a predictor. If the resampler adds or removes samples, resampler must now also add or remove rows in the sample_weights so that the shape matches. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how about adding "such as sample weights"? Maybe "fit_params" or "sample aligned kwargs" or something like that would be clearer as well? |
||||||
kwargs. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. handled -> handle? |
||||||
|
||||||
Composition | ||||||
----------- | ||||||
|
||||||
A key part of the proposal is the introduction of a way of composing resamplers | ||||||
with predictors. We present two options: ``ResampledTrainer`` and modifications | ||||||
to ``Pipeline``. | ||||||
|
||||||
Alternative 1: ResampledTrainer | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the Resampler is only called during There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes it is stateless. I guess this is mainly because we originally wanted these resamplers to work with pipelines, and the most natural realisation of this is that the resampler is an estimator There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, but for Alternative 1 there's really no reason any more and it just adds complexity, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, with this proposal there is no reason. Perhaps its good to note that there may actually be useful stateful resamplers (e.g. consider fitting SMOTE on your whole dataset, and then oversampling batches of that dataset on the fly to save memory) but we exclude these in this proposal due to the added complexity this would introduce. |
||||||
............................... | ||||||
|
||||||
This metaestimator composes a resampler and a predictor. It | ||||||
behaves as follows: | ||||||
|
||||||
* ``fit(X, y)``: resample ``X, y`` with the resampler, then fit the predictor | ||||||
on the resampled dataset. | ||||||
* ``predict(X)``: simply predict on ``X`` with the predictor. | ||||||
* ``score(X)``: simply score on ``X`` with the predictor. | ||||||
|
||||||
See PR #13269 for an implementation. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
One benefit of the ``ResampledTrainer`` is that it does not stop the resampler | ||||||
having other methods, such as ``transform``, as it is clear that the | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is an example of resampler that would need a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was discussed in https://github.com/scikit-learn/enhancement_proposals/pull/15/files#r327687961 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In terms of API resamplers do not have There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you give an example of an object where having both There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well depending on implementation, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fair, can you add that to the document? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Agreed, I will have another look. I would implement
EDIT: oops, that was
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is correct. The question is whether it is possible/useful/a good idea to make the metaestimators resamplers. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK. I think the problems should be framed is those exact terms in the SLEP then, that sounds clearer and more explicit to me There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that should be clearer, and I don't see a reason to do that. |
||||||
``ResampledTrainer`` will only call ``fit_resample``. | ||||||
|
||||||
There are complications around supporting ``fit_transform``, ``fit_predict`` | ||||||
and ``fit_resample`` methods in ``ResampledTrainer``. ``fit_transform`` support | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is |
||||||
is only possible by implementing ``fit_transform(X, y)`` as ``fit(X, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
y).transform(X)``, rather than calling ``fit_transform`` of the predictor. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Why? Can't we just define def fit_transform(self, X, y):
Xt, yt = self.resampler_.fit_resample(X, y)
return self.estimator_.fit_transform(Xt, yt) ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you include this explanation in the slep? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't understand, that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would argue that as soon as you implement There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I agree with @orausch. If it's not a transformer what else would it be if it implements |
||||||
``fit_predict`` would have to behave similarly. Thus ``ResampledTrainer`` | ||||||
would not work with non-inductive estimators (TSNE, AgglomerativeClustering, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why can't we resample There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
etc.) as their final step. If the predictor of a ``ResampledTrainer`` is | ||||||
itself a resampler, it's unclear how ``ResampledTrainer.fit_resample`` should | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With this proposal, what's the recommended way to chain two resamplers? Nest the ResampledTrainers? Then we run into defining There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, just saw below how it's done. Then there's no need to define |
||||||
behave. These caveats also apply to the Pipeline modification below. | ||||||
|
||||||
Example Usage: | ||||||
~~~~~~~~~~~~~~ | ||||||
|
||||||
.. code-block:: python | ||||||
|
||||||
est = ResampledTrainer(RandomUnderSampler(), SVC()) | ||||||
est = make_pipeline( | ||||||
StandardScaler(), | ||||||
ResampledTrainer(Birch(), make_pipeline(SelectKBest(), SVC())) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How can birch be the first step of a ResampledTrainer? What does the target become? |
||||||
) | ||||||
est = ResampledTrainer( | ||||||
RandomUnderSampler(), | ||||||
make_pipeline(StandardScaler(), SelectKBest(), SVC()), | ||||||
) | ||||||
clf = ResampledTrainer( | ||||||
NaNRejector(), # removes samples containing NaN | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure how this is gonna work, as in, what does the |
||||||
ResampledTrainer(RandomUnderSampler(), | ||||||
make_pipeline(StandardScaler(), SGDClassifier())) | ||||||
) | ||||||
|
||||||
Alternative 2: Prediction Pipeline | ||||||
.................................. | ||||||
|
||||||
As an alternative to ``ResampledTrainer``, ``Pipeline`` can be modified to | ||||||
accomodate resamplers. The essence of the operation is this: one or more steps | ||||||
of the pipeline may be a resampler. When fitting the Pipeline, ``fit_resample`` | ||||||
will be called on each resampler instead of ``fit_transform``, and the output | ||||||
of ``fit_resample`` will be used in place of the original ``X``, ``y``, etc., | ||||||
to fit the subsequent step (and so on). When predicting in the Pipeline, | ||||||
the resampler will act as a passthrough step. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To emphasize the different with alternative 1, you might explain that the pipeline solution would correspond to having implicitly both a |
||||||
|
||||||
Limitations | ||||||
~~~~~~~~~~~ | ||||||
|
||||||
.. rubric:: Prohibiting ``transform`` on resamplers | ||||||
|
||||||
It may be problematic for a resampler to provide ``transform`` if ``Pipeline``s | ||||||
support resampling: | ||||||
|
||||||
1. It is unclear what to do at test time if a resampler has a transform | ||||||
method. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think just calling |
||||||
2. Adding ``fit_resample`` to the API of an an existing transformer may | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is an example of resampler that would need a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
drastically change its behaviour in a ``Pipeline``. | ||||||
|
||||||
For this reason, it may be best to reject resamplers supporting ``transform`` | ||||||
from being used in a Pipeline. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also to reject it in common estimator checks? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Honestly I don't follow this any more after thinking about it for some time. This is supported in several other frameworks without any issue. |
||||||
|
||||||
.. rubric:: Prohibiting ``transform`` on resampling Pipelines | ||||||
|
||||||
Providing a ``transform`` method on a Pipeline that contains a resampler | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You might highlight that this is a different issue than "Prohibiting transform on resamplers". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what does it mean to provide a transform method on a pipeline?? Is it a pipeline where the last step is a transformer? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So let's use the latter formulation please :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well I guess the sentence should be "Providing a |
||||||
presents several problems: | ||||||
|
||||||
1. A resampling ``Pipeline`` needs to use a special code path for | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry I really don't understand this whole part Where is the resampler in the pipeline? Why should the last step be excluded? |
||||||
``fit_transform`` that would call ``fit(X, y, **kw).transform(X)`` on the | ||||||
``Pipeline``. Ordinarily a ``Pipeline`` would pass the transformed data to | ||||||
``fit_transform`` of the left step. If the ``Pipeline`` contains a | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remaining steps? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Is that what you mean? |
||||||
resampler, it rather needs to fit the ``Pipeline`` excluding the last step, | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't get why this is the case. Can you give an example ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same explanation as above, calling |
||||||
then transform the original training data until the last step, then | ||||||
``fit_transform`` the last step. This means special code paths for pipelines | ||||||
containing resamplers; the effect of the resampler is not localised in terms | ||||||
of code maintenance. | ||||||
2. As a result of issue 1, appending a step to the transformation ``Pipeline`` | ||||||
means that the transformer which was previously last, and previously trained | ||||||
on the full dataset, will now be trained on the resampled dataset. | ||||||
3. As a result of issue 1, the last step cannot be ``'passthrough'`` as in | ||||||
other transformer pipelines. | ||||||
|
||||||
For this reason, it may be best to disable ``fit_transform`` and ``transform`` | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why would you disable |
||||||
on the Pipeline. A resampling ``Pipeline`` would therefore not be usable as a | ||||||
transformation within a ``FeatureUnion`` or ``ColumnTransformer``. Thus the | ||||||
``ResampledTrainer`` would be strictly more expressive than a resampling | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How would you use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||||||
``Pipeline``. | ||||||
|
||||||
.. rubric:: Handling ``fit`` parameters | ||||||
|
||||||
Sample props or weights cannot be routed to steps downstream of a resampler in | ||||||
a Pipeline, unless they too are resampled. To support this, a resampler | ||||||
would need to be passed all props that are required downstream, and | ||||||
``fit_resample`` should return resampled versions of them. Note that these | ||||||
must be distinct from parameters that affect the resampler's fitting. | ||||||
That is, consider the signature ``fit_resample(X, y=None, props=None, sample_weight=None)``. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So if I understand correctly, one could apply a set of sample to drive the resampling and pass another set of weight which will be resampled to later be used by the downstream estimators? Did you encounter any use case for Note: I am not against including this parameter, this is more by a lack of knowledge regarding the use-case. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We may want to do the resampling w/o the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we need to separate the two meanings as long as we support doing learning with weights or similar with a resampler upstream. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is exactly the same thing that cross-validation already does with fit_params, right? So it's not really that new... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Btw my argument is that sample_weights should be resampled and returned. But that means the signature of |
||||||
The ``sample_weight`` passed in should affect the resampling, but does not | ||||||
itself need to be resampled. A Pipeline would pass ``props`` including the fit | ||||||
parameters required downstream, which would be resampled and returned by | ||||||
``fit_resample``. | ||||||
|
||||||
Example Usage: | ||||||
~~~~~~~~~~~~~~ | ||||||
|
||||||
.. code-block:: python | ||||||
|
||||||
est = make_pipeline(RandomUnderSampler(), SVC()) | ||||||
est = make_pipeline(StandardScaler(), Birch(), SelectKBest(), SVC()) | ||||||
est = make_pipeline( | ||||||
RandomUnderSampler(), StandardScaler(), SelectKBest(), SVC() | ||||||
) | ||||||
est = make_pipeline( | ||||||
NaNRejector(), RandomUnderSampler(), StandardScaler(), SGDClassifer() | ||||||
) | ||||||
est.fit(X,y, sgdclassifier__sample_weight=my_weight) | ||||||
|
||||||
|
||||||
Alternative implementation | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alternative 3: sample weights |
||||||
.......................... | ||||||
|
||||||
Alternatively ``sample_weight`` could be used as a placeholder to | ||||||
perform resampling. However, the current limitations are: | ||||||
|
||||||
* ``sample_weight`` is not available for all estimators; | ||||||
* ``sample_weight`` will implement only simple resampling (only when resampling | ||||||
uses original samples); | ||||||
* ``sample_weight`` needs to be passed and modified within a | ||||||
``Pipeline``, which isn't possible without something like resamplers. | ||||||
|
||||||
Current implementation | ||||||
...................... | ||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add that this is an implementation of alternative 1 |
||||||
https://github.com/scikit-learn/scikit-learn/pull/13269 | ||||||
|
||||||
glemaitre marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
Backward compatibility | ||||||
---------------------- | ||||||
|
||||||
There is no backward incompatibilities with the current API. | ||||||
|
||||||
Discussion | ||||||
---------- | ||||||
|
||||||
* https://github.com/scikit-learn/scikit-learn/pull/13269 | ||||||
|
||||||
References and Footnotes | ||||||
------------------------ | ||||||
|
||||||
.. [1] Each SLEP must either be explicitly labeled as placed in the public | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fix this? |
||||||
domain (see this SLEP as an example) or licensed under the `Open | ||||||
Publication License`_. | ||||||
|
||||||
.. _Open Publication License: https://www.opencontent.org/openpub/ | ||||||
|
||||||
|
||||||
Copyright | ||||||
--------- | ||||||
|
||||||
This document has been placed in the public domain. [1]_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite confused by what
predict
for a resampler could mean. Are you actually referring to the composed estimators like ResampledTrainer or pipeline?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this refers to the composed estimators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the term resampler is used throughout the SLEP to refer both to the meta estimator and to the actual resampler object.
This is quite confusing to me.