-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Inconsistent output of imblearn's pipeline #904
Comments
Hi, after some investigation, I found that the behavior resulted from here: imbalanced-learn/imblearn/pipeline.py Lines 303 to 311 in 6176807
In imbalanced-learn/imblearn/pipeline.py Lines 181 to 182 in 6176807
I think if we want to skip samplers during I can open a PR to address this if this is indeed a bug. |
@haochunchang The PR indeed solves the problem. Let's hope it gets merged soon |
I think this is a case that having resampling is ambiguous compared to the usual way. The When requesting This surprising API is one of the reasons why we never adopted samplers in scikit-learn because it breaks the contract At the end, I would not consider it a bug but we could improve the documentation to make it obvious. |
Describe the bug
The output of imblearn's pipeline is inconsistent for
fit_transform
andfit().transform()
(see example). The reason this happens is because in thetransform
method SMOTE is not applied while transforming (as expected) but in thefit_transform
method SMOTE is applied while fitting and that same data is returned.Is this intended, and if so, why? It seems quite confusing for the user. If it's indeed a bug, I think the fix is quite straight forward, although it will make the
fit_transform
method slower since you first have to fit the pipeline (which includes all transformations), and then transform it all again excluding the samplers.Steps/Code to Reproduce
Expected Results
I expected the
fit_transform
method to return data without balancing (same as thetransform
method does)Actual Results
Versions
System:
python: 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)]
executable: C:\Users\Mavs\Documents\Python\pycaret\venv\Scripts\python.exe
machine: Windows-10-10.0.19044-SP0
Python dependencies:
sklearn: 1.1.1
pip: 22.0.4
setuptools: 57.0.0
numpy: 1.21.5
scipy: 1.7.3
Cython: 0.29.28
pandas: 1.4.1
matplotlib: 3.5.2
joblib: 1.1.0
threadpoolctl: 3.0.0
Built with OpenMP: True
threadpoolctl info:
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: C:\Users\Mavs\Documents\Python\pycaret\venv\Lib\site-packages\numpy.libs\libopenblas.XWYDX2IKJW2NMTWSFYNGFUWKQU3LYTCZ.gfortran-win_amd64.dll
version: 0.3.17
threading_layer: pthreads
architecture: Zen
num_threads: 16
user_api: openmp
internal_api: openmp
prefix: vcomp
filepath: C:\Users\Mavs\Documents\Python\pycaret\venv\Lib\site-packages\sklearn.libs\vcomp140.dll
version: None
num_threads: 16
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: C:\Users\Mavs\Documents\Python\pycaret\venv\Lib\site-packages\scipy.libs\libopenblas.XWYDX2IKJW2NMTWSFYNGFUWKQU3LYTCZ.gfortran-win_amd64.dll
version: 0.3.17
threading_layer: pthreads
architecture: Zen
num_threads: 16
Windows-10-10.0.19044-SP0
Python 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)]
NumPy 1.21.5
SciPy 1.7.3
Scikit-Learn 1.1.1
Imbalanced-Learn 0.9.1
The text was updated successfully, but these errors were encountered: