Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Bug in sample Parameter in feature_select(): Prevent Unintended Row Removal and Dataset Modification #495

Merged
merged 6 commits into from
Jan 23, 2025

Conversation

axiomcura
Copy link
Member

Description

Thank you for your contribution to pycytominer!
Please succinctly summarize your proposed change.
What motivated you to make this change?

This pull request closes #494, which describes when using the sample parameter unexpectedly removes rows and modifies the original dataset. The changes aim to resolve this issue, ensuring that the sample parameter behaves as intended without altering the original data and removing data.

I have also created new tests to verify that the number of rows remains unchanged when applying the samples parameter, whether through individual operations and a combination of multiple operations.

Please also link to any relevant issues that your code is associated with.

What is the nature of your change?

  • Bug fix (fixes an issue).
  • Enhancement (adds functionality).
  • Breaking change (fix or feature that would cause existing functionality to not work as expected).
  • This change requires a documentation update.

Checklist

Please ensure that all boxes are checked before indicating that a pull request is ready for review.

  • I have read the CONTRIBUTING.md guidelines.
  • My code follows the style guidelines of this project.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • New and existing unit tests pass locally with my changes.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have deleted all non-relevant text in this pull request template.

@codecov-commenter
Copy link

codecov-commenter commented Jan 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.43%. Comparing base (5743c62) to head (45b76fb).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #495      +/-   ##
==========================================
+ Coverage   94.39%   94.43%   +0.03%     
==========================================
  Files          57       57              
  Lines        3159     3181      +22     
==========================================
+ Hits         2982     3004      +22     
  Misses        177      177              
Flag Coverage Δ
unittests 94.43% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@axiomcura axiomcura changed the title Fix Bug in sample Parameter: Prevent Unintended Row Removal and Dataset Modification Fix Bug in sample Parameter in feature_select(): Prevent Unintended Row Removal and Dataset Modification Jan 21, 2025
@gwaybio
Copy link
Member

gwaybio commented Jan 21, 2025

This looks functionally good to me @axiomcura - kudos and thanks! @d33bs could you give a quick, but close look to see if there are any technical things for Erik to consider prior to merging? One thing that I could potentially see is to create a new test (rather than append to an existing one), which would be more explicitly named to test the sample parameter (e.g., test_feature_select_samples_paramater()), but I'm not sure how hard or important this would be.

Copy link
Member

@d33bs d33bs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job @axiomcura! I added some comments with this review and would be interested in getting your response. Please don't hesitate to let me know if you have any questions.

Generally I also wondered:

  • Have we covered the change in functionality with new tests? I noticed there was a test for feature_select but perhaps not the other functions which were modified. Consider adding tests for these as well to ensure the sampling edge cases are covered (to avoid surprises later).

pycytominer/cyto_utils/features.py Outdated Show resolved Hide resolved
pycytominer/operations/correlation_threshold.py Outdated Show resolved Hide resolved
pycytominer/operations/get_na_columns.py Outdated Show resolved Hide resolved
pycytominer/operations/noise_removal.py Outdated Show resolved Hide resolved
pycytominer/operations/variance_threshold.py Outdated Show resolved Hide resolved
tests/test_feature_select.py Outdated Show resolved Hide resolved
pycytominer/operations/noise_removal.py Outdated Show resolved Hide resolved
@axiomcura
Copy link
Member Author

@d33bs and @gwaybio, I have taken all of your feedback into consideration on this pull request. I have written additional documentation into the operations module for feature selection and I have also updated the test module. Just waiting for an approval in order to merge!

Copy link
Member

@d33bs d33bs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @axiomcura for making these changes! LGTM

@d33bs d33bs merged commit 8fdbec9 into cytomining:main Jan 23, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: samples parameter in feature_select function reduces the rows in output
4 participants