Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Spherize – use SVD, simplify calculations #320

Merged
merged 32 commits into from
Jan 26, 2024

Conversation

shntnu
Copy link
Member

@shntnu shntnu commented Sep 9, 2023

Description

Make several updates to Spherize

  • Use SVD for sphering instead of first computing covariance matrix followed by eigenvalue decomposition. This notebook demonstrates the equivalence svd-eig.pdf.
  • Use epsilon to regularize, not clip (the former is more standard; unclear why we were doing the latter)
  • PCA-cor and ZCA-cor necessary center the data, so raise an error if centering is False
  • Handle wide matrices

What is the nature of your change?

  • Bug fix (fixes an issue).
  • Enhancement (adds functionality).

Checklist

Please ensure that all boxes are checked before indicating that a pull request is ready for review.

  • I have read the CONTRIBUTING.md guidelines.
  • My code follows the style guidelines of this project.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • New and existing unit tests pass locally with my changes.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have deleted all non-relevant text in this pull request template.

@codecov-commenter
Copy link

codecov-commenter commented Sep 9, 2023

Codecov Report

Attention: 20 lines in your changes are missing coverage. Please review.

Comparison is base (cf313bf) 95.57% compared to head (98f8d6f) 95.01%.

Files Patch % Lines
pycytominer/operations/transform.py 73.58% 14 Missing ⚠️
pycytominer/normalize.py 45.45% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #320      +/-   ##
==========================================
- Coverage   95.57%   95.01%   -0.56%     
==========================================
  Files          56       56              
  Lines        3138     3187      +49     
==========================================
+ Hits         2999     3028      +29     
- Misses        139      159      +20     
Flag Coverage Δ
unittests 95.01% <76.19%> (-0.56%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shntnu shntnu marked this pull request as ready for review September 9, 2023 23:10
@shntnu shntnu requested a review from gwaybio September 9, 2023 23:23
@shntnu shntnu changed the title Fix Spherize – use SVD. simplify. calculations Fix Spherize – use SVD, simplify calculations Sep 9, 2023
@shntnu shntnu marked this pull request as draft September 10, 2023 02:20
@shntnu shntnu removed the request for review from gwaybio September 10, 2023 02:20
Copy link
Member

@gwaybio gwaybio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see this fix, especially since the tests pass!

I made several inline comments, which we should discuss prior to merging. My general comments include:

  1. Please add tests for entry into the assertion that centering must be performed
  2. We really should also add testing conditions for the other assertions
  3. The other assertion statement errors need more user-friendly (read, less potential for frustration) text. Can we prompt a user on what to fix if they run into one?
  4. I'm wondering if an easier fix is to optionally return the numpy array rather than the DataFrame in Spherize() itself.

I made more specific comments for each of these four general comments in line below.

Thanks again!

pycytominer/normalize.py Outdated Show resolved Hide resolved
pycytominer/normalize.py Outdated Show resolved Hide resolved
pycytominer/operations/transform.py Outdated Show resolved Hide resolved
pycytominer/operations/transform.py Outdated Show resolved Hide resolved
pycytominer/operations/transform.py Show resolved Hide resolved
pycytominer/operations/transform.py Outdated Show resolved Hide resolved
pycytominer/operations/transform.py Outdated Show resolved Hide resolved
pycytominer/operations/transform.py Show resolved Hide resolved
tests/test_normalize.py Outdated Show resolved Hide resolved
tests/test_operations/test_transform.py Show resolved Hide resolved
@gwaybio
Copy link
Member

gwaybio commented Sep 14, 2023

Oops! @shntnu - I just saw the "removed the request" and WIP tag. Feel free to consider my comments as preliminary or to ignore them

@johnarevalo
Copy link
Member

This PR fixes #179

@shntnu shntnu marked this pull request as ready for review December 11, 2023 00:51
@shntnu
Copy link
Member Author

shntnu commented Dec 11, 2023

@gwaybio sorry, it took a while to return to this.

Thanks for the excellent code review and suggestions!

  • Please add tests for entry into the assertion that centering must be performed

Done

  • We really should also add testing conditions for the other assertions

First, I changed all assertions to exceptions because that's the preferred way of doing things (outside of tests).

I then added tests to enter these exceptions, wherever it made sense to do so.

In some cases, it isn't possible to do so (easily) because it would involve modifying the current behavior of functions in numpy.

See https://app.codecov.io/gh/cytomining/pycytominer/pull/320?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cytomining

These checks serve to future-proof against changes in numpy behavior (specifically, in np.linalg.svd).
I created a util create_bug_report_message to nudge users to report a bug if ever these exceptions are triggered.

Note that, in general, it is possible to test for these exceptions by mocking upstream dependencies but that's brinking on over-testing IMO.

Let me know what you think, specifically whether using create_bug_report_message is a wise idea.
I'm slightly hesistant to introduce this because I feel there should be some standard engineering pattern I should be following here instead of crafting my own.

  • The other assertion statement errors need more user-friendly (read, less potential for frustration) text. Can we prompt a user on what to fix if they run into one?

See above

  • I'm wondering if an easier fix is to optionally return the numpy array rather than the DataFrame in Spherize() itself.

This was such a great suggestion! Done.

This is now ready for your review.

@shntnu shntnu requested a review from gwaybio December 11, 2023 14:08
Copy link
Member

@gwaybio gwaybio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this code Shantanu!

I'm going to propose the following to get this merged in ASAP

  • @MattsonCam - can you focus on the math side of things? The current tests pass so we can be confident about accuracy, but it would be great for you to read for understanding and provide suggestions on where the documentation can be upgraded.
  • @kenibrewer - can you focus on the integration aspects? The added bug report function in util.py is neat, but you have your finger on the pulse of the pycytominer roadmap. Can you provide your thoughts here?

I think this can be a relatively quick PR merge, which I will do once we have these two approvals. Sorry about the delays Shantanu, we're right in the middle of me phasing out of PR review in favor of much more knowledgeable/qualified maintainers :)

Copy link
Member

@kenibrewer kenibrewer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to move the new create_bug_report_message function into a separate branch and pull request to be reviewed separately? There are a few aspects about the new pattern that merit discussion and I would prefer to keep this PR focused specifically on the spherize/SVD changes.

EDIT: To clarify. I agree with Greg that the feature has value, but it would be good to have a plan (and some documentation) about when pycytominer wants to "own" the error message as a bug and when we want to consider it data/user error and instead provide instructions on how to fix the problem.

Copy link

@MattsonCam MattsonCam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very detailed pr. I left a few comments to consider. Great job @shntnu !

pycytominer/cyto_utils/util.py Outdated Show resolved Hide resolved
pycytominer/normalize.py Outdated Show resolved Hide resolved
pycytominer/operations/transform.py Show resolved Hide resolved
pycytominer/operations/transform.py Show resolved Hide resolved
pycytominer/operations/transform.py Show resolved Hide resolved
pycytominer/operations/transform.py Show resolved Hide resolved
pycytominer/operations/transform.py Outdated Show resolved Hide resolved
pycytominer/operations/transform.py Outdated Show resolved Hide resolved
pycytominer/operations/transform.py Outdated Show resolved Hide resolved
@shntnu
Copy link
Member Author

shntnu commented Jan 9, 2024

Would it be possible to move the new create_bug_report_message function into a separate branch and pull request to be reviewed separately?

Makes sense, but instead, I'll just copy the code snippet here because I bet you'd want to do this more systematically later

def create_bug_report_message(error_detail, context):
    """
    Constructs an error message with a bug report prompt.

    Parameters
    ----------
    error_detail: str
        Description of the error.
    context: str
        A description of the context for the error.
    Returns
    -------
    str
        Error message with a link to file a bug report.
    """

    bug_report_url = "https://github.com/cytomining/pycytominer/issues"
    return (
        f"{error_detail} This is likely a bug in `{context}`. "
        "Please help us improve our software by filing a bug report at "
        f"{bug_report_url}."
    )


def test_create_bug_report_message():
    error_detail = "Division by zero error occurred."
    context = "calculate_statistics function"
    actual_message = create_bug_report_message(error_detail, context)

    # check that the strings `error_detail` and `context` are in the message
    assert error_detail in actual_message
    assert context in actual_message

    # check that the message contains a link to file a bug report
    assert "https://github.com/cytomining/pycytominer/issues" in actual_message

@shntnu
Copy link
Member Author

shntnu commented Jan 9, 2024

Very detailed pr. I left a few comments to consider. Great job @shntnu !

Thank you for nudging me to clarify the math further @MattsonCam! I understood my code better after explaining it :D I've resolved all your comments but feel free to reopen if necessary.

@gwaybio
Copy link
Member

gwaybio commented Jan 12, 2024

It would be great to get this fix merged in.

I just chatted with @MattsonCam and he will give a 👍 when he gets a chance

@kenibrewer - is this ready from your perspective?

@MattsonCam
Copy link

Very detailed pr. I left a few comments to consider. Great job @shntnu !

Thank you for nudging me to clarify the math further @MattsonCam! I understood my code better after explaining it :D I've resolved all your comments but feel free to reopen if necessary.

Absolutely! Thank you for addressing them @shntnu! I've finished addressing your response comments, but didn't feel it was imperative to reopen any of them.

It would be great to get this fix merged in.

I just chatted with @MattsonCam and he will give a 👍 when he gets a chance

@kenibrewer - is this ready from your perspective?

I give the 👍

@shntnu
Copy link
Member Author

shntnu commented Jan 20, 2024

I give the 👍

Hooray!

Over to one of you to merge once the tests pass (I made a documentation edit)

@gwaybio
Copy link
Member

gwaybio commented Jan 26, 2024

this looks good to me @shntnu - I am going to merge this in now. Thanks for this contribution and for your patience as we reviewed! 🎉

@gwaybio gwaybio self-requested a review January 26, 2024 17:17
@gwaybio
Copy link
Member

gwaybio commented Jan 26, 2024

@kenibrewer - thanks for your initial review. Given @MattsonCam 's approval (and your initial review focused on the bug report document now described in #365 ) - I will merge this in now to avoid additional delays. Thanks!

@gwaybio gwaybio dismissed kenibrewer’s stale review January 26, 2024 17:19

2 weeks without activity and see #320 (comment)

@gwaybio gwaybio merged commit 8541374 into cytomining:main Jan 26, 2024
9 checks passed
@kenibrewer
Copy link
Member

Sorry I missed the pings here. I had modified my github notifications settings incorrectly and didn't get the usual notification emails I rely on.

Everything looks good to me.

@gwaybio
Copy link
Member

gwaybio commented Jan 26, 2024

No worries!

@shntnu
Copy link
Member Author

shntnu commented Jan 26, 2024

Hooray! Thank you all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants