Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve predictability of DataQuery, DataID, and dependency tree #3018

Draft
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

djhoese
Copy link
Member

@djhoese djhoese commented Dec 13, 2024

Disclaimer: This PR's implementation is very rough at the time of writing.

This PR includes major changes to key parts of Satpy in order to resolve some inconsistencies noticed by users over the years. The high-level concepts that have been changed or updated are:

  1. DataQuery objects are now only equal to DataID objects that match all of the queries keys. Previously only the shared keys were compared. This old way meant that a DataQuery could match a DataID that didn't contain all the necessary information requested by the query.
  2. The "resolution" DataID key was not not transitive for all ID key sets. So for "default" ID keys it was False and for coordinate and minimal sets it was transitive. It made the most sense to set it to False. That is, a modified dataset is not required to have all dependencies be of the same resolution.
  3. Add and refactor a lot of tests regarding DataQuery and DataID comparisons.
  4. It should be possible to load a composite with two different sets of inputs (ex. DataQuery(name="comp", resolution=500), DataQuery(name="comp", resolution=1000)).

Remaining work

  • There are a lot of edge cases that need to be worked out. The biggest one is what happens when a DataQuery has a list of possible options. That is not handled in a lot of my dependency tree stuff.
  • See "Hindsight" below.
  • Refactoring
  • More explicit tests

Hindsight

For high-level change 1 above, I'm starting to think this was the wrong change or at least that the previous behavior had a good point. That is, if a user creates a query with a lot of keys to apply to many DataIDs from different sources, then not all DataIDs should be required to have all those keys. For example, if I specify a polarization in my query, then I don't think all composites or rather composite dependencies would be able to match that. There are currently no tests to verify this.

  • Closes #xxxx
  • Tests added
  • Fully documented
  • Add your name to AUTHORS.md if not there already

@djhoese djhoese added bug component:compositors component:dep_tree Dependency tree and dataset loading labels Dec 13, 2024
Copy link

codecov bot commented Dec 13, 2024

Codecov Report

Attention: Patch coverage is 96.96312% with 14 lines in your changes missing coverage. Please review.

Project coverage is 96.08%. Comparing base (5984c29) to head (c45ed8d).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
satpy/dataset/id_keys.py 96.15% 5 Missing ⚠️
satpy/dependency_tree.py 83.33% 5 Missing ⚠️
satpy/dataset/dataid.py 97.50% 2 Missing ⚠️
satpy/tests/test_dependency_tree.py 98.19% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3018      +/-   ##
==========================================
- Coverage   96.10%   96.08%   -0.02%     
==========================================
  Files         377      378       +1     
  Lines       55163    55213      +50     
==========================================
+ Hits        53012    53050      +38     
- Misses       2151     2163      +12     
Flag Coverage Δ
behaviourtests 3.98% <24.72%> (+0.04%) ⬆️
unittests 96.17% <96.96%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@coveralls
Copy link

coveralls commented Dec 14, 2024

Pull Request Test Coverage Report for Build 12381834226

Details

  • 447 of 461 (96.96%) changed or added relevant lines in 25 files are covered.
  • 9 unchanged lines in 4 files lost coverage.
  • Overall coverage decreased (-0.02%) to 96.188%

Changes Missing Coverage Covered Lines Changed/Added Lines %
satpy/dataset/dataid.py 78 80 97.5%
satpy/tests/test_dependency_tree.py 109 111 98.2%
satpy/dataset/id_keys.py 125 130 96.15%
satpy/dependency_tree.py 25 30 83.33%
Files with Coverage Reduction New Missed Lines %
satpy/dependency_tree.py 1 95.77%
satpy/tests/utils.py 2 93.16%
satpy/tests/reader_tests/gms/test_gms5_vissr_l1b.py 3 98.67%
satpy/tests/reader_tests/gms/test_gms5_vissr_navigation.py 3 97.18%
Totals Coverage Status
Change from base Build 12299617024: -0.02%
Covered Lines: 53294
Relevant Lines: 55406

💛 - Coveralls

@djhoese djhoese force-pushed the bugfix-greedy-dataid branch from 46eafc6 to de36fb8 Compare December 17, 2024 21:20
new_id_dict = orig_id.to_dict()
orig_id_keys = orig_id.id_keys
for query_key, query_val in query_dict.items():
# XXX: What if the query_val is a list?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my main task remaining. I'm really not sure how I want to treat this. If you ask for a composite as DataQuery(name="some_comp", resolution=[500, 1000]), what do you set in the DataID? This is before we know what dependencies were found. So does the composite DataID become DataID(name="some_comp") with no resolution, resolution 500, or resolution 1000?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug component:compositors component:dep_tree Dependency tree and dataset loading
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants