Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: allele rle normalization + pin pydantic version #234
feat: allele rle normalization + pin pydantic version #234
Changes from 9 commits
c8f5419
1ed754c
8ac1649
2113235
b579734
59424bc
14e5bc9
6736891
9938462
1b7c2ff
daa2195
812f480
af267f0
05deaca
b64aa8b
2e1923a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not following this. An IRI isn't guaranteed to have the digest in it, so we should always assume a
SequenceReference
, or have a mechanism for resolving the IRI to get aSequenceReference
. I might be missing something here, but why not simply remove this block and then revise line 107 to pull from thedigest
field expected in every object?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was initially done by @theferrit32 . @ahwagner is the digest expected in every VRS object? In the models.py, the
digest
is set as an optional field for all VRS objects.I'm trying to figure out why this was done, but I'm now realizing that this is an issue if
allele.location.sequence
is not defined. ASequenceLocation.sequence
is listed as an optional field. @ahwagner can you explain why this is not a required field?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is something that I think @theferrit32 was working on. It is available to every object, and our digest strategy in VRS 2 will compute this for every object, whether or not it is an identifiable object. I believe we were going to be using this for all objects in VRS-Python.
Yes, it is optional to use this attribute in JSON Schema, because when used in an
Allele
that is part of aHaplotype
, theSequenceLocation.sequence
can be omitted from VRS messages, as they (by definition) will match thesequence
of the parentHaplotype
object. However, it is required from thega4ghDigest.keys
for creating computed digests, because it is still a critical component of the value of aSequenceLocation
. In those cases, it is expected that the system loading the VRSHaplotype
object would refer to / copy over theHaplotype.sequence
for theHaplotype.member[*].location.sequence
properties.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahwagner I think there might be some confusion regarding the
digest
field. At the moment,digest
is optional and does not get computed when a VRS Object is created. Should this field be populated each time a user creates a VRS Object?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahwagner is suggesting to switch this condition to IRI. Some test cases:
#seqrefs/myseq123
andHTTPS://w3id.org/NM_012345
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to dereference the
#seqrefs/myseq123
outside this function because this function only receives the allele to be normalized, it won't have access to the full original document the allele came from, where#seqrefs/myseq123
can be resolved from.We can create another function like
ga4gh_inline
(or something) that takes a JSON document, finds the GA4GH objects in it, and for any field whose value is an IRI that is a relative pointer, inline the object it points to in that field, if the document contains it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do that the
type
fields would then need to be required on the input JSON/dict. They are not currently required because the type fields are defined as literals and when you construct a particular class with some input it assumes it is that type and fills in the type field.If we don't want this constraint, we could just traverse the input document and replace all field values that look like JSON pointers (not just those in fields that are defined as IRIs in the VRS models) with the objects they refer to. This would also let people use the JSON pointer thing in non-model fields. Like if someone has a statement and a custom field they added that isn't in the model, but they want to refer to the variant in the same document from there using a pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@theferrit32 updated comments + tests with deferenced IRIs. Let me know if I need to make any more changes!