-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust: Add support for MaD sources and sinks with access paths #18298
base: main
Are you sure you want to change the base?
Conversation
cb85b26
to
707bebb
Compare
efbbea1
to
67f7387
Compare
67f7387
to
1b31c90
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking forward to this PR being merged. I want to use this functionality in several places already, the tests look like exactly what I need, and I'm experimenting with this branch locally.
I don't feel very confident reviewing the code changes in this PR though - if there are any specific parts you'd like me to look at or discuss please say so.
extensible: sourceModel | ||
data: | ||
- ["repo::test", "crate::enum_source", "ReturnValue.Variant[crate::MyFieldEnum::D::field_d]", "test-source", "manual"] | ||
- ["repo::test", "<crate::MyFieldEnum>::source", "ReturnValue.Variant[crate::MyFieldEnum::C::field_c]", "test-source", "manual"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the repo::
prefix mean, and why do we need the crate::
prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The repo::test
bit matches Resolvable.getResolvedCrateOrigin
and crate::enum_source
matches Resolvable.getResolvedPath
, so this is simply what we get from the extractor. As for the repo
prefix, I'm not sure, but for the crate
prefix this appears to be always present in canonical paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would this work if there are items from multiple crates involved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect those to have different getResolvedCrateOrigin
values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect those to have different
getResolvedCrateOrigin
values.
That's right. I was wondering about the case where a function from one crate has arguments or return values of types in other crates. How would the lines above look if MyFieldEnum
was defined in a different crate than enum_source
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. I decided to not include the crate origins in the Variant
arguments, since that would bloat things. But you are right that this can potentially give rise to some ambiguity, if two crates define variants with the same canonical paths.
@aschackmull : Are you able to review the shared bits in |
TSourceOutputNode(SourceNode source, SummaryNodeState state, string kind, string model) { | ||
state.isSourceOutputState(source, _, kind, model) | ||
} or | ||
TSinkInputNode(SinkNode sink, SummaryNodeState state, string kind, string model) { | ||
state.isSinkInputState(sink, _, kind, model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something needs to change here. We worked very hard to untangle and stratify DataFlow::Node
vs. SummaryNode
such that FlowSummaryImpl
wasn't dependent on DataFlow::Node
. But now they've been made mutually recursive. We need to fix that, i.e. we should not embed DataFlow::Node
s into SummaryNode
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SinkNode
is not a subset of DataFlow::Node
, rather it is a subset of the parameterized type SinkBase
, which for Rust is subset of AstNode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Right.
So I guess the only new connection to DataFlow::Node
is in sourceLocalStep
via StepsInput::getSourceNode
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe SourceNode
and SinkNode
ought to be called something different, because it's perhaps somewhat unclear what kind of "nodes" they are supposed to be - at least I got confused. They're not DataFlow::Node
s and they are not SummaryNode
s. Maybe simply use SourceElement
/SinkElement
instead? And likely elaborate related qldoc a bit.
9dfebe3
to
868caf9
Compare
Overview
This PR adds support for defining flow sources/sinks for Rust with non-trivial access paths. For example, one may define
which models calls to
crate::my_option_source
as flow sources of kindtest-source
, but where the origin of tainted data is not directly the result of the call, but rather stored insideOption::Some
in the result of the call.Similarly, one may define a sink restricted to data stored inside
Option::Some
:Implementation
The implementation piggy-backs on the existing
FlowSummaryImpl
library, which already has functionality for specifying flow summaries with non-trivial input/output access paths. For a flow source likemy_option_source
above, we synthesize a store step from synthetic source nodes to actual calls tomy_option_source
, and dually for sinks likemy_option_sink
, we synthesize read steps from arguments to synthetic sink nodes.The functionality is currently limited to Rust, but other languages should be able to adopt relatively easily.