Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust: Add support for MaD sources and sinks with access paths #18298

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

hvitved
Copy link
Contributor

@hvitved hvitved commented Dec 16, 2024

Overview

This PR adds support for defining flow sources/sinks for Rust with non-trivial access paths. For example, one may define

extensions:
  - addsTo:
      pack: codeql/rust-all
      extensible: sourceModel
    data:
      - ["repo::test", "crate::my_option_source", "ReturnValue.Variant[crate::option::Option::Some(0)]", "test-source", "manual"]

which models calls to crate::my_option_source as flow sources of kind test-source, but where the origin of tainted data is not directly the result of the call, but rather stored inside Option::Some in the result of the call.

Similarly, one may define a sink restricted to data stored inside Option::Some:

extensions:
  - addsTo:
      pack: codeql/rust-all
      extensible: sinkModel
    data:
      - ["repo::test", "crate::my_option_sink", "Argument[0].Variant[crate::option::Option::Some(0)]", "test-sink", "manual"]

Implementation

The implementation piggy-backs on the existing FlowSummaryImpl library, which already has functionality for specifying flow summaries with non-trivial input/output access paths. For a flow source like my_option_source above, we synthesize a store step from synthetic source nodes to actual calls to my_option_source, and dually for sinks like my_option_sink, we synthesize read steps from arguments to synthetic sink nodes.

The functionality is currently limited to Rust, but other languages should be able to adopt relatively easily.

@github-actions github-actions bot added Rust Pull requests that update Rust code DataFlow Library labels Dec 16, 2024
@hvitved hvitved force-pushed the rust/mad-source-sink branch 2 times, most recently from efbbea1 to 67f7387 Compare December 18, 2024 09:46
@hvitved hvitved added the no-change-note-required This PR does not need a change note label Dec 18, 2024
@hvitved hvitved marked this pull request as ready for review December 18, 2024 10:09
@hvitved hvitved requested review from a team as code owners December 18, 2024 10:09
@hvitved hvitved force-pushed the rust/mad-source-sink branch from 67f7387 to 1b31c90 Compare January 6, 2025 12:27
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking forward to this PR being merged. I want to use this functionality in several places already, the tests look like exactly what I need, and I'm experimenting with this branch locally.

I don't feel very confident reviewing the code changes in this PR though - if there are any specific parts you'd like me to look at or discuss please say so.

extensible: sourceModel
data:
- ["repo::test", "crate::enum_source", "ReturnValue.Variant[crate::MyFieldEnum::D::field_d]", "test-source", "manual"]
- ["repo::test", "<crate::MyFieldEnum>::source", "ReturnValue.Variant[crate::MyFieldEnum::C::field_c]", "test-source", "manual"]
Copy link
Contributor

@aibaars aibaars Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the repo:: prefix mean, and why do we need the crate:: prefix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repo::test bit matches Resolvable.getResolvedCrateOrigin and crate::enum_source matches Resolvable.getResolvedPath, so this is simply what we get from the extractor. As for the repo prefix, I'm not sure, but for the crate prefix this appears to be always present in canonical paths.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this work if there are items from multiple crates involved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect those to have different getResolvedCrateOrigin values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect those to have different getResolvedCrateOrigin values.

That's right. I was wondering about the case where a function from one crate has arguments or return values of types in other crates. How would the lines above look if MyFieldEnum was defined in a different crate than enum_source ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. I decided to not include the crate origins in the Variant arguments, since that would bloat things. But you are right that this can potentially give rise to some ambiguity, if two crates define variants with the same canonical paths.

@hvitved
Copy link
Contributor Author

hvitved commented Jan 8, 2025

@aschackmull : Are you able to review the shared bits in FlowSummaryImpl.qll?

Comment on lines 1025 to 1029
TSourceOutputNode(SourceNode source, SummaryNodeState state, string kind, string model) {
state.isSourceOutputState(source, _, kind, model)
} or
TSinkInputNode(SinkNode sink, SummaryNodeState state, string kind, string model) {
state.isSinkInputState(sink, _, kind, model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something needs to change here. We worked very hard to untangle and stratify DataFlow::Node vs. SummaryNode such that FlowSummaryImpl wasn't dependent on DataFlow::Node. But now they've been made mutually recursive. We need to fix that, i.e. we should not embed DataFlow::Nodes into SummaryNodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SinkNode is not a subset of DataFlow::Node, rather it is a subset of the parameterized type SinkBase, which for Rust is subset of AstNode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Right.
So I guess the only new connection to DataFlow::Node is in sourceLocalStep via StepsInput::getSourceNode?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe SourceNode and SinkNode ought to be called something different, because it's perhaps somewhat unclear what kind of "nodes" they are supposed to be - at least I got confused. They're not DataFlow::Nodes and they are not SummaryNodes. Maybe simply use SourceElement/SinkElement instead? And likely elaborate related qldoc a bit.

@hvitved hvitved force-pushed the rust/mad-source-sink branch from 9dfebe3 to 868caf9 Compare January 8, 2025 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C# C++ DataFlow Library Go Java no-change-note-required This PR does not need a change note Python Ruby Rust Pull requests that update Rust code Swift
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants