Python: deprecate AstNode.getAFlowNode() and rewrite internal callers#21919
Python: deprecate AstNode.getAFlowNode() and rewrite internal callers#21919yoff wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Removes the public Python QL API predicate AstNode.getAFlowNode() and rewrites callers to use the inverse mapping via ControlFlowNode.getNode(), as a preparatory refactor for the Python shared-CFG dataflow migration.
Changes:
- Delete
AstNode.getAFlowNode()frompython/ql/lib/semmle/python/AstExtended.qll. - Mechanically rewrite call sites from
e.getAFlowNode() = nton.getNode() = eacross library, queries, tests, and tools. - Add a breaking change-note documenting the API removal and the migration pattern.
Show a summary per file
| File | Description |
|---|---|
| python/tools/recorded-call-graph-metrics/ql/lib/RecordedCalls.qll | Rewrites recorded call resolution to match AST↔CFG via getNode(). |
| python/ql/test/library-tests/PointsTo/local/LocalPointsTo.ql | Updates PointsTo test to bind CFG nodes using getNode(). |
| python/ql/test/library-tests/PointsTo/global/Global.ql | Updates PointsTo test to bind CFG nodes using getNode(). |
| python/ql/test/library-tests/dataflow/tainttracking/TestTaintLib.qll | Rewrites a tainttracking test to compare through ControlFlowNode.getNode(). |
| python/ql/test/library-tests/ControlFlow/splitting/NodeCount.ql | Replaces flow-node counting with a ControlFlowNode comprehension over getNode(). |
| python/ql/test/library-tests/ControlFlow/PointsToSupport/UseFromDefinition.ql | Rewrites DefinitionNode binding from AST using getNode(). |
| python/ql/test/experimental/import-resolution/importflow.ql | Updates import-resolution experiment to pass CFG nodes without getAFlowNode(). |
| python/ql/test/2/library-tests/comprehensions/ConsistencyCheck.ql | Updates comprehension consistency check to use `exists(ControlFlowNode n |
| python/ql/src/Variables/UnusedModuleVariable.ql | Rewrites reachability use from u.getAFlowNode() to an explicit CFG node bound by getNode(). |
| python/ql/src/Variables/UndefinedPlaceHolder.ql | Rewrites SSA/points-to use binding to CFG nodes via getNode(). |
| python/ql/src/Variables/UndefinedGlobal.ql | Rewrites control-flow containment/dominance checks to bind CFG nodes via getNode(). |
| python/ql/src/Variables/Undefined.qll | Rewrites loop-entry edge logic to bind statement CFG nodes via getNode(). |
| python/ql/src/Variables/SuspiciousUnusedLoopIterationVariable.ql | Updates comprehension CFG predecessor navigation to use getNode(). |
| python/ql/src/Variables/MultiplyDefined.ql | Rewrites assignment-definition comparison to use Definition.getNode() = <ast>. |
| python/ql/src/Variables/Loop.qll | Rewrites loop-to-use reachability to use CFG nodes bound via getNode(). |
| python/ql/src/Variables/LeakingListComprehension.ql | Rewrites reachability/dominance constraints to use CFG nodes bound via getNode(). |
| python/ql/src/Variables/Definition.qll | Rewrites list-comprehension declaration reachability to use CFG nodes bound via getNode(). |
| python/ql/src/Statements/SideEffectInAssert.ql | Rewrites subprocess-call matching to bind the call CFG node via getNode(). |
| python/ql/src/Statements/NonIteratorInForLoop.ql | Rewrites for-iter binding to iter.getNode() = loop.getIter(). |
| python/ql/src/Statements/NestedLoopsSameVariableWithReuse.ql | Rewrites SSA definition matching to compare via getNode(). |
| python/ql/src/Statements/IterableStringOrSequence.ql | Rewrites for-iter binding to use getNode(). |
| python/ql/src/semmle/python/functions/ModificationOfParameterWithDefaultCustomizations.qll | Rewrites augmented-assignment target matching to use asCfgNode().getNode(). |
| python/ql/src/Security/CWE-798/HardcodedCredentials.ql | Rewrites keyword-value matching to use asCfgNode().getNode() = k.getValue(). |
| python/ql/src/Resources/FileOpen.qll | Rewrites return-value/SSA linking to use AST via getNode() rather than getAFlowNode(). |
| python/ql/src/Functions/ReturnValueIgnored.ql | Rewrites the “unused return value” counting logic to bind call CFG nodes via getNode(). |
| python/ql/src/Functions/ExplicitReturnInInit.ql | Rewrites “never returns” call detection to bind return-value CFG via getNode(). |
| python/ql/src/Expressions/TruncatedDivision.ql | Rewrites BinaryExprNode binding to bin.getNode() = div. |
| python/ql/src/Expressions/IsComparisons.qll | Rewrites CompareNode binding to fcomp.getNode() = comp. |
| python/ql/src/Expressions/IncorrectComparisonUsingIs.ql | Rewrites CompareNode binding to fcomp.getNode() = comp. |
| python/ql/src/Expressions/Formatting/AdvancedFormatting.qll | Rewrites format-string points-to checks to pass a CFG node bound via getNode(). |
| python/ql/src/Expressions/DuplicateKeyInDictionaryLiteral.ql | Rewrites block-order and dominance comparisons to use CFG nodes bound via getNode(). |
| python/ql/src/Expressions/CallArgs.qll | Rewrites call matching helpers to bind call CFG nodes via getNode(). |
| python/ql/src/Exceptions/CatchingBaseException.ql | Rewrites except-block “reaches exit” check to use CFG nodes bound via getNode(). |
| python/ql/src/Classes/ClassAttributes.qll | Rewrites SSA and points-to bindings for attributes to use CFG nodes via getNode(). |
| python/ql/src/analysis/PointsToFailure.ql | Rewrites points-to failure detection to bind CFG nodes via getNode(). |
| python/ql/src/analysis/KeyPointsToFailure.ql | Rewrites key points-to failure logic to bind SSA uses via CFG nodes from getNode(). |
| python/ql/src/analysis/ImportFailure.ql | Rewrites import failure checks to bind import CFG nodes via getNode(). |
| python/ql/src/analysis/CrossProjectDefinitions.qll | Rewrites attribute resolution plumbing to pass CFG nodes bound via getNode(). |
| python/ql/lib/semmle/python/types/Object.qll | Rewrites vararg-origin checks to compare via ControlFlowNode.getNode(). |
| python/ql/lib/semmle/python/types/FunctionObject.qll | Rewrites parameter-definition linking to compare via getNode(). |
| python/ql/lib/semmle/python/types/Extensions.qll | Rewrites loop iterable/target bindings to compare via getNode(). |
| python/ql/lib/semmle/python/types/Exceptions.qll | Rewrites tuple-element extraction to bind element CFG node via getNode(). |
| python/ql/lib/semmle/python/types/ClassObject.qll | Rewrites metaclass/origin bindings to use CFG→AST via getNode(). |
| python/ql/lib/semmle/python/SelfAttribute.qll | Rewrites dominance/guard checks to bind self/object CFG nodes via getNode(). |
| python/ql/lib/semmle/python/pointsto/PointsTo.qll | Rewrites import/base/decorator/metaclass SSA bindings to use CFG→AST via getNode(). |
| python/ql/lib/semmle/python/objects/TObject.qll | Rewrites decorator CFG binding to use deco.getNode() = f.getADecorator(). |
| python/ql/lib/semmle/python/objects/Classes.qll | Rewrites class-expression origin binding to use an explicit CFG node via getNode(). |
| python/ql/lib/semmle/python/objects/Callables.qll | Rewrites callable-expression origin binding and return BB lookup via getNode(). |
| python/ql/lib/semmle/python/internal/CachedStages.qll | Rewrites cached-stage trigger from AST→CFG existence to CFG→AST via getNode(). |
| python/ql/lib/semmle/python/Import.qll | Removes now-dangling getAFlowNode() override on ImportMember. |
| python/ql/lib/semmle/python/frameworks/FlaskAdmin.qll | Rewrites decorator matching for route handlers via node.getNode() = decorator. |
| python/ql/lib/semmle/python/frameworks/Flask.qll | Rewrites decorator matching and return-value flow-node binding via getNode(). |
| python/ql/lib/semmle/python/frameworks/FastApi.qll | Rewrites decorator matching for route handlers via getNode(). |
| python/ql/lib/semmle/python/frameworks/Bottle.qll | Rewrites decorator matching for route handlers via getNode(). |
| python/ql/lib/semmle/python/Flow.qll | Rewrites internal legacy-CFG predicates/methods to use ControlFlowNode.getNode() comparisons. |
| python/ql/lib/semmle/python/Exprs.qll | Rewrites side-effect detection to bind an expression CFG node via getNode() and removes dangling overrides. |
| python/ql/lib/semmle/python/essa/SsaDefinitions.qll | Rewrites SSA source-definition predicates to bind defs via defn.getNode() = var. |
| python/ql/lib/semmle/python/dataflow/old/Implementation.qll | Rewrites legacy dataflow plumbing to bind return/yield/assign nodes via getNode(). |
| python/ql/lib/semmle/python/dataflow/new/internal/VariableCapture.qll | Rewrites captured-variable reads/writes to bind CFG nodes via getNode(). |
| python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPublic.qll | Rewrites module-variable local read to bind via asCfgNode().getNode(). |
| python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPrivate.qll | Rewrites yield store steps to bind yield/value via asCfgNode().getNode(). |
| python/ql/lib/semmle/python/dataflow/new/internal/DataFlowDispatch.qll | Rewrites extracted return/yield node selection to bind via node.getNode() = .... |
| python/ql/lib/semmle/python/dataflow/new/internal/Attributes.qll | Rewrites class-definition-as-attr object node to bind via asCfgNode().getNode() = cls. |
| python/ql/lib/semmle/python/dataflow/new/BarrierGuards.qll | Rewrites constant-comparison guard logic to bind constants’ CFG nodes via getNode(). |
| python/ql/lib/semmle/python/AstExtended.qll | Removes the public cached predicate AstNode.getAFlowNode(). |
| python/ql/lib/LegacyPointsTo.qll | Rewrites points-to/refers-to helpers to bind both expression and origin CFG nodes via getNode(). |
| python/ql/lib/change-notes/2026-05-19-remove-getAFlowNode.md | Adds a breaking change-note documenting the API removal and replacement pattern. |
| python/ql/lib/analysis/DefinitionTracking.qll | Rewrites definition tracking entry points to bind use CFG nodes via getNode(). |
Copilot's findings
- Files reviewed: 68/68 changed files
- Comments generated: 0
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll) and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade (semmle.python.controlflow.internal.Cfg) and the new SSA adapter (semmle.python.dataflow.new.internal.SsaImpl), both introduced additively in the preceding PRs in this stack. This is the trunk-flip equivalent of the original draft PR #21894 (kept around as documentation), rebased on top of the four preparatory PRs: P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919). P2: Qualify Flow.qll's AST references with Py:: prefix (#21920). P3: Add new shared-CFG-backed control flow graph (#21921). P4: Add new shared-SSA-backed SSA adapter (#21923). The Python dataflow library (semmle/python/dataflow/new/) now imports the new CFG facade and SSA adapter. All CFG-typed predicates (ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are qualified with the Cfg:: prefix; SSA references switch from EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable. GuardNode is redesigned to use the new CFG's outcome-node model (isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock + flipped indirection. Only BarrierGuard<...> is preserved as public API. Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib, ...) are updated to take CFG nodes from the new facade. A handful of dataflow consistency tweaks for the new CFG: - Augmented-assignment targets are treated as both load and store. - 'from X import *' produces uncertain SSA writes for unknown names. - CFG nodes are canonicalised so dataflow does not see equivalent pre/post-order pairs as distinct nodes. Two AST tweaks for the new CFG: - AstNodeImpl: omit PEP 695 type-parameter names from FunctionDefExpr / ClassDefExpr children. - ImportResolution: drop the legacy essa import. Test churn (~175 files): reblessed library- and query-test .expected files reflect slightly different CFG granularity, different toString output, and a handful of true alert deltas in security queries. Verification: all 367 lib + src + consistency-queries compile clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll) and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade (semmle.python.controlflow.internal.Cfg) and the new SSA adapter (semmle.python.dataflow.new.internal.SsaImpl), both introduced additively in the preceding PRs in this stack. This is the trunk-flip equivalent of the original draft PR #21894 (kept around as documentation), rebased on top of the four preparatory PRs: P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919). P2: Qualify Flow.qll's AST references with Py:: prefix (#21920). P3: Add new shared-CFG-backed control flow graph (#21921). P4: Add new shared-SSA-backed SSA adapter (#21923). The Python dataflow library (semmle/python/dataflow/new/) now imports the new CFG facade and SSA adapter. All CFG-typed predicates (ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are qualified with the Cfg:: prefix; SSA references switch from EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable. GuardNode is redesigned to use the new CFG's outcome-node model (isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock + flipped indirection. Only BarrierGuard<...> is preserved as public API. Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib, ...) are updated to take CFG nodes from the new facade. A handful of dataflow consistency tweaks for the new CFG: - Augmented-assignment targets are treated as both load and store. - 'from X import *' produces uncertain SSA writes for unknown names. - CFG nodes are canonicalised so dataflow does not see equivalent pre/post-order pairs as distinct nodes. Two AST tweaks for the new CFG: - AstNodeImpl: omit PEP 695 type-parameter names from FunctionDefExpr / ClassDefExpr children. - ImportResolution: drop the legacy essa import. Test churn (~175 files): reblessed library- and query-test .expected files reflect slightly different CFG granularity, different toString output, and a handful of true alert deltas in security queries. Verification: all 367 lib + src + consistency-queries compile clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll) and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade (semmle.python.controlflow.internal.Cfg) and the new SSA adapter (semmle.python.dataflow.new.internal.SsaImpl), both introduced additively in the preceding PRs in this stack. This is the trunk-flip equivalent of the original draft PR #21894 (kept around as documentation), rebased on top of the four preparatory PRs: P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919). P2: Qualify Flow.qll's AST references with Py:: prefix (#21920). P3: Add new shared-CFG-backed control flow graph (#21921). P4: Add new shared-SSA-backed SSA adapter (#21923). The Python dataflow library (semmle/python/dataflow/new/) now imports the new CFG facade and SSA adapter. All CFG-typed predicates (ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are qualified with the Cfg:: prefix; SSA references switch from EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable. GuardNode is redesigned to use the new CFG's outcome-node model (isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock + flipped indirection. Only BarrierGuard<...> is preserved as public API. Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib, ...) are updated to take CFG nodes from the new facade. A handful of dataflow consistency tweaks for the new CFG: - Augmented-assignment targets are treated as both load and store. - 'from X import *' produces uncertain SSA writes for unknown names. - CFG nodes are canonicalised so dataflow does not see equivalent pre/post-order pairs as distinct nodes. Two AST tweaks for the new CFG: - AstNodeImpl: omit PEP 695 type-parameter names from FunctionDefExpr / ClassDefExpr children. - ImportResolution: drop the legacy essa import. Test churn (~175 files): reblessed library- and query-test .expected files reflect slightly different CFG granularity, different toString output, and a handful of true alert deltas in security queries. Verification: all 367 lib + src + consistency-queries compile clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll) and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade (semmle.python.controlflow.internal.Cfg) and the new SSA adapter (semmle.python.dataflow.new.internal.SsaImpl), both introduced additively in the preceding PRs in this stack. This is the trunk-flip equivalent of the original draft PR #21894 (kept around as documentation), rebased on top of the four preparatory PRs: P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919). P2: Qualify Flow.qll's AST references with Py:: prefix (#21920). P3: Add new shared-CFG-backed control flow graph (#21921). P4: Add new shared-SSA-backed SSA adapter (#21923). The Python dataflow library (semmle/python/dataflow/new/) now imports the new CFG facade and SSA adapter. All CFG-typed predicates (ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are qualified with the Cfg:: prefix; SSA references switch from EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable. GuardNode is redesigned to use the new CFG's outcome-node model (isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock + flipped indirection. Only BarrierGuard<...> is preserved as public API. Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib, ...) are updated to take CFG nodes from the new facade. A handful of dataflow consistency tweaks for the new CFG: - Augmented-assignment targets are treated as both load and store. - 'from X import *' produces uncertain SSA writes for unknown names. - CFG nodes are canonicalised so dataflow does not see equivalent pre/post-order pairs as distinct nodes. Two AST tweaks for the new CFG: - AstNodeImpl: omit PEP 695 type-parameter names from FunctionDefExpr / ClassDefExpr children. - ImportResolution: drop the legacy essa import. Test churn (~175 files): reblessed library- and query-test .expected files reflect slightly different CFG granularity, different toString output, and a handful of true alert deltas in security queries. Verification: all 367 lib + src + consistency-queries compile clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The internal predicates that identify `@staticmethod`, `@classmethod` and `@property` decorators previously required the decorator's `NameNode` to satisfy `isGlobal()` (i.e. no SSA def reaches the decorator's name use). That filter was correct but unnecessarily indirect: these three names are builtins, and even when a class body redefines one, the class body has not started executing at the decorator position, so Python uses the builtin. Match the decorator's AST `Name` directly instead, dropping the CFG/SSA detour. The slight semantic change — `isGlobal()` would have rejected module-level shadowing of these builtins — is negligible in practice and explicitly documented in the change note. `hasContextmanagerDecorator` and `hasOverloadDecorator` keep the `NameNode.isGlobal()` check because their target names (`contextmanager`, `overload`) are imported, not builtin, and local shadowing is a real concern. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
befb51a to
8e4f127
Compare
tausbn
left a comment
There was a problem hiding this comment.
I'm a bit conflicted about this PR. The changes themselves look reasonable, but it does introduce quite a lot of churn.
Also, I'm not sure I find the "why" (that other languages don't have this) terribly persuasive. The getAFlowNode predicate is a simple inversion of getNode, and it seems to me that it would be just as useful in a world where we use the new CFG library.
Rather, it seems to me that the issue is that -- at present -- getAFlowNode only returns an old CFG node, which ties the dataflow library to that API.
However, this suggests we could have used a different approach here: rename getAFlowNode to getALegacyFlowNode. The latter can be made deprecated, which allows external users to continue to use it until we get rid of it entirely. When adding the new CFG library, we can then re-add getAFlowNode with the old API, and we don't have to make any changes to the dataflow libraries (apart from a slight detour to using getALegacyFlowNode).
What do you think?
Finally, I'll be curious to see if this affects performance in any way. One way to guarantee that it doesn't would be to look at the RA before and after the change, and verify that it is (hopefully) identical.
I actually consider it a clean-up. I would like to untangle our hierarchies as much as possible and I think that the AST not knowing about the CFG is a feature. Are you worried that users will experience a lot of churn from this?
That is a nice suggestion! |
Yeah, that's my main concern, seeing as In that case, how about we add back |
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll) and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade (semmle.python.controlflow.internal.Cfg) and the new SSA adapter (semmle.python.dataflow.new.internal.SsaImpl), both introduced additively in the preceding PRs in this stack. This is the trunk-flip equivalent of the original draft PR #21894 (kept around as documentation), rebased on top of the four preparatory PRs: P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919). P2: Qualify Flow.qll's AST references with Py:: prefix (#21920). P3: Add new shared-CFG-backed control flow graph (#21921). P4: Add new shared-SSA-backed SSA adapter (#21923). The Python dataflow library (semmle/python/dataflow/new/) now imports the new CFG facade and SSA adapter. All CFG-typed predicates (ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are qualified with the Cfg:: prefix; SSA references switch from EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable. GuardNode is redesigned to use the new CFG's outcome-node model (isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock + flipped indirection. Only BarrierGuard<...> is preserved as public API. Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib, ...) are updated to take CFG nodes from the new facade. A handful of dataflow consistency tweaks for the new CFG: - Augmented-assignment targets are treated as both load and store. - 'from X import *' produces uncertain SSA writes for unknown names. - CFG nodes are canonicalised so dataflow does not see equivalent pre/post-order pairs as distinct nodes. Two AST tweaks for the new CFG: - AstNodeImpl: omit PEP 695 type-parameter names from FunctionDefExpr / ClassDefExpr children. - ImportResolution: drop the legacy essa import. Test churn (~175 files): reblessed library- and query-test .expected files reflect slightly different CFG granularity, different toString output, and a handful of true alert deltas in security queries. Verification: all 367 lib + src + consistency-queries compile clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Yes, I supose we could do that. |
The new-dataflow ImportResolution module only used semmle.python.essa.SsaDefinitions for the 5-line helper predicate SsaSource::init_module_submodule_defn. Inline it locally and drop the dependency on legacy SsaDefinitions. This is the only remaining direct import of semmle.python.essa.* in the new dataflow stack, so dropping it makes the layering cleaner. Semantic noop on the current SSA: SsaSourceVariable.getName() and GlobalVariable.getId() both project the same DB column (variable(_,_,result)), and the old call's 'init.getEntryNode() = f' join was just constraining init = package via Scope.getEntryNode()'s functional uniqueness. RA dump of accesses.ql confirms only the expected predicate-rename shuffle; all 70 dataflow + ApiGraphs library tests pass. This factors out commit 8cab5a2 from the larger shared-CFG migration #21925. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
8e4f127 to
956b7a4
Compare
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll) and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade (semmle.python.controlflow.internal.Cfg) and the new SSA adapter (semmle.python.dataflow.new.internal.SsaImpl), both introduced additively in the preceding PRs in this stack. This is the trunk-flip equivalent of the original draft PR #21894 (kept around as documentation), rebased on top of the four preparatory PRs: P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919). P2: Qualify Flow.qll's AST references with Py:: prefix (#21920). P3: Add new shared-CFG-backed control flow graph (#21921). P4: Add new shared-SSA-backed SSA adapter (#21923). The Python dataflow library (semmle/python/dataflow/new/) now imports the new CFG facade and SSA adapter. All CFG-typed predicates (ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are qualified with the Cfg:: prefix; SSA references switch from EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable. GuardNode is redesigned to use the new CFG's outcome-node model (isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock + flipped indirection. Only BarrierGuard<...> is preserved as public API. Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib, ...) are updated to take CFG nodes from the new facade. A handful of dataflow consistency tweaks for the new CFG: - Augmented-assignment targets are treated as both load and store. - 'from X import *' produces uncertain SSA writes for unknown names. - CFG nodes are canonicalised so dataflow does not see equivalent pre/post-order pairs as distinct nodes. Two AST tweaks for the new CFG: - AstNodeImpl: omit PEP 695 type-parameter names from FunctionDefExpr / ClassDefExpr children. - ImportResolution: drop the legacy essa import. Test churn (~175 files): reblessed library- and query-test .expected files reflect slightly different CFG granularity, different toString output, and a handful of true alert deltas in security queries. Verification: all 367 lib + src + consistency-queries compile clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Preparatory refactor for the shared-CFG dataflow migration. Deprecates the AstNode.getAFlowNode() cached predicate on the public Python QL API and rewrites all ~140 internal callers across lib/, src/, test/, and tools/ from `expr.getAFlowNode() = cfgNode` to `cfgNode.getNode() = expr`, using ControlFlowNode.getNode() which already exists in Flow.qll. The predicate itself is preserved (with a deprecation note pointing at the new pattern) so external users do not experience churn — they can migrate at their own pace and the AST/CFG hierarchies still get the intended untangling once the deprecation eventually elapses. Semantic noop verified by: - All 361 lib/ + src/ queries compile clean. - All 122 ControlFlow + PointsTo library-tests pass. - All 64 dataflow library-tests pass. - All 113 Variables/Exceptions/Expressions/Statements/Functions/Imports/ Security/CWE-798/ModificationOfParameterWithDefault query-tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rnal callers Follow-up to the getAFlowNode deprecation in the same PR: same AST→legacy-CFG bridge pattern. The 11 internal call sites (across objects/, types/, frameworks/, and TypeTrackingImpl) are rewritten to bind a `Return ret` explicitly, then constrain via `ret.getScope() = f and n.getNode() = ret.getValue()`. The predicate itself is preserved with a deprecation note so external users do not experience churn. Semantic noop. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
956b7a4 to
0b473e3
Compare
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll) and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade (semmle.python.controlflow.internal.Cfg) and the new SSA adapter (semmle.python.dataflow.new.internal.SsaImpl), both introduced additively in the preceding PRs in this stack. This is the trunk-flip equivalent of the original draft PR #21894 (kept around as documentation), rebased on top of the four preparatory PRs: P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919). P2: Qualify Flow.qll's AST references with Py:: prefix (#21920). P3: Add new shared-CFG-backed control flow graph (#21921). P4: Add new shared-SSA-backed SSA adapter (#21923). The Python dataflow library (semmle/python/dataflow/new/) now imports the new CFG facade and SSA adapter. All CFG-typed predicates (ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are qualified with the Cfg:: prefix; SSA references switch from EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable. GuardNode is redesigned to use the new CFG's outcome-node model (isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock + flipped indirection. Only BarrierGuard<...> is preserved as public API. Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib, ...) are updated to take CFG nodes from the new facade. A handful of dataflow consistency tweaks for the new CFG: - Augmented-assignment targets are treated as both load and store. - 'from X import *' produces uncertain SSA writes for unknown names. - CFG nodes are canonicalised so dataflow does not see equivalent pre/post-order pairs as distinct nodes. Two AST tweaks for the new CFG: - AstNodeImpl: omit PEP 695 type-parameter names from FunctionDefExpr / ClassDefExpr children. - ImportResolution: drop the legacy essa import. Test churn (~175 files): reblessed library- and query-test .expected files reflect slightly different CFG granularity, different toString output, and a handful of true alert deltas in security queries. Verification: all 367 lib + src + consistency-queries compile clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll) and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade (semmle.python.controlflow.internal.Cfg) and the new SSA adapter (semmle.python.dataflow.new.internal.SsaImpl), both introduced additively in the preceding PRs in this stack. This is the trunk-flip equivalent of the original draft PR #21894 (kept around as documentation), rebased on top of the four preparatory PRs: P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919). P2: Qualify Flow.qll's AST references with Py:: prefix (#21920). P3: Add new shared-CFG-backed control flow graph (#21921). P4: Add new shared-SSA-backed SSA adapter (#21923). The Python dataflow library (semmle/python/dataflow/new/) now imports the new CFG facade and SSA adapter. All CFG-typed predicates (ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are qualified with the Cfg:: prefix; SSA references switch from EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable. GuardNode is redesigned to use the new CFG's outcome-node model (isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock + flipped indirection. Only BarrierGuard<...> is preserved as public API. Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib, ...) are updated to take CFG nodes from the new facade. A handful of dataflow consistency tweaks for the new CFG: - Augmented-assignment targets are treated as both load and store. - 'from X import *' produces uncertain SSA writes for unknown names. - CFG nodes are canonicalised so dataflow does not see equivalent pre/post-order pairs as distinct nodes. Two AST tweaks for the new CFG: - AstNodeImpl: omit PEP 695 type-parameter names from FunctionDefExpr / ClassDefExpr children. - ImportResolution: drop the legacy essa import. Test churn (~175 files): reblessed library- and query-test .expected files reflect slightly different CFG granularity, different toString output, and a handful of true alert deltas in security queries. Verification: all 367 lib + src + consistency-queries compile clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll) and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade (semmle.python.controlflow.internal.Cfg) and the new SSA adapter (semmle.python.dataflow.new.internal.SsaImpl), both introduced additively in the preceding PRs in this stack. This is the trunk-flip equivalent of the original draft PR #21894 (kept around as documentation), rebased on top of the four preparatory PRs: P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919). P2: Qualify Flow.qll's AST references with Py:: prefix (#21920). P3: Add new shared-CFG-backed control flow graph (#21921). P4: Add new shared-SSA-backed SSA adapter (#21923). The Python dataflow library (semmle/python/dataflow/new/) now imports the new CFG facade and SSA adapter. All CFG-typed predicates (ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are qualified with the Cfg:: prefix; SSA references switch from EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable. GuardNode is redesigned to use the new CFG's outcome-node model (isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock + flipped indirection. Only BarrierGuard<...> is preserved as public API. Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib, ...) are updated to take CFG nodes from the new facade. A handful of dataflow consistency tweaks for the new CFG: - Augmented-assignment targets are treated as both load and store. - 'from X import *' produces uncertain SSA writes for unknown names. - CFG nodes are canonicalised so dataflow does not see equivalent pre/post-order pairs as distinct nodes. Two AST tweaks for the new CFG: - AstNodeImpl: omit PEP 695 type-parameter names from FunctionDefExpr / ClassDefExpr children. - ImportResolution: drop the legacy essa import. Test churn (~175 files): reblessed library- and query-test .expected files reflect slightly different CFG granularity, different toString output, and a handful of true alert deltas in security queries. Verification: all 367 lib + src + consistency-queries compile clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll) and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade (semmle.python.controlflow.internal.Cfg) and the new SSA adapter (semmle.python.dataflow.new.internal.SsaImpl), both introduced additively in the preceding PRs in this stack. This is the trunk-flip equivalent of the original draft PR #21894 (kept around as documentation), rebased on top of the four preparatory PRs: P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919). P2: Qualify Flow.qll's AST references with Py:: prefix (#21920). P3: Add new shared-CFG-backed control flow graph (#21921). P4: Add new shared-SSA-backed SSA adapter (#21923). The Python dataflow library (semmle/python/dataflow/new/) now imports the new CFG facade and SSA adapter. All CFG-typed predicates (ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are qualified with the Cfg:: prefix; SSA references switch from EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable. GuardNode is redesigned to use the new CFG's outcome-node model (isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock + flipped indirection. Only BarrierGuard<...> is preserved as public API. Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib, ...) are updated to take CFG nodes from the new facade. A handful of dataflow consistency tweaks for the new CFG: - Augmented-assignment targets are treated as both load and store. - 'from X import *' produces uncertain SSA writes for unknown names. - CFG nodes are canonicalised so dataflow does not see equivalent pre/post-order pairs as distinct nodes. Two AST tweaks for the new CFG: - AstNodeImpl: omit PEP 695 type-parameter names from FunctionDefExpr / ClassDefExpr children. - ImportResolution: drop the legacy essa import. Test churn (~175 files): reblessed library- and query-test .expected files reflect slightly different CFG granularity, different toString output, and a handful of true alert deltas in security queries. Verification: all 367 lib + src + consistency-queries compile clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Flips the Python dataflow trunk from the legacy CFG (semmle/python/Flow.qll) and legacy ESSA SSA (semmle/python/essa/*) to the new shared CFG facade (semmle.python.controlflow.internal.Cfg) and the new SSA adapter (semmle.python.dataflow.new.internal.SsaImpl), both introduced additively in the preceding PRs in this stack. This is the trunk-flip equivalent of the original draft PR #21894 (kept around as documentation), rebased on top of the four preparatory PRs: P1: Remove AstNode.getAFlowNode() and rewrite callers (#21919). P2: Qualify Flow.qll's AST references with Py:: prefix (#21920). P3: Add new shared-CFG-backed control flow graph (#21921). P4: Add new shared-SSA-backed SSA adapter (#21923). The Python dataflow library (semmle/python/dataflow/new/) now imports the new CFG facade and SSA adapter. All CFG-typed predicates (ControlFlowNode, CallNode, BasicBlock, NameNode, AttrNode, ...) are qualified with the Cfg:: prefix; SSA references switch from EssaVariable/EssaDefinition to SsaImpl::Definition/SourceVariable. GuardNode is redesigned to use the new CFG's outcome-node model (isAfterTrue / isAfterFalse) instead of the legacy ConditionBlock + flipped indirection. Only BarrierGuard<...> is preserved as public API. Framework files (Bottle, FastApi, Django, Tornado, Pyramid, Stdlib, ...) are updated to take CFG nodes from the new facade. A handful of dataflow consistency tweaks for the new CFG: - Augmented-assignment targets are treated as both load and store. - 'from X import *' produces uncertain SSA writes for unknown names. - CFG nodes are canonicalised so dataflow does not see equivalent pre/post-order pairs as distinct nodes. Two AST tweaks for the new CFG: - AstNodeImpl: omit PEP 695 type-parameter names from FunctionDefExpr / ClassDefExpr children. - ImportResolution: drop the legacy essa import. Test churn (~175 files): reblessed library- and query-test .expected files reflect slightly different CFG granularity, different toString output, and a handful of true alert deltas in security queries. Verification: all 367 lib + src + consistency-queries compile clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Preparatory refactor for the shared-CFG dataflow migration (#21894).
Deprecates the
AstNode.getAFlowNode()cached predicate on the public Python QL API and rewrites all ~140 internal callers acrosslib/,src/,test/, andtools/fromexpr.getAFlowNode() = cfgNodetocfgNode.getNode() = expr, usingControlFlowNode.getNode()which already exists inFlow.qll. Follow-up commit does the same forFunction.getAReturnValueFlowNode()(11 internal call sites).(Previously this PR removed both predicates outright; based on review feedback it now keeps them as
deprecatedso external users can migrate at their own pace.)Why
The dataflow library is being migrated to the shared CFG. In the new world there is no AST → CFG predicate (analogous to Java, Ruby, Swift, etc.) — the bridge only goes from CFG → AST via
ControlFlowNode.getNode(). This PR factors out the call-site rewrite so that the bigger migration PR contains only genuinely new logic, while leaving the deprecated wrappers in place to avoid churn for external users.Semantic effect
None. The two forms are exact equivalents in the legacy CFG:
Verified by:
lib/+src/queries compile clean.Notes for reviewers
AstNode.getAFlowNode()inpython/ql/lib/semmle/python/AstExtended.qllandFunction.getAReturnValueFlowNode()inFunction.qllare now markeddeprecatedwith a docstring pointing at the recommended replacement.Exprs.qll/Import.qllclasses hadoverride XxxNode getAFlowNode() { result = super.getAFlowNode() }redeclarations that become dangling under the deprecation; removed to keep the API surface tidy.deprecatedchange-notes are included.Stacked under #21930.