Skip to content

Align all remaining error messages with CPython#7993

Open
JamesClarke7283 wants to merge 7 commits into
RustPython:mainfrom
JamesClarke7283:align-aug-assign-delete-set-dict-errors
Open

Align all remaining error messages with CPython#7993
JamesClarke7283 wants to merge 7 commits into
RustPython:mainfrom
JamesClarke7283:align-aug-assign-delete-set-dict-errors

Conversation

@JamesClarke7283
Copy link
Copy Markdown
Contributor

@JamesClarke7283 JamesClarke7283 commented May 29, 2026

Summary

Continues the parser error-message alignment from #7928 / #7933 / #7988. Translates many more ruff ParseErrorType variants to CPython 3.14.5's exact wording in CompileError::from_ruff_parse_error (using the source slice / parsed AST kind), plus a few codegen/symtable checks.

crates/vm/src/stdlib/_ast.rs now routes its parse errors through from_ruff_parse_error, so ast.parse() / compile(..., PyCF_ONLY_AST) produce the same CPython-aligned messages as the exec path (they previously leaked raw ruff strings — and this fixes a regression where the var-param message diverged on the AST path).

What's aligned

  • aug-assign / delete / set / dict / f-string / t-string assignment targets
  • cannot use {attribute,subscript,tuple,list,literal,function call} as import target; import X from Y → "Did you mean to use 'from ... import ...' instead?"
  • string-prefix incompatibility ('u' and 'b' prefixes are incompatible); invalid character 'X' (U+XXXX)
  • parenthesized def/lambda params; missing default / argument value; dict : and value syntax
  • 'elif' block follows an 'else' block; raise from; comprehension if / unparenthesized target; ternary statement-keyword hints
  • match case … as <target>cannot use {kind} as pattern target, and case … as _cannot use '_' as a target
  • __debug__ as a def/class/type-param/except-handler name → cannot assign to __debug__
  • name 'x' is nonlocal and global; generic type-parameter wording (… cannot be used within the definition of a generic)

This drops the now-passing # TODO: RUSTPYTHON; Wrong error message doctest markers and @expectedFailure decorators across test_syntax.py, test_genexps.py, test_named_expressions.py, and test_patma.py.

Not covered (intentionally left marked)

A few test_syntax.py cases need ruff-parser or deeper codegen changes and remain marked rather than emitting a silently-wrong message: the type-comment bare * case, class C(x for x in L) and f((x)=2) (ruff accepts these without a parse error), and dict(...); x $ y (duplicate-keyword vs lexer ordering). Runtime-error messages in unrelated subsystems (struct-sequence, format-spec, datetime, __import__, ast __replace__) are out of scope for this parser-focused change.

Verification

  • cargo clippy -p rustpython-compiler -p rustpython-codegen -- -D warnings clean; cargo fmt.
  • Every changed message diff-tested against CPython 3.14.5 through both the exec path and ast.parse().
  • test_syntax test_genexps test_named_expressions test_patma test_type_params test_scope test_grammar test_compile test_exceptions all pass (961 tests).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Reject the reserved identifier debug in more assignment, function/class, keyword-arg and type-parameter contexts
    • Disallow using the wildcard _ as a capture/store target
    • Prevent conflicting global/nonlocal declarations
    • More CPython-aligned and clearer syntax/compile error messages; improved parse/compile error reporting and wording
  • Chores

    • Minor lint/implementation adjustments and a WASM import linkage hint for host imports

Review Change Stack

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 29, 2026

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[ ] test: cpython/Lib/test/test_unicodedata.py (TODO: 22)
[x] test: cpython/Lib/test/test_unicode_file.py
[x] test: cpython/Lib/test/test_unicode_file_functions.py
[x] test: cpython/Lib/test/test_unicode_identifiers.py
[x] test: cpython/Lib/test/test_ucn.py (TODO: 4)

dependencies:

dependent tests: (no tests depend on unicode)

[ ] test: cpython/Lib/test/test_patma.py (TODO: 13)

dependencies:

dependent tests: (no tests depend on patma)

[ ] test: cpython/Lib/test/test_generators.py (TODO: 11)
[ ] test: cpython/Lib/test/test_genexps.py (TODO: 4)
[x] test: cpython/Lib/test/test_generator_stop.py
[x] test: cpython/Lib/test/test_yield_from.py (TODO: 1)

dependencies:

dependent tests: (no tests depend on generator)

[ ] test: cpython/Lib/test/test_syntax.py (TODO: 17)

dependencies:

dependent tests: (no tests depend on syntax)

[x] test: cpython/Lib/test/test_dict.py (TODO: 6)
[x] test: cpython/Lib/test/test_dictcomps.py
[x] test: cpython/Lib/test/test_dictviews.py (TODO: 1)
[x] test: cpython/Lib/test/test_userdict.py
[x] test: cpython/Lib/test/mapping_tests.py

dependencies:

dependent tests: (no tests depend on dict)

[x] test: cpython/Lib/test/test_named_expressions.py (TODO: 1)

dependencies:

dependent tests: (no tests depend on named_expressions)

[x] test: cpython/Lib/test/test_str.py (TODO: 7)
[x] test: cpython/Lib/test/test_fstring.py (TODO: 18)
[x] test: cpython/Lib/test/test_string_literals.py (TODO: 4)

dependencies:

dependent tests: (no tests depend on str)

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Tightens symbol validation (rejecting __debug__ in more contexts, forbidding _ as a pattern target, and enforcing global/nonlocal mutual exclusion), refactors parse-error translation into CPython-style diagnostics with centralized helpers, and routes parser errors through the AST/VM conversion path while preserving select message casing.

Changes

Symbol validation and parse error alignment

Layer / File(s) Summary
Symbol and name validation tightening
crates/codegen/src/compile.rs, crates/codegen/src/symboltable.rs
Function and class definitions now reject __debug__ in names and type parameters; except ... as NAME aliases run name checks before registration; _ is rejected as a pattern capture target; global/nonlocal declarations are mutually exclusive; type-parameter error context phrasing updated.
Parse error helper infrastructure
crates/compiler/src/lib.rs
Introduces adjusted_locations and many predicate/message helpers for indented-block normalization, missing defaults/arguments detection, incompatible string prefix messaging, parenthesized-param detection, starred/keyword/collection LHS detection, comprehension/import/match shape validators, and related helpers used to produce CPython-like diagnostics.
Parse error type translations
crates/compiler/src/lib.rs
Rewrites invalid-assignment-target, invalid-delete/augmented/named-target families and many Ruff-specific parse-error shapes to parse offending text, infer expression kinds, and emit CPython-style OtherError messages with adjusted ranges and targeted wording.
VM/AST error routing and minor runtime tweaks
crates/vm/src/stdlib/_ast.rs, crates/vm/src/vm/vm_new.rs, crates/vm/src/builtins/weakref.rs, crates/vm/src/stdlib/builtins.rs, example_projects/wasm32_without_js/...
parse() and parse_func_type() now route parser failures through CompileError::from_ruff_parse_error; SyntaxErrorInfo preserves select uppercase OtherError messages and drops two prior special-case arms; small changes include using zip for weakref comparisons, a clippy lint attribute tweak in pow, and a wasm import attribute addition.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • youknowone
  • ShaharNaveh

Poem

🐇 I hop through parse trees, nibbling errors small,
I banish naughty names—no debug at all.
Underscore captures? I toss them aside.
Ruff's clatters stitched to CPython, neat and wide.
Hop, hop — diagnostics tidy, now let's glide.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Align all remaining error messages with CPython' directly and clearly describes the main objective of the PR, which is to align parser error messages with CPython across multiple files and contexts.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
crates/codegen/src/compile.rs (1)

4718-4729: ⚡ Quick win

Extract duplicated type parameter validation into a helper method.

The type parameter __debug__ validation logic is duplicated identically in both function and class compilation paths. Extract this into a shared helper to improve maintainability and prevent future divergence.

♻️ Proposed refactor to eliminate duplication

Add a helper method to the compiler struct:

fn validate_type_params_no_debug(&self, type_params: Option<&ast::TypeParams>) -> CompileResult<()> {
    if let Some(params) = type_params {
        for tp in &params.type_params {
            let tp_name = match tp {
                ast::TypeParam::TypeVar(t) => &t.name,
                ast::TypeParam::TypeVarTuple(t) => &t.name,
                ast::TypeParam::ParamSpec(t) => &t.name,
            };
            if tp_name.as_str() == "__debug__" {
                return Err(self.error(CodegenErrorType::Assign("__debug__")));
            }
        }
    }
    Ok(())
}

Then replace the duplicated blocks with a single call in each location:

-        // Reject `def f[__debug__](): ...` type parameter (mirrors class defs).
-        if let Some(params) = type_params {
-            for tp in &params.type_params {
-                let tp_name = match tp {
-                    ast::TypeParam::TypeVar(t) => &t.name,
-                    ast::TypeParam::TypeVarTuple(t) => &t.name,
-                    ast::TypeParam::ParamSpec(t) => &t.name,
-                };
-                if tp_name.as_str() == "__debug__" {
-                    return Err(self.error(CodegenErrorType::Assign("__debug__")));
-                }
-            }
-        }
+        // Reject `def f[__debug__](): ...` type parameter (mirrors class defs).
+        self.validate_type_params_no_debug(type_params)?;

Apply the same simplification in the class compilation path (lines 5307-5318).

Also applies to: 5307-5318

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/codegen/src/compile.rs` around lines 4718 - 4729, Extract the
duplicated "__debug__" type-parameter check into a new helper on the compiler
impl (e.g. fn validate_type_params_no_debug(&self, type_params:
Option<&ast::TypeParams>) -> CompileResult<()>) that iterates
params.type_params, extracts the name from ast::TypeParam variants, and returns
Err(self.error(CodegenErrorType::Assign("__debug__"))) if a name equals
"__debug__", otherwise Ok(()). Replace the duplicated validation blocks in the
function compilation path and the class compilation path with a single call to
this helper, keeping existing types (ast::TypeParams, CompileResult) and error
construction unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/compiler/src/lib.rs`:
- Around line 986-1025: The helper parenthesized_param_message currently only
inspects the current physical line, missing multiline cases; update it to scan
backwards from line_start through previous lines to find a preceding "def " or
"lambda" token and then evaluate parenthesis depth across that full span.
Specifically, for "def " (use def_idx search but on the concatenated slice from
the found def position to the error start) walk characters from that def
position counting '(' and ')' to determine if the error position is inside the
function parameter list (return the CPython message if inside), and for "lambda"
search backwards similarly to find a lambda token without a ':' between it and
the error (consider spanning newlines). Keep the existing return messages and
reuse parenthesized_param_message, but replace single-line checks with these
backward-scanning, depth-aware checks so multiline parameter lists are detected.
- Around line 1397-1419: chunk_has_bare_assignment currently treats any
alphanumeric LHS as an identifier; update it to reject literals/keywords by
first checking the trimmed lhs start and exact content: ensure lhs is non-empty
and its first byte is ASCII alphabetic or b'_' (reject if it starts with a
digit, quote, '(' '[' '{', etc.), then ensure the whole lhs matches the
identifier shape (bytes().all(...) as you already do) and is not one of the
Python literal/keyword tokens like "True", "False", or "None" (use a small
static set and compare lhs.eq_ignore_ascii_case or exact match as appropriate).
Return false for those cases so chunk_has_bare_assignment only returns true for
real identifiers.
- Around line 1682-1698: The current is_in_case_pattern only checks the current
line for a "case " header so it misses multi-line case headers; change the logic
to scan backwards from range.start() to find the nearest preceding non-empty
line whose trimmed start begins with "case " (use the existing start and
line_start variables as anchors) and treat the location as being in a case
pattern if such a "case " header exists with indentation (case_indent) less than
the current line and there is no '=' token between the header start and the
range.start(); update the code around is_in_case_pattern, keeping references to
start, line_start, rest, and case_indent, to iterate previous lines (instead of
only checking source[line_start..]) and ensure multiline headers are recognized
before falling back to the match-search logic.

---

Nitpick comments:
In `@crates/codegen/src/compile.rs`:
- Around line 4718-4729: Extract the duplicated "__debug__" type-parameter check
into a new helper on the compiler impl (e.g. fn
validate_type_params_no_debug(&self, type_params: Option<&ast::TypeParams>) ->
CompileResult<()>) that iterates params.type_params, extracts the name from
ast::TypeParam variants, and returns
Err(self.error(CodegenErrorType::Assign("__debug__"))) if a name equals
"__debug__", otherwise Ok(()). Replace the duplicated validation blocks in the
function compilation path and the class compilation path with a single call to
this helper, keeping existing types (ast::TypeParams, CompileResult) and error
construction unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: c64ba8a6-f09d-45ff-8156-2a8997556f5b

📥 Commits

Reviewing files that changed from the base of the PR and between 1385c4e and 4383c21.

⛔ Files ignored due to path filters (4)
  • Lib/test/test_genexps.py is excluded by !Lib/**
  • Lib/test/test_named_expressions.py is excluded by !Lib/**
  • Lib/test/test_patma.py is excluded by !Lib/**
  • Lib/test/test_syntax.py is excluded by !Lib/**
📒 Files selected for processing (5)
  • crates/codegen/src/compile.rs
  • crates/codegen/src/symboltable.rs
  • crates/compiler/src/lib.rs
  • crates/vm/src/stdlib/_ast.rs
  • crates/vm/src/vm/vm_new.rs

Comment thread crates/compiler/src/lib.rs
Comment thread crates/compiler/src/lib.rs Outdated
Comment thread crates/compiler/src/lib.rs Outdated
@JamesClarke7283 JamesClarke7283 force-pushed the align-aug-assign-delete-set-dict-errors branch from 4383c21 to a84b1a4 Compare May 29, 2026 09:47
@JamesClarke7283
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

The duplicated __debug__ type-parameter validation flagged in the nitpick (crates/codegen/src/compile.rs) has been extracted into a single validate_type_params_no_debug helper used by both the function and class compilation paths, in commit d23ca7e. The earlier reviews were on 4383c2195/a84b1a430, both before that fix.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

@JamesClarke7283 Thanks for the update — good to know the validate_type_params_no_debug refactor has been applied. I'll kick off a fresh review of the current state of the PR.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@ShaharNaveh ShaharNaveh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tysm for working on this!

On one hand I can't argue with the value this brings, and on the other it feels a bit ridiculous because we are just reimplementing the lexer & parser atp.

I'll wait for @youknowone inputs

Comment thread crates/compiler/src/lib.rs Outdated
Comment thread crates/compiler/src/lib.rs Outdated
Comment thread crates/compiler/src/lib.rs
Comment thread crates/vm/src/stdlib/builtins.rs Outdated
Comment thread crates/vm/src/vm/vm_new.rs
@JamesClarke7283 JamesClarke7283 force-pushed the align-aug-assign-delete-set-dict-errors branch 2 times, most recently from 05843ed to 19195ed Compare May 30, 2026 13:41
@JamesClarke7283
Copy link
Copy Markdown
Contributor Author

@ShaharNaveh @youknowone is this okay to merge?

JamesClarke7283 and others added 7 commits June 3, 2026 18:51
Continues RustPython#7928/RustPython#7933/RustPython#7988. Translates many more ruff ParseErrorType
variants to CPython's exact wording in CompileError::from_ruff_parse_error,
and routes ast.parse() / compile(PyCF_ONLY_AST) through the same path so
those messages match too (previously they leaked raw ruff strings). Adds a
few codegen/symtable checks.

Covered: aug-assign/delete/set/dict/f-string/t-string targets; "cannot use
{kind} as import target"; string-prefix incompatibility and "invalid
character 'X' (U+XXXX)"; parenthesized def/lambda params; missing
default/argument value; dict ':' / value syntax; "'elif' block follows an
'else' block"; raise-from; comprehension 'if'; ternary statement keywords;
match "case ... as <target>" -> "cannot use {kind} as pattern target" and
"case ... as _"; __debug__ as def/class/type-param/except name; "name 'x'
is nonlocal and global"; generic type-parameter wording.

Lib/test: drop the now-passing "# TODO: RUSTPYTHON; Wrong error message"
doctest markers and @expectedfailure decorators.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`eval` calls the host `print` function, so rust-lld reported it as an
undefined symbol and the wasm32-unknown-unknown build failed to link
(`kv_get`/`kv_put` are unused, so they were GC'd and did not error).

Annotate the `extern "C"` block with `#[link(wasm_import_module = "env")]`
so the linker emits the host functions as wasm imports from the `env`
module, matching the wasmer host runtime in wasm-runtime/src/main.rs.
Verified: `cargo build` (the CI "check wasm32-unknown without js" step)
now links and produces the .wasm.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
test_dictcomps.test_illegal_assignment, test_fstring.test_invalid_string_prefixes
and test_unicode_identifiers.test_invalid now pass thanks to this PR's
error-message alignment, so their @unittest.expectedFailure markers caused
"unexpected success" failures in CI. Remove the obsolete markers (same cleanup
already applied to test_syntax/test_genexps/test_named_expressions/test_patma).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The identical `__debug__` type-parameter rejection in compile_function_def
and compile_class_def is now a single shared helper, eliminating the
duplicated block (addresses a CodeRabbit review note). Behavior-preserving:
`def f[__debug__]`/`class C[__debug__]` still raise "cannot assign to
__debug__" and valid type params still compile.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Shahar Naveh <50263213+ShaharNaveh@users.noreply.github.com>
- Report "Invalid star expression" for bare leading `*` in set/dict displays and non-call parenthesised groups (`{*}`, `(*,)`)
- Collapse double-comma in dict/set/list displays (`{1:2,, 3}`, `[1,, 2]`) to "invalid syntax"
- Add is_bare_star_first_in_group helper
- Drop stale "Is this intended to be part of the string?" uppercase-message entry
The bare-star helpers only checked that the error-adjacent token was `*`,
so a binary multiply with a missing operand was mislabeled "Invalid star
expression". `{1 *}`, `(1 *)`, `f(a *)` (and `f(g(a *))`, `{(x) *}`, …) now
correctly report "invalid syntax" like CPython. Require the `*` to start
its slot — mirroring `is_invalid_star_in_subscript` — and share the check
via `slot_starts_with_bare_star`.

The dict-value arm emitted "expression expected after dictionary key and
':'" for any unparseable value; CPython reserves that message for a
genuinely empty value and says "invalid syntax" otherwise. Restrict it to
empty values via `dict_value_is_empty`, so `{1: *}`, `{1: **}`, `{1: 2*}`,
`{1: +}`, `{1: not}`, … now match CPython.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@JamesClarke7283 JamesClarke7283 force-pushed the align-aug-assign-delete-set-dict-errors branch from 19195ed to 669fc00 Compare June 3, 2026 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants