Align more error messages with CPython 3.14.5#7988
Conversation
📝 WalkthroughWalkthroughThis PR refactors RustPython's parse error handling in the compiler. Error type imports are updated to use ChangesParse Error Diagnostics
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [x] test: cpython/Lib/test/test_named_expressions.py (TODO: 7) dependencies: dependent tests: (no tests depend on named_expressions) [ ] test: cpython/Lib/test/test_syntax.py (TODO: 210) dependencies: dependent tests: (no tests depend on syntax) [ ] test: cpython/Lib/test/test_generators.py (TODO: 11) dependencies: dependent tests: (no tests depend on generator) Legend:
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
crates/compiler/src/lib.rs (1)
101-114: ⚡ Quick winExtract the repeated range-normalization block.
The
loc/end_locsetup and the "column 1 on the next line" fixup are duplicated across four arms. Pulling that into a small helper would keep the message-specific branches much smaller and reduce drift when these diagnostics change again.As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code".
Also applies to: 118-129, 161-172, 180-191
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/compiler/src/lib.rs` around lines 101 - 114, Extract the duplicated loc/end_loc setup and "column 1 on the next line" adjustment into a small helper (e.g., normalize_parse_range or compute_parse_locations) that takes the Parse Error's error.location and a reference to source_code and returns (loc, end_loc); replace each arm that currently calls source_code.source_location(..., PositionEncoding::Utf8) and performs the prev_line_end fix with a single call to that helper, then only supply the message-specific ParseErrorType (like ParseErrorType::OtherError(msg)) in each branch so the parsing arms no longer duplicate the range-normalization logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/compiler/src/lib.rs`:
- Around line 88-94: The UTF-8 column fallback uses Unicode scalar count which
underestimates byte offsets for PositionEncoding::Utf8; in the fallback inside
the block that builds end_loc (using SourceLocation and loc.character_offset)
compute line_end_col from the line's byte length (use line.len()) instead of
line.chars().count() + 1 so character_offset remains measured in UTF-8 bytes;
keep the OneIndexed::new(...) and the existing unwrap_or(loc.character_offset)
logic but pass the byte-based column value.
- Around line 61-80: The match is currently borrowing error.error (match
&error.error) but some arms return error.error, causing a move-from-borrow
compile error; fix by matching the owned value instead (e.g., destructure error
first and use match error.error { ... }) or by returning a clone of the error
variant in those arms (e.g., error.error.clone()); update the branch that
handles Lexical(LexicalErrorType::Eof) to still set is_unclosed_bracket, compute
loc/end_loc via source_code.source_location and then return either
(ParseErrorType::OtherError(msg), loc, end_loc) or the owned/cloned original
error value for the fallback arm so you no longer move out of a borrowed
reference.
---
Nitpick comments:
In `@crates/compiler/src/lib.rs`:
- Around line 101-114: Extract the duplicated loc/end_loc setup and "column 1 on
the next line" adjustment into a small helper (e.g., normalize_parse_range or
compute_parse_locations) that takes the Parse Error's error.location and a
reference to source_code and returns (loc, end_loc); replace each arm that
currently calls source_code.source_location(..., PositionEncoding::Utf8) and
performs the prev_line_end fix with a single call to that helper, then only
supply the message-specific ParseErrorType (like
ParseErrorType::OtherError(msg)) in each branch so the parsing arms no longer
duplicate the range-normalization logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 8ea2731e-a951-4835-ae3f-a69cfee1284b
⛔ Files ignored due to path filters (4)
Lib/test/test_generators.pyis excluded by!Lib/**Lib/test/test_genexps.pyis excluded by!Lib/**Lib/test/test_named_expressions.pyis excluded by!Lib/**Lib/test/test_syntax.pyis excluded by!Lib/**
📒 Files selected for processing (1)
crates/compiler/src/lib.rs
| let (error_type, location, end_location) = match &error.error { | ||
| ParseErrorType::Lexical(LexicalErrorType::Eof) => { | ||
| if let Some((bracket_char, bracket_offset)) = find_unclosed_bracket(source_text) { | ||
| let bracket_text_size = ruff_text_size::TextSize::new(bracket_offset as u32); | ||
| let loc = | ||
| source_code.source_location(bracket_text_size, PositionEncoding::Utf8); | ||
| let end_loc = SourceLocation { | ||
| line: loc.line, | ||
| character_offset: loc.character_offset.saturating_add(1), | ||
| }; | ||
| let msg = format!("'{bracket_char}' was never closed"); | ||
| is_unclosed_bracket = true; | ||
| (ParseErrorType::OtherError(msg), loc, end_loc) | ||
| } else { | ||
| let loc = | ||
| source_code.source_location(error.location.start(), PositionEncoding::Utf8); | ||
| let end_loc = | ||
| source_code.source_location(error.location.end(), PositionEncoding::Utf8); | ||
| (error.error, loc, end_loc) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the referenced ranges
sed -n '40,120p' crates/compiler/src/lib.rs
echo '---'
sed -n '160,220p' crates/compiler/src/lib.rs
# Locate the function that contains this match (if needed)
rg -n "from_ruff_parse_error|find_unclosed_bracket|match &error\.error" crates/compiler/src/lib.rs
# Run clippy to surface any compile-time ownership errors
cargo clippy -qRepository: RustPython/RustPython
Length of output: 151
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show the exact match block and the other referenced ranges
nl -ba crates/compiler/src/lib.rs | sed -n '45,120p'
echo '---'
nl -ba crates/compiler/src/lib.rs | sed -n '160,240p'
# Locate every occurrence of the suspected match
rg -n "match\s+&error\.error|error\.error\s*," crates/compiler/src/lib.rs
# Find the ParseErrorType definition (to see if it is Copy/Clone-only, etc.)
rg -n "enum\s+ParseErrorType" -S . | head -n 50
rg -n "struct\s+ParseError|enum\s+ParseError" -S . | head -n 50Repository: RustPython/RustPython
Length of output: 108
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Helper to print with line numbers without nl
print_lines () {
awk 'NR>='$2' && NR<='$3'{printf "%d\t%s\n", NR, $0}' "$1"
}
FILE="crates/compiler/src/lib.rs"
echo "=== match &error.error block (around line ~61-90) ==="
print_lines "$FILE" 55 110
echo
echo "=== additional referenced ranges (83-97, 179-194) ==="
print_lines "$FILE" 80 105
echo "---"
print_lines "$FILE" 170 210
echo
echo "=== occurrences of match &error.error ==="
rg -n "match\s+&error\.error" "$FILE"
echo
echo "=== any returns of error.error inside that match ==="
rg -n "match\s+&error\.error|return\s+\(error\.error|,\s*error\.error\s*," "$FILE"
echo
echo "=== what type is error (search for ParseError construction/fields) ==="
rg -n "ParseError|parse error|error\.error|struct\s+ParseError|enum\s+ParseError" "$FILE"Repository: RustPython/RustPython
Length of output: 8431
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Try to find the ruff_python_parser source in the local cargo registry
RUST_REG_SRC="${CARGO_HOME:-$HOME/.cargo}/registry/src"
if [ ! -d "$RUST_REG_SRC" ]; then
echo "No cargo registry source dir at: $RUST_REG_SRC" >&2
exit 0
fi
# Search for ParseErrorType definition and the OtherError variant
rg -n "pub\s+enum\s+ParseErrorType" "$RUST_REG_SRC" -S --no-mmap | head -n 20 || true
rg -n "OtherError" "$RUST_REG_SRC" -S --no-mmap | rg -n "ParseErrorType" | head -n 50 || true
# Also search for an impl Copy for ParseErrorType (should be absent if it contains String)
rg -n "impl\s+.*Copy\s+for\s+ParseErrorType" "$RUST_REG_SRC" -S --no-mmap | head -n 20 || trueRepository: RustPython/RustPython
Length of output: 10058
🏁 Script executed:
#!/bin/bash
set -euo pipefail
RUST_REG_SRC="${CARGO_HOME:-$HOME/.cargo}/registry/src"
# Find and print the ParseErrorType definition block
def_file="$(rg -l --no-mmap "enum\s+ParseErrorType" "$RUST_REG_SRC" | rg "rustpython-ruff_python_parser-" | head -n 1)"
echo "ParseErrorType definition file: $def_file"
if [ -z "${def_file:-}" ]; then
echo "ParseErrorType definition file not found" >&2
exit 0
fi
# Print around the enum definition
line="$(rg -n --no-mmap "enum\s+ParseErrorType" "$def_file" | head -n 1 | cut -d: -f1)"
start=$((line-10))
end=$((line+60))
echo "--- definition context ($start-$end) ---"
sed -n "${start},${end}p" "$def_file"
echo
# Check for Copy/Clone derives/impls
rg -n --no-mmap "derive\([^\)]*\bCopy\b" "$def_file" || true
rg -n --no-mmap "impl\s+.*Copy\s+for\s+ParseErrorType" "$RUST_REG_SRC" | head -n 20 || true
echo
# Print the ParseError struct definition (to confirm field names/types)
parse_def_file="$(rg -l --no-mmap "struct\s+ParseError" "$RUST_REG_SRC" | rg "rustpython-ruff_python_parser-" | head -n 1)"
echo "ParseError definition file: $parse_def_file"
parse_line="$(rg -n --no-mmap "struct\s+ParseError" "$parse_def_file" | head -n 1 | cut -d: -f1)"
start2=$((parse_line-10))
end2=$((parse_line+60))
sed -n "${start2},${end2}p" "$parse_def_file"Repository: RustPython/RustPython
Length of output: 5621
Fix move-out-of-borrowed error.error in match &error.error
ParseErrorType is not Copy (e.g., OtherError(String)), so the arms that return error.error while matching on &error.error will fail to compile (“cannot move out of borrowed content”). Match on the owned error.error (destructure error first, then match error.error { ... other => (other, loc, end_loc) }) or return error.error.clone() in those arms.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/compiler/src/lib.rs` around lines 61 - 80, The match is currently
borrowing error.error (match &error.error) but some arms return error.error,
causing a move-from-borrow compile error; fix by matching the owned value
instead (e.g., destructure error first and use match error.error { ... }) or by
returning a clone of the error variant in those arms (e.g.,
error.error.clone()); update the branch that handles
Lexical(LexicalErrorType::Eof) to still set is_unclosed_bracket, compute
loc/end_loc via source_code.source_location and then return either
(ParseErrorType::OtherError(msg), loc, end_loc) or the owned/cloned original
error value for the fallback arm so you no longer move out of a borrowed
reference.
| let line_idx = loc.line.to_zero_indexed(); | ||
| let line = source_text.split('\n').nth(line_idx).unwrap_or(""); | ||
| let line_end_col = line.chars().count() + 1; // 1-indexed, past last char | ||
| let end_loc = SourceLocation { | ||
| line: loc.line, | ||
| character_offset: loc.character_offset.saturating_add(1), | ||
| character_offset: ruff_source_file::OneIndexed::new(line_end_col) | ||
| .unwrap_or(loc.character_offset), |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "line_end_col|PositionEncoding::Utf8|SourceLocation|character_offset|OneIndexed" crates/compiler/src/lib.rs
echo "----"
sed -n '60,120p' crates/compiler/src/lib.rs
echo "----"
# Find the definition/usage expectations for these types
rg -n "enum PositionEncoding|struct OneIndexed|OneIndexed<|type OneIndexed|PositionEncoding::Utf8" crates -S
echo "----"
# Search for similar fallback logic elsewhere in the repo
rg -n "chars\\(\\)\\.count\\(\\) \\+ 1|line\\.chars\\(\\)\\.count\\(\\)\\s*\\+\\s*1|PositionEncoding::Utf8.*chars\\(\\)\\.count" crates -SRepository: RustPython/RustPython
Length of output: 11305
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Look for the crate in the workspace/repo
fd -t f "lib.rs" crates -H | rg -n "ruff_source_file" || true
fd -t f "*ruff_source_file*" . || true
# Search for the actual definition of PositionEncoding / OneIndexed in the repo
rg -n "enum PositionEncoding|struct OneIndexed|OneIndexed::new|PositionEncoding::Utf8" . || true
# If definitions aren't in-repo, find the dependency version to use for targeted source lookup
rg -n "ruff_source_file" Cargo.toml Cargo.lock . || trueRepository: RustPython/RustPython
Length of output: 8598
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Try to locate the dependency source in the local cargo registry
echo "Searching local cargo registry for PositionEncoding..."
rg -n "enum PositionEncoding" ~/.cargo/registry/src 2>/dev/null || true
rg -n "PositionEncoding::Utf8" ~/.cargo/registry/src 2>/dev/null || true
# Print candidate crate directories if found
ls -R ~/.cargo/registry/src 2>/dev/null | rg -n "rustpython-ruff_source_file|ruff_source_file" || trueRepository: RustPython/RustPython
Length of output: 120
🏁 Script executed:
#!/bin/bash
set -euo pipefail
ls -la ~
echo "----"
ls -la ~/.cargo || true
echo "----"
ls -la ~/.cargo/registry || true
echo "----"
ls -la ~/.cargo/registry/src || true
# If present, search for the crate directory name
if [ -d ~/.cargo/registry/src ]; then
fd -t d rustpython-ruff_source_file ~/.cargo/registry/src || true
fd -t d ruff_source_file ~/.cargo/registry/src || true
fiRepository: RustPython/RustPython
Length of output: 973
🌐 Web query:
ruff_source_file PositionEncoding Utf8 meaning character_offset units
💡 Result:
In the context of the Ruff source code and its PositionEncoding implementation, the term character_offset represents the position of a character within a specific line, and its numerical unit is determined by the PositionEncoding variant selected [1][2]. Ruff defines three primary encoding variants in its PositionEncoding enum to handle how character offsets are measured, particularly for compatibility with Language Server Protocol (LSP) requirements and non-BMP (Basic Multilingual Plane) characters [1][2]: - Utf8: character_offset is measured in bytes from the start of the line. - Utf16: character_offset is measured in UTF-16 code units (16-bit units), which is common for LSP clients like VS Code or Neovim [3][1][2]. - Utf32: character_offset is measured in Unicode scalar values (essentially counting the number of actual characters/code points) [1][2]. When Ruff performs operations such as mapping between internal TextSize (which are byte-based) and SourceLocation (which uses a character_offset to specify column position), it uses the chosen PositionEncoding to correctly calculate the offset [4][1][5]. This mechanism ensures that diagnostics, quick-fixes, and editor markers remain accurate regardless of whether a file contains multi-byte characters or non-BMP symbols (like emojis) that would otherwise cause misalignment if only simple byte-counting were used [1][5].
Citations:
- 1: astral-sh/ruff@3e1e02e
- 2: https://github.com/astral-sh/ruff/blob/main/crates/ruff_wasm/src/lib.rs
- 3: Can
offset_encodingfor Ruff be configured in Neovim? astral-sh/ruff#14483 - 4: Split
SourceLocationintoLineColumnandSourceLocationastral-sh/ruff#17587 - 5: astral-sh/ruff@b2a8c42
Fix UTF-8 column fallback for IndentationError
PositionEncoding::Utf8 uses character_offset measured in UTF-8 bytes, but this fallback computes the column via line.chars().count() + 1 (Unicode scalars). That under-reports the offset on non-ASCII lines. Use the line’s byte length (line.len()) to keep the offset consistent.
Suggested fix
- let line_end_col = line.chars().count() + 1; // 1-indexed, past last char
+ let line_end_col = line.len() + 1; // UTF-8 bytes, 1-indexed, past last char📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let line_idx = loc.line.to_zero_indexed(); | |
| let line = source_text.split('\n').nth(line_idx).unwrap_or(""); | |
| let line_end_col = line.chars().count() + 1; // 1-indexed, past last char | |
| let end_loc = SourceLocation { | |
| line: loc.line, | |
| character_offset: loc.character_offset.saturating_add(1), | |
| character_offset: ruff_source_file::OneIndexed::new(line_end_col) | |
| .unwrap_or(loc.character_offset), | |
| let line_idx = loc.line.to_zero_indexed(); | |
| let line = source_text.split('\n').nth(line_idx).unwrap_or(""); | |
| let line_end_col = line.len() + 1; // UTF-8 bytes, 1-indexed, past last char | |
| let end_loc = SourceLocation { | |
| line: loc.line, | |
| character_offset: ruff_source_file::OneIndexed::new(line_end_col) | |
| .unwrap_or(loc.character_offset), |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/compiler/src/lib.rs` around lines 88 - 94, The UTF-8 column fallback
uses Unicode scalar count which underestimates byte offsets for
PositionEncoding::Utf8; in the fallback inside the block that builds end_loc
(using SourceLocation and loc.character_offset) compute line_end_col from the
line's byte length (use line.len()) instead of line.chars().count() + 1 so
character_offset remains measured in UTF-8 bytes; keep the OneIndexed::new(...)
and the existing unwrap_or(loc.character_offset) logic but pass the byte-based
column value.
Continues RustPython#7928/RustPython#7933/RustPython#7988. Translates many more ruff ParseErrorType variants to CPython's exact wording in CompileError::from_ruff_parse_error, and routes ast.parse() / compile(PyCF_ONLY_AST) through the same path so those messages match too (previously they leaked raw ruff strings). Adds a few codegen/symtable checks. Covered: aug-assign/delete/set/dict/f-string/t-string targets; "cannot use {kind} as import target"; string-prefix incompatibility and "invalid character 'X' (U+XXXX)"; parenthesized def/lambda params; missing default/argument value; dict ':' / value syntax; "'elif' block follows an 'else' block"; raise-from; comprehension 'if'; ternary statement keywords; match "case ... as <target>" -> "cannot use {kind} as pattern target" and "case ... as _"; __debug__ as def/class/type-param/except name; "name 'x' is nonlocal and global"; generic type-parameter wording. Lib/test: drop the now-passing "# TODO: RUSTPYTHON; Wrong error message" doctest markers and @expectedfailure decorators. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Continues RustPython#7928/RustPython#7933/RustPython#7988. Translates many more ruff ParseErrorType variants to CPython's exact wording in CompileError::from_ruff_parse_error, and routes ast.parse() / compile(PyCF_ONLY_AST) through the same path so those messages match too (previously they leaked raw ruff strings). Adds a few codegen/symtable checks. Covered: aug-assign/delete/set/dict/f-string/t-string targets; "cannot use {kind} as import target"; string-prefix incompatibility and "invalid character 'X' (U+XXXX)"; parenthesized def/lambda params; missing default/argument value; dict ':' / value syntax; "'elif' block follows an 'else' block"; raise-from; comprehension 'if'; ternary statement keywords; match "case ... as <target>" -> "cannot use {kind} as pattern target" and "case ... as _"; __debug__ as def/class/type-param/except name; "name 'x' is nonlocal and global"; generic type-parameter wording. Lib/test: drop the now-passing "# TODO: RUSTPYTHON; Wrong error message" doctest markers and @expectedfailure decorators. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Continues RustPython#7928/RustPython#7933/RustPython#7988. Translates many more ruff ParseErrorType variants to CPython's exact wording in CompileError::from_ruff_parse_error, and routes ast.parse() / compile(PyCF_ONLY_AST) through the same path so those messages match too (previously they leaked raw ruff strings). Adds a few codegen/symtable checks. Covered: aug-assign/delete/set/dict/f-string/t-string targets; "cannot use {kind} as import target"; string-prefix incompatibility and "invalid character 'X' (U+XXXX)"; parenthesized def/lambda params; missing default/argument value; dict ':' / value syntax; "'elif' block follows an 'else' block"; raise-from; comprehension 'if'; ternary statement keywords; match "case ... as <target>" -> "cannot use {kind} as pattern target" and "case ... as _"; __debug__ as def/class/type-param/except name; "name 'x' is nonlocal and global"; generic type-parameter wording. Lib/test: drop the now-passing "# TODO: RUSTPYTHON; Wrong error message" doctest markers and @expectedfailure decorators. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Continues RustPython#7928/RustPython#7933/RustPython#7988. Translates many more ruff ParseErrorType variants to CPython's exact wording in CompileError::from_ruff_parse_error, and routes ast.parse() / compile(PyCF_ONLY_AST) through the same path so those messages match too (previously they leaked raw ruff strings). Adds a few codegen/symtable checks. Covered: aug-assign/delete/set/dict/f-string/t-string targets; "cannot use {kind} as import target"; string-prefix incompatibility and "invalid character 'X' (U+XXXX)"; parenthesized def/lambda params; missing default/argument value; dict ':' / value syntax; "'elif' block follows an 'else' block"; raise-from; comprehension 'if'; ternary statement keywords; match "case ... as <target>" -> "cannot use {kind} as pattern target" and "case ... as _"; __debug__ as def/class/type-param/except name; "name 'x' is nonlocal and global"; generic type-parameter wording. Lib/test: drop the now-passing "# TODO: RUSTPYTHON; Wrong error message" doctest markers and @expectedfailure decorators. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Continues RustPython#7928/RustPython#7933/RustPython#7988. Translates many more ruff ParseErrorType variants to CPython's exact wording in CompileError::from_ruff_parse_error, and routes ast.parse() / compile(PyCF_ONLY_AST) through the same path so those messages match too (previously they leaked raw ruff strings). Adds a few codegen/symtable checks. Covered: aug-assign/delete/set/dict/f-string/t-string targets; "cannot use {kind} as import target"; string-prefix incompatibility and "invalid character 'X' (U+XXXX)"; parenthesized def/lambda params; missing default/argument value; dict ':' / value syntax; "'elif' block follows an 'else' block"; raise-from; comprehension 'if'; ternary statement keywords; match "case ... as <target>" -> "cannot use {kind} as pattern target" and "case ... as _"; __debug__ as def/class/type-param/except name; "name 'x' is nonlocal and global"; generic type-parameter wording. Lib/test: drop the now-passing "# TODO: RUSTPYTHON; Wrong error message" doctest markers and @expectedfailure decorators. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary by CodeRabbit
Release Notes