Use icu4x for UTF-8 float(); remove an alloc#7998
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (5)
💤 Files with no reviewable changes (1)
🚧 Files skipped from review as they are similar to previous changes (4)
📝 WalkthroughWalkthroughIntegrates icu_decimal for UTF‑8 decimal parsing in float conversion, adds SliceFmtWriter for bounded formatting into a byte buffer, and removes the old char_to_decimal lookup and export. ChangesICU Decimal Parsing Integration
Sequence Diagram(s)sequenceDiagram
participant FloatStr as float_from_string
participant Decimal as icu_decimal::Decimal
participant Writer as SliceFmtWriter
participant FloatParse as std::parse
FloatStr->>Decimal: try_from_str(trimmed_utf8)
alt parse succeeds
Decimal-->>FloatStr: decimal object
FloatStr->>Writer: write_str(formatted)
Writer-->>FloatStr: formatted buffer
FloatStr->>FloatParse: parse buffer
else parse fails
Decimal-->>FloatStr: error
FloatStr->>FloatParse: parse original bytes
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/vm/src/builtins/float.rs`:
- Around line 219-223: The ICU Decimal branch currently ignores
decimal.write_to's Result and always passes the full mapped_str_buf (including
unwritten trailing bytes) into parse_bytes; change the code in the
Decimal::try_from_str branch to capture the result of decimal.write_to(&mut
writer), handle/propagate any Err from write_to, read the actual number of bytes
written from the SliceFmtWriter (e.g., writer.written() or expose a written
field), and pass only &mapped_str_buf[..written] to
crate::literal::float::parse_bytes so trailing zero bytes are not fed into
lexical-core parsing.
In `@crates/vm/src/utils.rs`:
- Around line 113-118: SliceFmtWriter::write_str is using remainder as an index
and never advancing self.written; change it to treat remainder as the remaining
capacity, compute let to_write = remainder.min(s.len()), copy
s.as_bytes()[..to_write] into self.buf[self.written..self.written + to_write],
then increment self.written by to_write; return Err(fmt::Error) if to_write <
s.len(), otherwise Ok(()). This fixes bounds, limits the copied bytes to
available space, and advances the writer state (refer to the write_str method,
self.buf, self.written, and remainder()).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 52267e3c-a8da-4d0e-b64e-0c094ff82b04
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
Cargo.tomlcrates/common/src/str.rscrates/vm/Cargo.tomlcrates/vm/src/builtins/float.rscrates/vm/src/utils.rs
💤 Files with no reviewable changes (1)
- crates/common/src/str.rs
95c9965 to
0c3a9c5
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
crates/vm/src/utils.rs (1)
115-117:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winFix source slice bounds in partial copy logic.
On Line 116,
&s.as_bytes()[to_copy..]is the wrong half of the input and can triggercopy_from_slicelength mismatch panics when truncating. Use the prefix slice instead.Suggested fix
- self.buf[self.written..self.written + to_copy].copy_from_slice(&s.as_bytes()[to_copy..]); + self.buf[self.written..self.written + to_copy].copy_from_slice(&s.as_bytes()[..to_copy]);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/vm/src/utils.rs` around lines 115 - 117, The partial-copy currently takes the wrong suffix slice (&s.as_bytes()[to_copy..]) causing length mismatches; change the source to the prefix slice of length to_copy (use &s.as_bytes()[..to_copy]) when calling copy_from_slice so the copied length equals to_copy, and keep updating self.written as before in the function that computes to_copy via self.remainder().
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@crates/vm/src/utils.rs`:
- Around line 115-117: The partial-copy currently takes the wrong suffix slice
(&s.as_bytes()[to_copy..]) causing length mismatches; change the source to the
prefix slice of length to_copy (use &s.as_bytes()[..to_copy]) when calling
copy_from_slice so the copied length equals to_copy, and keep updating
self.written as before in the function that computes to_copy via
self.remainder().
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: f43ddc22-cecd-46a4-bdbb-7b876f4871d7
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
Cargo.tomlcrates/common/src/str.rscrates/vm/Cargo.tomlcrates/vm/src/builtins/float.rscrates/vm/src/utils.rs
💤 Files with no reviewable changes (1)
- crates/common/src/str.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- crates/vm/Cargo.toml
- Cargo.toml
- crates/vm/src/builtins/float.rs
0c3a9c5 to
a2b5735
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
crates/vm/src/utils.rs (1)
117-127:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winSource slice index is inverted — will panic on every call.
Line 120 uses
&s.as_bytes()[to_copy..]which yields the bytes afterto_copy, with lengths.len() - to_copy. The destination slice has lengthto_copy. This mismatch causescopy_from_sliceto panic in nearly all cases (e.g.,s = "hello",to_copy = 5→ source is empty, destination is 5 bytes).The slice should be
[..to_copy](the firstto_copybytes).Proposed fix
impl fmt::Write for SliceFmtWriter<'_> { fn write_str(&mut self, s: &str) -> fmt::Result { let to_copy = self.remainder().min(s.len()); - self.buf[self.written..self.written + to_copy].copy_from_slice(&s.as_bytes()[to_copy..]); + self.buf[self.written..self.written + to_copy].copy_from_slice(&s.as_bytes()[..to_copy]); self.written += to_copy; if to_copy == s.len() { Ok(()) } else { Err(fmt::Error) } } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/vm/src/utils.rs` around lines 117 - 127, The write_str implementation in impl fmt::Write for SliceFmtWriter<'_> copies the wrong source slice (using &s.as_bytes()[to_copy..]) which produces a length mismatch and panics; change the source slice to the first to_copy bytes (&s.as_bytes()[..to_copy]) so it matches the destination length, keeping the remainder() call and updates to self.buf and self.written intact in the write_str method.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@crates/vm/src/utils.rs`:
- Around line 117-127: The write_str implementation in impl fmt::Write for
SliceFmtWriter<'_> copies the wrong source slice (using
&s.as_bytes()[to_copy..]) which produces a length mismatch and panics; change
the source slice to the first to_copy bytes (&s.as_bytes()[..to_copy]) so it
matches the destination length, keeping the remainder() call and updates to
self.buf and self.written intact in the write_str method.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 66467756-7fae-4975-92bd-e1829f2a3631
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
Cargo.tomlcrates/common/src/str.rscrates/vm/Cargo.tomlcrates/vm/src/builtins/float.rscrates/vm/src/utils.rs
💤 Files with no reviewable changes (1)
- crates/common/src/str.rs
✅ Files skipped from review due to trivial changes (1)
- Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (2)
- crates/vm/Cargo.toml
- crates/vm/src/builtins/float.rs
I removed an embedded table of non-ASCII numbers in favor of using `icu_decimal`. The benefits of using `icu4x` here are consistency plus Unicode updates. As Unicode is updated, we automatically reap the benefits without having to modify the table. `icu_decimal` is also useful beyond `float()`. I'm also using it to clean up `unicodedata` in RustPython#7947.
a2b5735 to
892b897
Compare
|
Okay, it turns out there isn't a nice way to do this without complicating the code. 😵💫 A string like: float("0" * 100000 + "1e50")Would still need to parse. |
I removed an embedded table of non-ASCII numbers in favor of using
icu_decimal. The benefits of usingicu4xhere are consistency plus Unicode updates. As Unicode is updated, we automatically reap the benefits without having to modify the table.icu_decimalis also useful beyondfloat(). I'm also using it to clean upunicodedatain #7947.Summary
icu_decimalSummary by CodeRabbit
Refactor
Chores
Performance
Breaking Changes