Skip to content

Add _zstd stdlib module backing compression.zstd#7962

Open
JamesClarke7283 wants to merge 12 commits into
RustPython:mainfrom
JamesClarke7283:zstd
Open

Add _zstd stdlib module backing compression.zstd#7962
JamesClarke7283 wants to merge 12 commits into
RustPython:mainfrom
JamesClarke7283:zstd

Conversation

@JamesClarke7283
Copy link
Copy Markdown
Contributor

@JamesClarke7283 JamesClarke7283 commented May 23, 2026

Summary

  • Add new _zstd stdlib module (crates/stdlib/src/zstd.rs) implementing ZstdCompressor, ZstdDecompressor, ZstdDict, and the compression/decompression parameter constants that back the pure-Python compression.zstd package.
  • Wire up zstd and zstd-safe (with the zdict_builder feature) as workspace dependencies and register the module in crates/stdlib/src/lib.rs, gated off for android and wasm32 targets.
  • Convert test_zstd_multithread_compress from a skipIf to an expectedFailureIf when libzstd was built without multi-threading support, so the case is tracked rather than silently skipped.

Testing

  • cargo build -p rustpython-stdlib — Not run
  • cargo run -- -m test test_zstd — Not run

Summary by CodeRabbit

  • New Features
    • zstd compression support: streaming compressor and single-frame decompressor (available on non-Android, non‑WASM platforms).
    • Dictionary tooling: train, finalize, and use dictionaries via a ZstdDict wrapper (raw and validated modes).
    • Frame inspection and parameter introspection/configuration utilities.
    • Exposed zstd version info, constants, and a ZstdError for error handling.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 23, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: c14a3250-e586-4a27-b00a-030f2c606288

📥 Commits

Reviewing files that changed from the base of the PR and between 4cc65f6 and d757969.

📒 Files selected for processing (1)
  • crates/stdlib/src/zstd.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/stdlib/src/zstd.rs

📝 Walkthrough

Walkthrough

Adds zstd workspace dependencies, wires a conditional stdlib _zstd module, and implements the _zstd API: constants, ZstdError, parameter validation, ZstdDict, streaming ZstdCompressor/ZstdDecompressor, and utilities for frames and dictionary training/finalization.

Changes

Zstd Extension Module

Layer / File(s) Summary
Dependencies and module wiring
Cargo.toml, crates/stdlib/Cargo.toml, crates/stdlib/src/lib.rs
Workspace declares zstd v0.13 and zstd-safe v7 (workspace deps) with zdict_builder feature and disabled defaults; stdlib conditionally depends on them for non-Android/non-WASM32 and registers the zstd module.
Module initialization, constants, and error handling
crates/stdlib/src/zstd.rs (lines 1–172)
Extension module exports parameter IDs, strategy enums, version info and ZstdError, and includes libzstd error normalization helpers.
Parameter validation and conversion
crates/stdlib/src/zstd.rs (lines 179–443, 1582–1591)
Converts Python ints, enforces compression/decompression enum-kind checks, maps parameter IDs to libzstd enums, validates bounds, and parses zstd_dict arguments; stores parameter-type classes for runtime checks.
ZstdDict type and loaders
crates/stdlib/src/zstd.rs (lines 449–559)
Dictionary type with raw vs non-raw constructor (header validation), computed dict_id, and as_digested_dict/as_undigested_dict/as_prefix helpers.
ZstdCompressor streaming API
crates/stdlib/src/zstd.rs (lines 565–876, 930–1065)
Streaming compressor with compress/flush, set_pledged_input_size gating, last_mode tracking, options/level validation, dictionary attachment, and streaming loop implementation.
ZstdDecompressor streaming API
crates/stdlib/src/zstd.rs (lines 1126–1333)
Single-frame decompressor with streaming loop, max_length output capping, buffering/trailing unused_data, and eof/needs_input state.
Module-level utility functions
crates/stdlib/src/zstd.rs (lines 1340–1591)
get_frame_size, get_frame_info, train_dict, finalize_dict (FFI), and parameter introspection (get_param_bounds, set_parameter_types).
sequenceDiagram
  participant Python as Python
  participant ZstdModule as _zstd
  participant ZstdCrates as zstd / zstd-safe
  Python->>ZstdModule: import compression.zstd / call API
  ZstdModule->>ZstdCrates: FFI calls (compress, decompress, dict ops)
  ZstdCrates-->>ZstdModule: results / errors
  ZstdModule-->>Python: Python objects, constants, exceptions (ZstdError)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • ShaharNaveh

Poem

🐰 I nibble bytes and stitch them tight,
Dictionaries warmed by moonlit byte—
Streams compress and frames take flight,
Errors named to show the light,
Zstd hums soft into the night.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and accurately summarizes the main change: adding a new _zstd standard-library module to back the compression.zstd namespace.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 23, 2026

📦 Library Dependencies

The following Lib/ modules were modified. Here are their dependencies:

[ ] test: cpython/Lib/test/test_zstd.py (TODO: 2)

dependencies:

dependent tests: (no tests depend on zstd)

Legend:

  • [+] path exists in CPython
  • [x] up-to-date, [ ] outdated

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/stdlib/src/zstd.rs`:
- Around line 58-60: Remove the decorative `//
=========================================================================`
separators in this file and replace them with concise, meaningful comments or
doc comments only where they add value (e.g., replace the separator above
"Module-level constants" with a short `// Module-level constants` or `///` doc
comment if documentation is needed); similarly remove/replace the separators at
the other noted locations (lines around 164-166, 475-477, 628-630, 1136-1138,
1425-1427) with short explanatory `//` comments or `///` doc-comments tied to
the related symbols/sections so the file uses informative comments instead of
decorative dividers.
- Around line 1659-1668: get_param_bounds() is incorrectly validating parameter
IDs by calling cparameter_from_int(args.parameter, 0, vm) /
dparameter_from_int(..., 0, ...) which causes strategy-specific value validation
to reject valid IDs; instead call cparameter_from_int(args.parameter, 0, vm) /
dparameter_from_int(args.parameter, 0, vm) only to retrieve the parameter enum
(rename to c_param_enum/d_param_enum), then pass that enum value (not a
transmuted u32) into the corresponding zstd_sys ZSTD_cParam_getBounds /
ZSTD_dParam_getBounds FFI functions; remove the unsafe transmute logic and also
delete the decorative "//
========================================================================="
separators in crates/stdlib/src/zstd.rs to match style guidelines.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: f23e2155-6dfa-4c29-a3e4-7664092d9a73

📥 Commits

Reviewing files that changed from the base of the PR and between d3272e7 and 5f3acc7.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • Lib/test/test_zstd.py is excluded by !Lib/**
📒 Files selected for processing (4)
  • Cargo.toml
  • crates/stdlib/Cargo.toml
  • crates/stdlib/src/lib.rs
  • crates/stdlib/src/zstd.rs

Comment thread crates/stdlib/src/zstd.rs
Comment on lines +58 to +60
// =========================================================================
// Module-level constants
// =========================================================================
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot May 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Remove decorative section-separator comments.

Please replace the // ========================================================================= separators with regular short comments or doc comments only where needed.

As per coding guidelines, "Do not add decorative section separators (e.g. // -----------, // ===, /* *** */). Use /// doc-comments or short // comments only when they add value".

Also applies to: 164-166, 475-477, 628-630, 1136-1138, 1425-1427

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/stdlib/src/zstd.rs` around lines 58 - 60, Remove the decorative `//
=========================================================================`
separators in this file and replace them with concise, meaningful comments or
doc comments only where they add value (e.g., replace the separator above
"Module-level constants" with a short `// Module-level constants` or `///` doc
comment if documentation is needed); similarly remove/replace the separators at
the other noted locations (lines around 164-166, 475-477, 628-630, 1136-1138,
1425-1427) with short explanatory `//` comments or `///` doc-comments tied to
the related symbols/sections so the file uses informative comments instead of
decorative dividers.

✅ Addressed in commit 0bc36b6

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up: this thread was marked resolved earlier but the decorative // ===... separators were still in the file. They have now been removed (seven section dividers around Module-level constants, ZstdError exception, Parameter helpers, ZstdDict, ZstdCompressor, ZstdDecompressor, Module-level functions). The struct/fn definitions are self-describing without them, and cargo test is still green (119/119 in test_zstd).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JamesClarke7283, thanks for the confirmation and for verifying with cargo test — all seven separators removed and tests still passing sounds great!


If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're still there.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment thread crates/stdlib/src/zstd.rs Outdated
Comment thread crates/stdlib/src/zstd.rs Outdated
Comment thread crates/stdlib/src/zstd.rs Outdated
@ShaharNaveh
Copy link
Copy Markdown
Contributor

And tysm for working on this!

@ShaharNaveh
Copy link
Copy Markdown
Contributor

@JamesClarke7283 please pull from main, we've fixed the CI issue

Comment thread crates/stdlib/src/zstd.rs Outdated
Comment thread crates/stdlib/src/zstd.rs Outdated
Comment thread crates/stdlib/src/zstd.rs Outdated
JamesClarke7283 and others added 7 commits May 25, 2026 08:41
- Implement ZstdDict, ZstdCompressor, and ZstdDecompressor on top of the zstd-safe crate
- Wire the module into stdlib_module_defs (gated off Android/wasm32)
- Mark test_zstd_multithread_compress as expected failure when libzstd lacks multi-threading support
`PyMutex` is built on `RawCellMutex` on single-threaded targets (iOS,
Android, wasm32) and is not `Sync`, so a `static PyMutex<...>` fails to
compile. `PyTypeRef` is also not `Send` by default, which compounds the
issue.

Replace the static parameter-type registry with class-name comparison
(`"CompressionParameter"` / `"DecompressionParameter"`). The two names
originate from the `compression.zstd` module we ship, so identity vs.
name comparison is equivalent in practice. `set_parameter_types` now
just validates that its arguments are type objects.

Verified locally:
- `cargo build -p rustpython-stdlib` (default features): clean.
- `cargo check -p rustpython-stdlib --no-default-features --features host_env`
  (simulates iOS by dropping the `threading` feature): clean.
- `cargo run -- -m test test_zstd`: 119 tests pass, 1 expected failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RustPython's clippy lint set rejects `std::` imports for items that also
live in `core::` or `alloc::`. Switch every such reference in `zstd.rs`:

- `std::ffi::{c_int, CStr}`        -> `core::ffi::{c_int, CStr}`
- `std::fmt::*`                    -> `core::fmt::*`
- `std::slice::from_raw_parts`     -> `core::slice::from_raw_parts`
- `std::borrow::Cow`               -> `alloc::borrow::Cow`

Also clear up other clippy findings under `-D warnings`:

- Drop the redundant `obj.clone()` on the last use of `obj` in
  `parse_zstd_dict_arg`.
- Collapse `.map(|n| n.get()).unwrap_or(0)` to `.map_or(0, |n| n.get())`
  in the two dict-id readers.
- Replace `&*work_data` with `&work_data` (auto-deref).
- Factor the `(Option<Digested>, Option<PyRef<ZstdDict>>)` return shape
  into a `DictLoadResult<D>` type alias to satisfy `type_complexity`.
- Replace the remaining `std::mem::transmute<u32, ZSTD_cParameter>` /
  `ZSTD_dParameter` in `get_param_bounds` with the existing safe
  `c_param_enum` / `d_param_enum` helpers; surfaces a clear "invalid
  parameter" `ValueError` instead of relying on UB-adjacent transmutes
  for unknown ints.

Verified locally:
- `cargo clippy -p rustpython-stdlib --no-deps -- -Dwarnings`: clean
  (only the pre-existing `socket.rs::sock_wait` warning remains).
- `cargo run -- -m test test_zstd`: 119 pass, 1 expected failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous run's `Lint` job failed with a `401 Unauthorized` from
reviewdog hitting the GitHub API (a transient CI/permissions issue,
unrelated to this PR's diff). Pushing an empty commit to re-run CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
{"subject": "Format zstd module and extend spell-check ignores", "body": "- Apply rustfmt formatting to zstd.rs\n- Add additional spell-checker ignore entries for zstd identifiers"}
Co-authored-by: Shahar Naveh <50263213+ShaharNaveh@users.noreply.github.com>
{"subject": "Remove rustdoc comments from zstd module", "body": "- Strip doc comments from internal items in _zstd module\n- Convert a few API-doc comments into regular implementation notes where appropriate"}
Comment thread crates/stdlib/src/zstd.rs Outdated
Comment thread crates/stdlib/src/zstd.rs
{"subject": "Mark zstd load_dict as unsafe and document invariant", "body": "- Add # Safety docs requiring PyRef<ZstdDict> to outlive the context\n- Wrap calls in load_compressor_dict and load_decompressor_dict with SAFETY comments"}
{
  "subject": "refactor(zstd): release GIL during (de)compress and tighten dict-load safety",
  "body": "- Wrap compress/decompress loops in vm.allow_threads to release the GIL\n- Replace load_*_dict helpers with build_*_state that assemble the full state, making the load_dict safety invariants structural\n- Switch constructor args from OptionalArg to OptionalOption and use try_to_value instead of the pyobj_to_i32/arg_or_none helpers\n- Add check_sample_sizes_match with overflow-safe summing for train_dict/finalize_dict"
}
@JamesClarke7283
Copy link
Copy Markdown
Contributor Author

@youknowone can this be merged?

Comment thread crates/stdlib/src/zstd.rs
Comment thread crates/stdlib/src/zstd.rs
@JamesClarke7283 JamesClarke7283 mentioned this pull request May 28, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
crates/stdlib/src/zstd.rs (1)

52-54: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Decorative section separators still present.

The // ===... separators flagged in a previous review are still in the file (lines 52-54, 151-153, 176-178, 454-456, 570-572, 1081-1083, 1335-1337). Please remove them.

As per coding guidelines, "Do not add decorative section separators (e.g. // -----------, // ===, /* *** */). Use /// doc-comments or short // comments only when they add value."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/stdlib/src/zstd.rs` around lines 52 - 54, Remove the decorative
section separators (the repeated // ===... lines) found in
crates/stdlib/src/zstd.rs (e.g., around the "Module-level constants" comment and
the other flagged locations)—delete those purely decorative comment blocks and,
where a section boundary is helpful, replace them with a brief single-line
comment or an appropriate /// doc-comment tied to the nearby item (e.g., the
constants block or the functions/structs) so the file no longer contains
decorative separators but still documents sections meaningfully.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@crates/stdlib/src/zstd.rs`:
- Around line 52-54: Remove the decorative section separators (the repeated //
===... lines) found in crates/stdlib/src/zstd.rs (e.g., around the "Module-level
constants" comment and the other flagged locations)—delete those purely
decorative comment blocks and, where a section boundary is helpful, replace them
with a brief single-line comment or an appropriate /// doc-comment tied to the
nearby item (e.g., the constants block or the functions/structs) so the file no
longer contains decorative separators but still documents sections meaningfully.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 8b515ccc-727b-4221-b11d-b43ee2f1028e

📥 Commits

Reviewing files that changed from the base of the PR and between fc607c4 and 1b0b077.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • Lib/test/test_zstd.py is excluded by !Lib/**
📒 Files selected for processing (4)
  • Cargo.toml
  • crates/stdlib/Cargo.toml
  • crates/stdlib/src/lib.rs
  • crates/stdlib/src/zstd.rs
✅ Files skipped from review due to trivial changes (1)
  • Cargo.toml
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/stdlib/Cargo.toml
  • crates/stdlib/src/lib.rs

Comment thread crates/stdlib/src/zstd.rs Outdated
@joshuamegnauth54
Copy link
Copy Markdown
Contributor

Relevant for this patch (or future updates): https://trifectatech.org/blog/announcing-zstandard-in-rust/

RustPython is already using bzip2 and zlib from Trifecta.

JamesClarke7283 and others added 3 commits June 3, 2026 13:53
- Have set_parameter_types stash the CompressionParameter/DecompressionParameter classes as private _zstd module attributes instead of validating and discarding them
- check_wrong_param_kind now compares key classes by identity against the registered type, matching CPython's Py_TYPE check, and skips when unregistered
- Error message names the actual key type as an attribute
- Raise TypeError (not RuntimeError), matching CPython, when both
  `level` and `options` are passed to ZstdCompressor
- Parse set_pledged_input_size's argument before taking the state lock
  so a re-entrant __index__ can't deadlock, and fix the off-by-one in
  its bound message to match CPython

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants