Align marshal and .pyc with CPython 3.14#7958
Conversation
read_marshal_bytes, _str, _str_vec, _name_tuple, and _const_tuple now take a shared ref table and resolve TYPE_REF / register FLAG_REF entries. deserialize_code is split into a public wrapper and an inner function that receives the ref table; deserialize_value_depth opens a fresh inner ref space when it hits Type::Code, mirroring CPython's behaviour of putting the code object itself at ref slot 0. Nested code objects inside const tuples reuse the surrounding code's ref space via the new read_const_value helper.
… 3.14 PYC_MAGIC_NUMBER changes from 2994 to 3627, matching CPython 3.14's pyc_magic_number_token (0x0a0d0e2b). marshal FORMAT_VERSION drops from 5 to 4 (the encoder/marshal.version value; the decoder already accepts both). check_pyc_magic_number_bytes now compares all four magic bytes instead of the first two.
Two fixture-based tests pin the marshal decoder against actual CPython 3.14 marshal.dumps() output: a trivial module that exercises FLAG_REF plus TYPE_REF for qualname, and a module with a nested function that exercises ref sharing between a const tuple and its surrounding code object.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR upgrades RustPython to support CPython 3.14 marshal format by updating the PYC magic number constant to match CPython 3.14 (3627), reducing the marshal FORMAT_VERSION to 4, and refactoring code-object deserialization to correctly share and resolve a ref-table across nested marshal values using TYPE_REF and FLAG_REF flags. Version-dependent checks in import and sys modules are updated accordingly. ChangesCPython 3.14 Marshal Codec Upgrade
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
SourceFileLoader.get_code now also looks for .pyc files using
_RP_FALLBACK_CACHE_TAGS (currently ('cpython-314',)) in addition to
sys.implementation.cache_tag. The matched .pyc is only used for
reading; recompilation still writes to the RustPython-tagged path, so
CPython's .pyc is never overwritten. Source-stat / hash / timestamp
validation logic is unchanged.
CPython's marshal supports TYPE_SLICE from format version 4 onwards and that is the default version. Rejecting slice dumps below version 5 made marshal.dumps(slice(...)) fail with the default version and broke test.test_marshal.SliceTestCase.test_slice.
Lib/importlib/_bootstrap_external.py is CPython's own code copied verbatim; local patches here defeat compatibility tracking. The cpython-XX cache_tag fallback needs to live on the RustPython side (Rust code or sys.implementation.cache_tag policy), not as edits to the imported standard library. This reverts commit 1fc426d0fb5fcdb50d35cad13bbb43e8f6ce1c7f.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/compiler-core/src/marshal.rs (1)
825-845:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDict keys now reject code objects.
This path still decodes keys by calling
deserialize_value_typed(...)directly. After Line 857,Type::Codeis only handled indeserialize_value_depth, so a dict key that marshals as a code object now fails withBadTypeeven thoughserialize_valuecan still emit it.Route pre-read dict keys through the same raw-header/code-object path as
deserialize_value_depthinstead of maintaining a second decoder here.As per coding guidelines, "When branches differ only in a value but share common logic, extract the differing value first, then call the common logic once to avoid duplicate code."
Also applies to: 857-857
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/compiler-core/src/marshal.rs` around lines 825 - 845, The dict-key branch duplicates decoding logic and calls deserialize_value_typed(...) directly, which bypasses the raw-header/code-object handling in deserialize_value_depth and causes Type::Code keys to be rejected; change the key decoding in this branch to first determine the key's type_code/header (using the same raw/FLAG_REF/ref-slot handling as shown) then invoke the common decoder path used by deserialize_value_depth (so code objects go through the same raw-header/code-object route), i.e., extract the differing value (the key_type or ref index) and call the shared decode routine rather than duplicating deserialize_value_typed logic; update use of refs, key_slot, and k to follow the same flow as deserialize_value_depth so Type::Code is accepted for dict keys.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@crates/compiler-core/src/marshal.rs`:
- Around line 472-474: The tuple deserialization resets the recursion budget and
allows nested code->const tuple->code chains to bypass limits; change
read_marshal_const_tuple (and similar helpers used in lines ~487-516) to accept
the current depth parameter and call read_const_value(rdr, bag, depth - 1, refs)
for each element instead of using MAX_MARSHAL_STACK_DEPTH, and ensure
read_const_value passes depth - 1 when it recurses into deserialize_code_inner
so the depth budget is threaded through and decremented across code-object
helper boundaries (references: read_marshal_const_tuple, read_const_value,
deserialize_code_inner, MAX_MARSHAL_STACK_DEPTH).
- Around line 704-715: The current Type::Code branch in deserialize_value_depth
creates a fresh inner_refs Vec (and only reserves slot 0 when FLAG_REF is set)
which resets the TYPE_REF index space and prevents code-object fields from
resolving refs from the outer stream; change this to reuse the caller's ref memo
or maintain an index-aligned mapping so TYPE_REF indices remain consistent:
instead of creating a new inner_refs Vec in the Type::Code arm, pass the
existing ref memo (or construct a mapping that pre-populates entries to preserve
index alignment) into deserialize_code_inner, ensuring deserialize_code_inner
and bag.make_code operate against the same ref index space as the surrounding
marshal stream (referencing deserialize_value_depth, inner_refs,
deserialize_code_inner, Bag::ConstantBag::Constant, TYPE_REF, FLAG_REF, and
bag.make_code).
In `@crates/vm/src/stdlib/sys.rs`:
- Around line 661-662: The cache tag change makes RustPython emit the same
__pycache__ filenames as CPython, risking cross-interpreter .pyc conflicts;
revert the tag to a RustPython-specific prefix (e.g., change the cache_tag
creation from "cpython-{}{}" to "rustpython-{}{}" using version::MAJOR and
version::MINOR) and/or add defensive validation in the .pyc loader to detect and
refuse incompatible marshalled bytecode (so when reading .pyc files you check
the tag/magic and fail-safe rather than attempting to execute another
interpreter's bytecode).
In `@crates/vm/src/version.rs`:
- Around line 72-73: Update the PYC_MAGIC_NUMBER constant from 3627 to 3658 to
match CPython 3.14: change the value of PYC_MAGIC_NUMBER in version.rs and
update the accompanying comment to reflect CPython 3.14; then run and adjust any
unit tests or logic that assert or compute against PYC_MAGIC_NUMBER (search for
references to PYC_MAGIC_NUMBER, pyc magic, or functions using that constant) so
they expect 3658 and any derived behavior remains correct.
---
Outside diff comments:
In `@crates/compiler-core/src/marshal.rs`:
- Around line 825-845: The dict-key branch duplicates decoding logic and calls
deserialize_value_typed(...) directly, which bypasses the raw-header/code-object
handling in deserialize_value_depth and causes Type::Code keys to be rejected;
change the key decoding in this branch to first determine the key's
type_code/header (using the same raw/FLAG_REF/ref-slot handling as shown) then
invoke the common decoder path used by deserialize_value_depth (so code objects
go through the same raw-header/code-object route), i.e., extract the differing
value (the key_type or ref index) and call the shared decode routine rather than
duplicating deserialize_value_typed logic; update use of refs, key_slot, and k
to follow the same flow as deserialize_value_depth so Type::Code is accepted for
dict keys.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: d31894f2-0beb-4988-9b37-3783ed0c14e9
📒 Files selected for processing (5)
crates/compiler-core/src/marshal.rscrates/vm/src/import.rscrates/vm/src/stdlib/marshal.rscrates/vm/src/stdlib/sys.rscrates/vm/src/version.rs
| // Code-objects keep their own inner ref table because Bag::Value (the | ||
| // outer marshal value) and the constant-bag's Constant type are not | ||
| // in general the same. When the outer header carried FLAG_REF, the | ||
| // code object occupies slot 0 of CPython's single global ref space, | ||
| // so we mirror that by reserving slot 0 of the inner table. | ||
| let value = if matches!(typ, Type::Code) { | ||
| let mut inner_refs: Vec<Option<<Bag::ConstantBag as ConstantBag>::Constant>> = Vec::new(); | ||
| if flag { | ||
| inner_refs.push(None); | ||
| } | ||
| let code = deserialize_code_inner(rdr, bag.constant_bag(), &mut inner_refs)?; | ||
| bag.make_code(code) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In CPython 3.14's marshal format, do FLAG_REF/TYPE_REF entries share one reference table across the entire stream, including code objects nested inside outer containers?
💡 Result:
Yes, in CPython 3.14, the marshal format uses a single reference table that is shared across the entire serialization stream, including nested code objects [1][2]. When the marshal module serializes an object, it maintains a single hashtable (on the writer side) or a single list (on the reader side) for the duration of the entire dump or load operation [1][2]. This mechanism allows the format to track objects that have already been serialized and assign them an index via the FLAG_REF flag [1][3][2]. Subsequent occurrences of the same object within the stream—regardless of whether they are at the top level or nested deep within other containers or code objects—can then be represented using the TYPE_REF tag followed by that index [1][4][2]. This design enables efficient serialization of shared or recursive structures throughout the entire marshaled data [2].
Citations:
- 1: https://github.com/python/cpython/blob/main/Python/marshal.c
- 2: https://github.com/python/cpython/blob/3.11/Python/marshal.c
- 3: https://gopy.tamnd.com/docs/annotations/python/python_marshal_c_detail
- 4: https://www.cnblogs.com/wyuu101/p/19519704
🏁 Script executed:
#!/bin/bash
set -euo pipefail
ls -la
rg -n "deserialize_code_inner|make_code|resolve_ref|FLAG_REF|TYPE_REF|inner_refs|constant_bag" crates/compiler-core/src/marshal.rs
sed -n '650,770p' crates/compiler-core/src/marshal.rsRepository: RustPython/RustPython
Length of output: 9757
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "deserialize_code_inner|make_code|resolve_ref|FLAG_REF|TYPE_REF|inner_refs|constant_bag" crates/compiler-core/src/marshal.rs
sed -n '650,770p' crates/compiler-core/src/marshal.rsRepository: RustPython/RustPython
Length of output: 6401
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "fn resolve_ref|resolve_ref\\(" crates/compiler-core/src/marshal.rs
sed -n '560,650p' crates/compiler-core/src/marshal.rsRepository: RustPython/RustPython
Length of output: 3050
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '160,260p' crates/compiler-core/src/marshal.rs
sed -n '260,360p' crates/compiler-core/src/marshal.rs
sed -n '360,560p' crates/compiler-core/src/marshal.rs
sed -n '560,740p' crates/compiler-core/src/marshal.rsRepository: RustPython/RustPython
Length of output: 18515
🌐 Web query:
CPython marshal.c unmarshal code object TYPE_REF memo shared vs new scope
💡 Result:
In CPython’s marshal.c, “TYPE_REF memo” is an internal mechanism used to preserve object identity during unmarshal by deduplicating repeated objects; it is not the same thing as Python execution “scope”. 1) TYPE_REF and the memo table (reference memoization) - CPython defines TYPE_REF = 'r' for reference entries in the marshal stream, plus FLAG_REF to indicate “object has been added to the reference table.” [1][2] - During writing, w_ref uses a hashtable stored in WFILE (p->hashtable). If an object already exists in that table, it writes an index (TYPE_REF + long index). Otherwise it adds the object to the table and sets FLAG_REF in the type’s flag output. [2][1] 2) Unmarshal: references are resolved via a list (“memo”) not a new object graph each time - On unmarshal, the WFILE structure uses refs as a “dict on marshal, list on unmarshal” (comment in WFILE struct). [3] - The TYPE_REF mechanism is therefore “shared vs new” at unmarshal time: a) If the stream contains TYPE_REF with an index, unmarshal returns the already-created object from the refs list at that index (so shared identity is preserved). b) If the stream contains a non-ref first occurrence, unmarshal creates a new object, appends it to the refs list (so future TYPE_REF can point to it). - This matches the overall marshal approach: it supports reference tracking to keep shared objects shared across the unmarshalled structure. [1][2] 3) Code objects (TYPE_CODE 'c') and how “memo/shared vs new” interacts with them - marshal has a dedicated TYPE_CODE = 'c' for code objects. [3][1] - When unmarshalling a code object, marshal reads its constituent parts (e.g., code bytes, consts, names, exceptiontable, etc.) and then constructs a new code object via _PyCodeConstructor / _PyCode_New (the snippet shows construction and validation prior to _PyCode_New). [1] - For any sub-objects inside that code object (like tuples/lists/dicts/constants), the TYPE_REF logic applies the same way: repeated sub-objects may be returned from the refs memo (shared), rather than reconstructed again. The “memo” affects object identity in the resulting object graph, including inside code objects. [1][3] 4) “Shared vs new scope” clarification - The “scope” variable you mentioned is not a marshal concept in marshal.c. The only “memo” here is the unmarshal reference table (WFILE.refs, “list on unmarshal”) used for reference deduplication. [3] - So, “shared vs new scope” should be read as: “shared vs newly allocated Python object identity in the unmarshalled result,” driven by whether the marshal stream uses TYPE_REF entries. [1][3] 5) Relationship to marshal documentation - The marshal module is explicitly intended for.pyc “pseudo-compiled” data, and code object format is version-dependent; de-serializing code objects in an incorrect version is undefined behavior. [4] If you want, paste the specific snippet/region of marshal.c you’re studying (around where unmarshal handles TYPE_REF and/or TYPE_CODE), and I can map each line to the exact “shared vs new” behavior for code objects and their fields.
Citations:
- 1: https://github.com/python/cpython/blob/main/Python/marshal.c
- 2: https://github.com/python/cpython/blob/3.11/Python/marshal.c
- 3: https://hg.python.org/cpython/file/dbdb6f7f9a1a/Python/marshal.c
- 4: https://docs.python.org/3/library/marshal.html
🌐 Web query:
site:github.com/python/cpython marshal.c TYPE_CODE unmarshal refs memo code object
💡 Result:
In CPython’s C implementation of the marshal module, TYPE_CODE is the tag used when the next marshalled value represents a Python code object; unmarshalling that tag is handled inside marshal.c’s TYPE_CODE case in r_object, where the code object’s fields (e.g., filename, name, qualname, line table, consts, names, localsplus arrays, args counts, stacksize, exception table/line info depending on version) are read via helper readers like r_object/r_long, assembled into a constructor struct, validated, and then instantiated as a PyCode object (via _PyCode_New / PyCode_NewWithPosOnlyArgs / PyCode_New depending on version). The created code object is then stored in the marshal loader’s reference table using r_ref_insert, enabling TYPE_REF sharing/references. [1][2][3][4] Unmarshal “code object” safety: the Python marshal docs warn that marshal is not intended to be secure against malicious/erroneous data and explicitly caution against unmarshalling untrusted input. [5] Also, CPython’s Python-level API added an allow_code parameter: marshal.load(..., *, allow_code=True) / marshal.loads(..., *, allow_code=True) support code objects only if allow_code is true; otherwise unmarshalling code objects is disallowed (raising ValueError in the C code when p->allow_code is false). [5][1] Finally, the TYPE_CODE path also participates in ref-handling and has had bug reports/patches around corrupt TYPE_CODE streams and cleanup/leak risks; for example, issue #148418 describes a specific reference leak reachable via marshal.loads with crafted input in the TYPE_CODE case when an error path used break instead of goto code_error (skipping cleanup of owned refs). [6]
Citations:
- 1: https://github.com/python/cpython/blob/main/Python/marshal.c
- 2: https://github.com/python/cpython/blob/3.11/Python/marshal.c
- 3: https://github.com/python/cpython/blob/v3.6.2/Python/marshal.c
- 4: https://github.com/python/cpython/blob/3.8/Python/marshal.c
- 5: https://github.com/python/cpython/blob/main/Doc/library/marshal.rst
- 6: marshal.c: 8 refs leaked on corrupt TYPE_CODE marshal stream python/cpython#148418
🌐 Web query:
marshal.c code object unmarshal FLAG_REF TYPE_CODE refs table reset or separate scope
💡 Result:
In CPython’s marshal.c (Python side), a “reference” entry is encoded using a FLAG_REF bit (0x80) together with a type code (TYPE_*). In the marshaler, FLAG_REF means “with a type, add obj to index”, and the reader reserves/looks up an index in the current refs table when it sees TYPE_REF/FLAG_REF. 1) TYPE_CODE + FLAG_REF meaning (marshal writer) - FLAG_REF is defined as 0x80 with the comment “with a type, add obj to index”. [1][2] - The marshaling code sets the FLAG_REF bit in the byte when it wants the value to be added to (and later referenced from) the refs table. [1][2] 2) TYPE_CODE 'c' (code objects) and the unmarshal path - TYPE_CODE is 'c' in the Python marshal implementation. [1] - When unmarshalling a code object, CPython’s code reader checks allow_code and then constructs the CodeConstructor/Code object from the stream, including the fields like filename/name/qualname, code bytes, consts, names, localsplusnames, localspluskinds, stacksize, etc. [1][2] - After validating and creating the code object, it inserts the created object into the refs table when FLAG_REF is in effect via r_ref_insert(..., flag, p). [2] 3) How TYPE_REF uses the refs table (unmarshal reader) - The unmarshal switch explicitly handles case TYPE_REF. [1] - TYPE_REF reads an integer index n, checks it is within the current refs table size (PyList_GET_SIZE(p->refs)), and then retrieves the referenced object: v = PyList_GET_ITEM(p->refs, n). [1] - If the index is invalid (or refers to Py_None), it raises “bad marshal data (invalid reference)”. [1] 4) Does it “reset or separate scope” (refs table lifetime) - In the shown CPython marshal.c snippets, the refs table used for TYPE_REF lookups is the parser/file state field p->refs (i.e., it is tied to the current unmarshal operation/state). [1] - The reader validates that any TYPE_REF index is within the bounds of the refs list stored in that same state object (PyList_GET_SIZE(p->refs)). [1] - That implies references are resolved within the same unmarshal run/stream state, not across unrelated unmarshalling calls; if you unmarshal a separate stream/call, you get a separate marshal read state (and thus a separate refs list). This is the “separate scope” behavior you’re looking for at the C level: refs are managed per unmarshal state via p->refs. [1] 5) High-level summary for your terms - marshal.c code (CPython): - FLAG_REF (0x80): “add obj to refs index”. [1][2] - TYPE_CODE: 'c' (code objects). [1] - TYPE_REF: 'r' (reference to prior objects in the refs table). [1] - On unmarshal, TYPE_REF pulls from the current p->refs list. [1] - refs table is per unmarshal state (p->refs), so it is effectively separate per unmarshalling operation rather than globally shared. [1] If you meant Ruby’s marshal.c (CRuby), note that the query terms you used (TYPE_CODE/FLAG_REF/TYPE_REF) match CPython marshal.c naming; Ruby’s marshal.c has different type tags and mechanisms, so tell me the language/runtime you mean and I’ll focus on the correct codebase.
Citations:
- 1: https://android.googlesource.com/platform/external/python/cpython3/+/refs/heads/master/Python/marshal.c
- 2: https://github.com/python/cpython/blob/main/Python/marshal.c
Fix ref-memo handling for Type::Code decoding
deserialize_value_depthcreates a freshinner_refs(only reserving slot 0 whenFLAG_REFis set) and uses that fordeserialize_code_inner; this resets theTYPE_REFindex space, soTYPE_REFinside code-object fields can’t resolve objects registered earlier in the surrounding marshal stream.- Update the code to keep
TYPE_REFindices aligned with the caller’s ref memo (reuse the existing memo, or maintain an index-aligned mapping into the constant-bag representation).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/compiler-core/src/marshal.rs` around lines 704 - 715, The current
Type::Code branch in deserialize_value_depth creates a fresh inner_refs Vec (and
only reserves slot 0 when FLAG_REF is set) which resets the TYPE_REF index space
and prevents code-object fields from resolving refs from the outer stream;
change this to reuse the caller's ref memo or maintain an index-aligned mapping
so TYPE_REF indices remain consistent: instead of creating a new inner_refs Vec
in the Type::Code arm, pass the existing ref memo (or construct a mapping that
pre-populates entries to preserve index alignment) into deserialize_code_inner,
ensuring deserialize_code_inner and bag.make_code operate against the same ref
index space as the surrounding marshal stream (referencing
deserialize_value_depth, inner_refs, deserialize_code_inner,
Bag::ConstantBag::Constant, TYPE_REF, FLAG_REF, and bag.make_code).
| // cache tag uses 'cpython' because our compiler is cpython compatible | ||
| let cache_tag = format!("cpython-{}{}", version::MAJOR, version::MINOR); |
There was a problem hiding this comment.
Risk of .pyc cache conflicts with CPython installations.
The cache tag has been changed from rustpython-{MAJOR_IMPL}_{MINOR_IMPL} to cpython-{MAJOR}{MINOR}, making RustPython use the same .pyc cache tag as CPython itself (e.g., cpython-314.pyc). This means RustPython and CPython will share the same __pycache__ filenames in environments where both interpreters are used.
Since RustPython's marshal format and VM implementation differ from CPython's, using the same cache tag could cause:
- Cache corruption when both interpreters compile the same module
- Runtime errors when one interpreter loads the other's
.pycfiles - Subtle bugs if bytecode is partially compatible but differs in edge cases
If this alignment is intentional for testing or migration purposes, please document the risk and consider adding validation to detect incompatible .pyc files at load time. Otherwise, consider retaining a RustPython-specific prefix (e.g., rustpython-314) to avoid conflicts.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/vm/src/stdlib/sys.rs` around lines 661 - 662, The cache tag change
makes RustPython emit the same __pycache__ filenames as CPython, risking
cross-interpreter .pyc conflicts; revert the tag to a RustPython-specific prefix
(e.g., change the cache_tag creation from "cpython-{}{}" to "rustpython-{}{}"
using version::MAJOR and version::MINOR) and/or add defensive validation in the
.pyc loader to detect and refuse incompatible marshalled bytecode (so when
reading .pyc files you check the tag/magic and fail-safe rather than attempting
to execute another interpreter's bytecode).
| // Matches CPython 3.14 (Include/internal/pycore_magic_number.h). | ||
| pub const PYC_MAGIC_NUMBER: u16 = 3627; |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What is the PYC magic number for CPython 3.14?
💡 Result:
CPython 3.14’s.pyc “magic number” is 3658 (decimal). [1][2] This value comes from CPython’s internal header file that defines the PYC magic number used in the.pyc header for a given bytecode format. [1][2]
Citations:
- 1: https://github.com/python/cpython/blob/36e4ffc1/Include/internal/pycore_magic_number.h
- 2: https://github.com/python/cpython/blob/main/Include/internal/pycore_magic_number.h
Fix PYC_MAGIC_NUMBER to match CPython 3.14
CPython 3.14’s .pyc magic number is 3658 (from Include/internal/pycore_magic_number.h), so PYC_MAGIC_NUMBER: u16 = 3627 and the “Matches CPython 3.14” comment are inconsistent. Update the constant (and any related logic/tests) to 3658.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@crates/vm/src/version.rs` around lines 72 - 73, Update the PYC_MAGIC_NUMBER
constant from 3627 to 3658 to match CPython 3.14: change the value of
PYC_MAGIC_NUMBER in version.rs and update the accompanying comment to reflect
CPython 3.14; then run and adjust any unit tests or logic that assert or compute
against PYC_MAGIC_NUMBER (search for references to PYC_MAGIC_NUMBER, pyc magic,
or functions using that constant) so they expect 3658 and any derived behavior
remains correct.
Bidirectional
|
Use the CPython compatibility version (e.g. cpython-314) instead of
the rustpython-{MAJOR_IMPL}_{MINOR_IMPL} interpreter version string.
Py_MARSHAL_VERSION is 5 in CPython 3.14.5 (Include/marshal.h:16) and TYPE_SLICE serialization rejects version < 5 (Python/marshal.c:720). Restore the same threshold and constant so marshal.version and the slice-marshal gate match CPython.
Code objects embedded in const-tuples reset the depth budget on each recursion, so a hostile or pathological marshal stream of code-in-tuple- in-code can blow the stack despite MAX_MARSHAL_STACK_DEPTH. Pass the current depth through deserialize_code_inner and read_marshal_const_tuple and decrement at each code-object/tuple boundary. Also route dict keys through deserialize_value_after_header so TYPE_CODE keys decode instead of failing with BadType.
Summary by CodeRabbit
Bug Fixes
Chores
Tests