Skip to content

Newest updates break DaskOfflineStore with S3 parquets #4753

@bjmccotter7192

Description

@bjmccotter7192

Expected Behavior

In version 0.40.1 the Dask Offline store was able to read the data_source.path directly from the FileSource and retrieve the data from S3 using a path like: s3://<your-bucket>/<file-name>

Current Behavior

Failing to pull data because it is now appending the repo_path to the front of the s3 url.

Example:
/tmp/feast:s3//<your-bucket>/<file-name>

I believe this is because of a recent change: #4624 which is now not accepting the S3 url as a absolute Path

Steps to reproduce

  • Rebuilt my environment with latest tagged version 0.41.3
  • Reran my get_historical_features and call hung for a while then errored with the file path error not existing

Specifications

  • Version: 0.41.3
  • Platform: Linux
  • Subsystem: Debian

Possible Solution

  • Revert that change or allow a flag that would be able to bypass that breaking change
  • IF storage_options NOT None, Read parquet directly

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions