-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: native support for universal_pathlib (upath) IO #60618
Comments
pandas already supports fsspec - what does upath offer that isn't covered by that? |
Possibly only what upath offers intrinsically, which is the ability to work with cloud paths the same as |
Thanks @zkurtz for notifying me in the repository. And hello @WillAyd, I'm the current universal-pathlib maintainer.
There are two reasons for using universal-pathlib instead of fsspec directly, and both are basically convenience features.
The current feature request basically asks for simplifying the following pattern: from upath import UPath
import pandas as pd
pth = UPath("s3://bucket/file.csv", some_option=True, other_option=123)
pd.DataFrame({"A": [1, 2, 3]}).to_csv(pth, storage_options=pth.storage_options)
pd.read_csv(pth, storage_options=pth.storage_options) Currently, I would recommend against directly depending on For historic reasons, all This refactor in As soon as the refactor in Cheers, |
Contrary to my original post, I'm observing empirically that the naive approach
actually does work both for s3 and gcs. I encountered an issue only with Azure. Maybe the issue is more about the specific type of authentication being used. |
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
universal_pathlib makes it quite a lot easier to read and write data frames directly against cloud paths like, say,
"s3://test_bucket/example.txt"
by absorbing authentication concerns and cloud-specific-implementation issues into to the construction of the path itself. This then allows IO methods to work as close to normally as possible without regard for the nature of the path being used (local vs GCS vs S3 etc.).So, ideally, this would just work:
But it does not quite work. However, this thin wrapper does seems to work, simply by detecting whether the input path is a UPath, and (if so) passing along the storage options into the pandas IO calls.
Proposal: Extend the allowable types of paths in pandas dataframe IO methods to include UPath, and automatically detect storage options in that case.
Feature Description
Nothing to add ...
Alternative Solutions
Nothing to add ...
Additional Context
No response
The text was updated successfully, but these errors were encountered: