Skip to content

use s3fs authentication if provided #33639

Closed
@cc-jj

Description

@cc-jj

I have a use case where I need to download dataframes from multiple s3 buckets with different credentials.

By default, s3fs uses env variables such as AWS_PROFILE AWS_ACCESS_KEY_ID etc to determine credentials. However, this will not work for me as I need different credentials for different buckets.

The s3fs docs show you can alternatively authenticate like so:
https://fs-s3fs.readthedocs.io/en/latest/#authentication

s3fs = open_fs('s3://<access key>:<secret key>@mybucket')

I attempted to use this idea with pandas

df = pd.read_csv("s3://<access key>:<secret key>@mybucket/csv_key")

but this raised an exception deep within s3fs saying invalid bucket name. potentially caused by stripping logic here:
https://github.com/pandas-dev/pandas/blob/master/pandas/io/s3.py#L29

I think we could easily support authentication using this syntax:

pd.read_csv("s3://<access key>:<secret key>@mybucket/csv_key")

By modifying the code here:
https://github.com/pandas-dev/pandas/blob/master/pandas/io/s3.py#L27

The idea being we first attempt to match the filepath_or_buffer for the access key and secret key. If matched, we pass these into s3fs.FileSystem

m = re.match(pattern, filepath_or_buffer)
if match is not None:
    access_key, secret_key, bucket_name = match.groups()
    fs = s3fs.FileSystem(bucket_name, aws_access_key_id=access_key, aws_secret_key=secret_key)
...

Metadata

Metadata

Assignees

Labels

IO CSVread_csv, to_csv

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions