Description
I have a use case where I need to download dataframes from multiple s3 buckets with different credentials.
By default, s3fs uses env variables such as AWS_PROFILE
AWS_ACCESS_KEY_ID
etc to determine credentials. However, this will not work for me as I need different credentials for different buckets.
The s3fs docs show you can alternatively authenticate like so:
https://fs-s3fs.readthedocs.io/en/latest/#authentication
s3fs = open_fs('s3://<access key>:<secret key>@mybucket')
I attempted to use this idea with pandas
df = pd.read_csv("s3://<access key>:<secret key>@mybucket/csv_key")
but this raised an exception deep within s3fs saying invalid bucket name. potentially caused by stripping logic here:
https://github.com/pandas-dev/pandas/blob/master/pandas/io/s3.py#L29
I think we could easily support authentication using this syntax:
pd.read_csv("s3://<access key>:<secret key>@mybucket/csv_key")
By modifying the code here:
https://github.com/pandas-dev/pandas/blob/master/pandas/io/s3.py#L27
The idea being we first attempt to match the filepath_or_buffer
for the access key and secret key. If matched, we pass these into s3fs.FileSystem
m = re.match(pattern, filepath_or_buffer)
if match is not None:
access_key, secret_key, bucket_name = match.groups()
fs = s3fs.FileSystem(bucket_name, aws_access_key_id=access_key, aws_secret_key=secret_key)
...