Skip to content

Unable to use S3 presigned HTTP GET URL while attempting to create a dataframe from URL #23446

Open
@denismakogon

Description

@denismakogon

Code Sample, a copy-pastable example if possible

def from_file(path_or_url: str, amplification_peak: np.float64):
    df = pd.read_csv(
        path_or_url,
        names=[TIMECODE, AMP_PEAK],
        dtype={TIMECODE: np.float64, AMP_PEAK: np.float64}
    )

where the URL is:

http://docker.for.mac.localhost:9000/ffmpeg/af503ec9-08a8-4aff-a3a9-c9c3b0d8475f.csv?X-Amz-Algorithm=AWS4-HMAC-SHA256\u0026X-Amz-Credential=admin%2F20181101%2Fus-east-1%2Fs3%2Faws4_request\u0026X-Amz-Date=20181101T084722Z\u0026X-Amz-Expires=3600\u0026X-Amz-SignedHeaders=host\u0026X-Amz-Signature=60de48f6828baa2f83f8528ead22b4346044913202d2bc19020a0a6043334a57"

Problem description

Well, starting 0.18 AFAIK pandas allows to read dataframes from the URLs. So, presigned S3 URLs must not be an exception.

Unfortunately, with S3 presigned URLs pandas fails to read the dataframe and instead of raising any kinds or an exceptions it just returns an empty dataset.
At first i thought that URLs is bad or expired, but no, it's not and using cURL or wget I still can get that whole file from there.

Expected Output

Expected output here would be a non-empty dataset.

Output of pd.show_versions()

0.23.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO NetworkLocal or Cloud (AWS, GCS, etc.) IO Issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions