Skip to content

Unable to load excel (xlsx) file in pandas #588

Closed
@do-me

Description

@do-me

Checklist

  • I added a descriptive title
  • I searched for other issues and couldn't find a solution or duplication
  • I already searched in Google and didn't find any good information or help

What happened?

I tried to load an excel file from my server (not locally as in #287) with pandas and pyodide's functions but it doesn't work. I know that for pyodide.open_url(url: str) → _io.StringIO

The download of binary files is not supported.

See here. (I'm using the function with success for loading simple csv files)
Pyodide's docs claim that I should look into other functions for binary files (like xlsx).

I set up a playground to quickly test it here, with test.xlsx served under /downloads/test.xlsx.

I'm coming as far as retrieving the bytes from the response object.

from pyodide.http import pyfetch
import asyncio
import pandas as pd 
import openpyxl
from io import BytesIO

response = await pyfetch(url="https://geo.rocks/downloads/test.xlsx", method="GET")
bytes_response = response.bytes()
BytesIO(bytes_response)

But it gives me

JsException(PythonError: Traceback (most recent call last): File "/lib/python3.10/asyncio/futures.py", line 201, in result raise self._exception File "/lib/python3.10/asyncio/tasks.py", line 232, in __step result = coro.send(None) File "/lib/python3.10/site-packages/_pyodide/_base.py", line 500, in eval_code_async await CodeRunner( File "/lib/python3.10/site-packages/_pyodide/_base.py", line 353, in run_async await coroutine File "", line 10, in TypeError: a bytes-like object is required, not 'coroutine' )

Afterwards I wanted to read it into a pandas df but I think I'm lacking a tiny piece in the puzzle here. Also the error seems strange as it indeed return the bytes if you leave out the BytesIO function.

bytes_response gives

b'PK\x03\x04\x14\x00\x06\x00\x08\x00\x00\x00!\x00A7\x82\xcfn\x01\x00\x00\x04\x05\x00\x00\x13\x00\x08\x02[Content_Types].xml \xa2\x04\x02(\xa0\x00\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\ ...

Any clue how to load the response into a df?

What browsers are you seeing the problem on? (if applicable)

Firefox, Chrome, Safari, Microsoft Edge

Console info

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Closed

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions