Skip to content

ENH: The XLS_SIGNATURE is too restrictive #41225

Closed
@geoffrey-eisenbarth

Description

@geoffrey-eisenbarth

Pandas 1.2.4 fails to open XLS files generated by Lotus 1-2-3 because they have a different header than the one expected in XLS_SIGNATURE (io.excel._base line 1001 and then 1050).

I know I'm likely to be the only person in the world with this issue, but my boss still uses Lotus 1-2-3 🙄

I'm willing to submit a pull request with my fix if people are okay with it. The fix basically involves changing XLS_SIGNATURE to a list:

XLS_SIGNATURES = [
  b"\xD0\xCF\x11\xE0\xA1\xB1\xA1\xE1",
  b"\t\x04\x06",
]
ZIP_SIGNATURE = b"PK\x03\x04"
PEEK_SIZE = max(map(len, XLS_SIGNATURES + [ZIP_SIGNATURE]))

and then later:

if any(peek.startswith(signature) for signature in XLS_SIGNATURES):
  return "xls"

I don't know if the second XLS_SIGNATURES value is "good," but it works. I can attach an XLS that was exported by 1-2-3 if it would be beneficial to others, or if they'd be able to help me find a better second value for XLS_SIGNATURES. I tried looking through the byte-code for anything similar to the "DOCFILE" XLS signature, but didn't see anything.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions