Skip to content

Trouble recognising tables #1082

Open
Open
@DeanStra

Description

@DeanStra

Hi guys this is my first time posting on Github so apologies for breaking any conventions and feedback appreciated!

I have some code which converts a pdf using python-pdf2docx and then adds text to the tables using python-docx. When I run the following code on the attached docx files

print(doc.tables[0].cell(0, 1).text)
print(doc.tables[1].cell(0, 1).text)

I find that for Output_A the first "table" python-docx recognises isn't a table (not a particular problem) and then it skips over the first two actual tables (a problem).

Output_B seems not to recognise any tables at all.

If I convert the files using Adobe or SmallPDFs online tools docx interprets the tables correctly. Perhaps this is more of an issue with python-pdf2docx but I haven't had much luck pursuing that avenue and would appreciate any way to make it work with python-docx. I'd like to turn the code into a little executable so anyone can use it and the actually .docx code works great ... if it can detect the tables. Thank you in advance for helping.

Output_A.docx
Output_B.docx

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions