Skip to content

Split cells in tables are read wrong, who knows a workaround? #939

Open
@diruas

Description

@diruas

Consider the following two word tables, the first one only has merged cells I can read this correctly.
In the 2nd one I split the top row (after merge) to have two, and when I do this, my table reads incorrect.
image

demo-word.docx

here is the code to reproduce:

import docx

document = docx.Document('demo-word.docx')

for table in document.tables:
print('TABLE---')
for row in table.rows:
print('--new-row--')
for cell in iter_unique_cells(row):
print(cell.text)

to simply read the result I have the print out "Table" and "--new row--"

this will result in a "cell defect" propagating through the table (see below). Does anyone have a workaround for this?
Many thanks

1th table reads correct as:
TABLE---
--new-row--
A
--new-row--

--new-row--
B1
B2
B3
B4

--new-row--
C1
C2
C3
C4

--new-row--

D

--new-row--
E1
E3

--new-row--
F1
F2
F3
F4
F5

2nd table reads incorrect:
TABLE---
--new-row--
A1
A2
--new-row--

B1
--new-row--
B2
B3
B4

C1
C2
--new-row--
C3
C4

D

--new-row--

E1
E3
--new-row--
E3

F1
F2
F3
F4

--new-row--
F5

looking at the XML sturcutre the split creates something like a hidden row?
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions