detect beginning of new row. if last row is incomplete, add empty cells. #564

0-173 · 2018-10-29T00:33:56Z

I found that some tables in MS-Word created documents have inconsistend grid_span attributes in a row (not matching the tblGrid's gridCol element count).

When accessing the ._cells property of such a table, it will not consider the row/column definition of the xml but just count the cells and assume that the cell count per row and all grid_span attributes are consistent with the tblGrid specification.
In my case there were 7 columns in the table but one row contained only grid_span attribues of 1 + 1 + 2 + 2 = 6. The next row and all succeeding rows would get partly wrapped around and were broken.

This merge request fixes the issue by detecting a row's start (see #232) and then filling any incomplete row with empty cells if necessary. Unit test is included.

scanny · 2018-10-29T17:22:19Z

Can you say how the documents were created and how one with this characteristic can be reproduced?

0-173 · 2018-10-30T12:48:30Z

Hi Scanny, thats a good question. I currently only have a document that has been generated in MS Word by our customer. There are multiple people involved in the creation of this document, so it's not so easy to find out how exactly they produced this document - but I will give it a try. I have modified the original document to obtain a minimal version demonstrating the problem. The table has a gridCol definition of 7 columns. In this table the first line consists of cells that sum up to 7 columns, the second and third line each sum up to only 6 columns.
84_minimal_demo.docx

0-173 · 2018-11-13T09:49:34Z

I talked once again to the customer. They're copy-pasting lot of cells and rows from tables in other documents. But we are still not able to reproduce such a document from scratch.
Do you think it makes sense to integrate this new-row-detection? In my opinion, even though the document is kind of corrupted, the behaviour of not-seeing the new xml-row is not what people would expect from the library.

aidanfindlater · 2019-01-07T16:54:20Z

I'd just like to mention that I found this patch helpful. I'm having a similar problem with unwieldy manually-edited-over-years tables. The original document is from 2012, so it has acquired from cruft over the years. Some of the tables have different row lengths (for aesthetics, I believe), or cell widths that have been adjusted independently of the column (which has the same effect on the script, from what I can tell).

The expected behaviour is for the library to produce essentially an m x n dataframe that corresponds as closely as possible to the original table. The most straightforward way to do this is the one suggested by @0-173, where the shortened row would have empty cells added to the end. Instead, every time the script reached one of these inconsistencies, it fills any missing cells in the shortened row with cells from the next row (removing them from the next row, and thereby offsetting the data in the rest of the dataframe). Essentially, its current behaviour is to make an n x m table, then fill it in with whichever cells happen to exist (going LTR, TTB), regardless of which location they were originally in.

I've got it working now my monkey-patching my script with a version of this patch, so thank you.

blackdaemon · 2019-05-09T16:58:42Z

This is happening to me after converting doc to docx using Microsoft Office Migration Planning Manager. Resulting docx looks the same as the source doc visually, but internally it exhibits this issue.

scanny · 2019-05-09T17:10:00Z

Ah, that's a great datapoint @blackdaemon, it makes sense that the document was operated on by something other than Word itself. Thanks for mentioning that :)

fdabek1 · 2019-05-16T17:11:13Z

@scanny Is this going to get merged in and be released? I have come across this same issue with a bunch of files and my only solution at the moment would be to delete the extraneous column manually. Getting this merged would be a huge help!

HiGregSmith · 2020-10-16T22:20:33Z

I wonder if this issue is related to the gridAfter and gridBefore properties. I have noticed in the table I'm working with that a fewer number of tcPr elements happens when trPr also has a gridAfter element defined for the row as non-zero. If either gridAfter or gridBefore are unspecified, zero is the default..

It seems to be the case that the the table's number of columns equals, for each row:
gridAfter (row property, zero if unspecified), plus
gridBefore (row property, zero if unspecified), plus
each cell's gridSpan property (one in each cell where it is unspecified).

For more details, see #881

FredEPr · 2020-10-20T13:00:47Z

It's actually really easy to produce a table with a mismatch column count on Word.

On any table you can right click a cell and (I don't have Word in English so I'm guessing the names) insert -> insert cells... -> shift cells to the right. It will insert a single cell and shift the remaining cells of the row to the right. Or you can do the exact opposite and delete a single cell with right click -> delete cells... -> shift cells to the left.

Both will produce a table with an inconsistent number of columns and accessing the table by row and column indexes will be unreliable.

The patch from 0-173 fix this issue.

By the way the issue #861 is probably the same.

tonal · 2023-02-21T07:34:29Z

Try #881 (comment)

detect beginning of new row. if last row is incomplete, add empty cells.

9e04119

scanny added the table label Oct 29, 2018

abhishekmadhu approved these changes Jun 14, 2019

View reviewed changes

HiGregSmith mentioned this pull request Oct 16, 2020

Does not correctly parse table rows with w:gridBefore or w:gridAfter elements #881

Closed

unresto mentioned this pull request Jul 8, 2021

No. of columns are wrong in some table objects #852

Open

toxicphreAK mentioned this pull request Dec 2, 2022

Read commits from original repo toxicphreAK/python-docx-ng#22

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

detect beginning of new row. if last row is incomplete, add empty cells. #564

detect beginning of new row. if last row is incomplete, add empty cells. #564

Uh oh!

0-173 commented Oct 29, 2018

Uh oh!

scanny commented Oct 29, 2018

Uh oh!

0-173 commented Oct 30, 2018

Uh oh!

0-173 commented Nov 13, 2018 •

edited

Loading

Uh oh!

aidanfindlater commented Jan 7, 2019

Uh oh!

blackdaemon commented May 9, 2019

Uh oh!

scanny commented May 9, 2019

Uh oh!

fdabek1 commented May 16, 2019

Uh oh!

HiGregSmith commented Oct 16, 2020

Uh oh!

FredEPr commented Oct 20, 2020 •

edited

Loading

Uh oh!

tonal commented Feb 21, 2023

Uh oh!

Uh oh!

detect beginning of new row. if last row is incomplete, add empty cells. #564

Are you sure you want to change the base?

detect beginning of new row. if last row is incomplete, add empty cells. #564

Uh oh!

Conversation

0-173 commented Oct 29, 2018

Uh oh!

scanny commented Oct 29, 2018

Uh oh!

0-173 commented Oct 30, 2018

Uh oh!

0-173 commented Nov 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aidanfindlater commented Jan 7, 2019

Uh oh!

blackdaemon commented May 9, 2019

Uh oh!

scanny commented May 9, 2019

Uh oh!

fdabek1 commented May 16, 2019

Uh oh!

HiGregSmith commented Oct 16, 2020

Uh oh!

FredEPr commented Oct 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tonal commented Feb 21, 2023

Uh oh!

Uh oh!

0-173 commented Nov 13, 2018 •

edited

Loading

FredEPr commented Oct 20, 2020 •

edited

Loading