Open
Description
Hello Everyone,
I'm trying to extract the text from the table cells in word and populating them into pandas DataFrame. I was successfully able to do that mainly with the help of this code:
document = Document(path_to_your_docx)
tables = document.tables
for table in tables:
for row in table.rows:
for cell in row.cells:
for paragraph in cell.paragraphs:
print(paragraph.text)
Thanks to @scanny
However, I get empty text when hyperlinks are encountered in the cell.
Alternatively, I'm able to extract all the hyperlinks from the document using this code:
rels = document_name.part.rels
for rel in rels:
if rels[rel].reltype == RT.HYPERLINK:
print( rels[rel]._target)
But I would much rather prefer extracting them using cells object, this would allow me to place the hyperlinks corresponding to the row they belong.
Any help is appreciated !!!
Many Thanks,
Divyesh