Description
I'm using python-docx to open a document template and to replace placeholders by actual text. Screenshot of template1.docx (for the file, scroll to the bottom):
The following code sometimes worked, sometimes not:
from docx import Document
def replace_placeholders(document, data):
def replace_in_paragraph(paragraph):
if paragraph is not None:
for field, value in data.items():
if f"«{field}»" in paragraph.text:
paragraph.text = paragraph.text.replace(f"«{field}»", value)
def replace_in_bullet_list(bullet_list):
if bullet_list is not None:
for paragraph in bullet_list:
if paragraph.text is not None:
replace_in_paragraph(paragraph)
# Replace placeholders in paragraphs
for paragraph in document.paragraphs:
replace_in_paragraph(paragraph)
# Replace placeholders in tables in main document
for table in document.tables:
for row in table.rows:
for cell in row.cells:
for paragraph in cell.paragraphs:
replace_in_paragraph(paragraph)
# Replace placeholders in bullet lists in main document
for bullet_list in document.element.xpath("//w:p[w:pPr/w:numPr]"):
replace_in_bullet_list(bullet_list)
# Replace placeholders in tables in headers
for section in document.sections:
header = section.header
if header is not None:
for paragraph in header.paragraphs:
for field, value in data.items():
if f"«{field}»" in paragraph.text:
paragraph.text = paragraph.text.replace(f"«{field}»", value)
for table in header.tables:
for row in table.rows:
for cell in row.cells:
for paragraph in cell.paragraphs:
for field, value in data.items():
if f"«{field}»" in paragraph.text:
paragraph.text = paragraph.text.replace(f"«{field}»", value)`
data = {
"Name": "John Doe",
"Age": "30",
"Occupation": "Software Engineer",
"Location": "New York",
"Csharp": "2 years",
"Java": "6 years",
"Company": "Ferdy AB"
}
# Load the Word template
template = Document("template1.docx")
# Replace placeholders with actual data
replace_placeholders(template, data)
# Save the modified Word document
template.save("output1.docx")
The issue is caused by the template file (template1.docx, created and saved using Word Version 2401 out of Microsoft 365). For reasons I don't understand, the placeholders, which are fields of type MergeField, are sometimes stored in <w:instrText> tags, sometimes in <w:fldSimple> tags in the document.xml. In the former case, the above code works, in the latter case, it does not. Below, I'm giving an example of the 'Name' placeholder in the numbered list. In the first case the template has only the placeholder, in the seconds case it has the placeholder plus a space character.
Placeholder only (excerpt from document.xml):
<w:r>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
<w:instrText xml:space="preserve"> MERGEFIELD Name \* MERGEFORMAT </w:instrText>
</w:r>
<w:r>
<w:fldChar w:fldCharType="separate"/>
</w:r>
<w:r w:rsidR="000C7414">
<w:rPr>
<w:noProof/>
</w:rPr>
<w:t>«Name»</w:t>
</w:r>
<w:r>
<w:rPr>
<w:noProof/>
</w:rPr>
<w:fldChar w:fldCharType="end"/>
</w:r>
Placeholder plus subsequent space (excerpt from document.xml):
<w:fldSimple w:instr=" MERGEFIELD Name \* MERGEFORMAT ">
<w:r w:rsidR="000C7414">
<w:rPr>
<w:noProof/>
</w:rPr>
<w:t>«Name»</w:t>
</w:r>
</w:fldSimple>
<w:r w:rsidR="00697A4F">
<w:rPr>
<w:noProof/>
</w:rPr>
<w:t xml:space="preserve"></w:t>
</w:r>
I don't know whether that's a bug or a feature?! In any case, I have no idea how to work around this one.
I'm uploading the template that works with the code. If a space is added after the 'Name' placeholder, it does not work anymore...
template1.docx