Skip to content

feature: Document.text #72

Open
Open
@deanmalmgren

Description

@deanmalmgren

@mikemaccana's old project had a simple script for extracting text from a document. Took me a few minutes to figure it out, but this is really simple now:

document = docx.Document(filename)
return '\n\n'.join([
    paragraph.text.encode('utf-8') for paragraph in document.paragraphs
])

Just opening this issue with this little code snippet might just serve the purpose of documenting the methodology, but it might be nice to include this somewhere in the documentation or as a script that is installed with the package. I'm happy to contribute.

Do you have any preferences on a script vs documenting this two-liner? If just documenting is enough, any thoughts on where it should go?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions