Open
Description
@mikemaccana's old project had a simple script for extracting text from a document. Took me a few minutes to figure it out, but this is really simple now:
document = docx.Document(filename)
return '\n\n'.join([
paragraph.text.encode('utf-8') for paragraph in document.paragraphs
])
Just opening this issue with this little code snippet might just serve the purpose of documenting the methodology, but it might be nice to include this somewhere in the documentation or as a script that is installed with the package. I'm happy to contribute.
Do you have any preferences on a script vs documenting this two-liner? If just documenting is enough, any thoughts on where it should go?