Skip to content

document.save_as_markdown(page_no=page_number) creates an artefacts folder with all the images, instead the images of the page #179

Open
@mariaapostigo

Description

@mariaapostigo

I am iterating over the pages of a document as follows:
`
for page_num, _ in enumerate(pages, start=1):
# Save each page as a separate Markdown file
page_md_filename = output_dir / f"{doc_filename}page{page_num}.md"
conv_res.document.save_as_markdown(page_md_filename, image_mode=ImageRefMode.REFERENCED, page_no=page_num)

    # Read the Markdown content
    with page_md_filename.open("r", encoding="utf-8") as md_file:
        md_content = md_file.read()

    artifacts_dir = f"{doc_filename}_page_{page_num}_artifacts"
    image_files = list((output_dir / artifacts_dir).glob("*.png"))
    table_counter = 0

`
from line: conv_res.document.save_as_markdown(page_md_filename, image_mode=ImageRefMode.REFERENCED, page_no=page_num) I would expect to generate a directory with the images in page_num, but I create a directory pero page with all the images from the document.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions