Skip to content

blank page contained in a document not handled by get_layout_csv_from_trp2 #427

@fdejax90

Description

@fdejax90
from trp import trp2 as t2
t_document = t2.TDocumentSchema().load(job_results)

get_layout_csv_from_trp2(t_document)

>>>
AttributeError                            Traceback (most recent call last)
Cell In[3], line 1
----> 1 layout_csv = get_layout_csv_from_trp2(t_document)

File [~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/textractprettyprinter/t_pretty_print_layout.py:263](http://localhost:8888/lab/workspaces/auto-R/tree/~/.pyenv/versions/3.11.9/lib/python3.11/site-packages/textractprettyprinter/t_pretty_print_layout.py#line=262), in get_layout_csv_from_trp2(trp2_doc)
    261 processed_ids = []
    262 relationships: t2.TRelationship = page.get_relationships_for_type()
--> 263 blocks = [trp2_doc.get_block_by_id(id) for id in relationships.ids if relationships.ids]
    264 layout_blocks = [
    265     block for block in blocks if block.block_type in [
    266         "LAYOUT_TITLE", "LAYOUT_HEADER", "LAYOUT_FOOTER", "LAYOUT_SECTION_HEADER", "LAYOUT_PAGE_NUMBER",
    267         "LAYOUT_LIST", "LAYOUT_FIGURE", "LAYOUT_TABLE", "LAYOUT_KEY_VALUE", "LAYOUT_TEXT"
    268     ]
    269 ]
    270 for idx, layout_block in enumerate(layout_blocks):
    271     # for lists the output is special, because the LAYOUT_TEXTs do have a reference to the LAYOUT_LIST in the text
    272     # so we grab the list and process all children
    273     # probably could make this "easier" by keeping track of the len of CHILD relationships in LAYOUT_LIST
    274     # but wanted to see if I can prepare the lists in lists, which may happen one point in the future...

AttributeError: 'NoneType' object has no attribute 'ids'

The t_document document contains a blank page and so relationships is set to None for this page.

There is a missing handler for the edge case where relationships = None

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions