I used the following code to extract information from documents, including text and tables:
textract_json = call_textract(
input_document=byte_img,
features=[Textract_Features.TABLES, Textract_Features.LAYOUT, Textract_Features.FORMS],
boto3_textract_client=textract_client
)
layout = get_text_from_layout_json(textract_json, exclude_figure_text=False)
if 1 in layout.keys():
full_text = layout[1]
else:
full_text = ''
However, when testing it on the attached document (document_anonyme_1.jpg), the resulting text output (document_anonymise_1.txt) is missing the last row of the table — specifically, the row that contains "COPYRIGHT EOT ..." does not appear.
Could you please help me resolve this issue?
For reference, I am using the following versions of the relevant packages:
amazon-textract-caller: 0.2.4
amazon-textract-prettyprinter: 0.1.10
amazon-textract-response-parser: 0.1.48
amazon-textract-textractor: 1.9.2

document_anonymise_1.txt