Skip to content

Textractor returns only one INVOICE_RECEIPT_DATE instead of multiple values #438

@arsher-b

Description

@arsher-b

When parsing receipts using the textractor Python library, the output for INVOICE_RECEIPT_DATE does not match what is shown in the AWS Textract console.

Context:
Input: Receipt image (img1.jpg)

AWS Textract Console (Analyze Expense):
Detected two INVOICE_RECEIPT_DATE values:

  • 09/02/2025
Image
  • 08/13/2018
Image

Textractor library output:
Only returns a single INVOICE_RECEIPT_DATE: 08/13/2018
From terminal python print:
Image

Expected Behavior:
Textractor should return all detected normalized field values for INVOICE_RECEIPT_DATE, not just one.

Actual Behavior:
Only the single date value is returned.

Steps to Reproduce:

  1. Run Textract AnalyzeExpense on img1.jpg through AWS Console → confirm two values are returned.
  2. Run the same image through the textractor library.
  3. Observe that only one INVOICE_RECEIPT_DATE is returned.

Environment:
textractor version: 1.9.2
Python version: 3.9.6

Additional Notes:
It seems the library might only be returning the first detected value for certain fields. Would it be possible to expose all values that Textract detects, consistent with the console output?

Image used (img1.jpg):
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions