-
Notifications
You must be signed in to change notification settings - Fork 165
Description
When parsing receipts using the textractor Python library, the output for INVOICE_RECEIPT_DATE does not match what is shown in the AWS Textract console.
Context:
Input: Receipt image (img1.jpg)
AWS Textract Console (Analyze Expense):
Detected two INVOICE_RECEIPT_DATE values:
- 09/02/2025
- 08/13/2018
Textractor library output:
Only returns a single INVOICE_RECEIPT_DATE: 08/13/2018
From terminal python print:

Expected Behavior:
Textractor should return all detected normalized field values for INVOICE_RECEIPT_DATE, not just one.
Actual Behavior:
Only the single date value is returned.
Steps to Reproduce:
- Run Textract AnalyzeExpense on img1.jpg through AWS Console → confirm two values are returned.
- Run the same image through the textractor library.
- Observe that only one
INVOICE_RECEIPT_DATEis returned.
Environment:
textractor version: 1.9.2
Python version: 3.9.6
Additional Notes:
It seems the library might only be returning the first detected value for certain fields. Would it be possible to expose all values that Textract detects, consistent with the console output?
