Skip to content

Performance degradation with wr.catalog.get_partitions on Python 3.12 in AWS Lambda #3245

@javolando

Description

@javolando

Describe the bug

We’ve encountered a significant performance issue when using wr.catalog.get_partitions(database=database_name, table=table_name) in AWS Lambda with Python version 3.12.
With Python 3.9 and 3.10 (using the corresponding awswrangler layers), the function executes in approximately 3.5 seconds.
While in Python 3.12, the first iteration takes around 200 seconds

How to Reproduce

Change lambda from version 3.9 to 3.12
Delete layer 3.9 (arn:aws:lambda:eu-central-1:336392948345:layer:AWSDataWrangler-Python39:1)
Add layer for 3.12 (arn:aws:lambda:eu-central-1:336392948345:layer:AWSSDKPandas-Python312:19)

Expected behavior

The expected behavior is that wr.catalog.get_partitions(database=..., table=...) should execute consistently and efficiently across supported Python versions in AWS Lambda. Specifically, it should return partition data within a few seconds (e.g., ~3.5s as observed in Python 3.9,3.10), without significant delays or performance degradation.

Your project

No response

Screenshots

No response

OS

Windows

Python version

3.12

AWS SDK for pandas version

arn:aws:lambda:eu-central-1:336392948345:layer:AWSSDKPandas-Python312:19

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions