Skip to content

Conversation

@xingyaoww
Copy link
Collaborator

@xingyaoww xingyaoww commented Dec 4, 2025

Description

This PR fixes issue #1325 where find_dotenv breaks local conversation with an AssertionError.

Problem

The get_env() function in openhands/sdk/observability/utils.py calls dotenv_values() without any arguments. When dotenv_values() is called without arguments, it internally calls find_dotenv() to locate the .env file. The find_dotenv() function tries to walk up the call stack to find the directory of the calling file, but in certain execution contexts (like when running in local conversation or when the call stack is unusual), the assertion assert frame.f_back is not None can fail because the frame stack is exhausted.

Solution

The fix catches AssertionError and OSError from dotenv_values() and returns None gracefully. This allows the code to continue execution without breaking. The implementation:

  1. First checks environment variables (priority)
  2. Only attempts to load from dotenv if environment variable is not set
  3. Gracefully handles errors when dotenv file cannot be found or loaded
  4. Returns None if value cannot be retrieved from either source

Changes

  • Modified: openhands-sdk/openhands/sdk/observability/utils.py

    • Updated get_env() function to handle AssertionError and OSError from dotenv_values()
    • Added explicit check for environment variables before attempting dotenv lookup
  • Added: Tests for the fix

    • tests/sdk/observability/test_utils.py - Tests for get_env() function including error handling
    • tests/sdk/observability/test_laminar.py - Tests for should_enable_observability() with dotenv errors

Testing

All tests pass:

uv run pytest tests/sdk/observability/ -v

The fix has been validated to:

  • Return environment variables when set
  • Return values from dotenv when available and env var not set
  • Return None gracefully when dotenv fails with AssertionError
  • Return None gracefully when dotenv fails with OSError
  • Not break any existing functionality

Fixes #1325

@xingyaoww can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:3d8c0fc-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-3d8c0fc-python \
  ghcr.io/openhands/agent-server:3d8c0fc-python

All tags pushed for this build

ghcr.io/openhands/agent-server:3d8c0fc-golang-amd64
ghcr.io/openhands/agent-server:3d8c0fc-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:3d8c0fc-golang-arm64
ghcr.io/openhands/agent-server:3d8c0fc-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:3d8c0fc-java-amd64
ghcr.io/openhands/agent-server:3d8c0fc-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:3d8c0fc-java-arm64
ghcr.io/openhands/agent-server:3d8c0fc-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:3d8c0fc-python-amd64
ghcr.io/openhands/agent-server:3d8c0fc-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:3d8c0fc-python-arm64
ghcr.io/openhands/agent-server:3d8c0fc-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:3d8c0fc-golang
ghcr.io/openhands/agent-server:3d8c0fc-java
ghcr.io/openhands/agent-server:3d8c0fc-python

About Multi-Architecture Support

  • Each variant tag (e.g., 3d8c0fc-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 3d8c0fc-python-amd64) are also available if needed

Handle AssertionError and OSError from dotenv_values() gracefully when
find_dotenv() fails to walk the call stack. This fixes the issue where
local conversation would crash with AssertionError when dotenv tries
to locate a .env file in contexts with shallow or unusual call stacks.

The fix catches these exceptions and returns None, allowing the code
to continue execution without breaking. Environment variables are still
prioritized and checked first before attempting to load from dotenv.

Fixes #1325

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Dec 4, 2025

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/observability
   utils.py18950%12, 15–16, 20–22, 24–26
TOTAL12408563854% 

return env_value
try:
return dotenv_values().get(key)
except (AssertionError, OSError):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there an AssertionError in the first place, though?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea 🤷 it just happened randomly in the middle of my CLI session

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could load dotenv at the beginning of the execution path (not in observability utils) and then we shouldn't get any such error? @OpenHands WDYT? Look at the original issue #1325 and rethink / reinvestigate why the problem occurred in the first place.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on it! enyst can track my progress at all-hands.dev

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of findings and recommendation

  • Root cause: The AssertionError came from python-dotenv’s find_dotenv() when called indirectly via dotenv_values() without arguments. find_dotenv() walks Python stack frames (using frame.f_back). In our local conversation path, heavy decorator/wrapper usage can exhaust the frame chain, making the assertion frame.f_back is not None fail.

  • Answer to “Should we load dotenv at the beginning of the execution path?”: Yes. Loading dotenv once at process startup (outside observability utils) avoids the fragile stack-walk entirely. After that, get_env can simply read from os.environ, and we don’t rely on dotenv_values()/find_dotenv() mid-execution.

  • Practical approach:

    1. Keep the current defensive handling in get_env() (catch AssertionError/OSError) for robustness and backward compatibility.
    2. At application entry points (CLI, server), call load_dotenv(usecwd=True, override=False) early to populate os.environ deterministically without stack inspection.
    3. Long-term, consider simplifying get_env() to only read os.environ and leave .env loading to the host application. If we keep a dotenv fallback, prefer using usecwd=True or an explicit path rather than relying on call-stack introspection.

Bottom line: Loading dotenv at startup is the right architectural fix and would have prevented the AssertionError; the existing try/except in get_env is a good safety net.

View full conversation

Copy link
Collaborator

@enyst enyst Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xingyaoww How about this? So maybe it shouldn't be necessary to find_dotenv here at all?
OpenHands/OpenHands-CLI#162

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

find_dotenv breaks local conversation

4 participants