Skip to content

Conversation

@Sameerlite
Copy link
Collaborator

@Sameerlite Sameerlite commented Dec 4, 2025

Title

Add document ingest support for Ragflow

Relevant issues

FIxes #17112

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • I have added a screenshot of my new test passing locally
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🆕 New Feature

Changes

  • Added support for providing dynamic chat id/ agent id
  • Added support for ingesting doucments of ragflow via rag/ingest API

@vercel
Copy link

vercel bot commented Dec 4, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
litellm Ready Ready Preview Comment Dec 4, 2025 3:06pm

"file": (filename, file_content, content_type or "application/octet-stream")
}

verbose_logger.debug(f"Uploading document to RAGFlow: {url}")

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.

Copilot Autofix

AI 4 days ago

To fix the problem, we should avoid logging the full url variable if it may derive from secret or sensitive sources.

  • General approach: Remove or redact logging of sensitive connection endpoints such as api_base. Instead, log only non-sensitive information (e.g., action description, or redacted endpoint info).
  • Specific fix: In litellm/rag/ingestion/ragflow_ingestion.py, replace
    verbose_logger.debug(f"Uploading document to RAGFlow: {url}")
    with a log that excludes url, or, if needed, only log static or whitelisted info (e.g., dataset ID, filename).
  • Ideally, only log safe, non-secret identifying metadata for debugging (like dataset_id, filename), or replace the log with a generic message ("Uploading document to RAGFlow").

Only the log line(s) on line 178 in the _upload_document method needs updating; ensure code functionality remains unchanged.


Suggested changeset 1
litellm/rag/ingestion/ragflow_ingestion.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/litellm/rag/ingestion/ragflow_ingestion.py b/litellm/rag/ingestion/ragflow_ingestion.py
--- a/litellm/rag/ingestion/ragflow_ingestion.py
+++ b/litellm/rag/ingestion/ragflow_ingestion.py
@@ -175,7 +175,7 @@
             "file": (filename, file_content, content_type or "application/octet-stream")
         }
 
-        verbose_logger.debug(f"Uploading document to RAGFlow: {url}")
+        verbose_logger.debug("Uploading document to RAGFlow")
 
         client = get_async_httpx_client(
             llm_provider=httpxSpecialProvider.RAG,
EOF
@@ -175,7 +175,7 @@
"file": (filename, file_content, content_type or "application/octet-stream")
}

verbose_logger.debug(f"Uploading document to RAGFlow: {url}")
verbose_logger.debug("Uploading document to RAGFlow")

client = get_async_httpx_client(
llm_provider=httpxSpecialProvider.RAG,
Copilot is powered by AI and may make mistakes. Always verify output.
if not request_body:
return # Nothing to update

verbose_logger.debug(f"Updating document configuration: {url}")

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.

Copilot Autofix

AI 4 days ago

The fix is to avoid logging secret-derived values. Specifically, do not log the full constructed URL if it may contain secrets or sensitive values.

  • In RAGFlowRAGIngestion._update_document_config, replace the debug log verbose_logger.debug(f"Updating document configuration: {url}") with a message that omits the sensitive/interpolated parts, or logs only safe, non-secret portions (e.g., just the document ID or a generic statement).
  • You may log a generic string such as "Updating document configuration" or other non-sensitive context.
  • No new methods or imports are required beyond possibly changing what is logged at the highlighted location.
  • Ensure this change is applied only to the relevant logger invocation where the taint source flows into the sink.
Suggested changeset 1
litellm/rag/ingestion/ragflow_ingestion.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/litellm/rag/ingestion/ragflow_ingestion.py b/litellm/rag/ingestion/ragflow_ingestion.py
--- a/litellm/rag/ingestion/ragflow_ingestion.py
+++ b/litellm/rag/ingestion/ragflow_ingestion.py
@@ -254,7 +254,7 @@
         if not request_body:
             return  # Nothing to update
 
-        verbose_logger.debug(f"Updating document configuration: {url}")
+        verbose_logger.debug("Updating document configuration.")
 
         client = get_async_httpx_client(
             llm_provider=httpxSpecialProvider.RAG,
EOF
@@ -254,7 +254,7 @@
if not request_body:
return # Nothing to update

verbose_logger.debug(f"Updating document configuration: {url}")
verbose_logger.debug("Updating document configuration.")

client = get_async_httpx_client(
llm_provider=httpxSpecialProvider.RAG,
Copilot is powered by AI and may make mistakes. Always verify output.
"document_ids": document_ids,
}

verbose_logger.debug(f"Triggering parsing for documents: {url}")

Check failure

Code scanning / CodeQL

Clear-text logging of sensitive information High

This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.
This expression logs
sensitive data (secret)
as clear text.

Copilot Autofix

AI 4 days ago

To address this, we want to prevent accidental logging of potentially sensitive information such as the full URL derived from secrets or private environment configuration. Instead of logging the full URL, we can log only non-sensitive, high-level information (e.g., that parsing is being triggered, optionally with the dataset or document IDs if not sensitive), or redact/sanitize values before logging.

Recommended fix:

  • In litellm/rag/ingestion/ragflow_ingestion.py, within _trigger_parsing, update the log statement to omit or redact the URL, and only state that parsing was triggered, with possibly non-sensitive info (e.g., dataset_id, number of documents).
  • Do not log the value of variables that may contain secrets, such as api_base, url, or other secrets.

Implementation:

  • Replace verbose_logger.debug(f"Triggering parsing for documents: {url}") with a statement like verbose_logger.debug(f"Triggering parsing for documents in dataset '{dataset_id}' (count: {len(document_ids)})").
  • Ensure that no secrets are logged in the message.
  • No new dependencies are required.

Suggested changeset 1
litellm/rag/ingestion/ragflow_ingestion.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/litellm/rag/ingestion/ragflow_ingestion.py b/litellm/rag/ingestion/ragflow_ingestion.py
--- a/litellm/rag/ingestion/ragflow_ingestion.py
+++ b/litellm/rag/ingestion/ragflow_ingestion.py
@@ -313,7 +313,7 @@
             "document_ids": document_ids,
         }
 
-        verbose_logger.debug(f"Triggering parsing for documents: {url}")
+        verbose_logger.debug(f"Triggering parsing for documents in dataset '{dataset_id}' (count: {len(document_ids)})")
 
         client = get_async_httpx_client(
             llm_provider=httpxSpecialProvider.RAG,
EOF
@@ -313,7 +313,7 @@
"document_ids": document_ids,
}

verbose_logger.debug(f"Triggering parsing for documents: {url}")
verbose_logger.debug(f"Triggering parsing for documents in dataset '{dataset_id}' (count: {len(document_ids)})")

client = get_async_httpx_client(
llm_provider=httpxSpecialProvider.RAG,
Copilot is powered by AI and may make mistakes. Always verify output.
@krrishdholakia
Copy link
Contributor

@metalshanked is this what you wanted?

any help qa'ing this would be appreciated

@metalshanked
Copy link

Thanks @Sameerlite and @krrishdholakia. This is awesome with Chat, Agent w/ Dynamic ids and Dataset management.

Would it be possible to add the actual Search/Retrieval as well. I assume many users would need a basic Search/Retrieval capability that Ragflow exposes like a basic vector store.
Example:- I already have a separate non ragflow Chat or Agent app and want to query the Ragflow vector store (Ragflow uses Inifini or Elasticsearch behind the scenes) to retrieve the relevant chunks.

I mean this endpoint --> https://ragflow.io/docs/dev/http_api_reference#retrieve-chunks

Also, this should show up as a vector store in the litellm UI so that usual vector db permissions can be applied.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Support Ragflow as a vector store / chat completion

4 participants