-
-
Notifications
You must be signed in to change notification settings - Fork 5k
Add document ingest support for Ragflow #17483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
| "file": (filename, file_content, content_type or "application/octet-stream") | ||
| } | ||
|
|
||
| verbose_logger.debug(f"Uploading document to RAGFlow: {url}") |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
To fix the problem, we should avoid logging the full url variable if it may derive from secret or sensitive sources.
- General approach: Remove or redact logging of sensitive connection endpoints such as
api_base. Instead, log only non-sensitive information (e.g., action description, or redacted endpoint info). - Specific fix: In litellm/rag/ingestion/ragflow_ingestion.py, replace
verbose_logger.debug(f"Uploading document to RAGFlow: {url}")
with a log that excludesurl, or, if needed, only log static or whitelisted info (e.g., dataset ID, filename). - Ideally, only log safe, non-secret identifying metadata for debugging (like
dataset_id,filename), or replace the log with a generic message ("Uploading document to RAGFlow").
Only the log line(s) on line 178 in the _upload_document method needs updating; ensure code functionality remains unchanged.
-
Copy modified line R178
| @@ -175,7 +175,7 @@ | ||
| "file": (filename, file_content, content_type or "application/octet-stream") | ||
| } | ||
|
|
||
| verbose_logger.debug(f"Uploading document to RAGFlow: {url}") | ||
| verbose_logger.debug("Uploading document to RAGFlow") | ||
|
|
||
| client = get_async_httpx_client( | ||
| llm_provider=httpxSpecialProvider.RAG, |
| if not request_body: | ||
| return # Nothing to update | ||
|
|
||
| verbose_logger.debug(f"Updating document configuration: {url}") |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
The fix is to avoid logging secret-derived values. Specifically, do not log the full constructed URL if it may contain secrets or sensitive values.
- In
RAGFlowRAGIngestion._update_document_config, replace the debug logverbose_logger.debug(f"Updating document configuration: {url}")with a message that omits the sensitive/interpolated parts, or logs only safe, non-secret portions (e.g., just the document ID or a generic statement). - You may log a generic string such as
"Updating document configuration"or other non-sensitive context. - No new methods or imports are required beyond possibly changing what is logged at the highlighted location.
- Ensure this change is applied only to the relevant logger invocation where the taint source flows into the sink.
-
Copy modified line R257
| @@ -254,7 +254,7 @@ | ||
| if not request_body: | ||
| return # Nothing to update | ||
|
|
||
| verbose_logger.debug(f"Updating document configuration: {url}") | ||
| verbose_logger.debug("Updating document configuration.") | ||
|
|
||
| client = get_async_httpx_client( | ||
| llm_provider=httpxSpecialProvider.RAG, |
| "document_ids": document_ids, | ||
| } | ||
|
|
||
| verbose_logger.debug(f"Triggering parsing for documents: {url}") |
Check failure
Code scanning / CodeQL
Clear-text logging of sensitive information High
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
This expression logs
sensitive data (secret)
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
To address this, we want to prevent accidental logging of potentially sensitive information such as the full URL derived from secrets or private environment configuration. Instead of logging the full URL, we can log only non-sensitive, high-level information (e.g., that parsing is being triggered, optionally with the dataset or document IDs if not sensitive), or redact/sanitize values before logging.
Recommended fix:
- In
litellm/rag/ingestion/ragflow_ingestion.py, within_trigger_parsing, update the log statement to omit or redact the URL, and only state that parsing was triggered, with possibly non-sensitive info (e.g., dataset_id, number of documents). - Do not log the value of variables that may contain secrets, such as
api_base,url, or other secrets.
Implementation:
- Replace
verbose_logger.debug(f"Triggering parsing for documents: {url}")with a statement likeverbose_logger.debug(f"Triggering parsing for documents in dataset '{dataset_id}' (count: {len(document_ids)})"). - Ensure that no secrets are logged in the message.
- No new dependencies are required.
-
Copy modified line R316
| @@ -313,7 +313,7 @@ | ||
| "document_ids": document_ids, | ||
| } | ||
|
|
||
| verbose_logger.debug(f"Triggering parsing for documents: {url}") | ||
| verbose_logger.debug(f"Triggering parsing for documents in dataset '{dataset_id}' (count: {len(document_ids)})") | ||
|
|
||
| client = get_async_httpx_client( | ||
| llm_provider=httpxSpecialProvider.RAG, |
|
@metalshanked is this what you wanted? any help qa'ing this would be appreciated |
|
Thanks @Sameerlite and @krrishdholakia. This is awesome with Chat, Agent w/ Dynamic ids and Dataset management. Would it be possible to add the actual Search/Retrieval as well. I assume many users would need a basic Search/Retrieval capability that Ragflow exposes like a basic vector store. I mean this endpoint --> https://ragflow.io/docs/dev/http_api_reference#retrieve-chunks Also, this should show up as a vector store in the litellm UI so that usual vector db permissions can be applied. Thanks |
Title
Add document ingest support for Ragflow
Relevant issues
FIxes #17112
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unitType
🆕 New Feature
Changes