High-accuracy PDF-to-Markdown OCR API using LLMs with vision capabilities. Features parallel processing, batching, and auto-retry logic for scalable extraction.
-
Updated
Nov 29, 2025 - Python
High-accuracy PDF-to-Markdown OCR API using LLMs with vision capabilities. Features parallel processing, batching, and auto-retry logic for scalable extraction.
The ultimate sketch to code app made using GPT4o serving 30k+ users. Choose your desired framework (React, Next, React Native, Flutter) for your app. It will instantly generate code and preview (sandbox) from a simple hand drawn sketch on paper captured from webcam
Early Alpha Release: Chat with Your Image - Leveraging GPT-4 Vision and Function Calls for AI-Powered Image Analysis and Description
ParkingGPT is a cross-platform app that enables you to decide whether you want to park or not, all using the power of multimodal and multilanguage Vision AI and LLM.
Vision-Assisted Camera Orientation
Generate LEGO like looking images with gpt4-vision and DALL·E 3.
🤯 Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / Qwen / DeepSeek), Knowledge Base (file upload / knowledge management / RAG ), Multi-Modals (Vision/TTS/Plugins/Artifacts). One-click FREE deployment of your private ChatGPT/ Claude application
DRIFT.AI V2 - AI-Powered Contract Reconciliation Platform for Healthcare | Next.js 15 + TypeScript + GPT-4 Vision
AI-powered tool to blend 3–5 images into a single composite using GPT-4 Vision and DALL·E 3. Modular, scalable, and built with Flask.
Add a description, image, and links to the gpt4-vision topic page so that developers can more easily learn about it.
To associate your repository with the gpt4-vision topic, visit your repo's landing page and select "manage topics."