Skip to content
@OpenDCAI

OpenDCAI

Define the future of Data-centric AI together

OpenDCAI

We are dedicated to advancing research and open-source tools in Data-Centric Artificial Intelligence (DCAI).

Our goal is to develop effective and efficient DCAI systems and algorithms that support and enhance the performance of AI models and applications.

Newly Released Works

🔥 2025/6/29 Our DCAI system DataFlow is released! Link

Pinned Loading

  1. DataFlow DataFlow Public

    Easy Data Preparation with latest LLMs-based Operators and Pipelines.

    Python 1.5k 108

  2. MyScaleDB MyScaleDB Public

    Forked from OriginHubAI/MyScaleDB

    AI Database for unified, scalable SQL + vector data management, search and analytics

    C++ 38 1

  3. DataFlex DataFlex Public

    DataFlex is a data-centric training framework that enhances model performance by either selecting the most influential samples, optimizing their weights, or adjusting their mixing ratios.

    Python 33 9

Repositories

Showing 10 of 22 repositories
  • OpenDCAI/DataFlow-Agent’s past year of commit activity
    Python 106 Apache-2.0 13 0 1 Updated Dec 1, 2025
  • OpenDCAI/DataFlow-WebUI’s past year of commit activity
    Python 7 7 0 1 Updated Dec 1, 2025
  • DataFlow-Doc Public

    Documentation for DataFlow, Data-centric AI system for LLM.

    OpenDCAI/DataFlow-Doc’s past year of commit activity
    Python 9 25 4 1 Updated Nov 28, 2025
  • DataFlow Public

    Easy Data Preparation with latest LLMs-based Operators and Pipelines.

    OpenDCAI/DataFlow’s past year of commit activity
    Python 1,521 Apache-2.0 108 9 1 Updated Nov 28, 2025
  • DataFlex-Doc Public

    DataFlex is a data-centric training framework that enhances model performance by either selecting the most influential samples, optimizing their weights, or adjusting their mixing ratios.

    OpenDCAI/DataFlex-Doc’s past year of commit activity
    Python 2 7 0 0 Updated Nov 27, 2025
  • DataFlex Public

    DataFlex is a data-centric training framework that enhances model performance by either selecting the most influential samples, optimizing their weights, or adjusting their mixing ratios.

    OpenDCAI/DataFlex’s past year of commit activity
    Python 33 9 0 1 Updated Nov 26, 2025
  • Text2VectorSQL Public

    Official implementation of Text2VectorSQL: Towards a Unified Interface for Vector Search and SQL Queries

    OpenDCAI/Text2VectorSQL’s past year of commit activity
    Python 48 8 2 0 Updated Nov 25, 2025
  • DataFlow-MM Public

    Dataflow-MM, multi-media operators for Dataflow. We aim to prepare data for Multimodal Large Language Models.

    OpenDCAI/DataFlow-MM’s past year of commit activity
    Python 15 Apache-2.0 13 1 2 Updated Nov 24, 2025
  • DataFlow-MM-Doc Public

    Documentation for DataFlow-MM

    OpenDCAI/DataFlow-MM-Doc’s past year of commit activity
    Python 2 5 0 2 Updated Nov 24, 2025
  • SciAgent Public

    SciAgent: A Unified Multi-Agent System for Generalistic Scientific Reasoning

    OpenDCAI/SciAgent’s past year of commit activity
    89 Apache-2.0 11 1 0 Updated Nov 16, 2025