Skip to content

An AI-powered data science team of agents to help you perform common data science tasks 10X faster.

License

Notifications You must be signed in to change notification settings

potevera/ai-data-science-team

 
 

Repository files navigation

AI Data Science Team + AI Pipeline Studio
PyPI versions license GitHub Repo stars

AI Data Science Team

AI Data Science Team is a Python library of specialized agents for common data science workflows, plus a flagship app: AI Pipeline Studio. The Studio turns your work into a visual, reproducible pipeline, while the AI team handles data loading, cleaning, visualization, and modeling.

Status: Beta. Breaking changes may occur until 0.1.0.

Please ⭐ us on GitHub (it takes 2 seconds and means a lot).

AI Pipeline Studio (Flagship App)

AI Pipeline Studio is the main example of the AI Data Science Team in action.

AI Pipeline Studio

Highlights:

  • Pipeline-first workspace: Visual Editor, Table, Chart, EDA, Code, Model, Predictions, MLflow
  • Manual + AI steps with lineage and reproducible scripts
  • Multi-dataset handling and merge workflows
  • Project saves: metadata-only or full-data
  • Storage footprint controls and rehydrate workflows

Run it:

streamlit run apps/ai-pipeline-studio-app/app.py

Full app docs: apps/ai-pipeline-studio-app/README.md

Quickstart

Requirements

  • Python 3.10+
  • OpenAI API key (or Ollama for local models)

Install the app and library

Clone the repo and install in editable mode:

pip install -e .

Run the AI Pipeline Studio app

streamlit run apps/ai-pipeline-studio-app/app.py

Library Overview

The repository includes both the AI Pipeline Studio app and the underlying AI Data Science Team library. The library provides agent building blocks and multi-agent workflows for:

  • Data loading and inspection
  • Cleaning, wrangling, and feature engineering
  • Visualization and EDA
  • Modeling and evaluation (H2O + MLflow tools)
  • SQL database interaction

Agents (Snapshot)

Agent examples live in examples/. Notable agents:

  • Data Loader Tools Agent
  • Data Wrangling Agent
  • Data Cleaning Agent
  • Data Visualization Agent
  • EDA Tools Agent
  • Feature Engineering Agent
  • SQL Database Agent
  • H2O ML Agent
  • MLflow Tools Agent
  • Multi-agent workflows (e.g., Pandas Data Analyst, SQL Data Analyst)
  • Supervisor Agent (oversees other agents)
  • Custom tools for data science tasks

Apps

See all apps in apps/. Notable apps:

  • AI Pipeline Studio: apps/ai-pipeline-studio-app/
  • EDA Explorer App: apps/exploratory-copilot-app/
  • Pandas Data Analyst App: apps/pandas-data-analyst-app/

Use OpenAI

from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
    model_name="gpt-4.1-mini",
)

Use Ollama (Local LLM)

ollama serve
ollama pull llama3.1:8b
from langchain_ollama import ChatOllama

llm = ChatOllama(
    model="llama3.1:8b",
)

Next-Gen AI Agentic Workshop

Want to learn how to build AI agents and AI apps for real data science workflows? Join my next‑gen AI workshop: https://learn.business-science.io/ai-register

About

An AI-powered data science team of agents to help you perform common data science tasks 10X faster.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%