ibis
Expert guidance for Ibis, the Python dataframe library that provides a pandas-like API but generates SQL for execution on any backend — DuckDB, PostgreSQL, BigQuery, Snowflake, Spark, and more. Helps developers write analytics code once and run it anywhere without rewriting SQL for each database.
Usage
Getting Started
- Install the skill using the command above
- Open your AI coding agent (Claude Code, Codex, Gemini CLI, or Cursor)
- Reference the skill in your prompt
- The AI will use the skill's capabilities automatically
Example Prompts
- "Analyze the sales data in revenue.csv and identify trends"
- "Create a visualization comparing Q1 vs Q2 performance metrics"
Documentation
Overview
Ibis, the Python dataframe library that provides a pandas-like API but generates SQL for execution on any backend — DuckDB, PostgreSQL, BigQuery, Snowflake, Spark, and more. Helps developers write analytics code once and run it anywhere without rewriting SQL for each database.
Instructions
Basic Usage
# src/analytics.py — Portable analytics with Ibis
import ibis
from ibis import _ # Shorthand for column references
# Connect to a backend (DuckDB for local development)
con = ibis.duckdb.connect("analytics.duckdb")
# Or connect to production databases — same code, different backend
# con = ibis.postgres.connect(url="postgresql://...")
# con = ibis.bigquery.connect(project_id="my-project")
# con = ibis.snowflake.connect(...)
# Load data
orders = con.table("orders")
# Build a query — this is lazy (no execution until you call .execute())
monthly_revenue = (
orders
.filter(_.status == "completed")
.filter(_.created_at >= "2026-01-01")
.group_by(month=_.created_at.truncate("M"))
.agg(
revenue=_.amount.sum(),
order_count=_.count(),
unique_customers=_.customer_id.nunique(),
avg_order_value=_.amount.mean(),
)
.order_by(_.month)
)
# Execute and get a pandas DataFrame
df = monthly_revenue.execute()
print(df)
# Or see the generated SQL
print(ibis.to_sql(monthly_revenue))
Complex Transformations
# Window functions, joins, and case expressions
import ibis
from ibis import _
con = ibis.duckdb.connect("analytics.duckdb")
orders = con.table("orders")
customers = con.table("customers")
# Window functions — running totals and rankings
ranked = (
orders
.filter(_.status == "completed")
.group_by(_.customer_id)
.agg(
total_spent=_.amount.sum(),
order_count=_.count(),
first_order=_.created_at.min(),
last_order=_.created_at.max(),
)
.mutate(
# Rank customers by revenue
revenue_rank=ibis.rank().over(
order_by=ibis.desc(_.total_spent)
),
# Percentile
revenue_percentile=ibis.percent_rank().over(
order_by=_.total_spent
),
# Customer segment based on spending
segment=ibis.case()
.when(_.total_spent >= 1000, "whale")
.when(_.total_spent >= 100, "regular")
.else_("casual")
.end(),
)
)
# Joins
customer_analytics = (
ranked
.join(customers, _.customer_id == customers.id)
.select(
_.customer_id,
customers.name,
customers.email,
customers.plan,
_.total_spent,
_.order_count,
_.segment,
_.revenue_rank,
# Days since last order
days_inactive=(ibis.now() - _.last_order).cast("int32") // 86400,
)
)
# Cohort analysis
cohorts = (
orders
.filter(_.status == "completed")
.group_by(_.customer_id)
.mutate(
cohort_month=_.created_at.min().truncate("M"),
)
.mutate(
months_since=((_.created_at.truncate("M") - _.cohort_month)
.cast("int32") // (30 * 86400)),
)
.group_by(_.cohort_month, _.months_since)
.agg(
active_users=_.customer_id.nunique(),
revenue=_.amount.sum(),
)
)
Backend Portability
# The same analytics code runs on any backend
import ibis
def build_revenue_report(con: ibis.BaseBackend):
"""Build a revenue report — works on any Ibis backend.
Args:
con: Any Ibis connection (DuckDB, Postgres, BigQuery, etc.)
"""
orders = con.table("orders")
return (
orders
.filter(_.status == "completed")
.group_by(
month=_.created_at.truncate("M"),
category=_.category,
)
.agg(
revenue=_.amount.sum(),
orders=_.count(),
)
.order_by(_.month.desc())
)
# Development: DuckDB on local Parquet files
dev_con = ibis.duckdb.connect()
dev_con.read_parquet("data/orders.parquet", table_name="orders")
report = build_revenue_report(dev_con).execute()
# Production: BigQuery
prod_con = ibis.bigquery.connect(project_id="prod-project", dataset_id="analytics")
report = build_revenue_report(prod_con).execute()
# Testing: in-memory with DuckDB
test_con = ibis.duckdb.connect()
test_con.create_table("orders", test_data_df)
report = build_revenue_report(test_con).execute()
UDFs and Custom Functions
# Custom scalar and aggregate functions
import ibis
from ibis import udf
@udf.scalar.python
def normalize_email(email: str) -> str:
"""Normalize email addresses for deduplication."""
local, domain = email.lower().split("@")
# Remove dots and plus aliases from Gmail
if domain in ("gmail.com", "googlemail.com"):
local = local.split("+")[0].replace(".", "")
return f"{local}@{domain}"
# Use in queries
customers = con.table("customers")
deduped = (
customers
.mutate(clean_email=normalize_email(_.email))
.group_by(_.clean_email)
.agg(
count=_.count(),
first_seen=_.created_at.min(),
)
.filter(_.count > 1)
)
Installation
# Core library
pip install ibis-framework
# With specific backends
pip install "ibis-framework[duckdb]"
pip install "ibis-framework[postgres]"
pip install "ibis-framework[bigquery]"
pip install "ibis-framework[snowflake]"
pip install "ibis-framework[pyspark]"
# Interactive mode (for notebooks)
ibis.options.interactive = True # Auto-execute and display results
Examples
Example 1: Migrating a pandas pipeline to run on BigQuery
User request:
I have a pandas script that calculates cohort retention from our events table. Rewrite it using Ibis so it runs on BigQuery instead of loading everything into memory.
The agent rewrites the pandas code using Ibis expressions (ibis.bigquery.connect(), t.group_by(), _.mutate(), window functions with ibis.cumulative_window()), keeping the same logic but generating SQL that executes on BigQuery. The script goes from loading 50M rows into memory to pushing all computation to the warehouse.
Example 2: Building a portable analytics module with DuckDB for dev
User request:
Write an analytics module that computes daily active users and revenue per plan from Parquet files locally, but can switch to Snowflake in production.
The agent creates a module using ibis.duckdb.connect() for local development with Parquet files, writes composable Ibis expressions for DAU (t.select('user_id', 'event_date').distinct().group_by('event_date').count()) and revenue by plan, and adds a get_connection() function that switches to ibis.snowflake.connect() based on an environment variable — same analytics code, different backend.
Guidelines
- Write once, run anywhere — Define analytics logic with Ibis; swap backends by changing the connection, not the code
- Lazy by default — Ibis expressions are lazy; they only execute when you call
.execute()or.to_pandas() - DuckDB for development — Use DuckDB locally with Parquet files; switch to BigQuery/Snowflake for production
- Use
_for readability —from ibis import _gives you clean column references:_.amount.sum()vsorders.amount.sum() - Generate SQL for debugging — Use
ibis.to_sql(expr)to see the SQL being generated; helps debug unexpected results - Functions for reuse — Wrap analytics logic in functions that take a connection; test with DuckDB, deploy on any backend
- Interactive mode in notebooks — Set
ibis.options.interactive = Truefor immediate result display during exploration - Type your schemas — Use
ibis.schema()to define expected table schemas; catch type mismatches early
Information
- Version
- 1.0.0
- Author
- terminal-skills
- Category
- Data & AI
- License
- Apache-2.0