Overview

Builds automated data migration pipelines between databases. Handles schema analysis and mapping, type conversions, data transformations, dependency-ordered table loading, batch processing for large datasets, checkpoint/resume for reliability, post-migration validation, and cutover planning. Produces repeatable scripts that can be dry-run against staging before production.

Instructions

1. Schema Analysis

Start every migration by analyzing source and target:

For each table in source:
  - Column names, types, nullability, defaults
  - Primary keys and auto-increment sequences
  - Foreign key relationships (build dependency graph)
  - Indexes and unique constraints
  - Row count estimate (for batch sizing)
  - Encoding/collation (especially for MySQL → PostgreSQL)

Generate a schema map document listing every column with its source type, target type, and any transformation needed.

2. Type Mapping

Common cross-database type conversions:

MySQL	PostgreSQL	Notes
TINYINT(1)	BOOLEAN	Map 0/1 to false/true
ENUM('a','b')	VARCHAR + CHECK	Or create custom TYPE
DATETIME	TIMESTAMPTZ	Add timezone info
INT AUTO_INCREMENT	SERIAL	Reset sequence after migration
DOUBLE	DOUBLE PRECISION	Direct mapping
BLOB	BYTEA	Binary data
TEXT (latin1)	TEXT (UTF-8)	Re-encode characters
JSON	JSONB	Use binary JSON in PG

3. Dependency Resolution

Build a directed acyclic graph from foreign keys:

1. Parse all FK constraints → build adjacency list
2. Topological sort → migration order
3. Circular dependencies: temporarily drop FK, migrate, re-add FK
4. Self-referencing tables: migrate in two passes (data, then self-FK updates)

4. Batch Processing

For tables with more than 10,000 rows:

function migrateLargeTable(table, batchSize = 5000):
  lastId = loadCheckpoint(table) or 0
  while true:
    rows = SELECT * FROM source.table WHERE id > lastId ORDER BY id LIMIT batchSize
    if rows.empty: break
    transformed = rows.map(row => transform(row, table.mapping))
    INSERT INTO target.table VALUES transformed
    lastId = rows.last.id
    saveCheckpoint(table, lastId, totalMigrated)

Performance targets:

5,000 rows/batch for most tables
1,000 rows/batch for tables with BLOB/TEXT columns
Disable target indexes during bulk load, rebuild after

5. Validation

Post-migration validation checklist:

1. Row counts: source vs target for every table
2. Random sampling: 100 random rows per table, field-by-field comparison
3. Aggregate checks: SUM, COUNT, MIN, MAX on numeric columns
4. Referential integrity: all FKs resolve (no orphans)
5. Encoding: sample text fields for valid UTF-8
6. Sequences: verify auto-increment/serial values set above max ID
7. Nullability: no unexpected NULLs in NOT NULL target columns

6. Cutover Planning

Three strategies by downtime tolerance:

Full downtime (simplest): Stop app → migrate → validate → start app. For small datasets (< 1M rows, < 1 hour).

Minimal downtime (recommended): Pre-migrate bulk data → set up change capture → maintenance mode → apply delta → switch → validate. Downtime: 2-10 minutes.

Zero downtime (complex): Dual-write to both databases → background migration → gradual read traffic shift → drop old writes. Requires application changes.

Examples

Example 1: MySQL to PostgreSQL

Prompt: "Migrate our MySQL 5.7 database to PostgreSQL 16. 30 tables, biggest is 5M rows."

Output: Schema mapping JSON, type conversion DDL, migration script with dependency ordering, batch processing for large tables, checkpoint file, validation suite, and cutover runbook.

Example 2: Database Consolidation

Prompt: "Merge two SQLite databases into one PostgreSQL. Some tables overlap with different schemas."

Output: Schema diff report, merge strategy document (which columns win conflicts), deduplication logic using configurable match keys, migration script, and conflict resolution log.

Guidelines

Always dry-run on staging first — never run migration directly against production
Keep source untouched — migration should be read-only on source until cutover
Checkpoint everything — large migrations will fail; resumability is required
Validate before cutover — automated validation catches what manual spot-checks miss
Plan rollback — if target validation fails, have a documented path back to source
Log extensively — rows processed, rows skipped, transformation errors, timing
Reset sequences — after migration, set serial/auto-increment above max migrated ID
Test with production volume — a script that works on 1000 rows may OOM on 5M

data-migration

Usage

Getting Started

Example Prompts

Information

Use Cases

Documentation