The Problem
Sami runs a content agency that produces 50+ short videos per week for clients across Instagram, TikTok, YouTube Shorts, and Twitter. Each video follows a template — background clip, text overlays, branded intro/outro, music, and platform-specific sizing. The team manually edits each one in Premiere Pro, which takes 30-45 minutes per video. That's 25-37 hours of editing per week for content that's structurally identical.
The real pain is platform variations. A single "Python Tip" video needs five exports: 9:16 for Reels, TikTok, and YouTube Shorts, 16:9 for Twitter, 1:1 for Instagram posts. Each platform has different safe zones where UI elements cover the content — TikTok's bottom bar is 400 pixels tall, and text placed there is unreadable. Multiply 50 content pieces by 5 platforms and you're looking at 250 renders per week.
Sami wants a Python pipeline that takes a JSON template and a CSV of content, then outputs ready-to-post videos in under a minute each.
The Solution
Using the ffmpeg skill with MoviePy, build a template-driven video renderer. JSON defines the layout (text layers, timing, animations), a brand kit provides client-specific assets (logo, intro, outro, fonts), and a CSV feeds content variations. The pipeline renders all combinations across all platform sizes using multiprocessing, then validates each output against platform upload limits.
Step-by-Step Walkthrough
Step 1: Platform Specs and Template System
Every platform has specific resolution requirements and safe zones — areas covered by UI elements where text shouldn't appear. These go into a central config:
# templates.py
PLATFORM_SIZES = {
"instagram_reel": (1080, 1920), # 9:16 vertical
"tiktok": (1080, 1920), # 9:16 vertical
"youtube_short": (1080, 1920), # 9:16 vertical
"twitter": (1280, 720), # 16:9 landscape
"instagram_post": (1080, 1080), # 1:1 square
}
# Safe zones: (top, bottom, left, right) in pixels
# Text placed in these areas gets covered by platform UI
SAFE_ZONES = {
"instagram_reel": (200, 350, 40, 40),
"tiktok": (150, 400, 40, 40), # Tallest bottom bar of any platform
"youtube_short": (150, 300, 40, 40),
"twitter": (0, 0, 0, 0), # Landscape — no overlays
"instagram_post": (0, 100, 0, 0),
}
Video templates are JSON files that define everything about a video's layout. Each text layer maps a field name to a CSV column, so the same template renders different content per row:
{
"name": "Quote Card",
"duration": 15,
"background": { "type": "color", "color": [15, 23, 42] },
"texts": [
{
"field": "headline", "default": "Did You Know?",
"size": 64, "color": "white", "font": "Arial-Bold",
"position": ["center", 0.3], "start": 0.5, "duration": 4,
"effect": "scale_in"
},
{
"field": "body", "size": 42, "color": "#94a3b8",
"position": ["center", 0.5], "start": 1.5, "duration": 10,
"effect": "fade_in", "wrap_width": 0.85
},
{
"field": "cta", "default": "Follow for more",
"size": 36, "color": "#38bdf8", "font": "Arial-Bold",
"position": ["center", 0.75], "start": 8, "duration": 7,
"effect": "slide_up"
}
],
"music": { "volume": 0.2, "duck_on_voiceover": 0.08, "fade_in": 1, "fade_out": 2 },
"watermark": { "position": "top_right", "opacity": 0.5, "height": 40, "margin": 20 }
}
Step 2: Text Animation Effects
Four reusable text effects, each taking a MoviePy TextClip and returning a modified clip. Templates reference them by name:
# effects.py
from moviepy import TextClip
def fade_in_text(text_clip, duration=0.5):
return text_clip.crossfadein(duration)
def slide_up_text(text_clip, distance=100, duration=0.5):
final_pos = text_clip.pos
final_y = final_pos[1] if isinstance(final_pos, tuple) else 0
def position_func(t):
progress = min(t / duration, 1.0)
progress = 1 - (1 - progress) ** 3 # Ease-out cubic
x = final_pos[0] if isinstance(final_pos, tuple) else "center"
return (x, final_y + distance * (1 - progress))
return text_clip.with_position(position_func)
def scale_in_text(text_clip, duration=0.5):
def resize_func(t):
if t < duration:
progress = 1 - (1 - t / duration) ** 3
return max(progress, 0.01)
return 1.0
return text_clip.resized(resize_func)
EFFECTS = { "fade_in": fade_in_text, "slide_up": slide_up_text, "scale_in": scale_in_text }
Step 3: The Core Rendering Engine
The renderer takes a template, brand kit, CSV data row, and target platform, then composites everything into a finished video. The key trick is converting relative positions (0.0-1.0) to absolute pixels while respecting safe zones:
# renderer.py
from moviepy import *
from effects import EFFECTS
from templates import PLATFORM_SIZES, SAFE_ZONES
def render_video(template, brand, data, platform, output_path, voiceover_path=None):
size = PLATFORM_SIZES[platform]
safe = SAFE_ZONES[platform]
duration = template["duration"]
# Background: video clip or solid color
bg_cfg = template["background"]
if bg_cfg["type"] == "video":
bg = VideoFileClip(bg_cfg["path"]).subclipped(0, duration).resized(size)
else:
bg = ColorClip(size=size, color=tuple(bg_cfg["color"]), duration=duration)
layers = [bg]
# Text layers — each mapped to a CSV column via the "field" key
for text_cfg in template.get("texts", []):
text = data.get(text_cfg["field"], text_cfg.get("default", ""))
if not text: continue
# Convert relative Y positions to absolute pixels within safe zones
pos = text_cfg["position"]
if isinstance(pos[1], float) and pos[1] <= 1.0:
usable_top, usable_bottom = safe[0], size[1] - safe[1]
y = usable_top + (usable_bottom - usable_top) * pos[1]
pos = ("center", int(y))
txt = TextClip(text=text, font_size=text_cfg.get("size", 48),
color=text_cfg.get("color", "white"),
method="caption", size=(int(size[0] * text_cfg.get("wrap_width", 0.9)), None))
txt = txt.with_duration(text_cfg.get("duration", duration)).with_start(text_cfg.get("start", 0)).with_position(pos)
effect_name = text_cfg.get("effect")
if effect_name and effect_name in EFFECTS:
txt = EFFECTS[effect_name](txt)
layers.append(txt)
# Watermark, audio mixing, intro/outro from brand kit
video = CompositeVideoClip(layers, size=size)
# Prepend branded intro, append outro, export H.264 MP4
clips = []
if brand.get("intro"): clips.append(VideoFileClip(brand["intro"]).resized(size))
clips.append(video)
if brand.get("outro"): clips.append(VideoFileClip(brand["outro"]).resized(size))
final = concatenate_videoclips(clips)
final.write_videofile(output_path, fps=30, codec="libx264", audio_codec="aac",
bitrate="8000k", preset="fast", threads=4, logger=None)
Step 4: Quality Checker
Every rendered video passes through a quality gate that validates duration, resolution, audio track presence, and file size against platform-specific limits:
# qc.py
PLATFORM_LIMITS = {
"instagram_reel": {"max_duration": 90, "max_size_mb": 250, "resolution": (1080, 1920)},
"tiktok": {"max_duration": 180, "max_size_mb": 287, "resolution": (1080, 1920)},
"youtube_short": {"max_duration": 60, "max_size_mb": 500, "resolution": (1080, 1920)},
"twitter": {"max_duration": 140, "max_size_mb": 512, "resolution": (1280, 720)},
"instagram_post": {"max_duration": 60, "max_size_mb": 250, "resolution": (1080, 1080)},
}
def check_video(path, platform):
issues = []
limits = PLATFORM_LIMITS[platform]
clip = VideoFileClip(path)
if clip.duration > limits["max_duration"]:
issues.append(f"Duration {clip.duration:.1f}s exceeds {limits['max_duration']}s")
if clip.size != list(limits["resolution"]):
issues.append(f"Resolution {clip.size} != {limits['resolution']}")
if clip.audio is None:
issues.append("No audio track — most platforms reject silent videos")
if os.path.getsize(path) / (1024*1024) > limits["max_size_mb"]:
issues.append(f"File too large for {platform}")
clip.close()
return issues if issues else ["PASS"]
Step 5: Batch Pipeline with Multiprocessing
The entry point reads a CSV where each row is a content variation, then renders every row across every platform using ProcessPoolExecutor for parallel rendering:
# pipeline.py
from concurrent.futures import ProcessPoolExecutor, as_completed
def batch_produce(template_path, brand_name, csv_path, output_dir, platforms=None, workers=4):
template = load_template(template_path)
brand = load_brand(brand_name)
platforms = platforms or list(PLATFORM_SIZES.keys())
with open(csv_path) as f:
rows = list(csv.DictReader(f))
total = len(rows) * len(platforms)
print(f"Batch: {len(rows)} variations x {len(platforms)} platforms = {total} videos")
jobs = []
for i, row in enumerate(rows):
for platform in platforms:
name = row.get("name", f"video_{i+1}").replace(" ", "_")
jobs.append((template, brand, row, platform, f"{output_dir}/{platform}/{name}.mp4"))
with ProcessPoolExecutor(max_workers=workers) as pool:
futures = {pool.submit(render_and_check, job): job for job in jobs}
for future in as_completed(futures):
path, platform, issues = future.result()
status = "PASS" if issues == ["PASS"] else "FAIL"
print(f" [{status}] {platform}: {Path(path).name}")
Feed it a CSV like this:
name,headline,body,cta
python-tip-1,Python Tip #1,"The walrus operator := lets you assign and test in one expression.",Follow for more
python-tip-2,Python Tip #2,"Use match/case for structural pattern matching. Cleaner than if/elif chains.",Save this post
python-tip-3,Python Tip #3,"f-strings support = for self-documenting prints: f'{x=}' outputs 'x=42'.",Share with a dev friend
python pipeline.py batches/facts.csv
# 3 variations x 5 platforms = 15 videos rendered and validated
Real-World Example
Sami runs the pipeline for the first time on a Monday with 12 content pieces for a client's "React Tips" series. The CSV has 12 rows, the template is a dark-background quote card, and the brand kit includes the client's logo, intro bumper, and signature blue accent color.
The pipeline fires up 4 workers and starts chewing through 60 videos (12 tips across 5 platforms). Each render takes about 40 seconds. In 12 minutes, all 60 videos are done and validated — 58 pass QC on the first run. Two TikTok exports fail because the body text is too long and overlaps the safe zone. A quick edit to wrap width in the template, a re-render of just those two, and everything passes.
What used to take the team 6-9 hours of manual Premiere editing per client per week now takes 12 minutes of compute time and 5 minutes of CSV preparation. Sami scales from 3 clients to 11 without hiring another editor. The agency's video output goes from 50 to 200+ per week, and the team spends their time on creative work — writing better copy, testing new templates — instead of dragging text boxes around in a timeline.