Build an Automated Content Localization Pipeline with AI

The Problem

A 15-person SaaS startup is expanding into three new markets — Germany, Brazil, and Japan. The app has over 800 translation keys spread across JSON locale files, marketing pages, and email templates. The current process: export strings to a spreadsheet, email it to freelance translators, wait 3-5 days for responses, then copy-paste translations back into the codebase. Keys get missed, context is lost ("button" could mean a UI button or a clothing button), and the Brazilian Portuguese translations read like they were written for Portugal — because the translator was Portuguese.

Every feature release means another round of translation delays. The German translations use informal "du" when the brand voice requires formal "Sie." The Japanese locale file is 60 keys behind the English one because nobody tracked what was added last sprint. And there are 23 hardcoded English strings buried in React components that never went through the i18n pipeline at all — invisible to anyone who only checks the locale files.

A trade show in six weeks needs all three languages shipped. The old process would take three weeks just for the back-and-forth with translators, leaving no buffer for review or fixes.

The Solution

Using the content-writer, batch-processor, and data-extractor skills, the workflow extracts every translatable string from source code and locale files (including the hardcoded ones that slipped through), generates culturally adapted translations for all three markets in a single run, handles pluralization and gender rules per language, and validates completeness — turning three weeks of translator coordination into four hours of focused work.

The pipeline is reusable: once set up, incremental translations for new features take minutes instead of days. Ship a feature on Monday, have it translated by Tuesday.

Step-by-Step Walkthrough

Step 1: Extract All Translatable Strings

The first surprise in any localization project: the number of translatable strings is always higher than expected. Start by finding everything that needs translation, not just what's in the locale files:

Scan ./src and ./emails for all translatable strings. Extract every key from en.json and find any hardcoded English strings in React components that should be externalized.

The scan covers three sources:

Locale file (src/locales/en.json): 847 keys
Hardcoded strings: 23 English strings found across 14 React components that were never externalized
Email templates: 6 files with translatable content blocks

Total: 912 strings to translate. That's 65 more than the 847 keys in the locale file — and those 65 are the ones that would have stayed in English forever in a manual process.

The 23 hardcoded strings are the kind of thing that slips through every localization effort — someone writes <h2>Settings</h2> instead of <h2>{t('settings.title')}</h2>, and it stays in English for every non-English user forever. A German user sees a fully translated app with random English headings scattered throughout. These get externalized into proper i18n keys as part of this step, so the locale files become the single source of truth for all user-facing text.

Step 2: Generate Culturally Adapted Translations

Literal translation isn't localization. Context matters — formality, tone, cultural conventions, and even string length for UI elements:

Translate all 912 strings to de-DE, pt-BR, and ja-JP. This is a project management SaaS app. Use informal tone for pt-BR, formal Sie-form for de-DE. Keep UI strings under 40 characters where the English is under 30. For date and number formats, use each locale's native convention.

The character limit constraint is practical: German words are notoriously long ("Projekteinstellungen" has 22 characters where "Settings" has 8), and Japanese can be more compact than English in some cases but needs more vertical space. Without constraints, translated button labels overflow their containers and break the layout. The formality constraint is equally important — addressing German business users with "du" (informal) when the brand requires "Sie" (formal) is the equivalent of starting a business email with "Hey dude."

Step 3: Review Batch Results

The batch processor handles all 912 strings across three locales with locale-specific adaptations:

Cultural adaptations applied:

Locale	Formality	Decimal Separator	Date Format	Number Format
de-DE	Formal "Sie"	Comma (1.234,56)	DD.MM.YYYY	Dots for thousands
pt-BR	Informal "voce"	Comma (1.234,56)	DD/MM/YYYY	Dots for thousands
ja-JP	Polite desu/masu	Period (1,234.56)	YYYY年MM月DD日	Commas for thousands

Warnings flagged for review:

12 strings exceeded the 40-character limit in de-DE — German compound words push past the boundary. Each flagged string includes a suggested shorter alternative.
3 strings in ja-JP need review: the word "board" is ambiguous without context (掲示板 for bulletin board vs. ボード for project board in a Kanban sense)
2 strings in pt-BR flagged: English idioms ("Get started" and "Jump right in") that don't translate directly and need culturally appropriate alternatives

Output written to src/locales/de-DE.json, src/locales/pt-BR.json, and src/locales/ja-JP.json.

The warnings are the most valuable part. A human translator might silently choose the wrong meaning for "board" and nobody catches it until a Japanese user reports confusion three months later. Flagging ambiguity upfront means the team can make an intentional decision rather than discovering the mistake in production.

The character limit warnings come with alternatives: instead of "Projekteinstellungen" (22 chars), the agent suggests "Einstellungen" (13 chars) which loses the "project" prefix but fits the UI. The team decides which trade-off to make.

Step 4: Handle Pluralization and Gender Rules

Different languages have fundamentally different grammar rules that English doesn't prepare you for:

Check all translated strings for pluralization issues. German has different plural forms than English, Japanese doesn't pluralize nouns, and Portuguese has gendered nouns. Update the locale files to use ICU MessageFormat where needed.

The fixes are substantial:

de-DE: 34 strings updated to handle German plural forms correctly
pt-BR: 18 strings updated for gendered variants — "o projeto" (masculine) vs. "a tarefa" (feminine). When the app says "1 novo projeto" it needs to say "1 nova tarefa" for tasks, because the adjective agrees with the noun's gender.
ja-JP: 12 strings simplified by removing unnecessary plural markers that don't exist in Japanese. "5 projects" becomes "5件のプロジェクト" with a counter word, not a plural suffix.
All count-dependent strings converted to ICU MessageFormat patterns like {count, plural, one {# project} other {# projects}}

A naive translation would show Brazilian users "1 projetos" instead of "1 projeto" and "2 tarefa" instead of "2 tarefas." These aren't cosmetic issues — they make the app feel like it was machine-translated by someone who doesn't speak the language, which undermines the entire reason for localizing in the first place. Users notice grammatical errors faster than they notice feature gaps.

Step 5: Validate Completeness

The final check catches gaps, suspicious translations, and leftover English:

Verify that all three locale files have the same keys as en.json. Flag any translations that are identical to the English original, suspiciously short, or containing untranslated English words embedded in otherwise translated strings.

Validation results:

Locale	Keys Present	Identical to English	Issues Found
de-DE	912/912	0	2 embedded English words
pt-BR	912/912	1 ("OK" button — acceptable)	0
ja-JP	912/912	0	5 technical terms for review

Flagged for human review:

ja-JP: "dashboard" left untranslated — common practice in Japanese tech UIs, but the team should decide whether to keep the English or use the katakana ダッシュボード
ja-JP: "sprint" left untranslated — same question (keep English or use スプリント)
de-DE: "Setup" embedded in a German sentence — should be "Einrichtung"
de-DE: "Feature" used instead of "Funktion" — English loanword that's common in German tech but may confuse non-technical users

Seven strings need a human decision out of 2,736 total translations (912 strings x 3 locales). That's a 99.7% automation rate, with the remaining 0.3% flagged with specific context for a quick human decision — not a vague "please review," but a concrete question like "should 'dashboard' stay in English or become katakana?" that can be answered in seconds.

Real-World Example

Dani, a frontend developer at the 15-person startup, needs to ship German, Brazilian Portuguese, and Japanese versions before a trade show in six weeks. The CEO already committed to demos in all three languages with prospective enterprise customers in each market. The old process of emailing spreadsheets to freelance translators would take three weeks of back-and-forth alone — and that's assuming the translators are available on short notice, which they usually aren't for a 912-string rush job.

The agent extracts 912 translatable strings in under a minute, including 23 hardcoded English strings that would have stayed in English forever in a manual process. All three locale files generate with culturally appropriate translations, and 15 strings get flagged for human review due to ambiguous context or cultural decisions.

Dani reviews the flagged strings with a native-speaking colleague over a 30-minute video call — 10 of the 15 flags are quick "yes, keep the English term" decisions for Japanese tech vocabulary, and 5 need actual translation corrections. The agent externalizes the 23 hardcoded strings, creating proper i18n keys and updating the React components so they use useTranslation() instead of hardcoded text. That refactor alone would have taken a developer half a day manually.

Total time: 4 hours instead of 3 weeks. The trade show deadline is met with weeks to spare, and the demos in all three languages go smoothly — the German enterprise prospect specifically comments that the formal "Sie" form shows attention to their market.

And every future feature release reuses the same pipeline for incremental translations. When the team ships a new feature with 15 new strings, the pipeline translates just those 15 strings into all three locales, validates completeness, and flags any that need human review. What used to be a three-day delay on every release becomes a 10-minute step in the deployment checklist.

LIVETry this use case on your own files

Skills stack · 3 skills

content-writer

batch-processor

data-extractor