Most agency processes live in a deck. This one lives in a folder. Every markdown file Claude reads, every agent that grades work, every audit that runs before a single ad is generated. Open this page once and you can see the whole pipeline.
You're not buying ad output. You're buying the engine: the part that takes your Meta account, reads what worked and what didn't, and ships a graded batch of new ads against that data, on a recurring cadence, without me being the bottleneck on every line of copy or every visual decision.
Every file you'll see has one of three tags. The tag tells you whether the file is a fixed part of the engine, a piece I customize for your brand at install, or an output that gets refreshed every batch from your Meta account. Knowing which is which is the difference between "this thing runs" and "this thing runs for your brand."
Universal pipeline. Pipeline doc, the 9 copy agents, agent 10, the audit template, the safe-zone block. I don't rewrite these per client. They've already been hardened across every brand the system runs, and yours inherits all of it on day one.
Brand bible, voice agent, brand spec card, visual style card, persona library, banned-word list. Built once during month one, then read by every batch forever. This is the part that makes the system speak your brand, not generic DTC.
The data brief. EXTEND / RETIRE / FILL / NET-NEW buckets. On-screen text from winners and losers. Trending hooks. This is the file that makes round 5 different from round 4, because round 4's results are inputs to round 5.
Two jobs in this phase. First, the one-time-per-brand install: scrape the site, build the brand bible, lock the voice agent, render the spec cards. Run this once in month one, never again. Second, the per-batch data pull: ask Meta what worked and what didn't, get back a structured brief that drives every slot in the next slate.
When Claude reads this file, it enters new-client install mode. It asks for the brand name, shortcode, website URL, category, vertical, and any extra materials (founder interviews, prior briefs, brand guidelines). Then it walks the install in order: scrape the site, build the brand bible, draft the voice agent, render the brand spec card and visual style card, populate the product catalog.
Run once per brand, in week 1 of month one. Everything downstream reads from the artifacts this file produces. Skip it and the rest of the pipeline has nothing brand-specific to anchor to.
Targets Shopify out of the box. Pulls every product (title, price, description, ingredients, variants, all photos) into Product Assets/{handle}/, and every logo + favicon + apple-touch-icon into Brand Kit/logo/. Each product folder has its photos sitting next to a description.md and meta.json, plus a master catalog.md that links the whole thing together.
This is the file that means every future ad uses real product photography, not stock, not invented bottles, not a generic substitute.
Fires up a real headless Chromium so it can read JS-rendered SPAs and resolve CSS variables to actual values. Output is a JSON of every color used on the site (with frequency), every font stack, the type hierarchy, and the spacing tokens. Saved into Reference/ as design tokens.
Why it matters: the brand spec card in Phase B is built from these tokens, not from guesses. The proposal site, the spec ads, the visual style card all reference your real hex codes, not a designer's read of "looks about right."
One round-trip to your Meta account. Returns the top 20 ads by spend, bottom 20 by ROAS, action-plan recommendations (extend / retire / test / fill), per-dimension learnings (format / persona / angle / emotion), the on-screen text from every winner and loser, plus a Voice-of-Customer rollup from Reddit and a calendar-moment lookahead.
Then it writes DATA_BRIEF.md: the markdown brief Phase B reads to allocate the next 14 slots. EXTEND / RETIRE / FILL / NET-NEW buckets, each row with a "Pair with" companion and (when relevant) a "Trending hook" with a ship-by deadline.
DATA_BRIEF.md into your repo. The output stays the same. The source is yours.
This is the brain. Phase A delivers the data brief. Phase B turns that brief into 14 concrete ad slots, each with a format, a persona, an angle, an emotion, and a specific copy hook. Then it audits the slate before a single image is generated, because catching a clustering problem here costs nothing, and catching it after Fal renders 14 images costs real dollars.
The end-to-end playbook for shipping a graded batch of statics. Phase 1 is concept slate (the 14-row table). Phase 2 is variant selection (which style of UGC, which style of headline). Phase 3 is the visual diversity audit. Phase 4 is the 9-agent copy refinement. Phase 5 is generation via Fal. Phase 6 is Agent 10 grading. Phase 7 is iteration on anything below 90.
This file is what Claude reads to know what comes next. Every other file in Phases B-G is referenced by this one. If you only opened one file in the whole system, this is the one.
Every batch of 14 must satisfy: ≥2 platform-native UGC, ≥1 macro close-up, ≥1 outdoor location, ≥1 type-only, ≥1 chaotic tablescape. Without these quotas, every batch comes out 14 variations of the same warm editorial mood, because the model and the writer both default to whatever the brand's "house style" already is.
The quotas are the antidote. They force at least 6 of the 14 slots into visually distinct registers, even when the brief and the brand voice would naturally cluster. Plus the variant-rotation rule: same archetype can't run two batches in a row at the same Style letter.
Copied into every batch as AUDIT.md. Filled in twice. §1 Visual Diversity Audit runs at the end of Phase B: 14-row table mapping each slot's surface bucket, lighting bucket, primary subject, color cast, plus cluster-cap checks (max 3 per surface, max 4 per lighting, max 5 single-can-hero) and an adjacent-pair check. §2 Voice Diversity Audit runs at the end of Phase C: same shape, but for voice register, anchor type, and source citations.
The reason this exists: the per-ad rubrics catch per-ad problems. Slate-level cluster failure is a different bug, where 14 ads each pass their own rubric but render at thumbnail scale as the same warm-editorial mood. The audit template is the file that surfaces that pattern before generation runs.
Each file has a Style Selection Matrix with 4-9 styles inside (Style A, B, C, …) plus rules for when to use which. Phase 2 of the pipeline reads these to pick a specific variant per ad. The 16 archetypes:
Platform-native creator-style. iMessage, Reddit, Notes, Slack composites + retro-photo register.
The transformation arc. Setup, payoff, contrast forced to read at thumbnail scale.
First-person voice with attribution line. Quote marks + dash, never buried in subhead.
Reviews, ratings, customer voices. Real specific numbers, never round invented ones.
One number, big, defended by a citation. Treatment-claim risk gates apply.
Type-as-the-message. The wordmark IS the brand mark. No can required.
Single voice, full quote, attribution. Different rhythm from social-proof aggregations.
The classic. Bullet rhythm, parallel structure, cut weak verbs.
Tight list register. 3-5 items, short, no commentary.
Caveat-style, casual, friend-to-friend. Use sparingly. Big tonal shift.
Editorial article skin. Native-camouflage. CTA pill OFF.
Magazine clipping. Pull quotes, masthead lockup, body copy. Long-copy register.
What we don't do, what we won't sell. Earned negation, not edgy for edgy's sake.
The offer pill carries the deal. CTA + Logo both ON. Different rules from default.
Comparison fulcrum word on its own line. "Vs." stacked, italic, ~50% size.
Retro-photo carve-out for illustrated brands. Late-90s digital-camera character.
Each agent is a markdown file with a role, a rubric, and a scoring loop. Claude reads the file, takes on that role, scores the ad, returns a numeric score plus specific rewrites. Iterate until the ad clears 90. Then hand to the next agent. Nine passes total. This is the cheap end of the pipeline. Catching a fabricated stat or a banned word here costs pennies. Catching it after Fal renders the image costs dollars. Catching it after the ad goes live costs the campaign.
A consumer-psychologist agent. Inputs a persona definition (age, role, pain points, language patterns, worldview) and the ad brief. Scores 1-100 across 6 dimensions: language fit, pain-point fit, worldview match, evidence type, social context, register. Returns specific rewritten copy where the score lags.
Scores whether the ad's angle (the specific argument it's making) is differentiated, defensible, and matches the awareness level of the target traffic. Catches "natural ingredients" and "made for you" type drift back into category-default messaging.
Maps the ad to a target emotion (frustration, relief, validation, curiosity, FOMO, pride) and grades whether the copy + scene + persona combination actually generates that emotion or just claims it. The "claim vs. earn" distinction is the whole rubric.
Pure craft pass. Active verbs, parallel structure, cut filler, cut hedges, cut weak modifiers, headline rhythm, line-break placement. Independent of brand and persona. Catches the "this technically scans but reads flat" problem.
If Phase 2 picked "Style C of UGC archetype," does the copy actually read as platform-native UGC, or is it editorial copy crammed into a UGC visual? Cross-checks the chosen Style's rules against the actual copy block. Rejects mismatches.
The agent that reads from your brand voice agent file, not a universal rubric. Banned words (yours, not mine). Hard rules ("never claim treatment of a medical condition"). Voice register (warm-clinical vs. casual-friendly vs. founder-podcast). Brand-swap test: would this ad read as your brand or could a competitor have shipped it?
This is the most brand-specific agent in the system. Month one spends real time getting it right, because it gates every ad downstream.
Behavioral-econ pass. Anchoring, loss aversion, social proof weight, framing effects, availability heuristic. Grades whether the ad pulls a System-1 reaction in the first 1-2 seconds (the only window Meta gives you), or whether it relies on System-2 cognitive work the viewer won't do.
The hardest pass. Grades against a checklist of conversion-killers: weak hook, missing offer, unclear CTA, scene-format mismatch, social-proof drop, awareness-mismatch (cold traffic getting Most-Aware copy). Returns specific fixes, not just scores.
Composite pass that holds the ad against the previous 8 scores and asks: would a senior creative director ship this? Catches the "every individual rubric passed but the ad still feels off" problem. Final gate before Phase D burns Fal credits.
Once the slate is locked and the copy has cleared the 9-agent gate, Phase D actually renders the images. The pipeline picks between GPT Image 2 (default, photographic credibility) and Nano Banana 2 (override, illustrated / halftone / multi-product). Each prompt is assembled from three universal blocks plus the per-ad scene language.
Brief input → prompt assembly → Fal API call → finished PNG. Single source of truth for how Claude builds a generation prompt. Includes the model picker (GPT2 vs. NB2 by visual register), the multi-SKU rendering ceiling (both models top out at ~2 distinct SKUs in one frame), the aspect-ratio pre-check (reference images must be ≤3:1 or Fal hard-fails), and the 4-attempt retry pattern.
Why this exists: the same brief assembled two different ways produces wildly different outputs. This file locks the assembly so the variance is in the brief, not in how the brief was prompted.
Pasted verbatim at the top of every prompt. Defines the 840×1350 safe rectangle inside a 1080×1920 frame, with explicit pixel coordinates, plus a "rows 1-2 and 9-10 must be empty" simple version for the model to fall back on. Without this block, GPT Image 2 routinely places text in the top 400px Instagram-overlay zone. With it, the headline stays legible across Stories, Reels, and Feed.
Models default to making 12oz cans look like 16oz tallboys, supplement bottles look chunkier than they are, pouches float weirdly. This block locks proportions to the uploaded reference photo and provides explicit real-world dimensions. Pasted after the safe-zone block, before the per-ad scene language.
Agent 10 reads every rendered PNG, scores it across 11 gates and 58 dimensions, and flags anything below 90 for re-roll. Nothing ships to your Meta account without all 14 ads clearing this gate. This is the file that turns "we generated 14 ads" into "we generated 14 ads that actually perform."
The agent simulates the actual cognitive journey of a person being served the ad: thumb-stop, hook recognition, claim digestion, social-proof weighting, action engineering, brand recall. 11 gates in sequence. Gates 0-9 are universal (native camouflage, hook strength, copy clarity, format compliance, credibility, persuasion, action, brand fit, conversion math, polish). Gate 10 is the brand-performance overlay: scores the ad's classification (format × persona × angle × emotion) against the action plan from Phase A. An ad whose dimensions the brand has already proven don't work gets capped at 75, no matter how good the craft is.
This is the gate that catches: safe-zone violations, dropped headlines, paper-inset artifacts, fabricated stats that survived Phase 4, banned-word leakage, brand-fit failures the per-ad rubrics missed, AND ads built on dimensions the data has already retired.
The closed loop, in plain English. Every ad that ships generates Meta performance data. This file defines what gets logged per ad (Brief ID, brand, format, style, persona, angle, headline hook, scene, the actual rendered copy), how it gets logged, and where it goes back into the action plan so future briefs are informed by what actually worked, not just what looked good in generation.
This is the self-iteration mechanism. Without this file, the pipeline ships pretty ads. With it, the pipeline gets smarter every batch.
Run on every generated image before Agent 10 scores it. Hard-fail items: instruction leakage (font names, hex codes, "Hook:" labels visible on the image), safe-zone violations, multi-panel collages, product fidelity errors, label misspellings. Any single fail blocks the ad. Designed to take 30 seconds per image.
Why a human pass when Agent 10 exists: because Agent 10 is comprehensive but slow. The QC checklist is fast and catches the 80% of obvious failures before Agent 10 spends cycles on them.
Once all 14 ads clear Agent 10, Phase F moves them into the final-delivery folder structure. Phase G resizes to 1:1, then re-grades through Agent 10 (because aspect-ratio shifts can drift typography or trim CTAs). What ships to your Meta account is two files per concept: the 9:16 source-of-truth and a 1:1, both ≥90.
No standalone playbook, just file moves into Final Delivery/9x16/ with the canonical naming convention ([BRAND]_[Concept]_V[1-N]_9x16.png). Alongside the PNGs, four artifacts always go into the same folder: the locked generation script, the Agent 10 report, the filled-in audit, the data brief that drove the slate. That four-artifact triplet is the audit trail. Next batch reads them.
Universal resize playbook, single rulebook for getting approved 9:16 ads into 1:1 sizes. Reads from Final Delivery/9x16/ verbatim, runs each PNG through Fal's gpt-image-2/edit with NB2 fallback, drops outputs into Final Delivery/1x1/. Refuses to overwrite without a --force flag, so re-running is safe. Critically: every 1:1 output gets re-graded through Agent 10. Aspect changes re-shape the safe-zone constraints (1:1 has different blocked rows than 9:16), and resize models occasionally drift typography or trim a CTA. Anything below 90 falls back to manual.
Without it, every batch is a creative judgment in a vacuum. With it, every slot maps to a specific signal: extend a winner, kill a loser, fill a vertical gap, test a new bet. Your Meta history feeds the brief on day one.
Visual diversity (§1) and voice diversity (§2) get caught at the slate level, not after $50 of Fal credits is burned on a clustered batch. Cheap end of the pipeline.
9-agent copy review (Phase C, ≥90) gates Phase D. Agent 10 creative grader (Phase E, ≥90) gates delivery. Either fails, the batch doesn't ship. This is the rule that makes quality independent of who's running the batch.
3-4 of every 14 slots must be combos the brand has never tested. Skip it and the slate just optimizes what already worked, which is how every brand's creative converges into one mood after 6 batches. The Net-New floor is what stops your creative from going stale at scale.
Data brief + Agent 10 report + audit + locked generation script. Next batch reads them. The system literally cannot ship two identical batches, because round N reads round N-1's results before deciding what's worth making.
Three months. Mine, train, hand off. Replace the performance creative agency. Keep the system inside your brand.