How Douyin Studios Produce AI Short Dramas at Scale
How four-person Chinese teams ship 80-episode AI short dramas using Seedream, Kling, Doubao, and Jianying.
The setup: small teams, factory output
Walk into a co-working space in Hangzhou's Binjiang district or a converted apartment in Chengdu's Gaoxin South, and you'll find what local operators call a "duanju gongchang" (็ญๅงๅทฅๅ) โ a short drama factory. The crew is rarely more than four people: one lead operator who handles scripting and prompts, one editor on Jianying, one account manager juggling Douyin and Xiaohongshu posting schedules, and a part-time voice talent who often doubles as the team's gofer. Some teams are just one person.
What they ship is not what most Western creators picture when they hear "AI video." These are 60-to-90-second vertical episodes โ typically 80 to 120 episodes per series โ built around hooky melodrama: reincarnated empresses, billionaire CEOs with amnesia, rural daughters-in-law outsmarting wealthy in-laws, ancient cultivators waking up in modern Shanghai. A single team will run three to six concurrent series across multiple Douyin accounts, pushing out two to four finished episodes per day per series. The math is staggering by Western standards: a four-person crew might publish 300 to 600 minutes of finished AI video per week.
What's unusual to a Western observer isn't the volume โ it's how unfussy the workflow is. Nobody is trying to fool viewers into thinking the footage is real. Audiences know it's AI. The watermark on the corner of the frame is often Doubao's or Jimeng's logo, untouched. Comments under viral episodes openly discuss which model rendered which shot. The aesthetic ceiling everyone is targeting is "good enough to keep thumb-scrolling stopped for 90 seconds," not "indistinguishable from a Netflix production." That single product decision rewires every other choice downstream.
The other surprise is where the money comes from. These teams aren't really chasing ad revenue from Douyin's creator fund โ payouts are thin and inconsistent for AIGC content, and the platform has tightened distribution for low-effort AI work. The real revenue lines are: paid drama unlocks (users pay roughly 0.50 to 2 USD to binge a finished series on Douyin's mini-program drama platform or on third-party WeChat mini-programs), affiliate commissions on products embedded into the storyline, and increasingly, B2B arrangements where the team is producing content for an MCN agency or a brand under a flat monthly retainer of around 4,000 to 12,000 USD per series.
The actual workflow, end to end
The pipeline most teams converge on has six stages, and the tools are remarkably consistent across cities.
Stage 1: script generation and outline. The operator starts with a proven hook โ usually scraped from a list of top-performing series on Douyin or from one of the script trading groups on WeChat where ghostwriters sell 100-episode outlines for 30 to 80 USD. Doubao (ByteDance's chatbot, equivalent in role to ChatGPT inside the Chinese ecosystem) or DeepSeek is then asked to expand the outline into per-episode beats. A common prompt template asks for the cliffhanger placement at the 50-to-55-second mark, exactly two emotional inversions per episode, and a "face-slap moment" (ๆ่ธ, dalian) โ the genre's signature beat where the antagonist gets humiliated. Output is dumped into a shared Feishu document, which functions as the team's project hub the way Notion does for many Western teams.
Stage 2: character and scene image generation. This is where Seedream (ByteDance's image model, accessed through the Jimeng app) and Kling's image module dominate. Operators generate "character bibles" โ a sheet of reference images per character at multiple angles, outfits, and emotional states โ and lock them in early so each episode has visual continuity. Mid-journey-style prompt engineering is less common here. Instead, operators rely heavily on consistency features: Jimeng's "character lock" (่ง่ฒไธ่ดๆง), Kling's reference-image conditioning, and increasingly Qwen-Image's edit mode for tweaking a single feature without redrawing the face. A small but growing number of teams use Liblib (a Chinese Civitai analog) to train LoRAs on their own character sheets when consistency really matters, but most don't bother โ the per-shot regeneration cost is low enough that brute force works.
Stage 3: image-to-video. This stage is the bottleneck and the largest cost line. The two dominant tools are Kling (by Kuaishou, currently the quality leader for cinematic motion in Chinese workflows) and Jimeng's video module (powered by ByteDance's Seedance and PixelDance models). Hailuo (MiniMax) and Vidu show up as backups when the primary models are queued or producing artifacts. A typical 90-second episode requires 18 to 30 individual clips of 5 to 10 seconds each. Operators run batches overnight โ Kling's pro tier and Jimeng's premium membership both offer concurrent generation slots, and serious teams subscribe to two or three accounts to parallelize. An operator we researched in a Hangzhou Feishu community described their nightly routine as queuing 200 generations before bed, waking up to roughly 140 usable clips, and accepting a 30 percent reject rate as the cost of the medium.
Stage 4: voice and audio. Voice cloning is where Western and Chinese workflows diverge most sharply. Teams routinely clone two to four "house voices" using Doubao's voice synthesis, Reecho, or Volcengine's TTS, and use them across every series. The cloned voices are trained on either the operator's own readings (to dodge IP issues) or, less defensibly, on bought voice packs of varying provenance. Background music comes from CapCut/Jianying's licensed library or, for paid distribution, from Tencent Music's commercial license bundle. Sound effects โ the punchy "whoosh" on a face-slap, the heartbeat sting on a reveal โ are pulled from genre packs sold on Taobao for 5 to 15 USD a pack.
Stage 5: editing and assembly in Jianying. Jianying (the domestic version of CapCut, also by ByteDance) is the universal editor. It's not a creative choice โ the platform actively favors content edited in Jianying because of metadata signals it can read on upload to Douyin. Operators use Jianying's auto-captioning, auto-beat-detection on music, and the increasingly capable AI-driven smart cuts. Jianying's "digital human" feature is sometimes used for talking-head segments, but for narrative drama the workflow is to lay AI-generated clips on a vertical 1080ร1920 timeline, drop in the cloned voiceover, layer SFX, and burn in stylized subtitles in the genre's signature thick yellow-and-red font.
Stage 6: distribution and account warming. This is where Westerners most underestimate the workload. A serious team operates 5 to 20 Douyin accounts in rotation, each one "warmed" โ meaning the account has been used for normal browsing for two to three weeks before posting, follows a believable mix of accounts, and posts on a human-like schedule. Episodes are released two or three per day, with the cliffhanger of the third pointing to a paid mini-program where viewers unlock the rest. Cross-posting to Xiaohongshu (the lifestyle-leaning platform that skews female and higher-income, demographically ideal for romance dramas) and WeChat Channels happens for the first three to five episodes of any series as a discovery funnel. Telegram-style Discord communities don't exist; the equivalent fan engagement happens in WeChat groups managed by the account, often capped at 500 members, with the operator manually pinning episode links.
What it actually costs
Costs vary, and operators are cagey about exact numbers, but the ranges below reflect what teams in the 2026 market are paying. All figures are in USD per finished 90-second episode, assuming a four-person team running four concurrent series.
- Script (Doubao/DeepSeek API plus occasional purchased outlines): roughly 1 to 3 USD per episode amortized across the series.
- Image generation (Jimeng or Seedream membership): the relevant tier runs about 30 USD per month for unlimited standard generations, working out to around 0.50 to 1.50 USD per episode at typical volume.
- Video generation (Kling Pro plus Jimeng Premium): this is the dominant cost. Kling's professional tier runs roughly 60 to 90 USD per month per account, and serious teams hold two to three accounts. Per-episode cost lands at roughly 8 to 15 USD when batched efficiently. For series shooting at the higher quality tier (Kling 2.0 master mode or Jimeng's 1080p), this can climb to 20 to 30 USD per episode.
- Voice synthesis (Reecho or Volcengine): roughly 0.30 to 1 USD per episode at character-cloned tiers.
- Music and SFX licensing (amortized): roughly 0.50 USD per episode.
- Editing labor (Jianying is free): if you cost the human editor at roughly 6 USD per hour (a realistic Chengdu rate) and they spend 40 to 60 minutes per episode, that's about 4 to 6 USD per episode.
- Account infrastructure (SIM cards, devices, proxies for account warming): amortized to roughly 1 to 2 USD per episode across a series.
All-in, a finished episode costs the team somewhere between 15 and 50 USD in hard costs, depending on tier and reuse efficiency. A 100-episode series therefore costs 1,500 to 5,000 USD to produce. A successful series with a 5 percent paid-unlock conversion at 1.50 USD per unlock against 2 to 5 million cumulative views can clear 15,000 to 60,000 USD in revenue. Operators describe a roughly one-in-four or one-in-five hit rate, which is why running multiple series in parallel is structural, not optional.
What Western creators can copy
Several things port over cleanly. The product decision is the most valuable lesson: deciding upfront that "good enough" beats "perfect," and designing the rest of the stack around volume. Western AI video creators routinely overspend on a single piece, then post it once, and wonder why the math doesn't work. The Chinese drama factory model is fundamentally a portfolio play โ you accept that 70 to 80 percent of episodes will underperform and engineer for the hit ratio.
The template-and-bible workflow ports directly. Lock characters early using your model's consistency features โ Runway's references, Midjourney's character reference, or Sora's storyboard mode. Reuse character sheets across multiple series. Build a small library of voice clones (with proper rights) and use them as your house voices. Standardize aspect ratios, subtitle styles, and a beat structure that you can prompt against for every script.
Jianying/CapCut itself is available globally under the CapCut name, and it remains one of the strongest free editors for short vertical content. The auto-captioning and beat detection are useable straight away. What does not port is the platform-side preference: TikTok does not algorithmically favor CapCut-edited uploads the way Douyin favors Jianying-edited ones, so there's less hidden upside.
The portfolio account strategy is partially copyable. TikTok, YouTube Shorts, and Instagram Reels all reward consistent posting across multiple themed accounts, but the warming-and-rotation tactics carry real risk under Western platform terms of service. A safer adaptation is two or three accounts under different niches operated transparently, rather than the 10-to-20-account farms common in China.
What Western creators can't easily replicate
Three things are genuinely hard to port.
Tool access. Kling has an international tier and Jimeng has begun rolling out an English app, but the Chinese versions remain noticeably ahead in cinematic motion and Asian-character consistency. Seedream is essentially China-only in practice. If your stories require Asian faces and settings, the Chinese stack is a meaningful quality advantage; for Western faces, Runway, Sora, Veo, and Pika now match or beat it.
Distribution economics. The paid-mini-program model โ where viewers pay 1 to 2 USD to unlock the rest of a binge series โ has no clean Western equivalent. Patreon, OnlyFans, and YouTube memberships are subscription-shaped, not per-series unlock. This single difference changes the unit economics meaningfully and is why Western AI drama creators rely more on ad revenue and brand deals.
Labor cost. A skilled Jianying editor in Chengdu earning 800 to 1,500 USD per month is the structural backbone of the model. The same skill in Los Angeles, London, or Berlin costs 5 to 8 times more. Western operators need to either edit themselves, use AI-driven assembly tools more aggressively, or accept thinner margins.
Cultural and regulatory caveats
The Chinese AIGC content regulation regime is real and active. Algorithm filing is required for generative models offered to the public, and platforms enforce a labelling requirement on AI-generated content โ the unobtrusive "AI็ๆ" tag in the corner of episodes is not a stylistic choice. Romance, business-revenge, and historical-reincarnation themes are tolerated; political content, real public figures, and explicit material are aggressively removed and can result in account bans across the operator's entire account stable. Several teams maintain a private "do-not-prompt" list of recent regulatory triggers, updated weekly.
Voice cloning is the gray-area workhorse. Cloning a celebrity voice will get an account banned and possibly trigger a defamation complaint; cloning a paid voice actor's recordings under a clear contract is standard. The middle ground โ bought voice packs of unclear provenance โ is where most teams operate, and it's an area Western creators should approach more conservatively given stronger right-of-publicity enforcement in the US and EU.
The cultural texture of the genre also doesn't travel one-to-one. The "face-slap moment" is a load-bearing structural element of Chinese short drama that doesn't have the same emotional currency in Western audiences. The romance archetypes โ the dominant CEO, the rural-to-urban underdog, the cultivation hero โ are tied to specific class anxieties and gender dynamics that a direct translation tends to miss. Western creators borrowing the production model should expect to redesign the storytelling spine, not just the language.
What's worth carrying across is the operating philosophy: treat short-form AI drama as a manufacturing process, not a craft project. The teams winning in China are the ones who built a small, repeatable factory and let the format earn its way to quality through volume.