G
Case Study
11 min readUpdated 2026-05-30

The AI Content Stack Behind Top Xiaohongshu (RedNote) KOLs

How small Chinese studios run Xiaohongshu account matrices using Doubao, Seedream, Kling, and Jianying at sub-$2 per post.
xiaohongshu
rednote
ai-workflow
china-tech
creator-economy
content-ops

The Setup: Who These Operators Actually Are

Most Western coverage of Xiaohongshu (the platform now called RedNote in English) treats it as the new Instagram and stops there. That framing misses what actually happens behind successful accounts. The top tier of Xiaohongshu output, particularly in beauty, home, parenting, and finance verticals, is not produced by individual lifestyle creators sharing their week. It is produced by small "content studios" that look closer to a software shop than a creator brand.

A typical setup is two to five people, often working out of a residential apartment in Hangzhou, Chengdu, or a Shenzhen suburb. They run what the industry calls a 矩阵号 (juzhenhao) or matrix of accounts, ten to fifty Xiaohongshu profiles operated in parallel under a single thematic umbrella. One person handles trend mining, one or two handle production, one handles posting and risk control, and the studio lead handles monetization deals with brands and MCN partners. Headcount stays small because almost every step is wrapped around an AI tool.

Scale is the part that surprises Western creators. A single junior operator we corresponded with described pushing thirty to eighty notes per day across her assigned subset of an account matrix. Her studio collectively shipped over a thousand posts per week. None of these are sloppy autoposts. Each note has a curated cover image, hand-edited copy, hashtag stack, and a final human review pass. The only way that math works is by treating production as a pipeline with AI at every stage, not as a creative act with AI as a side helper.

This is the unfamiliar part for most Western audiences. Solo creators in the US tend to optimize for a single channel of personal brand, with AI used for a few thumbnails or scripts. Chinese matrix operators optimize for distribution surface area and treat individual accounts as semi-disposable test units. A note that lands gets cloned, retitled, and reshot for adjacent accounts within hours. A note that misfires gets quietly deleted, and the underperforming account may be retired if its weighted recommendation score drops too low. The unit of work is the matrix, not the post.

The Actual Workflow, Step By Step

1. Trend mining

The first step never involves a blank page. Operators open one of three third-party analytics tools: 千瓜 (Qiangua), 灰豚 (Huitun), or 新红 (Xinhong). These platforms scrape and index Xiaohongshu's public note stream, surface rising keywords by vertical, and rank notes by recent engagement velocity. A studio in the home decor vertical might pull the top thirty rising keywords in the past forty-eight hours, then filter to the ones with under 500 existing posts to find a window before the topic saturates.

This step has no clean Western analog. Tools like Exploding Topics or TrendHero exist, but Qiangua-class tools are tightly coupled to one domestic platform and update on a much faster cycle, often hourly, because the underlying recommendation system itself moves on hours not days.

2. Script and copy generation

Once a topic is locked, the studio drops it into a Chinese LLM. The dominant choice is Doubao (豆包) from ByteDance, with Kimi from Moonshot and DeepSeek as common alternatives. Western models like Claude or GPT are used by some studios via mirrors or work VPNs, but the typical workhorse is a domestic model for three reasons: the API price is roughly an order of magnitude lower, the Chinese-language output sounds native, and the model has been fine-tuned on the specific cadence of Xiaohongshu notes, with the right emoji density, line break rhythm, and soft promotional voice.

A studio prompt typically asks for fifteen to twenty title variants, then a body with a hook, three to five "value points," and a soft call to action. Operators rarely accept the first output. They iterate, swap nouns, and run titles through a separate scoring prompt that estimates click-through likelihood based on patterns the studio has documented internally.

3. Image generation

The visual layer is where the production stack diverges most sharply from Western practice. The dominant image generator is Jimeng AI (即梦), ByteDance's consumer-facing surface for the Seedream model family. Seedream is genuinely strong on East Asian faces, indoor scenes, food photography, and the slightly desaturated aesthetic Xiaohongshu rewards. Tongyi Wanxiang (通义万相) from Alibaba and Liblib (a Chinese SD/Flux community platform) are the common backups. Midjourney shows up in luxury and fashion verticals, accessed through bundled mirror services.

For human-model content, virtual KOL workflows are common. A studio will train a LoRA on a synthetic face, lock pose and outfit prompts in a template, and crank out a year of "lifestyle" content from one fictional persona. A single image set for a daily note typically runs three to nine images.

4. Short video

For video-format notes, the stack is Kling (可灵, Kuaishou), Hailuo (海螺, MiniMax), Vidu, and Jimeng Video. Kling is the workhorse for cinematic five-to-ten-second cuts. Hailuo is preferred for character motion and dialogue lip-sync. Operators do not usually generate full videos end-to-end. A typical pattern is to generate three to six short AI clips and intercut them with stock footage, the operator's own phone footage, or licensed clips from material libraries like 包图网 (Bao Tu Wang) or 摄图网.

5. Editing

Almost every studio edits in Jianying (剪映), the domestic version of CapCut. The two share an engine but diverge in features. The Chinese version has aggressive AI helpers: automatic caption generation tuned for Mandarin, AI voice cloning, beat-matched cut suggestions, and a one-click "smart edit" that takes raw footage plus a script and produces a draft timeline. Westerners using CapCut International get a subset of these, but the speed advantage of Jianying for Chinese-language production is real.

6. Voiceover

Synthetic voice has quietly become standard for non-face-on-camera notes. Doubao's TTS, Reecho (睿声), and Volcengine voice synthesis cover the default range. A studio will keep a small library of cloned voices, typically licensed from voice actors on a flat fee, and reuse them across hundreds of notes for brand consistency.

7. Posting and risk control

This is the operationally hardest stage and the one most invisible to outsiders. Xiaohongshu's risk control system flags content that looks bot-posted, IP-clustered, or AI-generated. Studios respond with what they call 真机 (zhenji, "real device") setups, racks of actual phones each tied to a single account, posting on staggered human-like schedules. Some studios use light automation tools that drive these phones via ADB scripts. Others post manually because the labor cost in second-tier cities is low enough to make manual posting cheaper than the cost of a banned account.

A studio operator we corresponded with put it bluntly: "We do not trust any auto-poster that promises Xiaohongshu integration. The platform is too aggressive. Real phones, real fingers, real sim cards, that is the only way to keep an account alive past three months."

8. Analytics and iteration

After posting, operators watch the first two hours of engagement velocity closely. Xiaohongshu's recommendation engine is unforgiving in that window. If a note does not break out, it usually never will. Studios use the native creator dashboard plus Qiangua to attribute performance, then loop the winning angles back into the next day's prompt templates.

Cost Breakdown In USD

Per-piece costs in this workflow are the part that genuinely shocks Western creators. The numbers below are roughly representative for mid-2026 rates and assume a studio doing real volume rather than a hobby account.

  • Doubao API for copy and titles: roughly $0.05 to $0.15 per million input tokens depending on model tier. A full note of titles, body, and hashtags consumes a few thousand tokens, so the LLM cost per note is a fraction of a cent.
  • Jimeng AI image generation: roughly $15 to $30 per month for a creator subscription that effectively unlocks unlimited standard generations. At even ten images per day, the per-image cost is well under $0.10.
  • Kling video: pay-as-you-go runs roughly $0.20 to $1.00 per five-to-ten-second clip depending on resolution and motion settings. Subscription tiers range from $10 to $40 per month.
  • Jianying Pro: roughly $10 to $15 per month per editor seat.
  • Qiangua or Huitun analytics: roughly $200 to $400 per year for a single-vertical seat, more for multi-vertical agency tiers.
  • Account matrix management software: roughly $30 to $100 per month depending on how many accounts and devices are tracked.
  • Real-device posting setup: a one-time hardware cost. A rack of fifteen secondhand Android phones plus sim cards and a router runs roughly $800 to $1,500. Amortized over a year, this is negligible per note.
  • Labor: a junior operator in a second-tier city earns roughly $700 to $1,200 per month all-in. At thirty notes a day, that is well under $2 of labor per note.

The all-in marginal cost per finished image-led note for a working studio lands somewhere around $0.30 to $1.50. For a video-led note with one or two AI clips, roughly $1.50 to $4.00. The fixed costs (subscriptions, analytics, devices, salary) are absorbed across thousands of notes per month.

For comparison, a Western creator producing a single Instagram Reel using Midjourney plus Runway plus ElevenLabs plus a paid scheduling tool will often spend more on tooling per single piece than a Chinese studio spends per ten pieces. The arbitrage is not just labor. It is API pricing, which on the Chinese side is currently subsidized by the platform-provider price war.

What Western Creators Can Copy

Several pieces of this workflow translate cleanly.

The matrix mindset translates. There is nothing China-specific about the idea that your unit of work is a pipeline of accounts and tests rather than a single channel of personal brand. Western creators who treat content as a portfolio of small bets, with templated production and quick deletion of misses, can pull off similar throughput on Instagram, TikTok, or Pinterest.

Trend-first production translates. Building the day's content from a ranked list of rising topics rather than from a personal whim is a habit any creator can adopt. Tools like TrendTok, Glimpse, or even raw TikTok Creative Center fill the role Qiangua plays domestically.

Cheap LLM workflows translate. The Doubao-tier price point is closer than people think to what Anthropic Haiku, Google Gemini Flash, and DeepSeek's international API now offer. Building a templated prompt library that produces twenty title variants, three hooks, and a body in one call is well within reach.

CapCut translates directly. The international version of Jianying covers maybe seventy percent of the Chinese version's AI features and is improving fast. A creator who learns the auto-caption, auto-beat, and template features deeply will outpace anyone editing in Premiere on raw skill alone for short-form output.

What does not translate as cleanly is the cost stack. US image and video generation tools are priced for prosumer subscribers, not for studios pushing thousands of generations a day. A Western team trying to mirror Chinese throughput will hit subscription rate limits and per-generation pricing that Chinese tools simply do not impose. The phone farm setup is also harder to replicate. US carrier sim provisioning, account verification, and platform anti-fraud are all stricter, and the labor cost of manually posting from real devices is much higher.

The cultural translation is the most important caveat. Xiaohongshu rewards a specific tone: knowledgeable, peer-to-peer, lightly self-deprecating, with a clear "I tried this, here is what I learned" framing. Western platforms reward different tones in different verticals. Copying the workflow without adapting the voice produces content that feels mechanical on either side.

Cultural And Regulatory Caveats

Two regulatory shifts matter for anyone studying this stack.

First, since 2025 China has required AI-generated content to be labeled, both visibly to users and via embedded metadata. Xiaohongshu enforces this unevenly, but the platform has been steadily training its own classifier to detect undisclosed AI imagery and downrank or remove it. Studios respond by mixing AI generations with hand-shot footage, by post-processing AI images to break detector heuristics, and by leaning on virtual KOL personas where AI labeling is expected and accepted. Western platforms are moving in the same direction but are roughly a year behind on enforcement.

Second, commercial content on Xiaohongshu sits inside a structure called 种草 (zhongcao), literally "planting grass," where soft seeded recommendations precede the harder sales surface. Brand deals run through Xiaohongshu's official 蒲公英 (Pugongying) platform, and undeclared brand content is increasingly punished. The studio workflow above is built around producing organic-looking notes that can be flipped into declared brand content when a deal lands. Western creator economies have similar disclosure rules but a much looser cultural expectation around the seeding step.

Finally, the cultural texture of Xiaohongshu is not something AI alone can manufacture. The platform's user base treats notes as semi-private peer recommendations, and the algorithm visibly punishes content that reads as broadcast advertising. A Chinese studio operator we exchanged messages with summarized it this way: "AI does ninety percent of the work. The last ten percent, the part where a note feels like a real person talking to a friend, that is still the operator. If you skip that part the account dies in a week." For Western creators, that may be the most useful single observation in the whole stack. The tools are leverage. The voice is still the job.