Inside China: How E-commerce Sellers Generate 50 Product Shots a Day for $1

How Chinese e-commerce operators ship 50 AI product shots a day using Seedream, Doubao, Kling, and Jianying.

china-ai

ecommerce

product-photography

ai-workflow

seedream

kling

The Operators You Have Never Heard Of

Walk into any co-working space in Hangzhou's Binjiang district at 9 a.m. and you will find a category of worker that barely exists in the West: the one-person product photography studio that ships fifty hero images a day without ever holding a camera. They call themselves 电商美工 (e-commerce designers) or 主图设计 (main-image designers), and their job description has quietly mutated. Two years ago they were Photoshop specialists retouching shots taken in a rented studio. Today most of them have not touched a camera since 2024.

The scale is the first thing Western creators tend to disbelieve. A typical operator running a Taobao or Pinduoduo storefront refreshes product imagery every seven to fourteen days because the platforms reward listings whose main image (主图) has been recently updated. A mid-sized seller with 80 SKUs needs roughly 400 to 600 new images per month: hero shots, lifestyle scenes, infographic-style 详情页 (detail pages), and short-video thumbnails. Doing this with a physical studio at Western rates would cost around fifteen to twenty thousand dollars a month. Doing it with the workflow described below costs the operators we have studied somewhere between thirty and ninety dollars a month in API and subscription fees, plus their own labor.

The reason this is unusual to Westerners is partly tooling and partly culture. The tools the Chinese operators use most heavily are not available, or not promoted, in English. The cultural piece is that platform-driven A/B testing of imagery is treated as a baseline operational discipline rather than a growth-hack, so the volume of images required is structurally higher than what a Shopify or Etsy seller would produce. The workflow is built for that volume.

The Actual Workflow, Step by Step

The pipeline most operators converge on has four stages: source, generate, polish, and deploy. Almost nobody runs all of these inside a single tool. The stack is glued together by hand and by WeChat groups where prompts and presets are traded daily.

Stage 1: Source the Reference

The starting point is rarely a creative brief. It is a SKU photo from the supplier, often a flat, badly lit pack-shot taken on a folding table. The operator's job is to turn that into a usable hero image. Most teams begin in 即梦 (Jimeng), ByteDance's consumer-facing image and video studio, which exposes the Seedream family of image models. Seedream 3.0 and the newer 4.0 have become the default for product imagery in China for one specific reason: their handling of Chinese-language text rendered inside an image is markedly better than Midjourney or DALL-E, which matters because Chinese e-commerce hero images almost always carry a price tag, a slogan, or a discount sticker rendered into the artwork.

For lifestyle scenes that need a human model holding the product, operators switch to 豆包 (Doubao), also ByteDance, which lets you feed a reference photo of the product plus a prompt describing the model and setting. Doubao's image endpoint does a competent job of preserving the product silhouette while changing the surroundings, which is the single most useful operation in the whole pipeline. Westerners reaching for the equivalent tend to use Flux Kontext or a custom ComfyUI graph; the Chinese operators just hit the API.

A second school uses 可灵 (Kling) from Kuaishou, particularly when the deliverable is a short video rather than a still. Kling 1.6 and 2.0 are now strong enough on physical motion (cloth falling, liquid pouring, a hand picking up the product) that operators routinely generate a five-second clip and pull a still frame out of it for the main image. This is the inverse of how most Western creators think about the problem.

Stage 2: Generate at Volume

Volume is where the Chinese workflow really diverges. An operator we observed in a Hangzhou WeChat group ran what she called a 批量出图 (batch-generation) loop: a Google Sheet with one row per SKU, columns for the reference image URL, the scene prompt, the desired aspect ratio, and the price-tag text. A small Python script (passed around in the group as a 50-line gist) walked the sheet, called the Jimeng API for each row, and dropped the outputs into a dated folder on her desktop. She generated 240 candidate images in about forty minutes and kept roughly one in four.

The discipline here is worth naming. Western AI image workflows tend to optimize for the single best output per prompt; the Chinese operators optimize for hit rate across a batch. They do not iterate prompts in a chat window. They write the prompt once, run it across every SKU, and triage the results visually. This is closer to how a print shop thinks than how a designer thinks.

Stage 3: Polish

Almost no generated image goes live without a polish pass. The dominant tools here are 美图设计室 (Meitu Design Studio) and 稿定设计 (Gaoding), both of which are best understood as Canva-equivalents that have been retrofitted with strong AI features specifically tuned for e-commerce: one-click background swap, AI-driven 抠图 (cutout/matting) that handles fur and transparent objects better than Photoshop's Select Subject in our testing, and template libraries pre-built for the exact pixel dimensions Taobao, Tmall, JD, and Pinduoduo demand.

The polish pass is where text gets layered on. Even though Seedream can render Chinese characters, operators do not trust it for the price (the number that drives clicks) and overlay that in Meitu or Gaoding instead. A typical polish takes ninety seconds per image once the operator has built up muscle memory.

For video deliverables, the polish tool is universally 剪映 (Jianying), the domestic version of CapCut. Jianying's pro tier (剪映专业版) includes AI-driven auto-captioning in Mandarin, auto-cut-to-music, and a digital-human (数字人) feature that lets an operator paste a script and get a presenter video back in under five minutes. The presenter is not photoreal under scrutiny but is more than acceptable for a six-second product clip on Douyin.

Stage 4: Deploy and Test

The deployment surface is not just the storefront. It is, in priority order, Douyin (the domestic TikTok, where short product videos drive most impulse-purchase traffic), 小红书 (Xiaohongshu) (lifestyle-led social commerce, where the same product needs a softer, more editorial image), the storefront itself on Taobao or Pinduoduo, and finally WeChat broadcast channels and group chats. Each surface wants slightly different crops and tones, which is why the polish stage is template-driven rather than freehand.

The A/B testing is platform-native. Taobao's 万相台 and JD's 京准通 ad systems will rotate up to four candidate main images automatically and report click-through rate within hours. Operators treat the AI generator as the top of a funnel that ends in a CTR number on a dashboard, not as a creative endpoint.

Cost Breakdown in USD

The numbers below are based on observed pricing from the Chinese AI platforms and operator self-reports gathered in late 2025 and early 2026. Treat them as roughly accurate rather than precise to the cent.

即梦 (Jimeng / Seedream): roughly 0.02 to 0.04 USD per generated still at standard resolution; subscription packages bring this down to about 0.01 USD per image at volume.
豆包 (Doubao) image API: roughly 0.03 USD per image when called directly; bundled inside ByteDance's Volcengine enterprise plan it can drop below 0.02.
可灵 (Kling) video: roughly 0.30 to 0.80 USD per five-second clip depending on model tier, with the standard tier sitting near 0.40.
美图设计室 (Meitu) pro subscription: about 5 to 8 USD per month for a single seat with unlimited cutouts and template access.
剪映 (Jianying) pro tier: about 7 USD per month per seat.

Putting this together for an operator producing fifty stills a day plus a handful of short videos:

Stills: 50 per day x 30 days = 1500 stills. At 0.02 USD per generated candidate and a 4:1 keep ratio (so 4 candidates per kept image), that is 6000 generations at 120 USD per month. Operators who buy subscription bundles trim this to roughly 60 to 80 USD.
Videos: 30 short clips per month at 0.40 USD = 12 USD.
Polish tools: 5 to 8 USD plus 7 USD = roughly 13 USD.
Total: roughly 85 to 135 USD per month for an output that would cost a Western creator using Midjourney plus Adobe Creative Cloud roughly 80 USD in subscriptions but far more in time.

Per-image, the marginal cost of one finished hero shot lands somewhere between 0.05 and 0.10 USD in tooling fees. The headline "fifty shots a day for one dollar" tracks if you only count the API calls for the kept images; it does not count subscriptions or labor, but it is not a fabrication.

What Western Creators Can Copy

Several pieces of this workflow port over cleanly. The batch mindset is the most valuable import. Building a Google Sheet of SKUs with one row per prompt and running it through the Replicate, Fal, or Together API against Flux or Stable Diffusion 3.5 is structurally identical to the Jimeng loop, and Fal's pricing on Flux schnell or dev is competitive with Seedream once you account for the keep ratio. A weekend of Python plumbing replaces what Chinese operators get from copy-pasted WeChat scripts.

The triage discipline is also portable. Generating 200 candidates and keeping 50 produces better results than iterating one prompt 50 times, and most Western creators we have watched do the opposite. Adopting a hard rule of "write the prompt once, run it across the batch, decide visually" is a free upgrade.

The polish stack has direct analogues. Canva, Photoroom, and Recraft cover most of what Meitu and Gaoding do, with Photoroom's batch background generator coming closest to the Chinese template-driven approach. CapCut is the same product as Jianying; the digital-human feature is gated by region but HeyGen and Synthesia fill that gap at a higher price point.

The single hardest piece to replicate is the platform-native A/B testing loop. Shopify, Etsy, and Amazon do not expose anything as direct as Taobao's 万相台 image-rotation reporting. Western sellers who want this discipline have to build it themselves with a third-party tool such as Intelligems or by manually rotating images on a schedule and reading conversion data out of analytics. It is doable but it is not the default.

What Western Creators Cannot Easily Copy

A few pieces simply do not transfer.

The Chinese-language text rendering advantage in Seedream is irrelevant to a Western seller. More importantly, the assumption that a hero image carries a hard-coded price tag and slogan is a Chinese platform convention. Western marketplaces penalize that style; Amazon explicitly forbids price text on the main image. The visual grammar is different and copying it wholesale produces images that look spammy to a Western audience.

The cost structure is also propped up by domestic Chinese pricing on the foundation models. ByteDance subsidizes Doubao and Jimeng aggressively as part of its broader AI strategy, and Kuaishou does the same with Kling. International access to these models, where it exists at all, is priced two to five times higher and routed through Volcengine or Singapore endpoints with latency penalties. The 0.02 USD per image figure is a domestic-China figure.

There is a regulatory layer Western creators rarely think about. Generated imagery on Chinese platforms is subject to the 生成式人工智能服务管理暂行办法 (Interim Measures for the Management of Generative AI Services), which require the Chinese AI providers to watermark and log AI-generated content. Operators using domestic tools are inside that compliance bubble by default. Anyone trying to feed outputs from Midjourney or Flux into Chinese storefronts is technically operating outside it, which has not been heavily enforced against small sellers but is a real risk vector. The flip side, for Western creators, is that EU AI Act provenance requirements are starting to push in the same direction; the Chinese watermarking infrastructure may end up looking like an early preview rather than an exotic case.

Finally, the labor model is hard to replicate. The operators in question typically earn 8000 to 15000 RMB a month (roughly 1100 to 2100 USD) and treat the AI tooling as a productivity multiplier on top of a wage that is already cost-effective. A Western creator charging 60 USD an hour faces a different math problem; the AI tools save time but not in the same dramatic ratio.

What an Operator Actually Said

An operator we spoke with through a Hangzhou e-commerce WeChat group, who runs imagery for three home-goods storefronts on Taobao and Pinduoduo, described her shift this way: she used to spend her Mondays at a rented studio shooting twenty SKUs and her Tuesdays through Fridays in Photoshop. Now she spends Mondays writing prompts in a spreadsheet, Tuesdays running batches and triaging, Wednesdays in Meitu polishing, and Thursdays and Fridays editing short videos in Jianying. Her output went from roughly 80 finished images a week to roughly 350. Her income, she said, did not triple; what changed is that she now manages three storefronts instead of one, and the marginal storefront takes about a day a week.

That last point is the one Western creators tend to miss. The Chinese AI imagery stack is not primarily a cost-cutter. It is a leverage tool that lets one operator hold the workload that used to require a small team. The dollar-a-day headline is real but it is downstream of the leverage, not the point of it.