After I published my previous AI image generation experiment, I thought, why not benchmark text-to-video models available through Vercel AI Gateway as well.
This time I added some action to the puppy prompt:
A golden retriever puppy chasing a butterfly through a sunflower field at golden hour, cinematic shallow depth of field, soft warm light, slow camera dolly forward.
Tried to make the models consistently produce clips of 1280x720 (720p), 16:9, 4-5s, with audio where possible.
Jump straight to the winners if you don't have much time, or check the data below.
Full Results
| Model | Cost | Generation time | Date | Video |
|---|---|---|---|---|
alibaba/wan-v2.5-t2v-preview |
$0.5 | 115.3s | 2026-05-18 | video |
alibaba/wan-v2.6-t2v |
$0.5 | 53.4s | 2026-05-17 | video |
bytedance/seedance-v1.0-pro |
$0.2575 | 53.0s | 2026-05-18 | video |
bytedance/seedance-v1.0-pro-fast |
$0.103 | 44.3s | 2026-05-18 | video |
bytedance/seedance-v1.5-pro |
$0.1295 | 73.9s | 2026-05-18 | video |
bytedance/seedance-2.0 |
$0.7623 | 170.4s | 2026-05-17 | video |
bytedance/seedance-2.0-fast |
$0.60984 | 117.9s | 2026-05-18 | video |
google/veo-3.0-generate-001 |
$1.6 | 85.1s | 2026-05-18 | video |
google/veo-3.0-fast-generate-001 |
$0.6 | 64.1s | 2026-05-18 | video |
google/veo-3.1-generate-001 |
$1.6 | 54.6s | 2026-05-17 | video |
google/veo-3.1-fast-generate-001 |
$0.6 | 65.6s | 2026-05-18 | video |
klingai/kling-v2.5-turbo-t2v |
$0.211722 | 47.8s | 2026-05-18 | video |
klingai/kling-v2.5-turbo-t2v-pro |
$0.35287 | 81.5s | 2026-05-18 | video |
klingai/kling-v2.6-t2v |
$0.211722 | 57.0s | 2026-05-18 | video |
klingai/kling-v2.6-t2v-pro |
$0.70574 | 90.9s | 2026-05-18 | video |
klingai/kling-v3.0-t2v |
$1.270332 | 41.6s | 2026-05-17 | video |
klingai/kling-v3.0-t2v-pro |
$1.693776 | 83.0s | 2026-05-18 | video |
xai/grok-imagine-video |
$0.35 | 49.6s | 2026-05-17 | video |
Total spent: $12.058302
- Generation time is wall time per video, measured by the benchmark script.
- Cost is returned by the gateway, so it should be accurate.
- Date is when each row's video was produced - rows may be from different runs (cache hits aren't re-generated).
Cost
Generation Time
Model Quirks
A couple of observations from running this benchmark:
- The
-fastvariants from ByteDance and Google trade a bit of quality for significantly lower cost - Veo 3.x Fast is ~62% cheaper than the standard Veo 3.x at $0.6 vs $1.6. - Kling Pro variants are consistently slower and pricier than their base counterparts without an obvious quality jump in this scene.
- Some models don't support audio generation.
Winners
If you only optimize for cost, the cheapest models in this benchmark are:
bytedance/seedance-v1.0-pro-fastat $0.103bytedance/seedance-v1.5-proat $0.1295klingai/kling-v2.5-turbo-t2vandklingai/kling-v2.6-t2vtied at $0.211722
If you optimize for latency, the fastest models are:
klingai/kling-v3.0-t2vat 41.6sbytedance/seedance-v1.0-pro-fastat 44.3sklingai/kling-v2.5-turbo-t2vat 47.8s
But to be honest, when we're talking about video, the scene, prompt-following precision, and quality matter more than cost and latency, so I'll let you choose the winner yourself.
DIY
If you want to run the benchmark yourself, add more models, or adjust the prompt - the script is easy to configure and run: kometolabs/ai-video-generation-cost-analysis
Support My Work
Preparing such research sometimes takes 2-5 full benchmark runs, which adds up. If it saved you time, consider sponsoring me on GitHub or Buy Me a Coffee.

