Multimodal
Qwen3-Omni-30B-A3B-Captioner
Fine-grained audio analysis model for generating accurate descriptions of complex audio scenes.
Model details
Model code
qwen3-omni-30b-a3b-captioner
Category
Multimodal
Family
Qwen3 Omni
Capability
Audio captioning
Modality
Audio -> Text
Release / status
2025-09-19
Snapshot
Current model code
Source region
Console
Official detail price
Input Audio: $3.81 / 1M tokens · Output Text (When input contains images/audio/video): $3.06 / 1M tokens
Input
Audio: $3.81 / 1M tokens
Output
Text (When input contains images/audio/video): $3.06 / 1M tokens
Source region: International. This is a copied summary from the official Model Studio detail page checked on June 8, 2026. Final quotes still require official console confirmation for region, account route, quota, promotions, taxes, and current availability.
Buyer review
Questions to confirm before purchase
Source note
Catalog taxonomy and model detail price summaries were checked against Alibaba Cloud Model Studio Console on 2026-06-08. Availability, region, account route, quota, taxes, promotions, and official terms must be confirmed before purchase.
Open official console source