Model catalog
Current Model Studio lanes, organized for buying decisions
Start with the primary models customers ask about first, then review the full official console taxonomy: Flagship, Cost-optimized, Visual, Wan, Audio, Multimodal, Embeddings, Third-party, and Older.
This page mirrors the visible Model Studio console categories for buyer review. It is not an Alibaba Cloud official quote, and final availability or price must be confirmed in the official console before purchase.
Primary models
Six main model lanes to review first
These are the main current options for the buyer conversation. They stay above the catalog so older Qwen3/Qwen3.5 entries do not accidentally look like the recommended path.
Qwen3.7
Qwen3.7-Max
The largest Qwen3.7 model for the highest-value text, coding, office, and long-running agent tasks. Use it when answer quality matters more than cost.
Model code
qwen3.7-max
Capability
Flagship text reasoning
Qwen3.6
Qwen3.6-Plus
A practical production lane for vision-language work, agentic coding, front-end programming, OCR-like extraction, and object localization.
Model code
qwen3.6-plus
Capability
Balanced multimodal production
DeepSeek
DeepSeek V4 Pro
A high-end MoE reasoning lane for technical research, sophisticated office workflows, long-form analysis, code, and math-heavy tasks.
Model code
deepseek-v4-pro
Capability
Reasoning, code, math
Kimi
Kimi K2.6
A long-context model for coding, instruction following, visual input, reasoning and non-reasoning dialogue, and agent-based tasks.
Model code
kimi-k2.6
Capability
Long context and visual agent tasks
GLM
GLM 5.1
A long-horizon model for logical reasoning, long-text understanding, code generation, summaries, and developer assistance.
Model code
glm-5.1
Capability
Long-horizon logic
MiniMax
MiniMax M2.5
A MiniMax flagship lane for programming, tool invocation, search, office work, and productivity scenarios.
Model code
MiniMax-M2.5
Capability
Agent and tool-use productivity
Source checked from Alibaba Cloud Model Studio Console on 2026-06-08. Final quotes must confirm region, account route, deployment mode, official availability, quota, ModelSmarter service fee, taxes, and promotion status.Official source
Official taxonomy
The console currently separates the catalog into nine lanes
This map follows the visible Model Studio console tabs. Some official models appear in more than one lane; the site keeps those lane placements while routing each unique model code to one detail page.
Groups
9
Entries
96
Details
90
Jump by category
Official catalog map
Catalog category
Flagship
Versatile high-intelligence models for premium reasoning, coding, agent, and multilingual workloads.
Qwen3.7
Qwen3.7-Plus
Cost-effective Qwen3.7 model with upgraded vision-language ability, coding, tool use, GUI perception, and productivity workflow support.
Model code
qwen3.7-plus
Capability
Multimodal agent foundation
Qwen3.7
Qwen3.7-Max
The largest Qwen3.7 model for the highest-value text, coding, office, and long-running agent tasks. Use it when answer quality matters more than cost.
Model code
qwen3.7-max
Capability
Flagship text reasoning
Qwen3.6
Qwen3.6-Plus
A practical production lane for vision-language work, agentic coding, front-end programming, OCR-like extraction, and object localization.
Model code
qwen3.6-plus
Capability
Balanced multimodal production
Qwen3.6
Qwen3.6-Max
Preview Max lane with stronger vibe coding, coding-agent execution, front-end development, and long-tail knowledge retention.
Model code
qwen3.6-max-preview
Capability
Preview flagship text
Qwen3.6
Qwen3.6-Open-Source
Dense 27B vision-language model with improved agentic coding, STEM reasoning, object localization, detection, and OCR-like work.
Model code
qwen3.6-27b
Capability
Open-source vision-language
Qwen3.5
Qwen3.5-Plus
Prior Qwen3.5 plus lane still listed in the official catalog, but not used as a current primary homepage recommendation.
Model code
qwen3.5-plus
Capability
Prior generation plus
Qwen3.5
Qwen3.5-Open-Source
Open-source Qwen3.5 native vision-language model with hybrid architecture and efficient inference.
Model code
qwen3.5-27b
Capability
Prior open-source lane
Qwen3
Qwen3-Max
Earlier Qwen3 Max catalog item. It remains in the official catalog map but should not replace Qwen3.7 in primary messaging.
Model code
qwen3-max
Capability
Earlier flagship
Qwen
Qwen-Plus
Enhanced general Qwen lane with thinking and non-thinking mode support in current snapshots.
Model code
qwen-plus
Capability
General plus
Qwen3
Qwen3-Coder-Plus
Code generation model with strong coding-agent, tool invocation, and environment interaction ability.
Model code
qwen3-coder-plus
Capability
Coding agent
Qwen
Qwen-Plus-Character
Role-playing model optimized for character instruction following, conversation progression, and empathy.
Model code
qwen-plus-character
Capability
Role-play
Qwen3
Qwen3-Open-Source
Open-source Qwen3 family covering hybrid, thinking, and non-thinking model styles.
Model code
qwen3-coder-next
Capability
Open-source family
Catalog category
Cost-optimized
Fast, cost-effective models for high-volume production, support, translation, and lighter agent work.
Qwen3.6
Qwen3.6-Flash
Cost-optimized Qwen3.6 lane with stronger coding, math, spatial intelligence, object localization, and detection than the prior flash generation.
Model code
qwen3.6-flash
Capability
Fast multimodal
Qwen3.6
Qwen3.6-Open-Source
Open-source Qwen3.6 dense model listed in the cost-optimized section for economical evaluation.
Model code
qwen3.6-27b
Capability
Open-source cost lane
Qwen3.5
Qwen3.5-Flash
Prior generation flash lane still listed by the official console, not a current homepage recommendation.
Model code
qwen3.5-flash
Capability
Prior flash lane
Qwen3.5
Qwen3.5-Open-Source
Open-source Qwen3.5 native vision-language model also visible in the cost-optimized section of the console.
Model code
qwen3.5-27b
Capability
Prior open-source cost lane
Qwen MT
Qwen-MT-Lite
Translation model for 32 languages with terminology, format preservation, and domain adaptation support.
Model code
qwen-mt-lite
Capability
Low-cost translation
Qwen3
Qwen-Flash
Fast general Qwen3-series lane with thinking/non-thinking switching and long context support.
Model code
qwen-flash
Capability
Fast general model
Qwen3
Qwen3-Coder-Flash
Cost-optimized coding model focused on repository-level understanding and stable tool calling.
Model code
qwen3-coder-flash
Capability
Fast coding agent
Qwen MT
Qwen-MT-Flash
Fast translation model for 92 languages with terminology and format preservation.
Model code
qwen-mt-flash
Capability
Fast translation
Qwen MT
Qwen-MT-Plus
Higher-quality translation lane for accurate and natural specialized-domain translation.
Model code
qwen-mt-plus
Capability
Flagship translation
Qwen
Qwen-Flash-Character
Fast character interaction model for multilingual role consistency and scenario-based dialogue.
Model code
qwen-flash-character
Capability
Fast role-play
Qwen3
Qwen3-Open-Source
Open-source Qwen3 family also visible in the cost-optimized console section.
Model code
qwen3-coder-next
Capability
Open-source family
Catalog category
Visual
Visual understanding, image generation, image editing, video generation, and visual-agent models.
HappyHorse
HappyHorse-I2V
Image-to-video generation with source-image consistency and fluid dynamic rendering.
Model code
happyhorse-1.0-i2v
Capability
Image to video
HappyHorse
HappyHorse-T2V
Text-to-video generation for cinematic creative output and realistic motion details.
Model code
happyhorse-1.0-t2v
Capability
Text to video
HappyHorse
HappyHorse-R2V
Reference-to-video generation that preserves subject and scene references across motion.
Model code
happyhorse-1.0-r2v
Capability
Reference to video
HappyHorse
HappyHorse-Video-Edit
Natural-language local or global video editing with reference images and motion preservation.
Model code
happyhorse-1.0-video-edit
Capability
Video editing
Qwen Image
Qwen-Image-2.0
Accelerated image generation and editing model with stronger text rendering and realistic texture control.
Model code
qwen-image-2.0
Capability
Image generation / editing
Qwen Image
Qwen-Image-2.0-Pro
Full-featured Qwen Image 2.0 lane with the strongest text rendering and lifelike texture quality in the series.
Model code
qwen-image-2.0-pro
Capability
High-quality image generation
Qwen Image
Qwen-Image-Max
High-quality Qwen image model for realism, human material texture, and better text rendering.
Model code
qwen-image-max
Capability
Image generation
Qwen Image
Qwen-Image-Edit-Max
Image editing model for industrial design, geometry, character consistency, and broader editing functions.
Model code
qwen-image-edit-max
Capability
Image editing
Z-Image
Z-Image-Turbo
Efficient image-generation model focused on photo-realistic output and bilingual text rendering.
Model code
z-image-turbo
Capability
Open-source image generation
Qwen3 VL
Qwen3-VL-Plus
Visual-language model with visual agent, visual coding, spatial perception, and multimodal reasoning upgrades.
Model code
qwen3-vl-plus
Capability
Visual understanding
Qwen3 VL
Qwen3-VL-Flash
Fast visual understanding model for long videos, documents, spatial awareness, and localization tasks.
Model code
qwen3-vl-flash
Capability
Fast visual understanding
Qwen VL
Qwen-VL-OCR
OCR-oriented visual-language model for image-text recognition, parsing, and processing tasks.
Model code
qwen-vl-ocr
Capability
OCR-like visual extraction
Qwen3
Qwen3-Open-Source
Open-source Qwen3 family listed in the visual section of the console for visual and hybrid model coverage.
Model code
qwen3-coder-next
Capability
Open-source visual family
Qwen Image
Qwen-Image-Plus
Image foundation model with complex text rendering and precise image editing capabilities.
Model code
qwen-image-plus
Capability
Image generation
Qwen Image
Qwen-Image-Edit-Plus
Image editing plus lane with faster inference, stability, and multi-image return support.
Model code
qwen-image-edit-plus
Capability
Image editing
Wan2.7
Wan-I2V
Image-to-video model that preserves subject, style, and text details through dynamic transitions.
Model code
wan2.7-i2v
Capability
Image to video
Wan2.7
Wan-T2V
Text-to-video model for smooth motion generation and cinematic direction control.
Model code
wan2.7-t2v
Capability
Text to video
Catalog category
Wan
Alibaba Wan image and video generation models for text, image, reference, editing, and VACE workflows.
Wan2.7
Wan-R2V
Reference-to-video model that preserves the look and voice of people or objects from reference material.
Model code
wan2.7-r2v
Capability
Reference to video
Wan2.7
Wan-VideoEdit
Prompt-driven video editing for local/global edits, video reshaping, and video transfer.
Model code
wan2.7-videoedit
Capability
Video editing
Wan2.7
Wan-I2V
Image-to-video generation with subject and style consistency.
Model code
wan2.7-i2v
Capability
Image to video
Wan2.7
Wan-T2V
Text-to-video generation with cinematic aesthetics and instruction adherence.
Model code
wan2.7-t2v
Capability
Text to video
Wan2.6
Wan-T2I
Text-to-image generation for photorealistic texture and accurate text rendering.
Model code
wan2.6-t2i
Capability
Text to image
Wan2.7
Wan2.7-Image
Wan2.7 image lane for generation, editing, style transformation, and visual production. Confirm exact price in the official console.
Model code
wan2.7-image
Capability
Image generation / editing
Wan2.7
Wan-Image
Wan image model for local edits, style transformations, object replacement, and contextual consistency.
Model code
wan2.7-image-pro
Capability
Image generation / editing
Wan2.1
Wan2.1-VACE-Plus
Unified video editing and generation model for repainting, expansion, extension, and image referencing.
Model code
wan2.1-vace-plus
Capability
Unified video editing
Catalog category
Audio
Speech recognition, translation, voice design, voice cloning, and real-time speech synthesis models.
Qwen3.5
Qwen3.5-LiveTranslate-Flash-Realtime
Realtime multilingual audio and video interpretation model with broad language coverage.
Model code
qwen3.5-livetranslate-flash-realtime
Capability
Realtime translation
Qwen Speech
Qwen-Voice-Design
Voice design service that creates a suitable voice from a text description.
Model code
qwen-voice-design
Capability
Voice design
Qwen3 TTS
Qwen3-TTS-VD-Realtime
Realtime speech synthesis on voices designed by Qwen voice-design services.
Model code
qwen3-tts-vd-realtime
Capability
Realtime voice-design TTS
Qwen3 TTS
Qwen3-TTS-VC-Realtime
Realtime speech synthesis using replicated voices with consistent timbre across languages.
Model code
qwen3-tts-vc-realtime
Capability
Realtime voice-clone TTS
Qwen3
Qwen3-LiveTranslate-Flash
Multilingual simultaneous audio/video interpretation for offline and realtime workflows.
Model code
qwen3-livetranslate-flash
Capability
Live translation
Qwen Speech
Qwen-Voice-Enrollment
Voice replication service that can clone a similar voice from short audio samples.
Model code
qwen-voice-enrollment
Capability
Voice enrollment
Fun-ASR
Fun-ASR-Realtime
Realtime speech recognition with hotword, punctuation, ITN, dialect, and noise-robust capabilities.
Model code
fun-asr-realtime
Capability
Realtime ASR
CosyVoice
CosyVoice
Generative speech model that converts text into natural human-like speech.
Model code
cosyvoice-v3-plus
Capability
Speech synthesis
Qwen3 TTS
Qwen3-TTS-VC
Offline speech synthesis using replicated voices with adaptive tone control.
Model code
qwen3-tts-vc
Capability
Voice-clone TTS
Qwen3 TTS
Qwen3-TTS-VD
Offline speech synthesis using designed voices and adaptive tone control.
Model code
qwen3-tts-vd
Capability
Voice-design TTS
Qwen3 TTS
Qwen3-TTS-Instruct-Flash
Speech synthesis model that adjusts emotion and expression through natural-language instructions.
Model code
qwen3-tts-instruct-flash
Capability
Instructional TTS
Qwen3 TTS
Qwen3-TTS-Flash
Fast offline TTS model with expressive voices, multilingual support, and stable synthesis.
Model code
qwen3-tts-flash
Capability
Fast TTS
Qwen3 TTS
Qwen3-TTS-Flash-Realtime
Realtime TTS model for low-latency and stable speech generation.
Model code
qwen3-tts-flash-realtime
Capability
Realtime fast TTS
Qwen3 TTS
Qwen3-TTS-Instruct-Flash-Realtime
Realtime instructable speech synthesis with emotion and expression control.
Model code
qwen3-tts-instruct-flash-realtime
Capability
Realtime instructional TTS
Fun-ASR
Fun-ASR
Speech recognition model for Mandarin, English, Japanese, dialects, and noisy environments.
Model code
fun-asr
Capability
Speech recognition
Qwen3
Qwen3-LiveTranslate-Flash-Realtime
Realtime version of Qwen3 live translation for simultaneous interpretation workflows.
Model code
qwen3-livetranslate-flash-realtime
Capability
Realtime live translation
Qwen3 ASR
Qwen3-ASR-Flash-Filetrans
Large-file transcription model based on Qwen3-ASR-Flash.
Model code
qwen3-asr-flash-filetrans
Capability
File transcription
Qwen3 ASR
Qwen3-ASR-Flash
Accurate multilingual speech recognition model based on a large language model.
Model code
qwen3-asr-flash
Capability
Speech recognition
Qwen3 ASR
Qwen3-ASR-Flash-Realtime
Realtime speech recognition model for multilingual, complex audio environments.
Model code
qwen3-asr-flash-realtime
Capability
Realtime speech recognition
CosyVoice
Voice-Enrollment
Voice cloning service that extracts and prepares voices for speech generation.
Model code
voice-enrollment
Capability
Voice cloning
Catalog category
Multimodal
Omni-modal models supporting text, images, audio, and video understanding and interaction.
Qwen3.5 Omni
Qwen3.5-Omni-Flash-Realtime
Realtime omni-modal model for voice assistants, multimedia analysis, interruptions, and tool calls.
Model code
qwen3.5-omni-flash-realtime
Capability
Realtime omni-modal
Qwen3.5 Omni
Qwen3.5-Omni-Flash
Omni-modal flash model for long audio and audio-visual understanding.
Model code
qwen3.5-omni-flash
Capability
Fast omni-modal
Qwen3.5 Omni
Qwen3.5-Omni-Plus
Plus omni-modal lane for audio, video, image, and text understanding and interaction.
Model code
qwen3.5-omni-plus
Capability
Plus omni-modal
Qwen3.5 Omni
Qwen3.5-Omni-Plus-Realtime
Realtime plus omni-modal model for controllable voice dialogue and complex function calls.
Model code
qwen3.5-omni-plus-realtime
Capability
Realtime plus omni-modal
Qwen3 Omni
Qwen3-Omni-Flash-Realtime
Realtime Qwen3 Omni Flash model for efficient multimodal understanding and speech generation.
Model code
qwen3-omni-flash-realtime
Capability
Realtime omni-modal
Qwen3 Omni
Qwen3-Omni-Flash
Qwen3 Omni Flash model supporting text, image, audio, and video interaction.
Model code
qwen3-omni-flash
Capability
Omni-modal flash
Qwen3 Omni
Qwen3-Omni-30B-A3B-Captioner
Fine-grained audio analysis model for generating accurate descriptions of complex audio scenes.
Model code
qwen3-omni-30b-a3b-captioner
Capability
Audio captioning
Catalog category
Embeddings
Embedding and reranking models for search, RAG, clustering, classification, recall, and ranking.
Qwen3
Qwen-Rerank
Text-ranking model for reranking query and document relevance across 100+ languages.
Model code
qwen3-rerank
Capability
Reranking
Qwen3
Qwen-Embedding
Multilingual text vector model for retrieval, clustering, classification, and RAG.
Model code
text-embedding-v4
Capability
Text embeddings
Tongyi embedding
Domain Embedding
Domain-specific multimodal representation model for e-commerce, security, photo albums, and autonomous driving retrieval.
Model code
tongyi-embedding-vision-plus
Capability
Domain embedding
Catalog category
Third-party
Third-party model families available through the official Model Studio route, subject to region and account support.
DeepSeek
DeepSeek V4 Pro
A high-end MoE reasoning lane for technical research, sophisticated office workflows, long-form analysis, code, and math-heavy tasks.
Model code
deepseek-v4-pro
Capability
Reasoning, code, math
Kimi
Kimi K2.6
A long-context model for coding, instruction following, visual input, reasoning and non-reasoning dialogue, and agent-based tasks.
Model code
kimi-k2.6
Capability
Long context and visual agent tasks
GLM
GLM 5.1
A long-horizon model for logical reasoning, long-text understanding, code generation, summaries, and developer assistance.
Model code
glm-5.1
Capability
Long-horizon logic
MiniMax
MiniMax M2.5
A MiniMax flagship lane for programming, tool invocation, search, office work, and productivity scenarios.
Model code
MiniMax-M2.5
Capability
Agent and tool-use productivity
DeepSeek
DeepSeek V4 Flash
Lightweight DeepSeek MoE model for fast, low-latency, cost-effective everyday dialogue, content, RAG, and batch work.
Model code
deepseek-v4-flash
Capability
Fast reasoning
DeepSeek
DeepSeek V3.2
DeepSeek model with sparse attention and reasoning integrated into tool usage.
Model code
deepseek-v3.2
Capability
Stable reasoning and tool use
Kimi
Kimi K2.5
Kimi stable lane for long-context business and coding review, subject to console availability.
Model code
kimi-k2.5
Capability
Long-context stable lane
GLM
GLM 5
GLM stable lane for general enterprise assistant and reasoning workloads, subject to console availability.
Model code
glm-5
Capability
General stable lane
Catalog category
Older
Older models still visible in the official catalog. Use them only after confirming support and deprecation status.
Qwen visual reasoning
QVQ-Max
Older visual reasoning model for math, programming, visual analysis, creation, and general tasks.
Model code
qvq-max
Capability
Older visual reasoning
Qwen2.5
Qwen-Max
Older super-large Qwen model. Check official lifecycle status before use.
Model code
qwen-max
Capability
Older flagship
QwQ
Qwen-QwQ-Plus
Older reasoning model trained from Qwen2.5 and marked as retiring in the official catalog.
Model code
qwq-plus
Capability
Older reasoning
Qwen
Qwen-Turbo
Older fast Qwen lane. Check current snapshot and lifecycle before recommending.
Model code
qwen-turbo
Capability
Older fast text
Qwen VL
Qwen-VL-Max
Older large vision-language model for high-level visual perception and cognition.
Model code
qwen-vl-max
Capability
Older vision-language
Qwen VL
Qwen-VL-Plus
Older enhanced vision-language model for detail and text recognition.
Model code
qwen-vl-plus
Capability
Older vision-language plus
Qwen MT
Qwen-MT-Turbo
Older fast translation model for 92 languages with terminology and format support.
Model code
qwen-mt-turbo
Capability
Older fast translation
Qwen2.5
Qwen2.5-Open-Source
Older Qwen2.5 open-source family covering text, visual, and multimodal models.
Model code
qwen2.5-omni-7b
Capability
Older open-source family
Qwen Omni
Qwen-Omni-Turbo
Older omni-modal model for mixed input understanding and streaming text/speech generation.
Model code
qwen-omni-turbo
Capability
Older omni-modal
Qwen Omni
Qwen-Omni-Turbo-Realtime
Older realtime omni-modal model for audio interaction scenarios.
Model code
qwen-omni-turbo-realtime
Capability
Older realtime omni-modal
Token Plan
Token Plan support is narrower than the whole API catalog
Token Plan Team Edition has its own key, base URL, supported tools, and model restrictions. Do not sell it as generic backend API access until the official terms and use case are confirmed.
Supported plan models
Token Plan support is limited to the models listed in the official plan route.
Text models use OpenAI-compatible calls and are charged by Credits.
Image models require the documented Skill / Agent or multimodal route and should not be treated as normal text calls.
Promotions, discounts, region support, and model availability must be checked in the official console before payment.
Tooling route
Coding tools need plan-specific endpoints and model names. A final customer handoff should record the tool, key source, endpoint, selected model, renewal/reset timing, and quota owner.
Procurement checks
The catalog is broad, but the quote still needs discipline
The page now covers the full checked console catalog. The buyer conversation should still focus on exact model code, route, price unit, region, quota, and whether the model belongs in Token Plan or direct API usage.
Buyer check
Exact model code
Use the model identifier from the official console, including version dots, family suffixes, and realtime/media variants.
Buyer check
Price route
Check whether the model is priced by input/output tokens, cache, images, video seconds, audio, voice enrollment, or tool calls.
Buyer check
Region and account
Confirm whether the buyer can use the model through the intended international or Chinese Mainland account route.
Buyer check
Token Plan fit
Separate Token Plan Team Edition support from the broader API catalog so coding seats are not sold as generic API quota.
Buyer check
Lifecycle status
Older models remain visible only as catalog references and require support/deprecation confirmation before quoting.
Buyer check
Written quote
Final price, taxes, promotions, quota, rate limits, and service scope are confirmed before payment.
Buyer questions
What we clarify before a quote
The buyer does not only need a model name. They need a safe answer for region, endpoint, billing, quota, model family, tooling, monitoring, and support handoff.
Flagship text models
Qwen flagship and high-intelligence text lanes for reasoning, coding, agents, and business workflows.
Vision understanding
Multimodal models for images, charts, documents, screenshots, and visual reasoning tasks.
Image generation
Qwen and Wan image models for text-to-image, image editing, multi-image fusion, and visual production.
Video generation
Wan video models for text-to-video, image-to-video, reference video, and general editing scenarios.
Speech & audio
Speech synthesis, real-time speech recognition, audio file transcription, and voice workflows.
Embeddings & reranking
Search, RAG, classification, clustering, recommendation, and retrieval-quality improvement.