Model catalog
Model Catalog by Capability
The catalog spans text generation, multimodal, image generation/editing, video generation/editing, speech, embeddings, reranking, and domain models.
Who this is for
Sales engineers translating technical model options into buyer choices.
Configuration reference
Values to confirm before setup
Text
Qwen commercial and open-source families
Vision and multimodal
Image, document, screenshot, and multimodal reasoning
Media generation
Image and video generation/editing
Retrieval
Embeddings and reranking for search and RAG
Setup flow
Practical steps
- 01Start from the customer's deliverable, not the model name.
- 02Choose the capability category.
- 03Check region availability.
- 04Estimate token or media-generation cost.
- 05Prepare a fallback option.
Category map
A customer building support automation may need text plus embeddings. A creative workflow may need image/video generation. A document workflow may need vision and long-context text. The quote should explain the category mix.
Common mistakes
Check these before escalating
- Do not offer a media model through a text-only tool route.
- Some features are only available in specific regions or API families.
- Model names and snapshots change over time.
Related guides
Text Generation
Text generation covers chat, reasoning, translation, summarization, coding, extraction, classification, and agent workflows.
Vision Understanding
Vision models help analyze images, screenshots, charts, documents, and other visual inputs. Buyers need to clarify input format and expected answer type.
Image Generation and Editing
Image generation and editing can support creative production, product visuals, image fusion, style transfer, and prompt-driven asset creation.
Video Generation and Editing
Video models support text-to-video, image-to-video, reference-based generation, and editing workflows. These require stricter expectations than text demos.