Model catalog

Model Catalog by Capability

The catalog spans text generation, multimodal, image generation/editing, video generation/editing, speech, embeddings, reranking, and domain models.

Model inferenceOfficial source

Who this is for

Sales engineers translating technical model options into buyer choices.

Configuration reference

Values to confirm before setup

Text

Qwen commercial and open-source families

Vision and multimodal

Image, document, screenshot, and multimodal reasoning

Media generation

Image and video generation/editing

Retrieval

Embeddings and reranking for search and RAG

Setup flow

Practical steps

01Start from the customer's deliverable, not the model name.
02Choose the capability category.
03Check region availability.
04Estimate token or media-generation cost.
05Prepare a fallback option.

Category map

A customer building support automation may need text plus embeddings. A creative workflow may need image/video generation. A document workflow may need vision and long-context text. The quote should explain the category mix.

Common mistakes

Check these before escalating

Do not offer a media model through a text-only tool route.
Some features are only available in specific regions or API families.
Model names and snapshots change over time.

Related guides

Text Generation

Text generation covers chat, reasoning, translation, summarization, coding, extraction, classification, and agent workflows.

Vision Understanding

Vision models help analyze images, screenshots, charts, documents, and other visual inputs. Buyers need to clarify input format and expected answer type.

Image Generation and Editing

Image generation and editing can support creative production, product visuals, image fusion, style transfer, and prompt-driven asset creation.

Video Generation and Editing

Video models support text-to-video, image-to-video, reference-based generation, and editing workflows. These require stricter expectations than text demos.

Model inference

All sections