Design: white + black Primary provider: Alibaba Cloud Model Studio Interfaces: DashScope + OpenAI-compatible Modalities: Text + Vision (Qwen-VL) Agent APIs: Tools + Responses

Qwen API - Everything you need to build reliably

The Qwen API is how developers integrate Alibaba’s Qwen model family into apps for chat, reasoning, coding, translation, summarization, content generation, and (with Qwen-VL) vision understanding. In practice, “Qwen API” usually refers to two closely-related ways to call Qwen models: the DashScope style endpoints and an OpenAI-compatible interface where you update only the API key, base URL, and model name. This page explains both approaches, shows production-grade request patterns, and covers key platform features like Responses API compatibility, Vision compatibility, and Batch API. (Details are based on Alibaba Cloud Model Studio documentation.)

Compatibility headline: Qwen models in Alibaba Cloud Model Studio support OpenAI-compatible interfaces, so you can migrate existing OpenAI code by updating apiKey, base_url, and model. The OpenAI-compatible base URL for international usage is commonly https://dashscope-intl.aliyuncs.com/compatible-mode/v1.

What is the Qwen API?

“Qwen API” is a shorthand for programmatic access to the Qwen model family. You use it when you want a model to: answer questions, follow instructions, summarize long text, extract structured data, generate code, rewrite content, or operate as part of a larger workflow (RAG, tool use, routing, evaluation, and monitoring).

The platform is designed to be friendly for developers who already know the OpenAI ecosystem. You can either: (1) call Qwen via DashScope endpoints directly, or (2) use OpenAI-compatible endpoints and keep your existing client code patterns with minimal changes. The same idea also extends to vision models (Qwen-VL) with OpenAI-compatibility, and to newer agent-style primitives through a Responses-compatible interface.

Production rule: Treat “Qwen API” as two layers: a model layer (which model and capability tier) and an interface layer (DashScope vs OpenAI-compatible). Most engineering issues come from mixing these layers accidentally (wrong base URL, unsupported endpoint path, or model name mismatch).

Two ways to access Qwen: DashScope vs OpenAI-compatible

Alibaba Cloud Model Studio supports both “native” DashScope-style endpoints and OpenAI-compatible endpoints. The OpenAI-compatible path is the easiest for many teams because it lets you reuse existing OpenAI SDK code—swap the key, point the base URL at the compatible-mode endpoint, and set a Qwen model name.

Option A: DashScope-style endpoints (direct Qwen API reference)

DashScope-style endpoints are explicit “services/aigc/…” endpoints and are commonly used when you want to follow platform-specific docs closely, match parameter names exactly, or use examples that reference DashScope request shapes. The international endpoints documented by Alibaba Cloud include:

Capability Endpoint (Intl) Typical use
Qwen LLM text generation POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/text-generation/generation Chat, summarization, extraction, translation, coding
Qwen-VL multimodal generation POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation Image understanding, captioning, VQA, doc visual parsing
These endpoint patterns and the “DashScope API reference” approach are described in Alibaba Cloud Model Studio’s Qwen API docs. If your organization is standardizing across multiple providers, DashScope also works well behind an internal gateway because the request shapes are explicit and stable.

Option B: OpenAI-compatible mode (migrate existing OpenAI code)

OpenAI-compatible mode is designed to reduce migration friction: you can keep the same overall code shape, use familiar /chat/completions and (where supported) /responses patterns, and adjust only: API key, base URL, and model name.

For Qwen-VL vision models, Alibaba Cloud provides guidance for OpenAI-compatible usage and explicitly calls out the compatible-mode base URL: https://dashscope-intl.aliyuncs.com/compatible-mode/v1. Use this as your base_url when you want “OpenAI-style” calls.

Compatibility note: Some SDKs default to OpenAI “Responses API” first. If your integration assumes only /responses, check whether your target interface supports it, or fall back to /chat/completions. Alibaba Cloud Model Studio documents both a Chat-compatible and a Responses-compatible approach.

Model families & how to choose (practical selection guide)

Qwen is a model family rather than a single model. In real products, you should treat “model selection” as a product decision, not a one-time engineering config. Your users care about speed, accuracy, cost, and whether the model is “good at coding” or “good at reasoning,” and your system needs a clear default plus a way to upgrade when needed.

Common capability buckets (how teams typically map models)

Bucket What it optimizes for Typical app use cases
Fast / cost-efficient Low latency, high throughput, good “general” quality Chatbots, drafts, bulk summarization, short Q&A
Balanced Better instruction following and reasoning at moderate cost Support bots, knowledge assistants, extraction, tool use
Max / high capability Higher reasoning and coding performance, better reliability on complex prompts Agentic workflows, complex coding tasks, long multi-step reasoning
Vision (Qwen-VL) Image understanding and multimodal reasoning Image Q&A, document understanding, UI parsing, multimodal search

Model choice best practice: two-step “draft → finalize”

A very effective production pattern is to use a “draft model” for speed and a “final model” for correctness. For example:

  • Step 1: Generate an initial response with a fast model.
  • Step 2: If confidence is low, or if the user requests “high accuracy,” re-run on a stronger model.
  • Step 3: Cache the final response (or the extracted facts) so repeated questions are cheap.

This pattern keeps your default experience fast and affordable while still allowing a premium tier or “high accuracy” switch. It also reduces the risk of building everything on a single high-cost model.

If you’re migrating from OpenAI, keep your “model routing” logic the same; only change the model names and base URL. Alibaba Cloud explicitly positions Qwen APIs as OpenAI-compatible for this reason.

What you can build with Qwen API

Qwen is most useful when it becomes part of a system, not just a chat box. Here are the high-value patterns that teams commonly ship using Qwen models:

1) Chat assistants with grounded knowledge (RAG)

A “Qwen chatbot” is easy. A trustworthy chatbot is harder. In production, you typically combine Qwen with retrieval: store your documents in a vector database, retrieve relevant chunks, and ask Qwen to answer using only that context. This reduces hallucinations and improves consistency.

  • Customer support bots that cite help-center pages
  • Internal copilots that cite company docs and policies
  • FAQ generators that include sources and update dates

2) Structured extraction (turn text into JSON)

Many apps need structured output: invoices, resumes, tickets, meeting notes, product specs, compliance logs. Qwen can extract entities and return clean JSON if you constrain the output format. You’ll still want schema validation on your side, because even good models can make formatting mistakes.

3) Coding assistants and developer tools

Qwen models are commonly used for code generation, refactoring, documentation, and debugging. A production-grade coding assistant pairs a model with tooling: repository search, test runners, build logs, and a “diff-only” editing mode.

4) Vision-enabled workflows (Qwen-VL)

Qwen-VL models power image understanding: captioning, visual Q&A, document screenshot parsing, and multimodal reasoning. If you’re building a “UI to code” tool, a doc assistant, or an image-based support agent, vision compatibility matters.

5) Agentic workflows (tools + multi-step execution)

Agentic behavior isn’t magic; it’s just the model deciding when to call a tool. When you provide tool schemas (function signatures and JSON Schema parameters), Qwen can produce structured tool calls you can execute in your code. Alibaba Cloud’s OpenAI-compatible Responses API documentation highlights built-in tool patterns and a more concise agent flow.


Authentication & API keys

Qwen API calls require an API key provisioned via Alibaba Cloud Model Studio / DashScope. In practice, you should treat your key like a root credential: store it server-side, never ship it in a public browser bundle, and rotate it on a schedule.

Recommended environment variables

DASHSCOPE_API_KEY="..."
QWEN_BASE_URL_COMPAT="https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
QWEN_BASE_URL_DASHSCOPE="https://dashscope-intl.aliyuncs.com"

# Your app defaults
QWEN_DEFAULT_MODEL="qwen-plus"          # example name; confirm in your console
QWEN_FINAL_MODEL="qwen-max"             # example name; confirm in your console
QWEN_VISION_MODEL="qwen-vl"             # example name; confirm in your console

APP_PUBLIC_BASE_URL="https://yourapp.com"
Never call Qwen directly from the browser with your real key. Put requests behind your backend to enforce rate limits, protect billing, and apply safety checks.

Endpoints (Intl + OpenAI-compatible)

The endpoint you use depends on whether you’re calling DashScope-style endpoints or OpenAI-compatible endpoints. Alibaba Cloud’s documentation provides both.

DashScope-style (Intl) endpoints

  • Text generation: POST /api/v1/services/aigc/text-generation/generation
  • Multimodal generation (Qwen-VL): POST /api/v1/services/aigc/multimodal-generation/generation

OpenAI-compatible base URL

For OpenAI-compatible requests, Alibaba Cloud documentation points to: https://dashscope-intl.aliyuncs.com/compatible-mode/v1. Use this as your base_url in OpenAI SDKs (or equivalent clients), then call OpenAI-style paths like /chat/completions and, where applicable, /responses.

In many migrations, you change only three fields: API key, base URL, and model name—exactly as Alibaba Cloud’s “OpenAI compatible” docs describe.

Quickstart (REST / Python / JavaScript)

Below are practical patterns you can copy into your project. They’re written in an OpenAI-style shape because that’s the fastest way to integrate Qwen if you already have a “chat completions” style client. If you prefer the DashScope-style endpoints, keep the same message structure and adapt the request shape to the DashScope API reference.

REST (OpenAI-compatible) — Chat Completions

curl -sS https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
  -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen-plus",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant. Answer clearly and briefly."},
      {"role": "user", "content": "Explain what the Qwen API is, in 3 bullet points."}
    ],
    "temperature": 0.3
  }'

Python (OpenAI SDK style) — Minimal migration pattern

from openai import OpenAI
import os

client = OpenAI(
  api_key=os.environ["DASHSCOPE_API_KEY"],
  base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

resp = client.chat.completions.create(
  model="qwen-plus",
  messages=[
    {"role":"system","content":"You are a developer assistant. Provide practical steps."},
    {"role":"user","content":"Give me a robust retry strategy for API calls."}
  ],
  temperature=0.2,
)

print(resp.choices[0].message.content)

JavaScript (Node.js) — OpenAI-compatible client shape

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.DASHSCOPE_API_KEY,
  baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
});

const res = await client.chat.completions.create({
  model: "qwen-plus",
  messages: [
    { role: "system", content: "You write accurate, production-grade explanations." },
    { role: "user", content: "Design a RAG pipeline that cites sources." }
  ],
  temperature: 0.2,
});

console.log(res.choices[0].message.content);
Model names in examples are placeholders. Confirm the exact Qwen model identifiers you have enabled in your Alibaba Cloud Model Studio console and replace them in code. The interface remains the same.

Chat requests (messages, roles, and prompting)

Qwen chat follows the familiar “messages with roles” approach: system sets behavior, user provides instruction, assistant responds. In production, the most important prompt engineering isn’t fancy wording—it’s structure and guardrails.

Recommended system prompt pattern (safe, reliable, debuggable)

  • Role: Define who the assistant is (“support agent”, “coding reviewer”, “research assistant”).
  • Constraints: “If unsure, ask clarifying questions” or “Only answer using provided context.”
  • Output format: “Return JSON only” or “Return Markdown with headings.”
  • Safety: “Refuse disallowed requests; suggest alternatives.”

Prompt template (drop-in)

SYSTEM:
You are a helpful assistant for {product}.
Follow these rules:
1) If the user request is ambiguous, ask 1-2 clarifying questions.
2) If the user asks for facts, cite the provided sources, and do not invent.
3) If the user asks for structured output, return valid JSON that matches the schema.
4) Keep answers concise unless the user requests detail.

USER:
{instruction}

CONTEXT (optional):
{retrieved_documents}

Treat prompts as part of your codebase: version them, test them, and roll them out carefully. If you run A/B tests, compare not just user satisfaction but also cost and latency.


Streaming (SSE) patterns

Streaming is how you get “typing” responses in chat UIs. It’s also important for long outputs: users prefer incremental feedback. Even if the full response takes time, seeing early tokens improves perceived latency.

A robust streaming implementation includes:

  • Server-sent events (SSE) parsing with graceful reconnect handling.
  • Cancellation support (user hits “Stop generating”).
  • Token budget limits so the model cannot generate unbounded output.
  • UI states (“starting”, “generating”, “finalizing”).

Streaming pseudo-logic

// Pseudo-code: streaming with cancellation + timeout
const controller = new AbortController();
setTimeout(() => controller.abort(), 60_000); // hard timeout

const res = await fetch(BASE_URL + "/chat/completions", {
  method: "POST",
  headers: { "Authorization": "Bearer " + KEY, "Content-Type":"application/json" },
  body: JSON.stringify({ model, messages, stream: true }),
  signal: controller.signal
});

for await (const event of parseSSE(res.body)) {
  if (event.type === "delta") appendToUI(event.textDelta);
  if (event.type === "done") break;
}

// User pressed "Stop" => controller.abort()
Streaming formats can vary by client library. The key is to implement streaming as a pure transformation: “event stream → text deltas → UI.” Keep it isolated so you can swap providers easily.

Responses API compatibility (agent-friendly interface)

In addition to Chat Completions compatibility, Alibaba Cloud Model Studio documents compatibility with the OpenAI Responses API. Responses is designed as a newer primitive that can represent complex outputs and built-in tools more concisely than classic chat completions.

If you are building agents—systems that can call tools, perform multi-step tasks, and return structured results—Responses can be cleaner. For teams migrating from OpenAI’s newer SDK patterns, this compatibility can reduce rework and keep your internal architecture consistent.

Integration advice: Decide one “primary interface” for your app (Chat Completions or Responses). Support both only if you truly need it; maintaining two codepaths increases complexity.

Vision (Qwen-VL) OpenAI compatibility

Qwen-VL refers to Qwen’s vision-language models that accept image inputs and produce text outputs (and sometimes multimodal structures). Alibaba Cloud provides documentation for calling Qwen-VL with OpenAI-compatible specifications, focusing on updating base_url to the compatible-mode endpoint and using the appropriate Qwen-VL model name.

Typical vision use cases

  • “What’s in this image?” captioning and alt-text generation
  • UI screenshot understanding (“Which button is the ‘Submit’ button?”)
  • Document understanding (forms, charts, scanned text, receipts)
  • Quality control (identify defects, mismatched labels, missing components)

Vision prompt best practices

  • Ask specific questions (“Extract the invoice total and currency”) instead of general (“Summarize this”).
  • Request structured output when extracting data.
  • For screenshots, specify UI coordinates or element names only if your product supports it; otherwise ask for relative descriptions.
  • When accuracy matters, perform “double pass” verification: extract → validate → re-ask for missing fields.

Tools & function calling (how to build Qwen agents)

Tool use (function calling) is how you safely connect the model to your systems. Instead of asking the model to “pretend it called an API,” you give the model a set of tools with structured parameter schemas. The model can then return a tool call, your backend executes it, and you feed the result back to the model to produce the final answer.

Why tool use matters

  • Reliability: the model doesn’t have to memorize facts; it can look them up.
  • Safety: you control what actions are allowed (read-only search vs transactions).
  • Auditability: tool calls are logged, replayable, and inspectable.
  • Cost control: expensive reasoning is reserved for decision points; tools do the data work.

Tool schema (OpenAI-style JSON Schema)

{
  "type": "function",
  "function": {
    "name": "search_kb",
    "description": "Search the internal knowledge base for relevant documents.",
    "parameters": {
      "type": "object",
      "properties": {
        "query": { "type": "string", "description": "What to search for" },
        "top_k": { "type": "integer", "minimum": 1, "maximum": 20 }
      },
      "required": ["query"]
    }
  }
}

Agent loop (high-level)

  1. Send user request + tool schemas to Qwen.
  2. If model returns a tool call, execute it in your backend.
  3. Send tool results back as a tool message.
  4. Model returns final answer grounded in tool output.
Many Qwen ecosystem tools and agent frameworks (like Qwen-Agent) are built around the same “tool schema + execution loop” principle. The best results come from tight tool descriptions and clean JSON schemas.

Structured outputs & JSON (extraction, classification, forms)

Structured output is a major reason teams use APIs instead of a chat UI. When your model output must be consumed by code, your “success condition” is not “sounds correct”—it’s “valid JSON that matches the schema.”

How to make JSON reliable in production

  • Provide a JSON schema (or at least an explicit field list with types).
  • Use a strict system message: “Return JSON only. No Markdown. No commentary.”
  • Validate outputs server-side. If invalid, retry with a corrective prompt: “Fix JSON to match schema.”
  • Log failures and add them to your test set. Prompt reliability improves through iteration.

Example schema (product extraction)

{
  "type": "object",
  "properties": {
    "product_name": { "type": "string" },
    "price": { "type": "number" },
    "currency": { "type": "string" },
    "availability": { "type": "string", "enum": ["in_stock","out_of_stock","unknown"] },
    "key_features": { "type": "array", "items": { "type": "string" } }
  },
  "required": ["product_name","availability"]
}

You’ll notice a recurring theme: you treat the model like a component that must pass tests, not like a magical oracle. With JSON outputs, your tests become clear and automatable.


Embeddings & RAG (retrieval-augmented generation)

If you want a Qwen assistant that answers with your company’s knowledge, you need RAG. The base pipeline looks like this:

  1. Chunk documents into smaller passages.
  2. Create embeddings (vectors) for each chunk.
  3. Store vectors in a vector DB (or a search engine with vector support).
  4. For a user query, retrieve top-K relevant chunks.
  5. Send the chunks as context to Qwen with a grounding instruction.

RAG prompt pattern (grounded answers)

SYSTEM:
You are a support assistant. Use ONLY the provided context.
If the context does not contain the answer, say you don't know and ask a clarifying question.

USER:
Question: {user_question}

CONTEXT:
{chunk_1}
{chunk_2}
{chunk_3}

ASSISTANT:
Provide a helpful answer. Include short citations by referencing chunk numbers like [1], [2].

Cost control in RAG

  • Keep chunks short and relevant; don’t paste entire documents.
  • Summarize long conversation history into a compact memory blob.
  • Cache retrieval results for repeated queries.
  • Use a cheaper model for retrieval query rewriting, and a stronger model only for final answer.
When users search “Qwen API RAG,” what they often really need is “a trustworthy assistant UI.” RAG is half the solution; the other half is showing sources and communicating uncertainty.

Batch API (offline jobs, lower cost, big scale)

For large offline workloads—like processing a million customer messages, rewriting a catalog, or extracting structured data from logs— real-time API calls can be expensive and slow. Alibaba Cloud Model Studio documents an OpenAI-compatible Batch API that lets you submit batch files for asynchronous execution, returning results later and often at a cost advantage compared with real-time calls.

When batch is the right tool

  • Nightly summarization of support tickets
  • Offline classification and tagging
  • Product catalog rewriting and normalization
  • Large-scale extraction for analytics

Batch design tips

  • Make your jobs idempotent: safe to re-run if something fails.
  • Include a job ID and a per-item ID for traceability.
  • Validate inputs before uploading to avoid wasting batch capacity.
  • Store results in a durable datastore and attach them to the original item IDs.

Pricing patterns & cost control (what to do even if prices change)

Pricing can change over time and depends on your region, account, and model tier—so a good “Qwen API pricing strategy” is less about memorizing numbers and more about building a system that keeps spend predictable.

Cost-control levers that actually work

  • Model routing: default to a fast model, upgrade only when needed.
  • Token budgets: set a max output length (and enforce it).
  • Context hygiene: summarize conversation history; avoid sending irrelevant context.
  • Caching: cache tool results and frequent answers.
  • Batch for offline: move large workloads to batch processing when feasible.
  • Abuse controls: per-user quotas and IP throttling stop unexpected spikes.

How to present pricing honestly in your product

Users hate surprise bills. A great UI shows: model tier, whether streaming is on, how much context is included, and a rough “estimated usage” (even if you only show it for internal dashboards).

Best practice: Don’t hardcode public price numbers into static pages unless you also maintain a changelog and update routinely. Instead: link to official pricing pages, and show “estimates” derived from your own usage telemetry.

Rate limits, retries & reliability

Even when everything is configured correctly, production systems see transient failures: timeouts, 429 throttling, network hiccups, and occasional service errors. Reliability is engineered, not hoped for.

Recommended retry policy

  • Retry on: 429, 500, 502, 503, and network timeouts.
  • Do not retry on: schema/validation errors, authentication errors, or safety/policy refusals.
  • Use exponential backoff with jitter (randomness) to avoid synchronized retry storms.
  • Cap retries (e.g., 3–4 attempts) and surface a friendly error to users.
// Pseudo logic for robust retries (provider-agnostic)
for attempt in 1..4:
  resp = callLLM()
  if resp.ok: return resp
  if resp.status in [429, 500, 502, 503] or timeout:
    sleep(exponentialBackoffWithJitter(attempt))
    continue
  else:
    // 4xx validation/auth, or policy block: do not retry blindly
    throw resp.error

Latency strategy

  • Use streaming for chat UIs.
  • Queue heavy requests and show progress states.
  • Pre-warm caches for common prompts.
  • Use smaller models for non-critical tasks (classification, routing).

Production architecture (secure, scalable, maintainable)

The architecture that scales is boring—but correct:

Client → Your Backend → Qwen API → Your Backend → Client

Why your backend is essential

  • Key security: keep API keys private.
  • Plan enforcement: quotas per tier, per workspace, per user.
  • Abuse prevention: rate limits, bot detection, prompt filtering.
  • Consistency: unified prompts, routing logic, and safety checks.
  • Observability: central logging, trace IDs, error tracking.

Suggested system components

  • API gateway: auth, quotas, input validation
  • LLM service: provider adapters (OpenAI-compatible, DashScope native)
  • Queue + workers: batch jobs, long tasks, controlled concurrency
  • Storage: prompts, outputs, tool results, embeddings
  • Moderation layer: policy enforcement & review tools

Minimal request lifecycle (state machine)

State Meaning User experience
CREATED Request accepted and validated Preparing…
RUNNING Backend calling Qwen Generating…
TOOL_CALL Model requested a tool execution Fetching data…
FINALIZING Formatting + validation + storage Finalizing…
SUCCEEDED Response delivered Ready
FAILED Error occurred Try again / contact support

Logging, QA & monitoring

If your Qwen integration becomes a core feature, you need observability from day one. The goal is to answer: “Is the system healthy?” and “Why did this request fail?” without guessing.

Metrics you should track

  • Latency: p50/p95 end-to-end and provider call duration
  • Success rate: successful responses vs errors vs policy refusals
  • Token usage: input tokens, output tokens (estimated if needed)
  • Cost anomalies: spikes by user, IP, prompt pattern, or tool path
  • Tool performance: tool call count, tool latency, tool error rate

QA that actually improves output

Build a “golden prompt set” that represents your real product use cases: customer support, extraction, coding tasks, RAG questions, and tool use flows. Run this set regularly and compare results. Prompt changes can be treated as code changes: reviewed, tested, and rolled out with careful monitoring.

Privacy reminder: Prompts can contain sensitive data. Implement access control, redact logs when possible, and set retention policies that match your compliance requirements.

Safety & policy notes (what your product must handle)

Any public-facing LLM product needs safety guardrails. A good safety system is layered: input checks, provider-level safety behaviors, and output review/reporting.

Layered safety checklist

  • Input validation: reject clearly disallowed requests; limit prompt length.
  • Provider safety: handle refusals gracefully (don’t crash your UI).
  • Output controls: protect against prompt injection in RAG systems by isolating instructions from documents.
  • User reporting: add a “Report” button and review queue.

For enterprise use, the most important safety issue is often data leakage: don’t send secrets, keys, or private documents to the model unless you have approval and a clear data handling policy.


FAQ

Yes. Alibaba Cloud Model Studio provides OpenAI-compatible interfaces for Qwen models. Most migrations require updating only the API key, base URL, and model name.
For international usage, documentation commonly references: https://dashscope-intl.aliyuncs.com/compatible-mode/v1. Use it as your base_url in OpenAI SDK style clients.
Alibaba Cloud Model Studio documents compatibility with the OpenAI Responses API for Qwen, which can be useful for agent-style workflows and built-in tools. Prefer one primary interface in your app to reduce complexity.
Yes. Qwen-VL models support multimodal use cases. Alibaba Cloud documentation describes OpenAI-compatible usage for Qwen-VL and provides a compatible-mode base URL for migration.
Use model routing (fast default, strong final), enforce token budgets, summarize history, use caching, and move large offline work to batch processing when possible.
Client → your backend → Qwen API → your backend → client. Never expose your API key in the browser. Add quotas, rate limits, and monitoring at your backend gateway.

Official links & resources


Changelog

  • Initial publication (DashScope endpoints, OpenAI-compatible base URL, Responses + Vision + Batch coverage).