Description
The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.
API Usage Examples
OpenAI Compatible Endpoint
Use this endpoint with any OpenAI-compatible library. Model: Qwen: Qwen3.5-Flash (qwen/qwen3.5-flash-02-23)
curl https://api.ridvay.com/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer YOUR_API_KEY" -d '{
"model": "qwen/qwen3.5-flash-02-23",
"messages": [
{
"role": "user",
"content": "Explain the capabilities of the Qwen: Qwen3.5-Flash model"
}
],
"temperature": 0.7,
"max_tokens": 1024
}'Supported Modalities
- Text
- Images
- Video
API Pricing
- Input: 0.1$ / 1M tokens
- Output: 0.4$ / 1M tokens
Token Limits
- Max Output: 65,536 tokens
- Max Context: 1,000,000 tokens
Subscription Tiers
- free
- pro
- ultimate
