Description
Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. It excels at visual analysis tasks, including object recognition, textual interpretation within images, and precise event localization in extended videos. Qwen2.5-VL-32B demonstrates state-of-the-art performance across multimodal benchmarks such as MMMU, MathVista, and VideoMME, while maintaining strong reasoning and clarity in text-based tasks like MMLU, mathematical problem-solving, and code generation.
API Usage Examples
OpenAI Compatible Endpoint
Use this endpoint with any OpenAI-compatible library. Model: Qwen: Qwen2.5 VL 32B Instruct (qwen/qwen2.5-vl-32b-instruct)
curl https://api.ridvay.com/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer YOUR_API_KEY" -d '{
"model": "qwen/qwen2.5-vl-32b-instruct",
"messages": [
{
"role": "user",
"content": "Explain the capabilities of the Qwen: Qwen2.5 VL 32B Instruct model"
}
],
"temperature": 0.7,
"max_tokens": 1024
}'Supported Modalities
- Text
- Images
API Pricing
- Input: 0.05$ / 1M tokens
- Output: 0.22$ / 1M tokens
Token Limits
- Max Output: 16,384 tokens
- Max Context: 16,384 tokens
Subscription Tiers
- free
- pro
- ultimate
