Description
Qwen2.5 VL 7B is a multimodal LLM from the Qwen Team with the following key enhancements: - SoTA understanding of images of various resolution & ratio: Qwen2.5-VL achieves state-of-the-art performance on visual understanding benchmarks, including MathVista, DocVQA, RealWorldQA, MTVQA, etc. - Understanding videos of 20min+: Qwen2.5-VL can understand videos over 20 minutes for high-quality video-based question answering, dialog, content creation, etc. - Agent that can operate your mobiles, robots, etc.: with the abilities of complex reasoning and decision making, Qwen2.5-VL can be integrated with devices like mobile phones, robots, etc., for automatic operation based on visual environment and text instructions. - Multilingual Support: to serve global users, besides English and Chinese, Qwen2.5-VL now supports the understanding of texts in different languages inside images, including most European languages, Japanese, Korean, Arabic, Vietnamese, etc. For more details, see this [blog post](https://qwenlm.github.io/blog/qwen2-vl/) and [GitHub repo](https://github.com/QwenLM/Qwen2-VL). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE).
API Usage Examples
OpenAI Compatible Endpoint
Use this endpoint with any OpenAI-compatible library. Model: Qwen: Qwen2.5-VL 7B Instruct (qwen/qwen-2.5-vl-7b-instruct)
curl https://api.ridvay.com/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer YOUR_API_KEY" -d '{
"model": "qwen/qwen-2.5-vl-7b-instruct",
"messages": [
{
"role": "user",
"content": "Explain the capabilities of the Qwen: Qwen2.5-VL 7B Instruct model"
}
],
"temperature": 0.7,
"max_tokens": 1024
}'
Supported Modalities
- Text
- Images
API Pricing
- Input: 0.2$ / 1M tokens
- Output: 0.2$ / 1M tokens
- Image: 0$ / image
Token Limits
- Max Output: 32,768 tokens
- Max Context: 32,768 tokens
Subscription Tiers
- free
- pro
- ultimate