Description
Molmo2-8B is an open vision-language model developed by the Allen Institute for AI (Ai2) as part of the Molmo2 family, supporting image, video, and multi-image understanding and grounding. It is based on Qwen3-8B and uses SigLIP 2 as its vision backbone, outperforming other open-weight, open-data models on short videos, counting, and captioning, while remaining competitive on long-video tasks.
API Usage Examples
OpenAI Compatible Endpoint
Use this endpoint with any OpenAI-compatible library. Model: AllenAI: Molmo2 8B (allenai/molmo-2-8b)
curl https://api.ridvay.com/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer YOUR_API_KEY" -d '{
"model": "allenai/molmo-2-8b",
"messages": [
{
"role": "user",
"content": "Explain the capabilities of the AllenAI: Molmo2 8B model"
}
],
"temperature": 0.7,
"max_tokens": 1024
}'Supported Modalities
- Text
- Images
- Video
API Pricing
- Input: 0.2$ / 1M tokens
- Output: 0.2$ / 1M tokens
Token Limits
- Max Output: 36,864 tokens
- Max Context: 36,864 tokens
Subscription Tiers
- free
- pro
- ultimate