Description
The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioning, visual question answering, and advanced image-text comprehension. Pre-trained on vast multimodal datasets and fine-tuned with human feedback, the Llama 90B Vision is engineered to handle the most demanding image-based AI tasks. This model is perfect for industries requiring cutting-edge multimodal AI capabilities, particularly those dealing with complex, real-time visual and textual analysis. Click here for the [original model card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/MODEL_CARD_VISION.md). Usage of this model is subject to [Meta's Acceptable Use Policy](https://www.llama.com/llama3/use-policy/).
API Usage Examples
OpenAI Compatible Endpoint
Use this endpoint with any OpenAI-compatible library. Model: Meta: Llama 3.2 90B Vision Instruct (meta-llama/llama-3.2-90b-vision-instruct)
curl https://api.ridvay.com/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer YOUR_API_KEY" -d '{
"model": "meta-llama/llama-3.2-90b-vision-instruct",
"messages": [
{
"role": "user",
"content": "Explain the capabilities of the Meta: Llama 3.2 90B Vision Instruct model"
}
],
"temperature": 0.7,
"max_tokens": 1024
}'
Supported Modalities
- Text
- Images
API Pricing
- Input: 0.35$ / 1M tokens
- Output: 0.4$ / 1M tokens
- Image: 0.001$ / image
Token Limits
- Max Output: 16,384 tokens
- Max Context: 32,768 tokens
Subscription Tiers
- free
- pro
- ultimate