Back to Models
Bytedance: UI-TARS 7B  AI Model Icon

Bytedance: UI-TARS 7B

bytedance/ui-tars-1.5-7b

Description

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces. This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints.

API Usage Examples

OpenAI Compatible Endpoint

Use this endpoint with any OpenAI-compatible library. Model: Bytedance: UI-TARS 7B (bytedance/ui-tars-1.5-7b)

curl https://api.ridvay.com/v1/chat/completions   -H "Content-Type: application/json"   -H "Authorization: Bearer YOUR_API_KEY"   -d '{
    "model": "bytedance/ui-tars-1.5-7b",
    "messages": [
      {
        "role": "user",
        "content": "Explain the capabilities of the Bytedance: UI-TARS 7B  model"
      }
    ],
    "temperature": 0.7,
    "max_tokens": 1024
  }'

Supported Modalities

  • Text
  • Images

API Pricing

  • Input: 0.1$ / 1M tokens
  • Output: 0.2$ / 1M tokens

Token Limits

  • Max Output: 2,048 tokens
  • Max Context: 128,000 tokens

Subscription Tiers

  • free
  • pro
  • ultimate