LogoTop AI Hubs

Bytedance: UI-TARS 7B

Other
Multimodal
Paid

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces. This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints.

Parameters

7B

Context Window

128,000

tokens

Input Price

$0.1

per 1M tokens

Output Price

$0.2

per 1M tokens

Capabilities

Model capabilities and supported modalities

Performance

Reasoning

Excellent reasoning capabilities with strong logical analysis

Math

-

Coding

-

Knowledge

-

Modalities

Input Modalities

image,text

Output Modalities

text

LLM Price Calculator

Calculate the cost of using this model

$0.000150
$0.000600
Input Cost:$0.000150
Output Cost:$0.000600
Total Cost:$0.000750
Estimated usage: 4,500 tokens

Monthly Cost Estimator

Based on different usage levels

Light Usage
$0.0030
~10 requests
Moderate Usage
$0.0300
~100 requests
Heavy Usage
$0.3000
~1000 requests
Enterprise
$3.0000
~10,000 requests
Note: Estimates based on current token count settings per request.
Last Updated: 1970/01/21