
yesterday, xiaomi mimo officially partnered with tilert to launch the ultraspeed inference mode for the xiaomi mimo‑v2.5‑pro, achieving, for the first time on a general‑purpose gpu platform, a generative throughput exceeding 1,000 tokens per second for trillion‑parameter large models. this breakthrough stems from full‑stack collaborative optimization across model architecture, system scheduling, and underlying operators, further expanding the boundaries of lightweight deployment and high‑performance inference.
measured results show that ultraspeed can generate an end-to-end snake game in under 10 seconds and faithfully reproduce a macos‑level ui within 60 seconds—nearly ten times faster than the standard version. to help developers integrate quickly, xiaomi has simultaneously launched a dedicated api service for mimo‑v2.5‑pro‑ultraspeed, offered at a limited‑time trial price three times that of the standard edition, while delivering output capacity up to ten times greater per unit time.
special note: ultraspeed is available exclusively via api calls and does not support the token plan billing model. for reference, the standard edition charges 0.025 yuan per million tokens for hit cache entries and 3 yuan per million tokens for misses, with a uniform output rate of 6 yuan per million tokens. in contrast, ultraspeed positions itself around the core value proposition of “threefold investment for tenfold response performance.” given the scarcity of high‑performance inference resources, access is granted through a targeted application process, with a window running from 00:00 on june 9, 2026, to 23:59 on june 23, 2026.
review prioritizes real‑world business use cases, focusing on enterprise customers and professional developer teams with clear ai integration needs. approval timelines and pass rates are not guaranteed. approved users will receive a limited‑time free chat interaction experience, with each account eligible to queue successfully up to 10 times per day; each session lasts a maximum of 30 minutes, and resources are automatically reclaimed after five minutes of inactivity. industry observers widely believe this leap in speed will significantly accelerate the large‑scale deployment of trillion‑parameter models in low‑latency, highly interactive applications.