Google Introduces Multi-Token Prediction Technology for Gemma 4, Boosting AI Inference Speed by 300%

AI 05.08.26

according to foreign media reports, google recently officially unveiled a multi-token prediction drafters for the gemma 4 series of models. this technological breakthrough leverages a speculative decoding architecture to boost model inference speed by up to three times—without compromising output quality or logical reasoning capabilities. as one of the most closely watched open-source models worldwide, gemma 4 has already surpassed 60 million downloads shortly after its release, and the core objective of this update is precisely to address the long-standing inference bottleneck in large language models, thereby further maximizing the efficiency of computational resources.

inferencing with traditional language models is often constrained by gpu memory bandwidth: when generating text, the processor must spend considerable time transferring tens of billions of parameters from gpu memory to the compute units, leaving most of the hardware resources idle and resulting in noticeable response latency. google’s newly introduced speculative decoding technique adopts a “master–slave” coordination model: the system pairs heavyweight target models like gemma 4 31b with lightweight mtp drafters. the drafter uses idle computing power to proactively predict multiple potential future tokens, which are then concurrently verified by the main model. once the predictions match, the model can directly confirm the entire sequence in a single computation, dramatically reducing text-generation time.

according to official benchmark data, this acceleration is particularly striking on local devices. on apple silicon chips, the local inference speed of the gemma 4 26b model has improved by about 2.2 times. this means developers can now smoothly run complex offline programming assistants or intelligent agent workflows on personal computers or standard consumer-grade gpus, while the increased inference efficiency also significantly reduces power consumption on edge devices. this technical update primarily targets low-latency use cases such as real-time chatbots and automated programming tools. through the mtp drafter, google has demonstrated that even in resource-constrained hardware environments, developers can deploy state-of-the-art language models without having to compromise between response speed and computational accuracy. as inference costs and barriers continue to fall, gemma 4 is bringing ai from the cloud to a much broader range of personal computing endpoints.

The rumored M6‑based MacBook Pro may, for the first time, feature 5G cellular connectivity

according to multiple supply-chain sources, apple is accelerating preparations for mass production of the m6‑series macbook pro, which is expected to launch o

06.15.26 0

Samsung may equip its widescreen foldable phone with an innovative hinge technology similar to the rumored iPhone Fold, designed to significantly reduce screen creases and enhance overall device reliability

according to the latest report from korean media outlet zdnet korea, samsung is exploring a more robust display approach for its next-generation vertically fold

06.15.26 1

MIT has developed an innovative dual-mode propulsion system, specifically designed for deep-space missions with microsatellites, achieving, for the first time, a breakthrough in simultaneously enhancing both propulsion performance and energy efficiency at

the massachusetts institute of technology (mit) aerospace team is breaking through bottlenecks in micro‑ and nano‑satellite propulsion technology, developing

06.15.26 1

Ukraine has, for the first time, deployed “Terminator” AI drones to carry out autonomous target identification and strike missions, successfully killing Russian frontline personnel

according to an exclusive report by new scientist, ukraine’s military application of artificial intelligence has reached a historic turning point: in 2024, du

06.15.26 1

Apple has abandoned the Face ID solution for the iPhone Ultra, opting instead for an in‑display side-mounted fingerprint sensor—a groundbreaking design that tech bloggers have hailed as “like a dream”

a tech blogger exclaimed on social media: “apple has truly turned the $2,000 iphone ultra into reality—touch id is back on the power button, and face id is c

06.15.26 0

Attorneys general from multiple US states have launched a joint investigation into OpenAI, focusing on its advertising practices and measures to safeguard content for minors

a multi-state joint investigative team, composed of attorneys general from several states, has officially launched a formal review of the artificial intelligen

06.15.26 1

The Samsung Galaxy S27 and Xiaomi’s Mi 18 series unexpectedly appeared in the same global certification database, sparking speculation within the industry about potential adjustments to Xiaomi’s product roadmap

although several months remain before their official unveiling, the next‑generation flagship models from samsung and xiaomi have quietly appeared in global ce

06.15.26 1

Apple is secretly developing a brand-new, in-house camera app, which could make its debut alongside the iPhone 18 Pro lineup

at wwdc 2026, while apple unveiled major system updates like ios 27, it deliberately held back a key imaging feature—a completely redesigned camera app that o

06.15.26 0

Lenovo has officially launched the new Yoga Pro 7 15-inch laptop, which globally debuts support for dynamic video memory allocation technology, allowing up to 96 GB of system memory to be flexibly allocated as video memory

lenovo has officially unveiled its new flagship yoga pro 7 15ash11 laptop, built around the amd strix halo platform to redefine the high-end mobile creative ex

06.15.26 1

Huawei’s FreeClip 2 Collector’s Edition has been officially launched, debuting with the HarmonyOS 6 operating system

at 10:08 on june 15, huawei terminal officially launched the freeclip 2 collector’s edition earclip-style headphones, with a launch price of 1,499 yuan. this

06.15.26 0

The OnePlus Turbo 6X series has officially launched, pre-installed with the all-new ColorOS 16, delivering a smooth user experience that lasts for six years

on june 15, oneplus officially launched its brand-new turbo 6x series and simultaneously kicked off its first-ever omnichannel sales. the lineup includes two f

06.15.26 0

Key specs of the OPPO Find X10 Pro have surfaced: it will debut MediaTek’s flagship chip built on TSMC’s 2nm process, and its imaging system features a dual 200-megapixel ultra‑clear main camera

recently, the specifications of a flagship dimensity engineering prototype built on the cutting-edge 2nm process were unexpectedly leaked, drawing widespread a

06.15.26 0

At present, Siri’s intelligent interaction capabilities are roughly on par with the technological level of mainstream AI chatbots from six months ago

apple’s ai strategy has reached a pivotal turning point: the all-new siri is redefining the boundaries of smart assistants with its contextual understanding a

06.15.26 1

Goldman Sachs’ latest research report indicates that the market’s current assessment of the true demand for artificial intelligence is markedly conservative, while corporate‑level AI investment continues to gain momentum The report forecasts that global t

goldman sachs’ latest research report points out that the market has significantly misjudged the pace of ai infrastructure expansion—far from peaking, the wa

06.15.26 1

Oracle has announced adjustments to the resource quotas of its permanently free cloud service tiers: once the limits are exceeded, the services will be automatically suspended, and any usage beyond the quota will be billed on a pay-as-you-go basis

oracle cloud recently announced an official update to its free-tier policy, stating that starting june 15, 2026, the permanently free arm‑based plan for globa

06.15.26 0