
google has officially unveiled its new open-source ai model, diffusiongemma, marking the first time a text diffusion architecture has been introduced into the large language model space. unlike traditional autoregressive models that generate text one token at a time in sequence, this model draws on the well-established denoising mechanisms used in image generation to reconstruct all tokens simultaneously and in parallel, delivering a qualitative leap in inference efficiency even on edge devices and in low-resource settings—actual measurements show local inference speeds up to four times faster than comparable autoregressive models.
diffusiongemma is fully open-sourced under the apache 2.0 license, with model weights now available on hugging face for free access and deployment. benchmark tests reveal a sampling throughput of up to 1,479 tokens per second; its coding capabilities score 89.6% on the humaneval benchmark, on par with gemini 2.0 flash-lite. notably, its performance in mathematical reasoning stands out, achieving an accuracy of 23.3% on the aime 2025 test—3.3 percentage points ahead of competing models—though it still lags slightly on the gpqa diamond benchmark for advanced scientific reasoning tasks, scoring 40.4% and leaving room for further optimization.
the nvidia engineering team has verified that the model is deeply optimized for gpu tensor core architectures, delivering stable output of 1,000 tokens per second on a single h100 gpu, and scaling even further to 2,000 tokens per second in multi-gpu dgx station configurations. additionally, diffusiongemma supports dynamic error correction and multi-round iterative refinement during the generation process, significantly enhancing output consistency and logical robustness.