Google open-sources DiffusionGemma 26B; OpenAI ships scalable memory update

Google DeepMind released DiffusionGemma on June 10, an open-weights 26-billion-parameter model under Apache 2.0 that replaces autoregressive token prediction with a text diffusion process, generating 256-token blocks in parallel rather than one token at a time (Google Blog, MarkTechPost). The Mixture-of-Experts architecture activates only 3.8 billion parameters per inference step and achieves over 1,000 tokens per second on a single NVIDIA H100 and over 700 on an RTX 5090, roughly four times the throughput of comparable autoregressive models (NVIDIA Blog). Built on the Gemma 4 backbone, the model supports a 256,000-token context window and more than 140 languages and fits within 18 GB VRAM when quantized; Google cautions that output quality falls below standard Gemma 4 and positions the model for speed-critical interactive workloads such as inline editing rather than production tasks (Google AI for Developers).

OpenAI began rolling out a new memory architecture for ChatGPT, codenamed “Dreaming,” which automatically revises stored facts as time passes, updating entries like an upcoming trip to a past one after the travel date; OpenAI’s internal benchmarks report factual recall improving from 67.9% to 82.8% and temporal accuracy from 52.2% to 75.1% versus the previous system, with the rollout starting for Plus and Pro subscribers in the United States (OpenAI). Also on June 11, OpenAI announced that its models and Codex are accessible to Oracle Cloud Infrastructure customers through existing Oracle Universal Credit commitments, giving enterprises a path to OpenAI access under standard Oracle procurement workflows without requiring a separate agreement (OpenAI).