OpenAI and Broadcom debut Jalapeño chip; GPT-5.6 previews three-tier model family

OpenAI and Broadcom unveiled Jalapeño on June 24 - OpenAI’s first custom AI chip and the opening step of a multi-generation inference compute platform the two companies are co-developing (CNBC, TechCrunch). The reticle-sized ASIC targets LLM inference in production rather than training and moved from initial design to manufacturing tape-out in nine months - a cycle the teams described as among the fastest recorded for an advanced semiconductor of its class (Tom’s Hardware). Early measurements show performance per watt above current-generation hardware, and OpenAI attributed the compressed development timeline in part to its own AI models assisting chip design (VentureBeat). The chip is designed for gigawatt-scale data center deployments alongside Microsoft and other partners, with production rollout at OpenAI’s own facilities targeted for end of 2026, a transition OpenAI said would lower API costs and improve throughput capacity for developers (CNBC).

OpenAI on June 26 began a limited preview of GPT-5.6, a three-model family - Sol, Terra, and Luna - restricted to trusted API partners and Codex users (OpenAI, MacRumors). Sol, the flagship, is priced at $5 per million input tokens and $30 per million output and delivers OpenAI’s strongest reported benchmarks to date on agentic coding, biology, and cybersecurity tasks; Terra matches GPT-5.5 quality at $2.50 input and $15 output, roughly half the prior generation’s price; Luna is $1 input and $6 output for fast, lower-cost work (OpenAI). The release adds a “max” reasoning effort level, an “ultra” mode that decomposes requests across internal sub-agents, and explicit cache breakpoints with a guaranteed 30-minute minimum cache lifetime - targeting latency variance in agentic pipelines (Axios).

Z.ai released GLM-5.2 this week, a 753-billion-parameter open-weights model published under an MIT license, reporting 62.1 on SWE-bench Pro (versus 58.6 for GPT-5.5) and 74.4% on FrontierSWE (versus 72.6%), at $4.40 per million output tokens compared to GPT-5.5’s $30 - a cost ratio of roughly one-to-seven (VentureBeat). The weights are on Hugging Face and the model ships with a 1-million-token context window integrated across more than 20 third-party coding environments.

Anthropic disclosed a June 10 letter to the U.S. Senate Banking Committee alleging that entities affiliated with Alibaba’s Qwen lab ran the largest known distillation attack on Claude to date: approximately 25,000 fraudulent accounts generated 28.8 million interactions with the model between April 22 and June 5, 2026, systematically extracting its reasoning patterns for use in training a competing system (CNBC, Tom’s Hardware). Anthropic called on Congress to require mandatory screening of high-volume API usage patterns and impose export controls on access to frontier AI models.