● Updated June 25, 2026ModelsHardware

OpenAI and Broadcom unveil Jalapeño — a custom chip for LLM inference

OpenAI and Broadcom introduce Jalapeño, a custom accelerator ASIC designed for large‑language‑model inference, targeted to run at scale by late 2026.

In detail

  • Jalapeño is OpenAI’s first ‚Intelligence Processor‘, designed from scratch for modern LLM inference.
  • Broadcom provides silicon manufacturing and networking tech (including Tomahawk); Celestica handles boards, racks and system integration; OpenAI leads chip design.
  • Engineering samples already run ML workloads in the lab, including GPT‑5.3‑Codex‑Spark; early self‑reported tests claim substantially better performance per watt.
  • The end‑to‑end design to tape‑out reportedly took nine months; Jalapeño is the first generation of a multi‑generation platform between OpenAI and Broadcom.

Why it matters

Owning hardware reduces reliance on third parties, can cut operating costs and latency, and gives OpenAI an edge in deploying LLM services at scale — shifting competition toward full‑stack optimization rather than model IP alone.

For you Check if your AI vendors plan custom hardware or optimize for inference accelerators; re‑evaluate long cloud commitments if providers roll out cheaper, purpose‑built inference silicon.

Updates

OpenAI and Broadcom have unveiled Jalapeño, a custom ASIC designed for LLM inference in data centers, to be deployed by end of 2026.

  • Nine months of development; Broadcom drew on insights from OpenAI researchers and their roadmap for future models.
  • OpenAI claims substantially better performance per watt than current state-of-the-art; detailed technical report coming in coming months.
  • Goal: vertical integration and reduced dependence on Nvidia amid global compute shortage.

OpenAI and Broadcom announce Jalapeño, an ASIC designed specifically for large language model inference in data centers.

  • Chip designed from scratch for LLM inference based on conversations with OpenAI researchers; development took nine months.
  • OpenAI claims: substantially better performance per watt than current state-of-the-art; detailed technical report coming in coming months.
  • Deployment in data centers planned by end of 2026; part of OpenAI's strategy to control the full stack and reduce Nvidia dependence.
← All news

Summaries are generated automatically and link to the original source.