AI NewsIndustry UpdateJune 24, 20265 min read

OpenAI and Broadcom Unveil Jalapeño, the First Custom AI Inference Chip

OpenAI and Broadcom Unveil Jalapeño, the First Custom AI Inference Chip

OpenAI and Broadcom unveiled Jalapeño, OpenAI's first custom AI inference chip, designed from scratch in nine months with help from OpenAI's own models and showing roughly 50 percent cost savings over conventional GPUs in early testing.

Key Takeaways

  • 1OpenAI and Broadcom unveiled Jalapeño, designed from concept to tape-out in nine months using OpenAI's own models.
  • 2Early testing shows roughly 50 percent cost savings compared to conventional AI GPUs.
  • 3Jalapeño is the first step in a multi-generation compute platform for gigawatt-scale data centers.

OpenAI unveiled its first custom AI chip on June 24, breaking its total dependence on Nvidia hardware for running its models. Called Jalapeño, the inference processor was designed from scratch in nine months with Broadcom and is already showing roughly 50 percent cost savings compared to conventional AI GPUs in early testing, Broadcom CEO Hock Tan told Bloomberg.

What Jalapeño Does

Jalapeño is built specifically for inference, the process of running a trained AI model to generate responses for users. It is not a training chip. OpenAI designed the architecture around the specific kernels, memory movement patterns, and serving workloads that matter most for its frontier language models, OpenAI wrote in its announcement.

The chip was co-developed from initial design to manufacturing tape-out in just nine months. OpenAI's own models were used to accelerate parts of the design and optimization process, creating a feedback loop where the same AI served to users is helping improve the infrastructure that will run future models.

Richard Ho, who leads OpenAI's hardware program, said the chip was "designed from the ground up for LLM inference" and that early testing shows Jalapeño "will efficiently execute our most important workloads close to the hardware's theoretical limits."

Why OpenAI Needed Its Own Chip

OpenAI has been entirely dependent on Nvidia GPUs for both training and inference since it launched ChatGPT in late 2022. That dependency creates a structural cost problem. Google runs inference on its own TPUs. Amazon uses its custom Trainium chips. Both companies control their per-query economics in ways OpenAI cannot when it rents Nvidia hardware at market rates.

Inference is where the daily cost of serving hundreds of millions of ChatGPT and Codex users accumulates. While training runs are more compute-intensive, they happen periodically. Inference costs scale with every question, every code completion, every agent session, every day, according to TechCrunch.

OpenAI President Greg Brockman told CNBC that the company "cannot get compute fast enough" and that designing more of the stack internally allows OpenAI to serve more intelligence with greater efficiency. Broadcom CEO Tan reinforced the urgency, saying compute demand from the company's customers is "simply insatiable" and shows no signs of slowing through 2028, CNBC reported.

The Technical Details

Jalapeño is an application-specific integrated circuit (ASIC), which is less flexible than Nvidia's general-purpose GPU but is also less expensive and can be optimized for specific AI workloads. Analysis of the chip's die photograph suggests a compute chiplet measuring approximately 840 square millimeters, very close to the reticle size limit of EUV lithography, Tom's Hardware reported.

The architecture reduces data movement and balances compute, memory, and networking resources to achieve utilization much closer to theoretical peak performance than general-purpose alternatives. Broadcom's networking silicon, including its Tomahawk series, helps connect the chips at data center scale.

The Strategic Context

The Jalapeño launch arrives as every major AI company races to control the physical infrastructure behind its models. Google has invested billions in TPU development. Amazon has built multiple generations of Trainium and Inferentia chips. Qualcomm just acquired Modular for $3.9 billion to build a silicon-agnostic software layer.

For OpenAI, the timing is significant. The company is preparing for an IPO and needs to demonstrate a credible path to profitability. Reducing inference cost per token through custom silicon directly addresses the central question investors will ask about the economics of running frontier AI models at scale.

Broadcom CEO Tan described Jalapeño as the beginning of a multi-generation roadmap designed for gigawatt-scale data centers that OpenAI and Microsoft are building together. OpenAI still depends on Nvidia for training runs, and the company emphasized that Jalapeño is about complementing, not replacing, its Nvidia relationship.

What to Watch

The first test will be whether Jalapeño's performance-per-watt numbers hold at production scale when deployment begins in late 2026. A detailed technical report on performance is expected in the coming months. If the 50 percent cost savings figure holds in production, it would represent a meaningful structural improvement to OpenAI's unit economics and could accelerate pricing competition across the AI inference market.

What Changed

OpenAI and Broadcom unveiled Jalapeño on June 24, 2026, the first custom AI inference chip designed specifically for OpenAI's LLM workloads.

Why It Matters

OpenAI has been entirely dependent on Nvidia GPUs for inference, putting it at a structural cost disadvantage compared to Google and Amazon who have custom chips.

Suggested Actions

Engineering teams evaluating AI inference infrastructure should track Jalapeño's production deployment timeline and published performance benchmarks when they arrive later in 2026.

Related Tags

Platforms
OpenAI

Related News