OpenAI and Broadcom Unveil Jalapeño Inference Chip

OpenAI and Broadcom unveiled Jalapeño on June 24, 2026, OpenAI's first custom-built chip and its first real step into making the silicon it has spent years renting.

According to Broadcom, Jalapeño is an accelerator architected around OpenAI's vision for large language model inference, and the first chip in a multi-generation compute platform the two companies are building together.

OpenAI Built a Chip for One Job

Jalapeño is designed for inference, not training. As TechCrunch explained, inference is the process of running a finished model in response to a user command, the step that happens every time someone sends a message to ChatGPT.

That focus is the point. Where a general-purpose GPU handles training, gaming, and inference, Jalapeño does the single workload where OpenAI burns the most money daily.

The design is a blank slate rather than a repurposed accelerator. BetaNews reported that OpenAI built it from scratch around the kernels, memory movement, and serving patterns that matter most for frontier models.

The Nine-Month Build Is the Headline Number

The speed of development is the standout claim. Broadcom said Jalapeño went from initial design to manufacturing tape-out in just nine months, which it called what may be the fastest ASIC development cycle ever achieved in high-performance semiconductors.

OpenAI's own models helped get it there. President Greg Brockman told CNBC that the chip was designed end to end in nine months with help from the company's AI, adding that the degree to which the models accelerated the work was surprising.

That detail doubles as a proof point for the broader pitch that AI can speed up chip design, lowering the cost of compute across the industry over time.

Early Samples Are Already Running Models

Jalapeño is not just a render. OpenAI said engineering samples are running machine-learning workloads in the lab at production target frequency and power, including its GPT-5.3-Codex-Spark model.

Performance numbers are still being measured. Early testing shows the chip will deliver performance per watt substantially better than current state-of-the-art, with a detailed technical report promised in the coming months.

The economics behind that figure are simple. Lower cost per watt of inference means lower cost per token, which the wider market has watched reshape AI compute economics across every frontier lab.

Deployment Starts Small and Scales to Gigawatts

The rollout is staged. CNBC reported that the companies are aiming for initial deployment by the end of 2026, with Broadcom CEO Hock Tan describing small prototype work late in the year before a real ramp in 2027 and full volume in the first half of 2028.

The scale target is large. The first production deployment is planned at gigawatt scale, with Microsoft confirmed as a primary partner and reportedly committed to purchase 40 percent of the first production run.

That kind of commitment fits the broader race over AI infrastructure capacity, where access to compute now shapes the business as much as model quality does.

Why This Lands on Nvidia

The move is a direct response to GPU dependence. Since 2022 OpenAI has been one of the biggest buyers of Nvidia's chips, and TechCrunch noted the company's chip plans have long been seen as a way to reduce that reliance.

The threat to Nvidia is not a single benchmark. It is that its largest customers start pulling inference, the fastest-growing compute segment, onto silicon they own.

For Broadcom, the deal extends an established playbook. The company already helped design Google's TPU line and partners with Meta and ByteDance, and adding OpenAI gives it direct access to frontier LLM engineering insight.

What Operators Should Watch

The practical signal is cost. If Jalapeño delivers on efficiency at scale, OpenAI gains room to cut API prices or improve margins, and competitors will face pressure to match.

The catch is timing. Meaningful deployment does not ramp until 2027, so the near-term effect on pricing is limited, and the self-reported performance figures still need independent benchmarks.

The durable takeaway is structural. The AI build-out is moving down the stack into custom chips, and the labs that control more of that stack will have the most leverage over the economics of running models at scale.

What Changed

OpenAI moved from buying compute to designing its own silicon. Jalapeño is the company's first Intelligence Processor, a blank-slate accelerator architected around the kernels, memory movement, and serving patterns of LLM inference rather than adapted from a general-purpose GPU.

The unveiling turns a 2025 supply agreement into a concrete product. A physical chip was handed to OpenAI leadership, and engineering samples are already running real workloads in the lab.

Why It Matters

Inference is where OpenAI spends the most money every day, so a cheaper, more efficient chip flows straight to the bottom line and to token pricing. Owning the chip, not just renting capacity, gives OpenAI more control over cost and supply.

It also pressures Nvidia where it is most exposed. If frontier labs pull inference onto their own silicon, the fastest-growing slice of AI compute starts to migrate away from off-the-shelf GPUs.

Suggested Actions

Model inference cost as its own line item and watch custom-silicon roadmaps from OpenAI, Google, Amazon, and Meta. Cheaper inference should show up as lower API prices and faster responses over the next two years, so plan procurement around a multi-vendor chip market rather than a single supplier.

Tools Mentioned

Horizontal Suites

ChatGPT – General-purpose AI assistant for writing, analysis, coding, and search

ChatGPT is built for teams that need general-purpose AI assistant for writing, analysis, coding, and search. It helps reduce manual work, improve consistency, and turn a fragmented workflow into something more repeatable for operators and stakeholders.