I compared two fast AI inference, here is the result

4/17/2025

I mean, sure, benchmarks are cool and all. But I care more about what it feels like to actually use them.

What are these?

Groq runs Meta’s Llama models on custom chips called LPUs — Language Processing Units. Think of them as ultra-fast processors built specifically for running large language models efficiently.

Cerebras takes a different route. It uses its own massive chip, the Wafer-Scale Engine (WSE), which is optimized for AI workloads. Instead of processing tasks in a traditional CPU/GPU setup, Cerebras processes the entire model in one place, massively reducing memory bottlenecks.

  • Both are built for speed, but they do it differently:

Groq focuses on ultra-low latency and high throughput for tokens per second (T/s).

Cerebras aims for end-to-end efficiency by running models like Llama 3 70B on a single chip, resulting in huge speed advantages.

In a recent test I ran:

Groq: ~275 tokens/sec

Cerebras: ~2185 tokens/sec

Same model. Same parameter. Same prompt. That’s a big leap — Cerebras is seriously fast.

About those two companies

Groq

  • Founded: 2016 by Jonathan Ross and Douglas Wightman, both former Google engineers .​Groq+5Wikipedia+5Wikipédia, l’encyclopédie libre+5
  • Headquarters: Mountain View, California.​
  • Core Technology: Develops custom AI inference chips known as LPUs (Language Processing Units), optimized for high-speed execution of large language models.​
  • Funding:
  • Total Raised: Over $1 billion across five funding rounds .
  • Latest Round: August 2024, secured $640 million in a Series D round led by BlackRock, Cisco Investments, and Samsung Catalyst Fund, achieving a valuation of $2.8 billion .
  • Additional Commitment: In February 2025, received a $1.5 billion commitment from Saudi Arabia to expand AI chip deployment in the region .​Clay+7Tech & Data VC+7Groq+7Reuters
  • Recent Developments:
  • Launched GroqCloud, a developer platform providing API access to their LPUs.
  • Acquired Definitive Intelligence in March 2024 to enhance cloud-based AI solutions .​Dealroom+2Wikipédia, l’encyclopédie libre+2Wikipedia+2

⚙️ Cerebras Systems

  • Founded: 2015 by Andrew Feldman, Gary Lauterbach, Michael James, Sean Lie, and Jean-Philippe Fricker .​Wikipedia
  • Headquarters: Sunnyvale, California.​
  • Core Technology: Creator of the Wafer-Scale Engine (WSE), the largest computer chip designed to accelerate AI workloads by processing entire models on a single chip.​
  • Funding:
  • Total Raised: Approximately $720 million over six funding rounds .
  • Latest Round: November 2021, raised $250 million in a Series F round led by Alpha Wave Global, valuing the company at over $4 billion .​Dealroom+3Tech & Data VC+3Tech & Data VC+3Yahoo Finance+3Cerebras+3Tech & Data VC+3
  • IPO Status:
  • Filed for an IPO in September 2024, aiming to raise up to $1 billion.
  • The IPO has been delayed due to a national security review by the Committee on Foreign Investment in the United States (CFIUS) concerning a $335 million investment from Abu Dhabi-based G42 .​Yahoo Finance+1axios.com+1Reuters+1Wikipedia+1
  • Notable Deployments:
  • Collaborated with Mayo Clinic to develop genomic foundation models for personalized healthcare .
  • Partnered with G42 to build AI supercomputers and train large language models, including the Arabic language model Jais .​Wikipedia+1Reuters+1