MLWhiz Weekly AI/ML Newsletter # 4
The week AI buyers became AI owners — and a lot of people decided they’re done with agents.
🏆 Story of the Week: OpenAI Just Bought 10% of Cerebras
For two years, I’ve been hearing the same question in every infrastructure conversation: how does the Nvidia monopoly end?
This week, we got the answer. And it’s not what anyone predicted.
It wasn’t a DOJ antitrust case. It was a procurement contract — but a procurement contract structured like nothing we’ve ever seen in this industry.
Cerebras filed its S-1 on Friday. But buried inside was a deal between OpenAI and Cerebras that is pretty hard to believe.
Here’s what OpenAI committed to:
$20+ billion in chip spending through 2028
750 MW of capacity, with an option to expand to 2 GW
A $1 billion loan to OpenAI from Cerebras at 6% interest
In exchange, OpenAI got about 10% of Cerebras post-IPO.
Read that again. The customer got equity in the supplier. The customer also got a billion-dollar loan from the supplier.
OpenAI is now Cerebras’s biggest customer, biggest creditor, and one of its biggest shareholders. People are calling it “circular financing.”
But honestly, The logic is pretty obvious. If you’re going to spend $20B on chips, why would you hand all that margin to your supplier? You can take some of it back as equity.
This might be the new template. I’d bet that we will see Anthropic-AMD, Microsoft-Cerebras, and Google-Groq deals structured exactly this way within 12 months.
What this means for you: If you’re negotiating GPU contracts above $50M/year, the Cerebras-OpenAI deal is your new precedent. Stop asking for volume discounts. Ask for shares. And if your 2027 architecture assumes NVIDIA for inference, build at least one alternative path.
🤖 Models That Dropped This Week
Nemotron 3 Super (NVIDIA, April 15) — Open-weight 120B hybrid Mamba-MoE, only 12B active params, native 1M context. The interesting bit is that it was pre-trained directly in FP4. No quantization tax at deployment.
Qwen 3.6-35B-A3B (Alibaba, April 17) — 35B total / 3B active. Optimized for agentic coding. The community went wild — #1 on Hacker News, dominated r/LocalLLaMA. Users say it beats Gemma 4 26B and runs on an RTX 5060 Ti. After Meta abandoned the open-weight crown, Qwen might just be taking it for itself.
Claude Design (Anthropic, April 17) — First standalone product from an AI lab that isn’t a chat interface. Built by Mike Krieger’s team. You describe what you want, Claude builds it. No layers panel. No timeline. Figma dropped 7% the day it launched. That’s crazy…
🧠 Papers That Matter
Do We Still Need GraphRAG?
If you’ve ever sat in a meeting where someone proposed building a GraphRAG pipeline and felt your eye twitch, this paper is for you.
Researchers benchmarked standard RAG, GraphRAG, and agentic search with iterative retrieval. The result: agentic search with simple vector retrieval matches or beats GraphRAG on most benchmarks. And GraphRAG costs 10-100x more to construct.
SOLARIS: Speculative Decoding for Recommendation (Meta)
I love when an idea from one field gets brilliantly stolen for another.
Speculative decoding transformed LLM inference. Meta’s SOLARIS does the same trick for recommendation: predict which user-item pairs will be requested soon, precompute their embeddings asynchronously, and serve from cache when the real request arrives.
Foundation-model quality at cached-lookup latency. With Meta on the byline and 30+ authors, this is almost certainly running in production at Instagram or Facebook scale already. If you’re serving large recommenders and fighting latency, this is your reading for the weekend.
Cycle-Consistent Search (Meta)
Training search agents with RL needs ground-truth labels. Labels are expensive. So most teams either train on incomplete data or skip RL entirely.
Meta’s Superintelligence Labs found a clever workaround: if your retrieved documents contain enough information to reconstruct the original question, the search probably worked. No labels needed.
Corpus2Skill: Don’t Retrieve, Navigate
This is the most interesting RAG paper I’ve read in months.
The argument: RAG’s “chunk and embed” approach fundamentally cannot handle hierarchical, cross-referenced documents.
The solution: stop retrieving. Distill the corpus offline into a hierarchical skill directory. Let the agent navigate it like a librarian who knows the stacks — browsing, drilling down, backtracking when a path dead-ends.
📝 Some Good Reads
Spotify x Anthropic: Agentic Development Deep Dive
Spotify’s chief architect and Anthropic’s MCP co-creator broke down “Honk” — Spotify’s internal agentic dev system built on Claude Code.
The numbers are pretty wild:
1,500+ PRs generated by the system
Top devs haven’t written code since December 2025
30% productivity gain per developer
Engineers now act as “editors-in-chief” — directing agents through Slack, reviewing PRs, making architectural decisions. The new bottleneck isn’t writing code. It’s reading and validating it fast enough to keep up with the agents.
Meta’s FP8 Quantization Playbook
Meta’s approach to FP8: don’t quantize everything. Run micro-benchmarks per layer. Quantize what tolerates it. Keep precision where it matters.
Result on Instagram’s ad ranking: +3% conversions, +5% CTR. At Instagram scale, that’s billions in incremental revenue from a quantization strategy. If you’re serving any large ranking model and treating quantization as all-or-nothing, you’re leaving money on the table.
Karpathy’s “LLM Wiki”
Karpathy keeps shipping ideas that turn into entire categories.
This one: point an LLM at a folder of raw research, let it build and maintain a structured markdown wiki. His own wiki grew to ~100 articles and 400,000 words with almost no manual editing.
Three layers: raw sources (you never edit these) → wiki (the LLM maintains it) → schema rules (CLAUDE.md tells the LLM how to organize things).
The community is calling it a simpler, more traceable alternative to RAG. No embedding infrastructure. No retrieval pipeline. Just markdown an LLM can navigate.
💡 What This Week Was Really About
Three things shifted this week. Together they tell us where we’re heading.
1. AI buyers became AI owners.
The Cerebras-OpenAI deal isn’t just a deal. It’s a new primitive.
The boundary between hyperscaler buyers and critical suppliers is collapsing into equity co-ownership.
2. People started pushing back on agents.
The resistance isn’t theoretical anymore. It’s showing up in code, contracts, and laws.
Legislation:
Maine passed the first state-wide data center ban
Sanders/AOC introduced a federal data center moratorium
Open source:
SDL banned AI-generated commits
Jellyfin published a formal anti-AI development policy
The trust gap between executives buying AI and engineers using it is the single biggest adoption problem nobody wants to talk about. It’s going to define the next year.
3. The capability gap is closing fast.
Stanford confirmed the US-China Arena gap is now 39 ELO points. It was 232 two years ago.
If your strategy assumes US/Western models will always be best-in-class, you’re betting on something that’s fragile right now.
⚡ Quick Hits
Stanford AI Index 2026 put the US-China Arena gap at 39 ELO points, down from 232 in 2024. Global AI data centers hit 29.6 GW — 2.5x in two years.
Iran opened the Strait of Hormuz, and oil crashed to $82.59 (-9.41% in one session, -25% in 10 days). S&P crossed 7,000 for the first time. Nasdaq hit its longest winning streak since 2009.
Cursor is raising at a $50B valuation — the highest ever for a developer tools company.
Perplexity got hit with a class-action privacy lawsuit — a 135-page complaint alleging “Incognito Mode” secretly shared chats with Google and Meta. Even paid Pro subscribers affected.
Anthropic’s Glasswing found thousands of zero-days including a 27-year-old OpenBSD flaw. Model is too powerful to release publicly.
Claude Opus 4.7’s new tokenizer uses 35-47% more tokens for the same text. That’s a 20-30% effective price hike despite “unchanged pricing.”
RAM shortage now projected through 2030 — only 60% of DRAM demand will be met by end of 2027. All three majors shifting capacity to HBM for AI. Expect 15-30% RAM price increases.

