MLWhiz Weekly Recsys/ML/GenAI Newsletter # 10 - The week AI infrastructure crossed from a technology story to a financial one
Hey, Rahul here! 👋 Each week, I publish long-form ML+AI posts covering ML, AI, and System design for MLwhiz. Paid subscribers also get how-to guides with full code walkthroughs. I publish occasional extra articles. If you’d like to become a paid subscriber, here’s a button for that:
🏆 Story of the Week: The $35 Billion Deal That Turned AI Chips Into Toll Roads
Apollo, Blackstone, and a consortium of global banks closed the largest private financing in history to back Broadcom’s new AI XPV Platform. The target: 20+ gigawatts of AI compute capacity through 2028, with Anthropic and OpenAI as anchor customers.
Here’s why this matters more than anything released this week.
Until now, the AI compute buildout was constrained by four balance sheets: Google, Microsoft, Meta, and Amazon. The pace of AI infrastructure was gated by how fast those companies could allocate capital. Apollo just took their moat away.
Pension funds, insurance companies, and sovereign wealth can now finance AI compute the way they finance power plants and highways.
The Broadcom angle is the second story inside this deal. Broadcom CEO Hock Tan is running the VMware playbook: control the infrastructure layer, make it financeable, let private capital scale the deployment.
Money is no longer the binding constraint.
I think this is the week AI infrastructure crossed from a technology story to a financial one. Your cost of compute will probably drop as private capital floods in. Your cost of power and compliance will rise.
🤖 Models That Dropped This Week
Gemma 4 12B (Google) — The first encoder-free open multimodal model that runs on a laptop. No separate vision encoder, no CLIP adapter. Raw image patches flow directly into the transformer alongside text tokens. At 12B parameters with quantization, it fits on consumer GPUs. Download and test it. I was able to get it working with Ollama without a GPU, but was not able to get it working with Claude. Let me know if you are able to run Claude's code with this.
MiniMax M3 (MiniMax) — A Shanghai lab claiming 59.0% on SWE-Bench Pro (edging past GPT-5.5’s 58.6%), a 1M-token context window, and pricing at $0.60/$2.40 per 1M tokens. That’s 15x cheaper than Claude Opus on input. The MiniMax Sparse Attention architecture delivers 9x prefill speedup at 1M tokens. Benchmarks are vendor-reported and need verification.
MAI-Code-1-Flash (Microsoft) — Microsoft’s first in-house coding model, built without OpenAI data. 137B total / 5B active params via sparse MoE, 256K context. Claims +16 points over Claude Haiku 4.5 on SWE-Bench Pro. Priced at $0.75/M input tokens. Already rolling out in GitHub Copilot. The clearest signal yet of Microsoft-OpenAI decoupling.
🧠 Papers That Matter
Mult-DPO: Multinomial Direct Preference Optimization for Recommender Systems (Netflix) — I keep telling teams that LLM alignment techniques will reshape RecSys. Netflix just proved it. DPO works on pairwise preferences (”A is better than B”), but recommendation data is set-wise: multiple positive items, multiple negatives, no meaningful ordering among the positives. Forcing set-wise data into pairwise comparisons loses information on every training step.
Mult-DPO extends DPO to a multinomial formulation that handles sets directly. The model learns “all items in set S+ should outrank all items in S−” without imposing artificial order within the positives. If you’re training recommendation models with any flavor of pairwise loss, you’re leaving performance on the table. Benchmark against this.
📝 Some Good Reads
“When AI Builds Itself” (Anthropic Institute) — Anthropic published detailed research on recursive self-improvement mid-IPO, showing they’re already delegating a growing share of AI development to AI systems.
“Coding Is No Longer the Constraint” (Spotify Engineering) — Niklas Gustavsson at Spotify published the numbers: 99% of engineers use AI tools weekly, 650+ agent-generated PRs merged per month, 90% migration time reduction. The thesis: years of platform investment in CI/CD, testing, and docs now compound with AI agents. Companies that underinvest in dev platforms won’t benefit from AI coding.
⚡ Quick Hits
Google is paying SpaceX $920M/month to rent GPUs — the company that builds TPUs can’t build capacity fast enough. $11B/year to a competitor.
Anthropic’s Glasswing expanded to 200 organizations across 15+ countries. Mythos found 10,000+ high/critical vulnerabilities since April.
SpaceX/xAI prices its IPO Thursday at a $1.75T target valuation — Totally unethical. Just to put SpaceX in the Nasdaq 100. I think markets are in “greed mode.”
TurboVec hit #1 on GitHub trending — a Rust vector index fitting 10M documents in 4GB (vs. 31GB float32), beating FAISS by 12-20% on ARM, with no training step and filter-at-search-time. Need to look into this myself.





