MLWhiz Weekly AI/ML Newsletter # 2
Here is what happened this week.
🏆 Story of the Week: The Agent Platform War Has Officially Started
For years, the AI industry has been a model quality race. GPT vs Claude vs Gemini, benchmark after benchmark, parameter count after parameter count. This week, that race ended — and a new one began.
On Sunday, OpenAI’s CEO of Applications Fidji Simo announced what insiders are calling “Code Red”: the consolidation of ChatGPT, the Codex coding platform, and the Atlas browser into a single desktop superapp built around agentic task handling. The catalyst? Internal data showing Anthropic’s enterprise market share climbing to 40% while OpenAI’s fell to roughly 27%. Simo told employees they could no longer afford “side quests” — a direct shot at Sora, which briefly hit #1 in the App Store before usage flatlined.
But this isn’t just an OpenAI crisis story — it’s an industry-wide convergence. Within the same week, Meta shipped “My Computer”, a desktop agent from its $2B Manus acquisition, already integrated into Meta Ads Manager and WhatsApp Business. Anthropic’s Claude Dispatch — phone-to-desktop task routing — went live on Pro at $20/month. And OpenClaw crossed 210,000 GitHub stars, spawning ByteDance’s OpenViking context database (17.7K stars in one week) with persistent agent memory that cuts token costs by 95%.
The technical bet is about agentic continuity — maintaining a single context across research, coding, browsing, and execution without losing state. OpenAI’s superapp maintains context across modalities. Anthropic’s Dispatch takes the minimal approach: phone as remote control, confirmation for every action. Meta/Manus goes local-first with OS integration. OpenClaw is the open-source wild card, now with more GitHub stars than React or Linux.
The deeper signal is that GPT-5.4, Claude Opus 4.6, and Gemini 3.1 are all “good enough.” The differentiation now lives above the model layer: who owns the surface where work happens? OpenAI has the consumers, Anthropic has the developers, Meta has the advertisers, and OpenClaw has the open-source community. The agent platform war will determine who keeps all of them.
For practitioners, the message is to design for agentic workflows from day one. The standalone chatbot era is ending. The winners will be those whose agents integrate most seamlessly into existing work contexts — and the next 90 days will determine which platform becomes the default.
🤖 Models That Dropped This Week
GPT-5.4 Mini and Nano (OpenAI, March 17) — OpenAI extended the GPT-5 family downward with Mini and Nano variants targeting cost-sensitive and edge deployment. Intensifies competition with Mistral Small 4 and open-weight alternatives in the “small frontier model” category. (source)
Xiaomi MiMo-V2-Pro (Xiaomi, March 22) — The mystery “Hunter Alpha” model climbing leaderboards since March 11 turned out to be from a phone maker, not an AI lab. A 1T-parameter MoE (42B active) with 1M context window, ranking 3rd on ClawEval behind only Claude Opus 4.6. Beats Claude Sonnet 4.6 at coding at 67% lower cost. Xiaomi committed $8.7B in AI spending over three years. (source)
🧠 Papers That Matter
Deploying Semantic ID-based Generative Retrieval for Large-Scale Podcast Discovery at Spotify (GLIDE) — Spotify shipped the strongest published production result for generative recommendation systems. GLIDE uses semantic IDs — compact learned representations that let an LLM “generate” recommendations by outputting semantic tokens — combining instruction-following with collaborative filtering. The system handles natural language queries while delivering personalized results across a 10M+ podcast catalog.
The production numbers are exceptional: 5.4% increase in non-habitual podcast streaming and 14.3% improvement in new-show discovery in real A/B tests. What makes this architecturally important is that it unifies search and recommendation — users get personalization from collaborative filtering plus the flexibility of arbitrary natural language queries. A companion paper, NEO, extends the approach to consolidate search, recommendation, and reasoning in a single model. (paper)
SuperKMeans: A Super Fast K-means for Indexing Vector Embeddings — An unglamorous systems paper that will actually ship. K-means clustering is the backbone of ANN vector search indexes (IVF, FAISS, ScaNN), and standard k-means on high-dimensional embeddings is slow, bottlenecking index build time for production systems. SuperKMeans reliably prunes dimensions that don’t affect cluster assignment during each iteration, computing only the distances that matter.
The results: up to 7x faster than FAISS and Scikit-Learn on CPUs, and up to 4x faster than cuVS on GPUs, with no degradation in downstream search accuracy. It’s a drop-in replacement for k-means in existing IVF/FAISS pipelines. If you’ve ever had to choose between fresh embeddings and affordable index rebuilds, this paper directly addresses that tradeoff. A 7x clustering speedup means going from daily to near-real-time index updates without proportional compute cost. (paper)
ERank: Fusing SFT and RL for Effective Text Reranking — LLM rerankers face a fundamental tradeoff: pointwise scoring is efficient but misses global ranking signals; listwise ranking captures order but is expensive at inference. ERank solves this with a two-stage approach: train the model to output fine-grained integer scores (0-10) via SFT, then refine with RL using listwise-derived rewards for global ranking awareness — keeping pointwise efficiency at inference.
A 4B ERank model outperforms many 7B rerankers, and the 32B variant sets SOTA on the BRIGHT benchmark with nDCG@10 of 40.2, surpassing the Rank-R1-32B listwise reranker while being more inference-efficient. Reranking is the highest-leverage stage in production search and recommendation pipelines. ERank delivers listwise quality at pointwise cost — exactly what production systems need. (paper)
📝 Some Good Reads
“A Visual Guide to Attention Variants in Modern LLMs” (Sebastian Raschka) — A comprehensive visual walkthrough of every major attention variant in current open-weight architectures — from multi-head through grouped-query, multi-latent (DeepSeek), sliding window, differential, and native sparse attention. Covers hybrid patterns like Qwen3.5’s Gated DeltaNet + full attention in a 3:1 ratio. Accompanies a new LLM Architecture Gallery with 45+ visual model cards. The single best reference for understanding design choices behind Llama, DeepSeek, Gemma, and Qwen. (read it)
How Uber Uses AI for Development (Pragmatic Engineer) — The most detailed look yet at agentic coding inside a major tech company. 84% of Uber devs are agentic coding users, 65-72% of code is AI-generated in IDEs, and 11% of PRs are opened by AI agents. Claude Code usage nearly doubled from 32% to 63% in two months. Uber built five internal tools including Minion (background agent platform) and Autocover (5,000+ auto-generated unit tests per month). The catch: AI costs are up 6x since 2024. (read it)
Meta’s Ranking Engineer Agent (REA) — Meta Engineering unveiled an AI agent that autonomously optimizes advertisement ranking algorithms at scale. Not code completion — an agent iterating on ranking functions, running experiments, and improving Meta’s core revenue engine without human intervention. The shift from “AI assists engineers” to “AI is the engineer” for high-stakes production systems. (read it)
💡 What This Week Was Really About
Three forces collided this week, and the intersection defines where AI goes next.
The model race ended; the platform race began: When Xiaomi — a phone maker — can build a trillion-parameter model that ranks 3rd on ClawEval and beats Claude Sonnet at coding for 67% less, the message is inescapable: frontier model capability is commoditizing. The competition has moved to the layer above — who controls the agentic workflow surface where work actually happens.
Physical constraints are catching up with digital ambition: The Strait of Hormuz crisis represents the most consequential geopolitical moment for AI infrastructure in 2026. The AI infrastructure buildout assumed stable, cheap energy. That assumption is gone.
Regulation and accountability arrived simultaneously on multiple fronts: The era of building AI without regulatory, legal, and geopolitical constraints is definitively over.
The practitioners who thrive in the next 90 days will be those building efficient, hardware-portable, constraint-aware systems. The “just scale it” era ended this week. Efficiency is the new moat.
⚡ Quick Hits
Nvidia’s $1T projection and Vera Rubin launch — Jensen Huang opened GTC 2026 projecting at least $1 trillion in chip orders through 2027 and began shipping Vera Rubin, claiming 3.5x faster training and 5x faster inference over Blackwell. OpenAI, Anthropic, and Meta confirmed as customers. (TechCrunch)
Yann LeCun’s AMI Labs raises $1.03B — The largest AI seed round in European history, betting on world models and JEPA instead of LLM scaling. A strategic contrarian bet that investors are hedging as the LLM scaling regime shows diminishing returns. (TechCrunch)
Musk announces TERAFAB — A $20-25B joint Tesla/SpaceX/xAI chip fab in Austin targeting 2nm. Tesla has zero semiconductor manufacturing experience, and leading-edge fabs typically take 5-7 years. (Bloomberg)
Block lays off 4,000, stock jumps 25% — Jack Dorsey cited AI efficiency; critics call it “AI-washing” of post-overhiring corrections. Goldman estimates AI eliminates only 5K-10K jobs/month across all US sectors, far below the rhetoric. Sets a precedent where “AI” becomes the socially acceptable framing for any layoff. (Bloomberg)
ICML rejects 2% of papers for LLM-written reviews — First major conference to enforce anti-AI review policies at scale. The tension between AI productivity and institutional norms is hitting academic publishing. (ICML)
Karpathy’s AutoResearch goes viral — A 630-line script letting AI agents run hundreds of ML experiments overnight hit 22,983 GitHub stars in 3 days. Shopify’s CEO reported a 19% performance gain after 37 overnight experiments. The automation of AI research itself is becoming tangible. (GitHub)


