MLWhiz Weekly Recsys/ML/GenAI Newsletter # 9 - The week AI started its IPOs
The AI industry is about to stop being a private market story. Quarterly earnings calls ask harder questions than venture capitalists.
Hey, Rahul here! 👋 Each week, I publish long-form ML+AI posts covering ML, AI, and System design for MLwhiz. Paid subscribers also get how-to guides with full code walkthroughs. I publish occasional extra articles. If you’d like to become a paid subscriber, here’s a button for that:
🏆 Story of the Week: The triple IPO summer begins
Anthropic filed its S-1 with the SEC on Sunday, targeting an October listing. Three days earlier, it had closed a $65 billion Series H at a $965 billion valuation. Run-rate revenue: $47 billion.
The valuation progression is wild: $380B in February, $900B in May, $965B by the end of the month. Three funding rounds in four months, and the number kept climbing.
And the product engine does work. Opus 4.8 shipped on Wednesday with dynamic workflows in Claude Code, a system for coordinating swarms of subagents on large tasks.
But Anthropic isn’t filing in isolation. SpaceX filed its public S-1 on May 20 and is targeting a June 12 Nasdaq listing. OpenAI filed confidentially around May 22, eyeing September at $1T+. Three companies with a combined potential market cap above $3 trillion, all going public within four months.
One analyst told CNN this is “either the most consequential IPO cycle since the dot-com era or the most expensive lesson in narrative-versus-fundamentals that public markets have ever taught.”
I think it’s both.
For practitioners, this filing matters for a specific reason: Anthropic’s public financials will show Claude’s actual cost structure for the first time. If margins are better than expected, token pricing could get more aggressive. If worse, expect increases. Either way, you’ll have the data to make informed build-vs-buy decisions instead of guessing.
The AI industry is about to stop being a private market story. Quarterly earnings calls ask harder questions than venture capitalists.
🤖 Models that dropped this week
Nvidia RTX Spark Superchip (Nvidia, Jun 1). NVIDIA entered the PC market with an ARM-based chip pairing a Blackwell GPU with 128GB unified memory and 1 petaflop of AI compute. It runs a 120-billion-parameter LLM locally with a 1-million-token context window. A million-token local context window means your entire codebase fits without a single API call. If this delivers even half of what’s claimed, it is a big thing to come in a long time.
Claude Opus 4.8 (Anthropic, May 29). The upgrade over Opus 4.7 shipped alongside dynamic workflows in Claude Code: coordinating swarms of subagents on large problems. The model is described as more “honest” about its mistakes. This is all great, but users are reporting they've torched over 700k to 1.2 million tokens on a single prompt.
Liquid AI LFM2.5-8B-A1B (Liquid AI, May 31). An 8B-param MoE with only 1B active parameters, trained on 38T tokens, with 128K context. Optimized for tool calling and agentic tasks on consumer hardware.
🧠 Papers that matter
UniPinRec: Unifying Generative Retrieval and Ranking at Pinterest Scale. I am always excited when Pinterest releases something, and they didn’t disappoint me this time either. They collapsed retrieval and ranking into a single shared transformer with three innovations: Masked Action Modeling, blended training that pairs action sequences with impression slates, and cross-stage KV cache sharing that computes user representations once for both stages.
The results: +1% online engagement, 11.1% latency reduction, 63.6% QPS improvement. If you’re a RecSys team lead, put this on your Q3 roadmap. (paper)
No More K-means: Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval. SSR replaces K-means clustering in ColBERT indexing with Sparse Autoencoders. The sparse representation converts token embeddings into exactly the format that inverted indexes are designed for. Results: 15x faster indexing, 2x faster retrieval. If you’ve been stuck on single-vector retrieval because ColBERT was too expensive to operate, that excuse just evaporated. (paper)
📝 Some good reads
Spotify Engineering: Better Experiments with LLM Evals. Only 12% of Spotify’s A/B tests ship positive outcomes, but 64% produce valid learning. Their framework: LLM evals should be a pre-experiment filter, not a replacement for experiments. Evals verify output quality (relevance, coherence, tone) before consuming experiment bandwidth. Experiments validate business impact. This settles a debate every ML team is having right now.
“Talk Is Cheap”: 22,000 developers, and LLMs may be destroying value. The most rigorous LLM productivity analysis yet, using Faros.ai telemetry across 22,000 developers and 4,000 teams. Individual speed improves modestly. System-level metrics are worse: deployment frequency dropped 11%, and code deletion ratios spiked. Current LLM usage in software development is likely destroying value.
⚡ Quick hits
• DeepSeek makes its 75% V4-Pro price cut permanent. Output pricing at $0.87/M tokens, roughly 34x cheaper than GPT-5.5.
• METR can’t replicate its own study. The 2025 study showed AI actually slowed developers. In 2026, developers refused to participate without AI tools. The control group no longer exists.









