MLWhiz Weekly Recsys/ML/GenAI Newsletter # 11 - The week US government pulled a frontier model offline on a letter
Trump Strikes!!!
Hey, Rahul here! 👋 Each week, I publish long-form ML+AI posts covering ML, AI, and System design for MLwhiz. Paid subscribers also get how-to guides with full code walkthroughs. I publish occasional extra articles. If you’d like to become a paid subscriber, here’s a button for that:
I love keeping track of everything week to week — here’s what happened this week. Enjoy this free weekly post! For those who want to dive deeper into any of these topics, that’s what my paid posts are for.
🏆 Story of the Week: The government pulled a frontier model offline
I didn’t think I’d watch a national government reach into a private company and switch off its best model this decade.
On Friday, June 13, the Commerce Department sent Anthropic an enforcement letter. By the weekend, Fable 5 and Mythos 5, Anthropic’s two most capable models, were offline for every customer. The order even barred non-Americans, including Anthropic’s own employees, from touching them.
The stated reason was an “unspecified national security concern.” (Rolling eyes) The letter was never made public.
Here’s the part I can’t get past. Amazon is Anthropic’s largest investor, and AWS hosts Anthropic’s models on Bedrock. And according to the Wall Street Journal, it was Amazon CEO Andy Jassy who raised the concerns with US officials that preceded the crackdown.
So the company that bankrolls Anthropic and runs its infrastructure is also the one that called the government about it. Anthropic pushed back publicly, calling the action “disproportionate to the narrow jailbreak finding.” They’re not wrong, and they also can’t ignore who placed the call.
The timing made it even stark. Anthropic was winning the enterprise race the same week its flagship models went dark.
BUT, the fight was helping Anthropic’s enterprise sales, not hurting them. Buyers seem to read “the government tried to pull this model” as a signal of capability.
This actually sets up a dangerous precedent. The export control law was built for physical goods and cryptography, things you can put in a crate. Applying it to a hosted API means the government can treat a frontier model like a controlled item and switch it off on the strength of a letter, with no public technical justification.
This raises the question of whether American AI can be trusted as infrastructure? When a government can recall your model in an afternoon, “self-hosted” stops being an ideological stance and becomes a business continuity requirement.
If your production stack has one hard dependency on a single vendor’s flagship API, you might want to build a fallback to an open-weight model you can run yourself, and test the failover.
🤖 Models that dropped this week
The open-weight side of the field had the loudest week I can remember, and the timing next to the Anthropic suspension was not lost on anyone.
GLM-5.2 (Z.ai, June 16) — Z.ai (formerly Zhipu) shipped a 753B-parameter MoE model with a 1M-token context window under an MIT license, and it scores 62.1 on SWE-bench Pro, edging out GPT-5.5’s 58.6 and near-tying Claude Opus 4.8 on long-horizon agentic suites like FrontierSWE and MCP-Atlas. The best part is $5.80 per million combined tokens, roughly one-sixth what you’d pay at the closed frontier. Long-horizon coding is exactly where open weights were supposed to fall apart, and they didn’t.
Nemotron 3 Ultra (NVIDIA, June 16) — A 550B MoE that mixes Mamba and Transformer blocks, ships a 1M-token context window, and runs about 6x higher inference throughput than comparable transformer LLMs. NVIDIA released base, post-trained, and quantized checkpoints plus the training data, which is rare at this scale.
Qwen-RobotSuite (Alibaba, June 17) — Alibaba’s Tongyi Lab open-sourced three robotics foundation models: RobotManip for vision-language-action manipulation (trained on 38,100+ hours of open data), RobotNav for navigation and driving, and RobotWorld, a video world model that predicts future physical states. RobotManip and RobotNav ship with public GitHub repos. Robotics was always going to be the next frontier, and this defines the field.
🧠 Papers that matter
On the Memorization Behavior of LLMs in Generative Recommendation — Generative recommenders keep beating classical baselines, and most teams just took the win. Snap Research went looking for why, and the answer is: most of the lift is “one-hop memorization,” concentrated on users whose target items were directly predictable from their history.
A big chunk of that leaderboard gain is the model building a better lookup table for easy users. They propose IIRG, which injects collaborative and semantic item relations to push the model toward real generalization, and it improves performance specifically on the non-memorizable users. So, before you ship a generative recommender because it beat your two-tower baseline, slice your eval by how memorizable each user is. The win you’re celebrating might vanish on the cold and complex users, which are the ones you actually need to get right.
Helmsman: Cost-Effective ANNS at Scale — HNSW has been the unquestioned default for production nearest-neighbor search since 2018, and its memory cost at billion-vector scale is the open secret nobody likes to discuss. At Meta, the memory overhead of graph indices was a constant budget fight. Xiaohongshu went back to clustering, the approach HNSW was supposed to have retired, and engineered it for modern hardware.
The result matches HNSW’s recall-latency tradeoff at much lower memory cost, and it’s deployed across their search, recommendation, and ads systems. If you’re running HNSW at scale and your infrastructure bill is uncomfortable, this is the benchmark to run this quarter.
📝 Some good reads
Open Source AI Must Win — It argues that AI is civilizational infrastructure that must stay free to study, build, deploy, and run. It went viral the same week a government switched off a closed frontier model on a letter, and the community connected the dots in real time.
Not everyone is using AI for everything (Gabriel Weinberg) — The DuckDuckGo founder makes the counter-hype case that AI adoption is far more uneven than the industry narrative claims, and most people use it occasionally rather than constantly.
⚡ Quick hits
Salesforce acquired AI customer-service platform Fin for $3.6B, one of the year’s biggest agentic-AI deals, and a sign that standalone agent startups are becoming acquisition targets rather than category winners.
Jeff Bezos’ Prometheus raised $12B at a $41B valuation to build an “artificial general engineer” for the physical world, betting on engineering automation over humanoid robots.
Sarvam became India’s newest AI unicorn with a $234M round led by HCLTech at a $1.5B valuation. Sovereign AI is now a fundable thesis with what we saw with Mythos.







