MLWhiz Weekly Recsys/ML/GenAI Newsletter # 8 - The week of Google I/O 2026
Google I/O opened up a lot of eyes for major AI firms
Hey, Rahul here! 👋 Each week, I publish long-form ML+AI posts covering ML, AI, and System design for MLwhiz. Paid subscribers also get how-to guides with full code walkthroughs. I publish occasional extra articles. If you’d like to become a paid subscriber, here’s a button for that:
🏆 Story of the Week: Recapping Google I/O 2026
I sat through the Google I/O 2026 keynote on Tuesday and had to pause it multiple times. There were just so many updates. By the time it ended, Google had shipped a new flagship model, a true any-to-any multimodal model, the first 25-year redesign of Search, a 24/7 personal agent, a cybersecurity agent, native Android app generation from natural language, and a replacement for the Gemini CLI.
Here’s the recap.
Gemini 3.5 Flash. Firstly, Google’s new frontier model: 76.2% Terminal-Bench, 83.6% MCP Atlas, 4x faster output than competing frontier models. Gemini 3.5 Pro lands in June.
The 4x faster output is the number to focus on. For agentic workloads, total task completion time scales with decode speed times the number of tool-calling turns. A faster model that’s 90% as smart will beat a slower one that’s 100% as smart in any multi-turn system. And the thing is, this model is smart and fast.
I’d suggest routing high-volume, latency-sensitive workloads (autocomplete, real-time agents, streaming responses) to 3.5 Flash, keeping the hardest reasoning workloads on Opus, and putting a router in front so each query hits the right tier.
Gemini Omni. Google’s first true any-to-any multimodal model across text, image, audio, video, and 3D, in both directions. Native 3D generation is the new modality, and Google got there first.
Most “multimodal” models you’ve used so far were really text-with-vision. GPT-4o handles voice but generates text. Gemini 1.5 ingests video but produces words about it. Omni breaks that asymmetry. You can prompt the model with a video and get a video as an output. Or you can share some text + image, and you can get a video out. Possibilities are endless.
Honestly, this is going to break the context industry.
Search, rewritten. For 25 years, Google Search was a text box that returned blue links. 4 billion people built their digital lives on top of that one interaction model. On Tuesday, Google replaced it.
The new search box accepts text, images, files, videos, and entire Chrome tabs as input. The main things are “Search agents,” which are persistent AIs that monitor the web 24/7 for topics you specify and synthesize updates without you asking. So if you want to know the instant any of your favorite pro athletes announce a sneaker collab, your agent will let you know when a new drop lands so you don’t miss out. Information agents will launch first for Google AI Pro & Ultra subscribers this summer.
Think about what this does to the rest of the web. Every site that depends on Google traffic now competes with Google’s synthesized answer. Every product team that builds search UX inside their own app now has Google demonstrating what users will expect.
Gemini Spark. Spark is what Search-as-a-Platform looks like in your pocket. It runs continuously, monitors your Gmail and calendar, tracks topics you tell it to care about, surfaces decisions when something needs your input, and (with permission and a default $100/month spending cap) can make purchases on your behalf.
A persistent process is holding your credit card! That’s a different product category from the chat-with-tools products everyone has been calling agents, and it just changes what agents are capable of.
Android apps from natural language. Google’s AI Studio now generates native Android apps from a prompt. Describe an app, get an APK.
Mobile development is the largest long tail of developer work, and Google just dropped the cost curve to ship a real Android app to roughly zero for the typical user. The same way no-code web tools created a decade of niche SaaS, this opens a long tail of niche mobile apps from non-developers.
I’d use it right now for prototyping internal tools, personal apps, and one-off utilities where production polish doesn’t matter.
Antigravity replaces Gemini CLI. The Gemini CLI sunsets June 18. The replacement, Antigravity CLI, is built for multi-agent workflows.
Antigravity is designed around orchestrating multiple agents (planner, executor, validator) instead of running a single chat session.
If you have anything in production using Gemini CLI, start migrating today. The deadline is less than a month away. Worth reading the Antigravity docs even if you don’t use the Gemini CLI, because the multi-agent patterns will leak into Cursor, Claude Code, and the rest of the agent CLI ecosystem within months.
What it adds up to. Every I/O announcement is a piece of the same bet: agents are the next interaction layer, and Google plans to own the entry point. Search is the platform, Spark is the consumer agent, Omni is the multimodal substrate, and the rest fills in around them.
🤖 Models That Dropped This Week
Gemini 3.5 Flash (Google, May 20) : Google’s new frontier model. 76.2% Terminal-Bench 2.1, 83.6% MCP Atlas, 4x faster output tokens/second than the competition. Google’s bet that agents need fast and cheap, not just smart.
Gemini Omni (Google, May 20) : Omni is Google’s first true any-to-any multimodal model: text, image, audio, video, and 3D, in both directions. That last clause is the part that matters.
DeepSeek V4-Pro permanent price cut (May 26) : The big one. $0.87/M output tokens, permanently. 34x cheaper than GPT-5.5 at roughly 95% of the quality on coding benchmarks.
If you’re running 50M output tokens/day, you just went from $5,200/month to $1,300.
Lance (ByteDance, May 20) : A 3B-parameter unified multimodal model that handles image understanding, video understanding, image generation, image editing, and video generation in a single framework. Open weights.
🧠 Papers That Matter
Agent Meltdowns: The Road to Hell Is Paved with Helpful Agents (Cornell) : This is the paper to read if you’re shipping agents. It shows that frontier models, including Claude and GPT-4, systematically fail by being too helpful.
A coding agent that can’t find a test file doesn’t stop. It modifies production code to remove the test requirement. A browsing agent that can’t load a page doesn’t report the error. It tries alternative sources with decreasing reliability. The more capable the model, the more creative the meltdown.
Interesting, right?
Memento: RAG for User History (Meta) : Meta applies RAG-style retrieval to user engagement history, treating the user’s full history as a document corpus and each ad request as a query. Maximal Marginal Relevance retrieves diverse, relevant past interactions from the user’s activity corpus and uses them in downstream ranking models. The result: 1.2% CVR lift on Facebook Feed and Reels in production. 5-10x resource efficiency vs. linearly scaling to 365 days of history.
If your recommendation model uses a fixed-window user history (and it almost certainly does), this paper shows what you’re leaving on the table. You already know how to build retrieval systems. You already have the user history sitting in your feature store. Connect them.
⚡Quick Hits
Anthropic projects $10.9B Q2 revenue, first profit : Compute costs dropped 71 cents to 56 cents per revenue dollar. October IPO target.
Mistral acquires Emmi AI : 30+ physics AI researchers join. Mistral’s bet: industrial simulation and digital twins are the next moat, not chat.
Vatican releases 42,300-word AI encyclical : Lethal autonomous weapons declared “not permissible.”
LinkedIn deprioritizes AI slop with 94% accuracy : Posts suppressed, not removed. Bot comments and engagement bait also targeted. Their own “rewrite with AI” button still exists.
Meta begins 8,000 layoffs while raising 2026 capex to $145B : Combined effective reduction of 14,000 positions.









