<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[MLWhiz: Recs|ML|GenAI]]></title><description><![CDATA[Making ML careers accessible and GenAI, Recsys, and MLOps understandable. 🔧 No-fluff guides and real-world insights to help you build, deploy, and advance in the machine learning and Generative AI ecosystem]]></description><link>https://www.mlwhiz.com</link><image><url>https://substackcdn.com/image/fetch/$s_!jdCB!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79ca48ed-d331-477b-aa19-029389751190_500x500.png</url><title>MLWhiz: Recs|ML|GenAI</title><link>https://www.mlwhiz.com</link></image><generator>Substack</generator><lastBuildDate>Sat, 13 Jun 2026 02:07:07 GMT</lastBuildDate><atom:link href="https://www.mlwhiz.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Rahul Agarwal]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[mlwhiz@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[mlwhiz@substack.com]]></itunes:email><itunes:name><![CDATA[Rahul Agarwal]]></itunes:name></itunes:owner><itunes:author><![CDATA[Rahul Agarwal]]></itunes:author><googleplay:owner><![CDATA[mlwhiz@substack.com]]></googleplay:owner><googleplay:email><![CDATA[mlwhiz@substack.com]]></googleplay:email><googleplay:author><![CDATA[Rahul Agarwal]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[MLWhiz Weekly Recsys/ML/GenAI Newsletter # 10 - The week AI infrastructure crossed from a technology story to a financial one ]]></title><description><![CDATA[Hey, Rahul here!]]></description><link>https://www.mlwhiz.com/p/mlwhiz-weekly-recsysmlgenai-newsletter-ed3</link><guid isPermaLink="false">https://www.mlwhiz.com/p/mlwhiz-weekly-recsysmlgenai-newsletter-ed3</guid><dc:creator><![CDATA[Rahul Agarwal]]></dc:creator><pubDate>Wed, 10 Jun 2026 23:56:34 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!gkhs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06405cf5-5aa1-4445-85da-8e6d616f71a6_2752x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yHq9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" width="1456" height="250" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:250,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77210,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p><em>Hey, Rahul here! &#128075; Each week, I publish long-form ML+AI posts covering ML, AI, and System design for MLwhiz. Paid subscribers also get how-to guides with full code walkthroughs. I publish occasional extra articles. If you&#8217;d like to become a paid subscriber, here&#8217;s a button for that:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.mlwhiz.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.mlwhiz.com/subscribe?"><span>Subscribe now</span></a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B1mx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B1mx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 424w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 848w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1272w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png" width="995" height="80" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:80,&quot;width&quot;:995,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15990,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B1mx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 424w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 848w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1272w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><h2>&#127942; Story of the Week: The $35 Billion Deal That Turned AI Chips Into Toll Roads</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gkhs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06405cf5-5aa1-4445-85da-8e6d616f71a6_2752x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gkhs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06405cf5-5aa1-4445-85da-8e6d616f71a6_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!gkhs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06405cf5-5aa1-4445-85da-8e6d616f71a6_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!gkhs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06405cf5-5aa1-4445-85da-8e6d616f71a6_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!gkhs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06405cf5-5aa1-4445-85da-8e6d616f71a6_2752x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gkhs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06405cf5-5aa1-4445-85da-8e6d616f71a6_2752x1536.png" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06405cf5-5aa1-4445-85da-8e6d616f71a6_2752x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7317164,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.mlwhiz.com/i/201526627?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06405cf5-5aa1-4445-85da-8e6d616f71a6_2752x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gkhs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06405cf5-5aa1-4445-85da-8e6d616f71a6_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!gkhs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06405cf5-5aa1-4445-85da-8e6d616f71a6_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!gkhs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06405cf5-5aa1-4445-85da-8e6d616f71a6_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!gkhs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06405cf5-5aa1-4445-85da-8e6d616f71a6_2752x1536.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Apollo, Blackstone, and a consortium of global banks <a href="https://www.apollo.com/insights-news/pressreleases/2026/06/apollo-leads-35-billion-capital-solution-for-broadcom-ai-xpv-platform-in-partnership-with-blackstone-and-leading-global-banks-3308896">closed the largest private financing in history</a> to back Broadcom&#8217;s new AI XPV Platform. The target: 20+ gigawatts of AI compute capacity through 2028, with Anthropic and OpenAI as anchor customers.</p><p>Here&#8217;s why this matters more than anything released this week.</p><p>Until now, the AI compute buildout was constrained by four balance sheets: Google, Microsoft, Meta, and Amazon. The pace of AI infrastructure was gated by how fast those companies could allocate capital. Apollo just took their moat away.</p><p>Pension funds, insurance companies, and sovereign wealth can now finance AI compute the way they finance power plants and highways.</p><p>The Broadcom angle is the second story inside this deal. Broadcom CEO Hock Tan is running the VMware playbook: control the infrastructure layer, make it financeable, let private capital scale the deployment.</p><p>Money is no longer the binding constraint.</p><p>I think this is the week AI infrastructure crossed from a technology story to a financial one. Your cost of compute will probably drop as private capital floods in. Your cost of power and compliance will rise. </p><div><hr></div><h2>&#129302; Models That Dropped This Week</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PfNd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F859ea03e-6b35-4a26-a25b-1d682de8f9be_3856x2256.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PfNd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F859ea03e-6b35-4a26-a25b-1d682de8f9be_3856x2256.png 424w, https://substackcdn.com/image/fetch/$s_!PfNd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F859ea03e-6b35-4a26-a25b-1d682de8f9be_3856x2256.png 848w, https://substackcdn.com/image/fetch/$s_!PfNd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F859ea03e-6b35-4a26-a25b-1d682de8f9be_3856x2256.png 1272w, https://substackcdn.com/image/fetch/$s_!PfNd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F859ea03e-6b35-4a26-a25b-1d682de8f9be_3856x2256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PfNd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F859ea03e-6b35-4a26-a25b-1d682de8f9be_3856x2256.png" width="1456" height="852" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/859ea03e-6b35-4a26-a25b-1d682de8f9be_3856x2256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:852,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;overview&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="overview" title="overview" srcset="https://substackcdn.com/image/fetch/$s_!PfNd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F859ea03e-6b35-4a26-a25b-1d682de8f9be_3856x2256.png 424w, https://substackcdn.com/image/fetch/$s_!PfNd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F859ea03e-6b35-4a26-a25b-1d682de8f9be_3856x2256.png 848w, https://substackcdn.com/image/fetch/$s_!PfNd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F859ea03e-6b35-4a26-a25b-1d682de8f9be_3856x2256.png 1272w, https://substackcdn.com/image/fetch/$s_!PfNd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F859ea03e-6b35-4a26-a25b-1d682de8f9be_3856x2256.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Gemma 4 12B (Google)</strong> &#8212; The first encoder-free open multimodal model that runs on a laptop. No separate vision encoder, no CLIP adapter. Raw image patches flow directly into the transformer alongside text tokens.  At 12B parameters with quantization, it fits on consumer GPUs. <a href="https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12b/">Download and test it</a>. I was able to get it working with Ollama <em><strong>without a GPU</strong></em>, but was not able to get it working with Claude. Let me know if you are able to run Claude's code with this. </p><p><strong>MiniMax M3 (MiniMax)</strong> &#8212; A Shanghai lab <a href="https://the-decoder.com/minimax-m3-open-weight-model-with-a-million-token-context-challenges-proprietary-leaders/">claiming 59.0% on SWE-Bench Pro</a> (edging past GPT-5.5&#8217;s 58.6%), a 1M-token context window, and pricing at $0.60/$2.40 per 1M tokens. That&#8217;s 15x cheaper than Claude Opus on input. The MiniMax Sparse Attention architecture delivers 9x prefill speedup at 1M tokens. Benchmarks are vendor-reported and need verification. </p><p><strong>MAI-Code-1-Flash (Microsoft)</strong> &#8212; Microsoft&#8217;s first in-house coding model, <a href="https://microsoft.ai/news/introducingmai-code-1-flash/">built without OpenAI data</a>. 137B total / 5B active params via sparse MoE, 256K context. Claims +16 points over Claude Haiku 4.5 on SWE-Bench Pro. Priced at $0.75/M input tokens. Already rolling out in GitHub Copilot. The clearest signal yet of Microsoft-OpenAI decoupling.</p><div><hr></div><h2>&#129504; Papers That Matter</h2><p><strong>Mult-DPO: Multinomial Direct Preference Optimization for Recommender Systems (Netflix)</strong> &#8212; I keep telling teams that LLM alignment techniques will reshape RecSys. Netflix just proved it. DPO works on pairwise preferences (&#8221;A is better than B&#8221;), but recommendation data is set-wise: multiple positive items, multiple negatives, no meaningful ordering among the positives. Forcing set-wise data into pairwise comparisons loses information on every training step.</p><p>Mult-DPO extends DPO to a multinomial formulation that handles sets directly. The model learns &#8220;all items in set S+ should outrank all items in S&#8722;&#8221; without imposing artificial order within the positives. If you&#8217;re training recommendation models with any flavor of pairwise loss, you&#8217;re leaving performance on the table. <a href="https://arxiv.org/abs/2606.10078">Benchmark against this</a>.</p><div><hr></div><h2>&#128221; Some Good Reads</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!w5rS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173a5fe-577d-4429-95c6-60b0001ebf19_2200x1276.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!w5rS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173a5fe-577d-4429-95c6-60b0001ebf19_2200x1276.webp 424w, https://substackcdn.com/image/fetch/$s_!w5rS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173a5fe-577d-4429-95c6-60b0001ebf19_2200x1276.webp 848w, https://substackcdn.com/image/fetch/$s_!w5rS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173a5fe-577d-4429-95c6-60b0001ebf19_2200x1276.webp 1272w, https://substackcdn.com/image/fetch/$s_!w5rS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173a5fe-577d-4429-95c6-60b0001ebf19_2200x1276.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!w5rS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173a5fe-577d-4429-95c6-60b0001ebf19_2200x1276.webp" width="1456" height="844" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2173a5fe-577d-4429-95c6-60b0001ebf19_2200x1276.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:844,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Bar graph showing code contributed per person, per quarter, starting in Q2 2021 and ending in Q2 2026. The graph notes the release dates of eight different models: Claude 1, Claude 2, Claude 3, Claude 4, Claude Code, Claude Sonnet 4.5, Claude Opus 4.5, Claude Mythos Preview (internal access), and Claude Mythos Preview.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Bar graph showing code contributed per person, per quarter, starting in Q2 2021 and ending in Q2 2026. The graph notes the release dates of eight different models: Claude 1, Claude 2, Claude 3, Claude 4, Claude Code, Claude Sonnet 4.5, Claude Opus 4.5, Claude Mythos Preview (internal access), and Claude Mythos Preview." title="Bar graph showing code contributed per person, per quarter, starting in Q2 2021 and ending in Q2 2026. The graph notes the release dates of eight different models: Claude 1, Claude 2, Claude 3, Claude 4, Claude Code, Claude Sonnet 4.5, Claude Opus 4.5, Claude Mythos Preview (internal access), and Claude Mythos Preview." srcset="https://substackcdn.com/image/fetch/$s_!w5rS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173a5fe-577d-4429-95c6-60b0001ebf19_2200x1276.webp 424w, https://substackcdn.com/image/fetch/$s_!w5rS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173a5fe-577d-4429-95c6-60b0001ebf19_2200x1276.webp 848w, https://substackcdn.com/image/fetch/$s_!w5rS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173a5fe-577d-4429-95c6-60b0001ebf19_2200x1276.webp 1272w, https://substackcdn.com/image/fetch/$s_!w5rS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2173a5fe-577d-4429-95c6-60b0001ebf19_2200x1276.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>&#8220;When AI Builds Itself&#8221; (Anthropic Institute)</strong> &#8212; Anthropic published <a href="https://www.anthropic.com/institute/recursive-self-improvement">detailed research on recursive self-improvement</a> mid-IPO, showing they&#8217;re already delegating a growing share of AI development to AI systems. </p><p><strong>&#8220;Coding Is No Longer the Constraint&#8221; (Spotify Engineering)</strong> &#8212; Niklas Gustavsson at Spotify <a href="https://engineering.atspotify.com/2026/6/code-with-claude-coding-is-no-longer-the-constraint">published the numbers</a>: 99% of engineers use AI tools weekly, 650+ agent-generated PRs merged per month, 90% migration time reduction. The thesis: years of platform investment in CI/CD, testing, and docs now compound with AI agents. Companies that underinvest in dev platforms won&#8217;t benefit from AI coding.</p><div><hr></div><h2>&#9889; Quick Hits</h2><ul><li><p><strong><a href="https://techcrunch.com/2026/06/05/google-will-pay-spacex-920m-per-month-for-compute/">Google is paying SpaceX $920M/month to rent GPUs</a></strong> &#8212; the company that builds TPUs can&#8217;t build capacity fast enough. $11B/year to a competitor. </p></li><li><p><strong><a href="https://www.anthropic.com/news/expanding-project-glasswing">Anthropic&#8217;s Glasswing expanded to 200 organizations</a></strong> across 15+ countries. Mythos found 10,000+ high/critical vulnerabilities since April. </p></li><li><p><strong><a href="https://www.cnbc.com/2026/06/05/trump-open-ai-altman-stake.html">SpaceX/xAI prices its IPO Thursday at a $1.75T target valuation</a></strong> &#8212; Totally unethical. Just to put SpaceX in the Nasdaq 100. I think markets are in &#8220;greed mode.&#8221;</p></li><li><p><strong><a href="https://github.com/RyanCodrai/turbovec">TurboVec hit #1 on GitHub trending</a></strong> &#8212; a Rust vector index fitting 10M documents in 4GB (vs. 31GB float32), beating FAISS by 12-20% on ARM, with no training step and filter-at-search-time. Need to look into this myself.</p></li></ul>]]></content:encoded></item><item><title><![CDATA[The Transformer, Demystified — Let's Actually Build One]]></title><description><![CDATA[GenAI Series Part 2: Implementing a Transformer]]></description><link>https://www.mlwhiz.com/p/the-transformer-demystified-lets</link><guid isPermaLink="false">https://www.mlwhiz.com/p/the-transformer-demystified-lets</guid><dc:creator><![CDATA[Rahul Agarwal]]></dc:creator><pubDate>Fri, 05 Jun 2026 22:47:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!bq_1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F317e653c-074e-4297-9a47-437631370409_2000x1125.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Hey, Rahul here! &#128075; Each week, I publish long-form ML+AI posts covering ML, AI, and System design for MLwhiz. Paid subscribers also get how-to guides with full code walkthroughs. I publish occasional extra articles. If you&#8217;d like to become a paid subscriber, here&#8217;s a button for that:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.mlwhiz.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.mlwhiz.com/subscribe?"><span>Subscribe now</span></a></p><blockquote><p><em>Over the coming weeks, I&#8217;ll be writing more about GenAI, including topics like pre-training and post-training. This post is the second one of the foundational pieces meant to set up that series.</em></p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B1mx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B1mx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 424w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 848w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1272w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png" width="995" height="80" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:80,&quot;width&quot;:995,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15990,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B1mx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 424w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 848w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1272w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bq_1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F317e653c-074e-4297-9a47-437631370409_2000x1125.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bq_1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F317e653c-074e-4297-9a47-437631370409_2000x1125.png 424w, https://substackcdn.com/image/fetch/$s_!bq_1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F317e653c-074e-4297-9a47-437631370409_2000x1125.png 848w, https://substackcdn.com/image/fetch/$s_!bq_1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F317e653c-074e-4297-9a47-437631370409_2000x1125.png 1272w, https://substackcdn.com/image/fetch/$s_!bq_1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F317e653c-074e-4297-9a47-437631370409_2000x1125.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bq_1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F317e653c-074e-4297-9a47-437631370409_2000x1125.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/317e653c-074e-4297-9a47-437631370409_2000x1125.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Understanding Transformers, the Programming Way&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Understanding Transformers, the Programming Way" title="Understanding Transformers, the Programming Way" srcset="https://substackcdn.com/image/fetch/$s_!bq_1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F317e653c-074e-4297-9a47-437631370409_2000x1125.png 424w, https://substackcdn.com/image/fetch/$s_!bq_1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F317e653c-074e-4297-9a47-437631370409_2000x1125.png 848w, https://substackcdn.com/image/fetch/$s_!bq_1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F317e653c-074e-4297-9a47-437631370409_2000x1125.png 1272w, https://substackcdn.com/image/fetch/$s_!bq_1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F317e653c-074e-4297-9a47-437631370409_2000x1125.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Transformers run most of modern NLP, but they&#8217;re still surprisingly hard to internalize from a diagram alone. In my <a href="https://www.mlwhiz.com/p/transformers">last post</a>, I walked through how they work &#8212; the encoder, decoder, and the data flow between them. </p><p>This post is where we stop reading and start building: an end-to-end English-to-German translator in PyTorch, written from scratch with a Transformer at its core. Because the fastest way to actually understand something is to implement it.</p><div><hr></div><h2><strong>Task Description</strong></h2><p>We want to create a translator that uses transformers to convert English to German. So, if we look at it as a black box, our network takes as input an English sentence and returns a German sentence.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!f8Vo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0836bab3-77d4-4cd4-bf82-ac33c56a6a75_1683x273.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!f8Vo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0836bab3-77d4-4cd4-bf82-ac33c56a6a75_1683x273.png 424w, https://substackcdn.com/image/fetch/$s_!f8Vo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0836bab3-77d4-4cd4-bf82-ac33c56a6a75_1683x273.png 848w, https://substackcdn.com/image/fetch/$s_!f8Vo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0836bab3-77d4-4cd4-bf82-ac33c56a6a75_1683x273.png 1272w, https://substackcdn.com/image/fetch/$s_!f8Vo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0836bab3-77d4-4cd4-bf82-ac33c56a6a75_1683x273.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!f8Vo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0836bab3-77d4-4cd4-bf82-ac33c56a6a75_1683x273.png" width="1456" height="236" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0836bab3-77d4-4cd4-bf82-ac33c56a6a75_1683x273.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:236,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Transformer for Translation&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Transformer for Translation" title="Transformer for Translation" srcset="https://substackcdn.com/image/fetch/$s_!f8Vo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0836bab3-77d4-4cd4-bf82-ac33c56a6a75_1683x273.png 424w, https://substackcdn.com/image/fetch/$s_!f8Vo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0836bab3-77d4-4cd4-bf82-ac33c56a6a75_1683x273.png 848w, https://substackcdn.com/image/fetch/$s_!f8Vo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0836bab3-77d4-4cd4-bf82-ac33c56a6a75_1683x273.png 1272w, https://substackcdn.com/image/fetch/$s_!f8Vo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0836bab3-77d4-4cd4-bf82-ac33c56a6a75_1683x273.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Transformer for Translation</figcaption></figure></div><div><hr></div><h2><strong>Data Preprocessing</strong></h2><p>To train our English-German translation Model, we will need translated sentence pairs between English and German.</p><p>Fortunately, there is a pretty standard way to get these with the OPUS-100 dataset (English-German subset), a curated multilingual translation corpus we can access via HuggingFace datasets. </p><p>Also, before we really get into the whole coding part, let us understand what we need as input and output to the model while training. We will actually need two matrices to be input to our Network:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!K5hx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1239a84c-833b-4900-98e5-b5a3602bdba5_2664x860.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!K5hx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1239a84c-833b-4900-98e5-b5a3602bdba5_2664x860.png 424w, https://substackcdn.com/image/fetch/$s_!K5hx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1239a84c-833b-4900-98e5-b5a3602bdba5_2664x860.png 848w, https://substackcdn.com/image/fetch/$s_!K5hx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1239a84c-833b-4900-98e5-b5a3602bdba5_2664x860.png 1272w, https://substackcdn.com/image/fetch/$s_!K5hx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1239a84c-833b-4900-98e5-b5a3602bdba5_2664x860.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!K5hx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1239a84c-833b-4900-98e5-b5a3602bdba5_2664x860.png" width="1456" height="470" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1239a84c-833b-4900-98e5-b5a3602bdba5_2664x860.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:470,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:162548,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.mlwhiz.com/i/199921894?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1239a84c-833b-4900-98e5-b5a3602bdba5_2664x860.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!K5hx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1239a84c-833b-4900-98e5-b5a3602bdba5_2664x860.png 424w, https://substackcdn.com/image/fetch/$s_!K5hx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1239a84c-833b-4900-98e5-b5a3602bdba5_2664x860.png 848w, https://substackcdn.com/image/fetch/$s_!K5hx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1239a84c-833b-4900-98e5-b5a3602bdba5_2664x860.png 1272w, https://substackcdn.com/image/fetch/$s_!K5hx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1239a84c-833b-4900-98e5-b5a3602bdba5_2664x860.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div>
      <p>
          <a href="https://www.mlwhiz.com/p/the-transformer-demystified-lets">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[MLWhiz Weekly Recsys/ML/GenAI Newsletter # 9 - The week AI started its IPOs]]></title><description><![CDATA[The AI industry is about to stop being a private market story. Quarterly earnings calls ask harder questions than venture capitalists.]]></description><link>https://www.mlwhiz.com/p/mlwhiz-weekly-recsysmlgenai-newsletter-204</link><guid isPermaLink="false">https://www.mlwhiz.com/p/mlwhiz-weekly-recsysmlgenai-newsletter-204</guid><dc:creator><![CDATA[Rahul Agarwal]]></dc:creator><pubDate>Tue, 02 Jun 2026 23:30:44 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Tk2D!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13deaa0f-4905-4b5a-95f9-642caec9f0ff_2752x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yHq9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" width="1456" height="250" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:250,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77210,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p><em>Hey, Rahul here! &#128075; Each week, I publish long-form ML+AI posts covering ML, AI, and System design for MLwhiz. Paid subscribers also get how-to guides with full code walkthroughs. I publish occasional extra articles. If you&#8217;d like to become a paid subscriber, here&#8217;s a button for that:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.mlwhiz.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.mlwhiz.com/subscribe?"><span>Subscribe now</span></a></p>
      <p>
          <a href="https://www.mlwhiz.com/p/mlwhiz-weekly-recsysmlgenai-newsletter-204">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Understanding Transformers, the MLE Way]]></title><description><![CDATA[GenAI Series Part 1: What even are transformers?]]></description><link>https://www.mlwhiz.com/p/transformers</link><guid isPermaLink="false">https://www.mlwhiz.com/p/transformers</guid><dc:creator><![CDATA[Rahul Agarwal]]></dc:creator><pubDate>Fri, 29 May 2026 23:02:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!9xqm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46e91704-a007-4a59-8ea0-ec51c088a8f1_3840x2400.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yHq9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" width="1456" height="250" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:250,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77210,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p><em>Hey, Rahul here! &#128075; Each week, I publish long-form ML+AI posts covering ML, AI, and System design for MLwhiz. Paid subscribers also get how-to guides with full code walkthroughs. I publish occasional extra articles. If you&#8217;d like to become a paid subscriber, here&#8217;s a button for that:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.mlwhiz.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.mlwhiz.com/subscribe?"><span>Subscribe now</span></a></p><blockquote><p><em>Over the coming weeks, I&#8217;ll be writing more about GenAI, including topics like pre-training and post-training. This post is one of the foundational pieces meant to set up that series.</em></p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B1mx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B1mx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 424w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 848w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1272w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png" width="995" height="80" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:80,&quot;width&quot;:995,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15990,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B1mx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 424w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 848w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1272w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9xqm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46e91704-a007-4a59-8ea0-ec51c088a8f1_3840x2400.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9xqm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46e91704-a007-4a59-8ea0-ec51c088a8f1_3840x2400.jpeg 424w, https://substackcdn.com/image/fetch/$s_!9xqm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46e91704-a007-4a59-8ea0-ec51c088a8f1_3840x2400.jpeg 848w, https://substackcdn.com/image/fetch/$s_!9xqm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46e91704-a007-4a59-8ea0-ec51c088a8f1_3840x2400.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!9xqm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46e91704-a007-4a59-8ea0-ec51c088a8f1_3840x2400.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9xqm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46e91704-a007-4a59-8ea0-ec51c088a8f1_3840x2400.jpeg" width="1456" height="910" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/46e91704-a007-4a59-8ea0-ec51c088a8f1_3840x2400.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Understanding Transformers, the Data Science Way&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Understanding Transformers, the Data Science Way" title="Understanding Transformers, the Data Science Way" srcset="https://substackcdn.com/image/fetch/$s_!9xqm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46e91704-a007-4a59-8ea0-ec51c088a8f1_3840x2400.jpeg 424w, https://substackcdn.com/image/fetch/$s_!9xqm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46e91704-a007-4a59-8ea0-ec51c088a8f1_3840x2400.jpeg 848w, https://substackcdn.com/image/fetch/$s_!9xqm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46e91704-a007-4a59-8ea0-ec51c088a8f1_3840x2400.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!9xqm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46e91704-a007-4a59-8ea0-ec51c088a8f1_3840x2400.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Transformers have become the de facto standard for almost everything. Though the architecture was introduced for NLP, it now powers computer vision, recommender systems, and&#8212;most importantly&#8212;the entire wave of modern LLMs. </p><p>Yet for all their ubiquity, transformers remain as hard to understand as ever.</p><p>It has taken me multiple readings through the Google research <strong><a href="https://arxiv.org/pdf/1706.03762.pdf">paper</a></strong> that first introduced transformers, along with just so many blog posts, to really understand how a transformer works.</p><p>So, I thought of putting the whole idea down in as simple words as possible, and with some very basic Math and some puns, as I am a proponent of having some fun while learning. I will try to keep both the jargon and the technicality to a minimum, yet it is such a topic that I could only do so much. And my goal is to make the reader understand even the most gory details of Transformer by the end of this post.</p><p><em><strong>Also, this is officially my longest post, both in terms of time taken to write it as well as the length of the post. Hence, I will advise you to Grab A Coffee.</strong></em> &#9749;&#65039;</p><p>Before we dive in, here&#8217;s the path we&#8217;ll walk together: we&#8217;ll start with the big picture of what a transformer even does, then crack open the <strong>encoder</strong> stack (attention, feed-forward, positional encodings, and those mysterious &#8220;Add &amp; Norm&#8221; boxes). From there, we&#8217;ll move to the <strong>decoder</strong> stack and the masking trick that makes it tick, bolt on an <strong>output head</strong> to actually get our German words, and finish with how the whole thing is <strong>trained</strong> and how it makes <strong>predictions</strong> at test time. Long road, but I promise the view is worth it. Onwards.</p><div><hr></div><p><em><strong>Q: So, why should I even understand Transformer?</strong></em></p><p>In the past, the LSTM and GRU architecture(as explained here in my past <strong><a href="https://www.mlwhiz.com/p/deeplearning_architectures_text_classification">post</a></strong> on NLP), along with the attention mechanism, used to be the State of the Art Approach for Language modeling problems (put very simply, predict the next word) and Translation systems. But the main problem with these architectures is that they are recurrent in nature, and the runtime increases as the sequence length increases. That is, these architectures take a sentence and process each word in a <em><strong>sequential</strong></em> way, and hence, with the increase in sentence length, the whole runtime increases.</p><p>Transformer, a model architecture first explained in the paper Attention is all you need, lets go of this recurrence and instead relies entirely on an attention mechanism to draw global dependencies between input and output. And that makes it FAST.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!l5i7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57b44dee-2f73-4c7e-8569-6f95c48d436a_380x560.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!l5i7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57b44dee-2f73-4c7e-8569-6f95c48d436a_380x560.png 424w, https://substackcdn.com/image/fetch/$s_!l5i7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57b44dee-2f73-4c7e-8569-6f95c48d436a_380x560.png 848w, https://substackcdn.com/image/fetch/$s_!l5i7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57b44dee-2f73-4c7e-8569-6f95c48d436a_380x560.png 1272w, https://substackcdn.com/image/fetch/$s_!l5i7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57b44dee-2f73-4c7e-8569-6f95c48d436a_380x560.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!l5i7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57b44dee-2f73-4c7e-8569-6f95c48d436a_380x560.png" width="380" height="560" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/57b44dee-2f73-4c7e-8569-6f95c48d436a_380x560.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:560,&quot;width&quot;:380,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;\n\n<a href=\&quot;https://arxiv.org/pdf/1706.03762.pdf\&quot; target=\&quot;_blank\&quot; rel=\&quot;nofollow noopener\&quot;>Source</a>\n&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="

<a href=&quot;https://arxiv.org/pdf/1706.03762.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;>Source</a>
" title="

<a href=&quot;https://arxiv.org/pdf/1706.03762.pdf&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;>Source</a>
" srcset="https://substackcdn.com/image/fetch/$s_!l5i7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57b44dee-2f73-4c7e-8569-6f95c48d436a_380x560.png 424w, https://substackcdn.com/image/fetch/$s_!l5i7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57b44dee-2f73-4c7e-8569-6f95c48d436a_380x560.png 848w, https://substackcdn.com/image/fetch/$s_!l5i7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57b44dee-2f73-4c7e-8569-6f95c48d436a_380x560.png 1272w, https://substackcdn.com/image/fetch/$s_!l5i7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F57b44dee-2f73-4c7e-8569-6f95c48d436a_380x560.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">From the Paper</figcaption></figure></div><p>This is the picture of the full transformer as taken from the paper. And, it surely is intimidating. So, I will aim to demystify it in this post by going through each piece. So read ahead.</p><div><hr></div><h2><strong>The Big Picture</strong></h2><p><em><strong>Q: That sounds interesting. So, what does a transformer do exactly?</strong></em></p><p>Essentially, a transformer can perform almost any NLP task. It can be used for language modeling, Translation, or Classification as required, and it does it fast by removing the sequential nature of the problem. So, the transformer in a machine translation application would convert one language to another, or for a classification problem will provide the class probability using an appropriate output layer.</p><p>It all will depend on the final output layer for the network; the Transformer basic structure will remain quite the same for any task. For this particular post, I will be continuing with the machine translation example.</p><p>So, from a very high place, this is how the transformer looks for a translation task. It takes as input an English sentence and returns a German sentence.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DoSv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e97e6e-5c72-499d-854d-1b60fee54897_1683x273.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DoSv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e97e6e-5c72-499d-854d-1b60fee54897_1683x273.png 424w, https://substackcdn.com/image/fetch/$s_!DoSv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e97e6e-5c72-499d-854d-1b60fee54897_1683x273.png 848w, https://substackcdn.com/image/fetch/$s_!DoSv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e97e6e-5c72-499d-854d-1b60fee54897_1683x273.png 1272w, https://substackcdn.com/image/fetch/$s_!DoSv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e97e6e-5c72-499d-854d-1b60fee54897_1683x273.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DoSv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e97e6e-5c72-499d-854d-1b60fee54897_1683x273.png" width="1456" height="236" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/36e97e6e-5c72-499d-854d-1b60fee54897_1683x273.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:236,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Transformer for Translation&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Transformer for Translation" title="Transformer for Translation" srcset="https://substackcdn.com/image/fetch/$s_!DoSv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e97e6e-5c72-499d-854d-1b60fee54897_1683x273.png 424w, https://substackcdn.com/image/fetch/$s_!DoSv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e97e6e-5c72-499d-854d-1b60fee54897_1683x273.png 848w, https://substackcdn.com/image/fetch/$s_!DoSv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e97e6e-5c72-499d-854d-1b60fee54897_1683x273.png 1272w, https://substackcdn.com/image/fetch/$s_!DoSv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36e97e6e-5c72-499d-854d-1b60fee54897_1683x273.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Transformer for Translation</figcaption></figure></div><div><hr></div><h2><strong>The Building Blocks</strong></h2><p><em><strong>Q: That was too basic. &#128526; Can you expand on it?</strong></em></p><p>Okay, just remember in the end, you asked for it. Let&#8217;s go a little deeper and try to understand what a transformer is composed of.</p><p>So, a transformer is essentially composed of a stack of encoder and decoder layers. The role of an encoder layer is to encode the English sentence into a numerical form using the attention mechanism, while the decoder aims to use the encoded information from the encoder layers to give the German translation for the particular English sentence.</p><p>In the figure below, the transformer is given an English sentence as input, which gets encoded using 6 encoder layers. The output from the final encoder layer then goes to each decoder layer to translate English to German.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iywS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F545d075a-b6b9-4ff9-907a-2f333b68d113_1530x1473.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iywS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F545d075a-b6b9-4ff9-907a-2f333b68d113_1530x1473.png 424w, https://substackcdn.com/image/fetch/$s_!iywS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F545d075a-b6b9-4ff9-907a-2f333b68d113_1530x1473.png 848w, https://substackcdn.com/image/fetch/$s_!iywS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F545d075a-b6b9-4ff9-907a-2f333b68d113_1530x1473.png 1272w, https://substackcdn.com/image/fetch/$s_!iywS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F545d075a-b6b9-4ff9-907a-2f333b68d113_1530x1473.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iywS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F545d075a-b6b9-4ff9-907a-2f333b68d113_1530x1473.png" width="1456" height="1402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/545d075a-b6b9-4ff9-907a-2f333b68d113_1530x1473.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1402,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Data Flow in a Transformer&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Data Flow in a Transformer" title="Data Flow in a Transformer" srcset="https://substackcdn.com/image/fetch/$s_!iywS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F545d075a-b6b9-4ff9-907a-2f333b68d113_1530x1473.png 424w, https://substackcdn.com/image/fetch/$s_!iywS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F545d075a-b6b9-4ff9-907a-2f333b68d113_1530x1473.png 848w, https://substackcdn.com/image/fetch/$s_!iywS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F545d075a-b6b9-4ff9-907a-2f333b68d113_1530x1473.png 1272w, https://substackcdn.com/image/fetch/$s_!iywS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F545d075a-b6b9-4ff9-907a-2f333b68d113_1530x1473.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Data Flow in a Transformer</figcaption></figure></div><div><hr></div><h2><strong>1. Encoder Architecture</strong></h2><p><em><strong>Q: That&#8217;s alright, but how does an encoder stack encode an English sentence exactly?</strong></em></p><p>Patience, I am getting to it. So, as I said, the encoder stack contains six encoder layers on top of each other(As given in the paper, but the future versions of transformers use even more layers). And each encoder in the stack has essentially two main layers:</p><ul><li><p><strong>a multi-head self-attention Layer, and</strong></p></li><li><p><strong>a position-wise fully connected feed-forward network</strong></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TMPS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0d60a9-5325-475d-8427-79ba33253a5a_400x255.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TMPS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0d60a9-5325-475d-8427-79ba33253a5a_400x255.png 424w, https://substackcdn.com/image/fetch/$s_!TMPS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0d60a9-5325-475d-8427-79ba33253a5a_400x255.png 848w, https://substackcdn.com/image/fetch/$s_!TMPS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0d60a9-5325-475d-8427-79ba33253a5a_400x255.png 1272w, https://substackcdn.com/image/fetch/$s_!TMPS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0d60a9-5325-475d-8427-79ba33253a5a_400x255.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TMPS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0d60a9-5325-475d-8427-79ba33253a5a_400x255.png" width="400" height="255" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bd0d60a9-5325-475d-8427-79ba33253a5a_400x255.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:255,&quot;width&quot;:400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Very basic encoder Layer&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Very basic encoder Layer" title="Very basic encoder Layer" srcset="https://substackcdn.com/image/fetch/$s_!TMPS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0d60a9-5325-475d-8427-79ba33253a5a_400x255.png 424w, https://substackcdn.com/image/fetch/$s_!TMPS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0d60a9-5325-475d-8427-79ba33253a5a_400x255.png 848w, https://substackcdn.com/image/fetch/$s_!TMPS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0d60a9-5325-475d-8427-79ba33253a5a_400x255.png 1272w, https://substackcdn.com/image/fetch/$s_!TMPS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0d60a9-5325-475d-8427-79ba33253a5a_400x255.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Very basic encoder Layer</figcaption></figure></div><p>They are a mouthful. Right? Don&#8217;t lose me yet as I will explain both of them in the coming sections. Right now, just remember that the encoder layer incorporates attention and a position-wise feed-forward network.</p><p><em><strong>Q: But, how does this layer expect its inputs to be?</strong></em></p><p>This layer expects its inputs to be of the shape <code>SxD</code> (as shown in the figure below) where <code>S</code> is the source sentence(English Sentence) length, and <code>D</code> is the dimension of the embedding whose weights can be trained with the network. In this post, we will be using D as 512 by default throughout. While S will be the maximum length of a sentence in a batch. So it normally changes with batches.</p>
      <p>
          <a href="https://www.mlwhiz.com/p/transformers">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[HSTU From Scratch in PyTorch - A complete Walkthrough]]></title><description><![CDATA[RecSys for MLEs Part 9d: data pipeline, three sub-layers, retrieval + rating loss, and benchmarking against rectools' HSTU on MovieLens-1M]]></description><link>https://www.mlwhiz.com/p/hstu-from-scratch-in-pytorch-a-complete</link><guid isPermaLink="false">https://www.mlwhiz.com/p/hstu-from-scratch-in-pytorch-a-complete</guid><dc:creator><![CDATA[Rahul Agarwal]]></dc:creator><pubDate>Thu, 28 May 2026 02:11:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!flhs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94dcab62-17f1-401f-83ef-b333b161df68_2752x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yHq9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" width="1456" height="250" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:250,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77210,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p><em>Hey, Rahul here! &#128075; Each week, I publish long-form ML+AI posts covering ML, AI, and System design for MLwhiz. Paid subscribers also get how-to guides with full code walkthroughs. I publish occasional extra articles. If you&#8217;d like to become a paid subscriber, here&#8217;s a button for that:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.mlwhiz.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.mlwhiz.com/subscribe?"><span>Subscribe now</span></a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B1mx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B1mx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 424w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 848w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1272w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png" width="995" height="80" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:80,&quot;width&quot;:995,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15990,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B1mx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 424w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 848w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1272w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!flhs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94dcab62-17f1-401f-83ef-b333b161df68_2752x1536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!flhs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94dcab62-17f1-401f-83ef-b333b161df68_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!flhs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94dcab62-17f1-401f-83ef-b333b161df68_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!flhs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94dcab62-17f1-401f-83ef-b333b161df68_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!flhs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94dcab62-17f1-401f-83ef-b333b161df68_2752x1536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!flhs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94dcab62-17f1-401f-83ef-b333b161df68_2752x1536.png" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94dcab62-17f1-401f-83ef-b333b161df68_2752x1536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6149188,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.mlwhiz.com/i/199390246?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94dcab62-17f1-401f-83ef-b333b161df68_2752x1536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!flhs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94dcab62-17f1-401f-83ef-b333b161df68_2752x1536.png 424w, https://substackcdn.com/image/fetch/$s_!flhs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94dcab62-17f1-401f-83ef-b333b161df68_2752x1536.png 848w, https://substackcdn.com/image/fetch/$s_!flhs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94dcab62-17f1-401f-83ef-b333b161df68_2752x1536.png 1272w, https://substackcdn.com/image/fetch/$s_!flhs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94dcab62-17f1-401f-83ef-b333b161df68_2752x1536.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>This is Part 9d of the RecSys for MLEs series. <a href="https://www.mlwhiz.com/p/hstu-how-meta-built-a-trillion-parameter">Part 9c explained why HSTU works</a>: the softmax-to-SiLU switch that preserves engagement intensity, relative attention bias that gives the model a sense of time, and M-FALCON for cheap candidate scoring. This post is the hands-on follow-up. We&#8217;re going to train an HSTU from scratch, on the MovieLens-1M dataset, and benchmark it against the rectools library&#8217;s reference HSTU implementation, so you have a real number to compare against.</em></p><div><hr></div><p>Last week, I published the conceptual deep dive on HSTU. Within 24 hours, the most common question in my replies was the same: <em>&#8220;OK, but how do I actually train one?&#8221;</em></p><p>Fair. So today we build it from scratch &#8212; the fused (item, action) input layer, all three HSTU sub-layers, the multi-task retrieval + rating heads, and the M-FALCON inference cache. We&#8217;ll train it on MovieLens-1M, then benchmark head-to-head against rectools&#8217; reference HSTU implementation, so you have a real number to compare against.</p><p>We&#8217;ll use PyTorch 2.x on a single GPU (a free Colab T4 works fine). I&#8217;m intentionally keeping the model small (D=64, 2 layers) so it trains in few hours on free hardware. And to make sure I wasn&#8217;t fooling myself with vanity numbers, I trained the <a href="https://github.com/MobileTeleSystems/RecTools">rectools library&#8217;s HSTU</a> on the same data and split, then ran my from-scratch model with a similar training config. That way, if my numbers come out worse, I know exactly where to look.</p><p>Here&#8217;s what we&#8217;ll cover:</p><ul><li><p><strong>The dataset</strong>: how MovieLens ratings map to the (item, action, time) triples HSTU consumes</p></li><li><p><strong>The fused input layer</strong>: item embedding + action embedding + fusion MLP</p></li><li><p><strong>The HSTU block</strong>: all three sub-layers as a single PyTorch module &#8212; SiLU attention, RAB, gated transformation</p></li><li><p><strong>Multi-task heads</strong>: retrieval (sampled softmax with cosine similarity + temperature) and rating prediction</p></li><li><p><strong>Results</strong>: HR@10 and NDCG@10 against rectools HSTU and SASRec, plus the SiLU vs softmax ablation</p></li><li><p><strong>Example predictions</strong>: what the model actually recommends for specific MovieLens users</p></li><li><p><strong>Retrieval and ranking demos</strong>: brute-force, FAISS, and ranking by retrieval score + predicted rating</p></li><li><p><strong>M-FALCON inference</strong>: the K/V caching trick that makes serving feasible</p></li></ul><div class="callout-block" data-callout="true"><p><strong>Notebooks to read alongside the post:</strong> </p><p><a href="https://www.kaggle.com/code/mlwhiz/hstu-from-scratch">&#128211; hstu-from-scratch-ml1m-v2.ipynb</a> &#8212; the from-scratch HSTU we build in this post </p><p><a href="https://www.kaggle.com/code/mlwhiz/rectools-hstu">&#128211; rectools-ml1m.ipynb</a> &#8212; the rectools HSTU + SASRec baseline notebook for the comparison numbers </p></div><div><hr></div><h2>1. The dataset: MovieLens-1M ratings as (item, action, time) triples</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k2g8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8cc014b-d6ac-4c94-ab86-573a37546195_2020x897.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k2g8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8cc014b-d6ac-4c94-ab86-573a37546195_2020x897.png 424w, https://substackcdn.com/image/fetch/$s_!k2g8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8cc014b-d6ac-4c94-ab86-573a37546195_2020x897.png 848w, https://substackcdn.com/image/fetch/$s_!k2g8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8cc014b-d6ac-4c94-ab86-573a37546195_2020x897.png 1272w, https://substackcdn.com/image/fetch/$s_!k2g8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8cc014b-d6ac-4c94-ab86-573a37546195_2020x897.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k2g8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8cc014b-d6ac-4c94-ab86-573a37546195_2020x897.png" width="1456" height="647" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f8cc014b-d6ac-4c94-ab86-573a37546195_2020x897.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:647,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Data pipeline: raw MovieLens ratings get mapped to POSITIVE/NEUTRAL/NEGATIVE actions, sorted into per-user sequences, then split into train/val/test via leave-one-out&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Data pipeline: raw MovieLens ratings get mapped to POSITIVE/NEUTRAL/NEGATIVE actions, sorted into per-user sequences, then split into train/val/test via leave-one-out" title="Data pipeline: raw MovieLens ratings get mapped to POSITIVE/NEUTRAL/NEGATIVE actions, sorted into per-user sequences, then split into train/val/test via leave-one-out" srcset="https://substackcdn.com/image/fetch/$s_!k2g8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8cc014b-d6ac-4c94-ab86-573a37546195_2020x897.png 424w, https://substackcdn.com/image/fetch/$s_!k2g8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8cc014b-d6ac-4c94-ab86-573a37546195_2020x897.png 848w, https://substackcdn.com/image/fetch/$s_!k2g8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8cc014b-d6ac-4c94-ab86-573a37546195_2020x897.png 1272w, https://substackcdn.com/image/fetch/$s_!k2g8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8cc014b-d6ac-4c94-ab86-573a37546195_2020x897.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>One of the most basic but important questions that we need to answer is how the data is structured. We&#8217;re using <a href="https://grouplens.org/datasets/movielens/1m/">MovieLens-1M</a> from GroupLens which contains ~1M ratings across 6,040 users and 3,706 movies.</p><p>Each row is <code>(user, movie, rating, timestamp)</code> where rating is 1-5 stars. SASRec would treat every rating as one positive interaction. HSTU&#8217;s input can be richer &#8212; it can fuse the <em>action type</em> alongside the item ID &#8212; so I want to give the model the rating sentiment, not just the fact that the user watched the movie. This is how I&#8217;m incorporating an action signal in this model. Honestly, every setup can have a different action vocabulary &#8212; add-to-cart vs. purchase on an e-commerce store, click vs. like vs. share on a feed, watch-25% vs. watch-90% vs. skip on a video platform. The point I want to make here is that HSTU lets you encode whichever signal actually matters for your domain, not just &#8220;the user clicked this item.&#8221;</p><p>For our case, we create three behavioral signals, derived from the rating value:</p><ul><li><p><strong>POSITIVE</strong> (rating &#8805; 4): the user liked the movie</p></li><li><p><strong>NEUTRAL</strong> (rating = 3): the user was indifferent</p></li><li><p><strong>NEGATIVE</strong> (rating &lt; 3): the user disliked the movie</p></li></ul><p>The actual rating value (1-5) is kept separately as the label for the rating-prediction head.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">import pandas as pd
import numpy as np

ratings = pd.read_csv(&#8221;ml-1m/ratings.dat&#8221;, sep=&#8221;::&#8221;, header=None,
                      names=[&#8221;user&#8221;, &#8220;item&#8221;, &#8220;rating&#8221;, &#8220;ts&#8221;], engine=&#8221;python&#8221;)

def rating_to_action(r):
    if r &gt;= 4: return &#8220;POSITIVE&#8221;
    if r == 3: return &#8220;NEUTRAL&#8221;
    return &#8220;NEGATIVE&#8221;

events = ratings.copy()
events[&#8221;action&#8221;] = events[&#8221;rating&#8221;].apply(rating_to_action)
events[&#8221;value&#8221;]  = events[&#8221;rating&#8221;].astype(np.float32)

print(events.action.value_counts())
# POSITIVE    575281
# NEUTRAL     261197
# NEGATIVE    163731</code></pre></div><p><strong>Train/test split: leave-one-out.</strong> For each user, the last interaction goes to test, and everything before that goes to training. During training, the <em>last item of the training sequence</em> is held out as the validation target.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">def split_seq(seq, max_len=200):
    n = len(seq[&#8221;items&#8221;])
    if n &lt; 3: return None
    history_slice = slice(max(0, n - 1 - max_len), n - 1)
    history = {k: seq[k][history_slice].tolist() for k in seq}
    test_target = (
        int(seq[&#8221;items&#8221;][n - 1]),
        int(seq[&#8221;actions&#8221;][n - 1]),
        int(seq[&#8221;times&#8221;][n - 1]),
        float(seq[&#8221;values&#8221;][n - 1]),
    )
    return history, test_target

splits = [s for s in (split_seq(seq) for seq in sequences) if s is not None]
# Train/test sequences: 6,040
</code></pre></div><p><code>max_len=200</code> is the cap. With an average length of 165, most users will fit. Long-tail users get truncated to their most recent 200 events.</p><div><hr></div><h2>2. The fused input: item embedding + action embedding + fusion MLP</h2><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aaUT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e236ce-d3e9-49d9-9fc6-e271c52b323d_2798x525.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aaUT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e236ce-d3e9-49d9-9fc6-e271c52b323d_2798x525.png 424w, https://substackcdn.com/image/fetch/$s_!aaUT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e236ce-d3e9-49d9-9fc6-e271c52b323d_2798x525.png 848w, https://substackcdn.com/image/fetch/$s_!aaUT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e236ce-d3e9-49d9-9fc6-e271c52b323d_2798x525.png 1272w, https://substackcdn.com/image/fetch/$s_!aaUT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e236ce-d3e9-49d9-9fc6-e271c52b323d_2798x525.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aaUT!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e236ce-d3e9-49d9-9fc6-e271c52b323d_2798x525.png" width="1200" height="225" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/18e236ce-d3e9-49d9-9fc6-e271c52b323d_2798x525.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:273,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Fused input layer: item_emb (D) and action_emb (D) concatenate to (B, T, 2D), pass through Linear &#8594; SiLU &#8594; Linear MLP, output a single fused vector (B, T, D)&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="Fused input layer: item_emb (D) and action_emb (D) concatenate to (B, T, 2D), pass through Linear &#8594; SiLU &#8594; Linear MLP, output a single fused vector (B, T, D)" title="Fused input layer: item_emb (D) and action_emb (D) concatenate to (B, T, 2D), pass through Linear &#8594; SiLU &#8594; Linear MLP, output a single fused vector (B, T, D)" srcset="https://substackcdn.com/image/fetch/$s_!aaUT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e236ce-d3e9-49d9-9fc6-e271c52b323d_2798x525.png 424w, https://substackcdn.com/image/fetch/$s_!aaUT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e236ce-d3e9-49d9-9fc6-e271c52b323d_2798x525.png 848w, https://substackcdn.com/image/fetch/$s_!aaUT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e236ce-d3e9-49d9-9fc6-e271c52b323d_2798x525.png 1272w, https://substackcdn.com/image/fetch/$s_!aaUT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e236ce-d3e9-49d9-9fc6-e271c52b323d_2798x525.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Here&#8217;s the actual code.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">import torch
import torch.nn as nn
import torch.nn.functional as F

class FusedInputEmbedding(nn.Module):
    &#8220;&#8221;&#8220;Item embedding + action embedding, fused with an MLP into a single D-dim vector.&#8221;&#8220;&#8221;
    def __init__(self, num_items, num_actions, dim):
        super().__init__()
        self.item_emb   = nn.Embedding(num_items + 1, dim, padding_idx=0)
        self.action_emb = nn.Embedding(num_actions + 1, dim, padding_idx=0)
        self.fuse = nn.Sequential(
            nn.Linear(2 * dim, dim),
            nn.SiLU(),
            nn.Linear(dim, dim),
        )

    def forward(self, item_ids, action_ids):
        i = self.item_emb(item_ids)      # (B, T, D)
        a = self.action_emb(action_ids)  # (B, T, D)
        return self.fuse(torch.cat([i, a], dim=-1))  # (B, T, D)
</code></pre></div><p>What&#8217;s happening here is that each item ID looks up a D-dim vector in <code>item_emb</code> (the &#8220;what is this item&#8221; signal) table, and each action ID looks up another D-dim vector in <code>action_emb</code> (the &#8220;how did the user engage with it&#8221; signal). </p><p>The forward pass then concatenates them into a 2D-dim vector per token, then the <code>fuse</code> MLP projects back to D dims.</p><p>The MLP matters. It&#8217;s what lets the model learn that &#8220;watched Toy Story with rating 5&#8221; represents something different from &#8220;watched Toy Story with rating 1&#8221;. A simple sum or concatenation wouldn&#8217;t give the model the capacity to learn that interaction.</p><div><hr></div><h2>3. The HSTU block in PyTorch</h2><p>Now, it is time to write the actual HSTU block. If you want to refresh on what each piece does, <a href="https://www.mlwhiz.com/p/hstu-how-meta-built-a-trillion-parameter">Part 9c walks through the three sub-layers conceptually</a>. Here I&#8217;ll just translate that directly into code.</p><p>First, the relative attention bias (RAB) module. Two learnable tables: one for relative position offset, one for log-spaced time buckets.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">class RelativeAttentionBias(nn.Module):
    &#8220;&#8221;&#8220;Learnable position + time biases, added to QK^T before SiLU.&#8221;&#8220;&#8221;
    TIME_BUCKETS = [
        (0, 3600),                  # 0-1 hour
        (3600, 86400),              # 1-24 hours
        (86400, 86400 * 7),         # 1-7 days
        (86400 * 7, 86400 * 30),    # 7-30 days
        (86400 * 30, float(&#8221;inf&#8221;)), # 30+ days
    ]

    def __init__(self, max_seq_len):
        super().__init__()
        self.max_seq_len = max_seq_len
        self.pos_bias  = nn.Embedding(2 * max_seq_len - 1, 1)
        self.time_bias = nn.Embedding(len(self.TIME_BUCKETS), 1)
        nn.init.zeros_(self.pos_bias.weight)
        nn.init.zeros_(self.time_bias.weight)

    def _bucket(self, time_deltas):
        out = torch.zeros_like(time_deltas, dtype=torch.long)
        for k, (lo, hi) in enumerate(self.TIME_BUCKETS):
            mask = (time_deltas &gt;= lo) &amp; (time_deltas &lt; hi)
            out = torch.where(mask, torch.full_like(out, k), out)
        return out

    def forward(self, times):
        B, T = times.shape
        idx = torch.arange(T, device=times.device)
        rel_pos = (idx.view(T, 1) - idx.view(1, T)) + (self.max_seq_len - 1)
        pos_b = self.pos_bias(rel_pos).squeeze(-1)
        time_deltas = (times.unsqueeze(2) - times.unsqueeze(1)).abs()
        time_b = self.time_bias(self._bucket(time_deltas)).squeeze(-1)
        return pos_b.unsqueeze(0) + time_b
</code></pre></div><p>Initialize both bias tables to zeros. That way, the first forward pass behaves like a vanilla attention block, and the time/position biases learn from gradient flow.</p><p>Now the HSTU block itself:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">class HSTUBlock(nn.Module):
    def __init__(self, dim, max_seq_len, dropout=0.2):
        super().__init__()
        self.linear_in  = nn.Linear(dim, 4 * dim)
        self.rab        = RelativeAttentionBias(max_seq_len)
        self.norm       = nn.LayerNorm(dim)
        self.linear_out = nn.Linear(dim, dim)
        self.dropout    = nn.Dropout(dropout)

    def forward(self, x, times, attn_mask):
        B, T, D = x.shape

        # --- Sub-layer 1: pointwise projection ---
        proj = F.silu(self.linear_in(x))
        K, Q, V, U = proj.chunk(4, dim=-1)

        # --- Sub-layer 2: spatial aggregation (SiLU attention + RAB + causal/pad mask) ---
        scores = torch.matmul(Q, K.transpose(-2, -1)) + self.rab(times)
        causal = torch.tril(torch.ones(T, T, device=x.device))
        scores = scores * causal
        pad_mask = attn_mask.unsqueeze(1).float()
        scores = scores * pad_mask
        activated = F.silu(scores)                     # NOT softmax &#8212; pointwise SiLU
        attn_out  = torch.matmul(activated, V) / T     # 1/T normalization

        # --- Sub-layer 3: gated transformation + residual ---
        gated  = self.norm(attn_out) * U
        output = self.dropout(self.linear_out(gated))
        return output + x
</code></pre></div><p>Stacking blocks is one line:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">class HSTUEncoder(nn.Module):
    def __init__(self, num_items, num_actions, dim, num_layers, max_seq_len, dropout=0.2):
        super().__init__()
        self.embed  = FusedInputEmbedding(num_items, num_actions, dim)
        self.blocks = nn.ModuleList([
            HSTUBlock(dim, max_seq_len, dropout=dropout) for _ in range(num_layers)
        ])

    def forward(self, item_ids, action_ids, times, attn_mask):
        x = self.embed(item_ids, action_ids)
        for block in self.blocks:
            x = block(x, times, attn_mask)
        return x
</code></pre></div><p>That&#8217;s the encoder.</p><div><hr></div><p><em><strong>The rest of this post covers the multi-task heads (retrieval + rating), the full results table comparing our from-scratch HSTU to rectools HSTU and SASRec, the SiLU vs softmax ablation, the learned time-bias curve, M-FALCON inference benchmark (210&#215; speedup), and a production field guide.</strong></em><strong> </strong></p>
      <p>
          <a href="https://www.mlwhiz.com/p/hstu-from-scratch-in-pytorch-a-complete">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[MLWhiz Weekly Recsys/ML/GenAI Newsletter # 8 - The week of Google I/O 2026]]></title><description><![CDATA[Google I/O opened up a lot of eyes for major AI firms]]></description><link>https://www.mlwhiz.com/p/mlwhiz-weekly-recsysmlgenai-newsletter-3da</link><guid isPermaLink="false">https://www.mlwhiz.com/p/mlwhiz-weekly-recsysmlgenai-newsletter-3da</guid><dc:creator><![CDATA[Rahul Agarwal]]></dc:creator><pubDate>Tue, 26 May 2026 21:01:25 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/93b225fa-032d-404a-a4c8-77af99f9a054_2752x1566.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yHq9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" width="1456" height="250" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:250,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77210,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p><em>Hey, Rahul here! &#128075; Each week, I publish long-form ML+AI posts covering ML, AI, and System design for MLwhiz. Paid subscribers also get how-to guides with full code walkthroughs. I publish occasional extra articles. If you&#8217;d like to become a paid subscriber, here&#8217;s a button for that:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.mlwhiz.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.mlwhiz.com/subscribe?"><span>Subscribe now</span></a></p>
      <p>
          <a href="https://www.mlwhiz.com/p/mlwhiz-weekly-recsysmlgenai-newsletter-3da">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[MLWhiz Weekly Recsys/ML/GenAI Newsletter # 7 - The week Karpathy Joined Anthropic]]></title><description><![CDATA[The week Andrej Karpathy picked his side, and everyone else picked theirs.]]></description><link>https://www.mlwhiz.com/p/mlwhiz-weekly-recsysmlgenai-newsletter-0e1</link><guid isPermaLink="false">https://www.mlwhiz.com/p/mlwhiz-weekly-recsysmlgenai-newsletter-0e1</guid><dc:creator><![CDATA[Rahul Agarwal]]></dc:creator><pubDate>Tue, 19 May 2026 22:40:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uYEj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F238cbfd7-6731-4b61-918c-cf2a4d49f950_1410x804.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yHq9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" width="1456" height="250" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:250,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77210,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p><em>Hey, Rahul here! &#128075; Each week, I publish long-form ML+AI posts covering ML, AI, and System design for MLwhiz. Paid subscribers also get how-to guides with full code walkthroughs. I publish occasional extra articles. If you&#8217;d like to become a paid subscriber, here&#8217;s a button for that:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.mlwhiz.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.mlwhiz.com/subscribe?"><span>Subscribe now</span></a></p>
      <p>
          <a href="https://www.mlwhiz.com/p/mlwhiz-weekly-recsysmlgenai-newsletter-0e1">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[HSTU: How Meta Built a Trillion-Parameter Recommender That Actually Scales]]></title><description><![CDATA[The architecture, the math, the code, and why every RecSys team is suddenly building one]]></description><link>https://www.mlwhiz.com/p/hstu-how-meta-built-a-trillion-parameter</link><guid isPermaLink="false">https://www.mlwhiz.com/p/hstu-how-meta-built-a-trillion-parameter</guid><dc:creator><![CDATA[Rahul Agarwal]]></dc:creator><pubDate>Mon, 18 May 2026 22:19:45 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!5pvs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26ba8291-e5b4-4682-bf7c-4921c9b062aa_2752x1536.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yHq9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" width="1456" height="250" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:250,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77210,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p><em>Hey, Rahul here! &#128075; Each week, I publish long-form ML+AI posts covering ML, AI, and System design for MLwhiz. Paid subscribers also get how-to guides with full code walkthroughs. I publish occasional extra articles. If you&#8217;d like to become a paid subscriber, here&#8217;s a button for that:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.mlwhiz.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.mlwhiz.com/subscribe?"><span>Subscribe now</span></a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B1mx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B1mx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 424w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 848w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1272w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png" width="995" height="80" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:80,&quot;width&quot;:995,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15990,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B1mx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 424w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 848w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1272w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5pvs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26ba8291-e5b4-4682-bf7c-4921c9b062aa_2752x1536.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5pvs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26ba8291-e5b4-4682-bf7c-4921c9b062aa_2752x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!5pvs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26ba8291-e5b4-4682-bf7c-4921c9b062aa_2752x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!5pvs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26ba8291-e5b4-4682-bf7c-4921c9b062aa_2752x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!5pvs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26ba8291-e5b4-4682-bf7c-4921c9b062aa_2752x1536.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5pvs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26ba8291-e5b4-4682-bf7c-4921c9b062aa_2752x1536.jpeg" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/26ba8291-e5b4-4682-bf7c-4921c9b062aa_2752x1536.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;HSTU hero&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="HSTU hero" title="HSTU hero" srcset="https://substackcdn.com/image/fetch/$s_!5pvs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26ba8291-e5b4-4682-bf7c-4921c9b062aa_2752x1536.jpeg 424w, https://substackcdn.com/image/fetch/$s_!5pvs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26ba8291-e5b4-4682-bf7c-4921c9b062aa_2752x1536.jpeg 848w, https://substackcdn.com/image/fetch/$s_!5pvs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26ba8291-e5b4-4682-bf7c-4921c9b062aa_2752x1536.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!5pvs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26ba8291-e5b4-4682-bf7c-4921c9b062aa_2752x1536.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>This is Part 9c of the RecSys for MLEs series. <a href="https://www.mlwhiz.com/p/rnns-to-transformers-sequential-recommenders">In Part 9a, we built GRU4Rec and SASRec from scratch</a> on the Steam Games dataset and got our hands dirty with sequential recommenders. <a href="https://www.mlwhiz.com/p/semantic-ids-rqvae-generative-recommender">In Part 9b, we covered Semantic IDs and TIGER</a>, Google&#8217;s clever approach to generative retrieval, where item IDs become a learned token vocabulary. Now we close the arc with the architecture that&#8217;s actually running at Meta scale: HSTU.</em></p><div><hr></div><p>Late 2022. You&#8217;re an ML engineer on a recommendation team, and you&#8217;ve just shipped a beefed-up SASRec model. The metrics look great. Your director walks over and says: <em>&#8220;What if we just scaled this thing? More layers, bigger embeddings, a hundred billion parameters. Like the LLM folks are doing.&#8221;</em></p><p>So you do. And for a while, the model gets better. Then it stops getting better. Then you throw more compute at it and nothing. No improvement. A bigger, slower, more expensive model that performs about the same.</p><p>Frustrating, because over in NLP-land the scaling laws are <em>clean</em>: double the compute, get a predictably better model. GPT-3 had proven that. LLaMA was about to prove it again. DLRMs (Deep Learning Recommendation Models)? They kept plateauing. Something was fundamentally broken, and nobody could quite articulate what.</p><p>Meta&#8217;s answer was <strong>HSTU</strong> (Hierarchical Sequential Transduction Units). </p><p>Here&#8217;s what we&#8217;ll work through in this post:</p><ul><li><p><strong>The three structural issues</strong> that quietly sabotage standard Transformers when you point them at recommendation data</p></li><li><p><strong>What HSTU actually consumes</strong>: the fused (item, action) input format, plus the actual training data schema (impression table + history table)</p></li><li><p><strong>Inside the HSTU block</strong>: three sub-layers, with the math, the code, the intuition, and the diagrams</p></li><li><p><strong>M-FALCON</strong>: the caching optimization that lets you score 10,000 candidates without recomputing 8,000 history events 10,000 times</p></li></ul><p>By the end, you should understand exactly <em>why</em> HSTU works, <em>how</em> it works, and why every big tech RecSys team is suddenly building one.</p><div><hr></div><h2>1. Three reasons standard Transformers fail for RecSys</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hzpz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5527b6e5-fb30-4670-89e5-7ec9b60a4e0b_2168x1025.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hzpz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5527b6e5-fb30-4670-89e5-7ec9b60a4e0b_2168x1025.png 424w, https://substackcdn.com/image/fetch/$s_!hzpz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5527b6e5-fb30-4670-89e5-7ec9b60a4e0b_2168x1025.png 848w, https://substackcdn.com/image/fetch/$s_!hzpz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5527b6e5-fb30-4670-89e5-7ec9b60a4e0b_2168x1025.png 1272w, https://substackcdn.com/image/fetch/$s_!hzpz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5527b6e5-fb30-4670-89e5-7ec9b60a4e0b_2168x1025.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hzpz!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5527b6e5-fb30-4670-89e5-7ec9b60a4e0b_2168x1025.png" width="1200" height="567.032967032967" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5527b6e5-fb30-4670-89e5-7ec9b60a4e0b_2168x1025.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:688,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Softmax vs SiLU Attention&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="Softmax vs SiLU Attention" title="Softmax vs SiLU Attention" srcset="https://substackcdn.com/image/fetch/$s_!hzpz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5527b6e5-fb30-4670-89e5-7ec9b60a4e0b_2168x1025.png 424w, https://substackcdn.com/image/fetch/$s_!hzpz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5527b6e5-fb30-4670-89e5-7ec9b60a4e0b_2168x1025.png 848w, https://substackcdn.com/image/fetch/$s_!hzpz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5527b6e5-fb30-4670-89e5-7ec9b60a4e0b_2168x1025.png 1272w, https://substackcdn.com/image/fetch/$s_!hzpz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5527b6e5-fb30-4670-89e5-7ec9b60a4e0b_2168x1025.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Three structural mismatches between language data and recommendation data make standard Transformers fail at scale. These are fundamental, not edge cases you can patch with a clever hack.</p><p>If you&#8217;d like a refresher on <a href="https://www.mlwhiz.com/p/transformers">how standard Transformer attention works</a> before going further, that post will get you up to speed. From here on, I&#8217;ll assume you&#8217;re comfortable with Q, K, V, softmax, and self-attention.</p><h3>Problem 01: non-stationary vocabularies</h3><p>In NLP, your vocabulary is fixed. &#8220;The&#8221; is always token 42. &#8220;Recommendation&#8221; is always token 18,973. The model trains with a stable set of possible next tokens, and that set doesn&#8217;t really change between training and serving.</p><p>In recommendations, your &#8220;vocabulary&#8221; is the entire item catalog, and that catalog is constantly changing. New videos go live every second on Reels. A trending creator didn&#8217;t exist yesterday. The model you trained last Tuesday has never seen most of the items it&#8217;s being asked to score on a Friday.</p><p><strong>Softmax was designed for a fixed vocabulary with stable class boundaries.</strong> It&#8217;s <em>defined</em> over a set of mutually exclusive classes that sum to 1. When the set of possible &#8220;next tokens&#8221; is a moving target, when items appear and disappear from the catalog every minute, the softmax assumption starts to silently misbehave. The probability mass keeps getting redistributed among the items the model happens to have seen, which is not the same set as the items it needs to recommend.</p><p>This is a much bigger deal than it sounds. Almost every metric you care about (CTR, watch time, retention) depends on the model surfacing <em>new</em> items that the user will love. If your normalization assumes a stable world, you&#8217;re starting from a broken assumption.</p>
      <p>
          <a href="https://www.mlwhiz.com/p/hstu-how-meta-built-a-trillion-parameter">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[MLWhiz Weekly Recsys/ML/GenAI Newsletter # 6]]></title><description><![CDATA[The week the AI found curl vulnerabilities and the developers discussed AI usage while coding]]></description><link>https://www.mlwhiz.com/p/mlwhiz-weekly-recsysmlgenai-newsletter</link><guid isPermaLink="false">https://www.mlwhiz.com/p/mlwhiz-weekly-recsysmlgenai-newsletter</guid><dc:creator><![CDATA[Rahul Agarwal]]></dc:creator><pubDate>Fri, 15 May 2026 00:28:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!gEXe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35d5797c-81ec-4638-8348-37d7514d6fb8_1410x804.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yHq9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" width="1456" height="250" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:250,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77210,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p><em>Hey, Rahul here! &#128075; Each week, I publish long-form ML+AI posts covering ML, AI, and System design for MLwhiz. Paid subscribers also get how-to guides with full code walkthroughs. I publish occasional extra articles. If you&#8217;d like to become a paid subscriber, here&#8217;s a button for that:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.mlwhiz.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.mlwhiz.com/subscribe?"><span>Subscribe now</span></a></p><p>I love keeping track of everything week to week &#8212; here's what happened this week. Enjoy this free weekly post! For those who want to dive deeper into any of these topics, that's what my paid posts are for.</p>
      <p>
          <a href="https://www.mlwhiz.com/p/mlwhiz-weekly-recsysmlgenai-newsletter">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[From Random IDs to Semantic IDs: Building a Generative Recommender from Scratch]]></title><description><![CDATA[How RQVAE compresses item embeddings into meaningful tokens, enabling TIGER-style generative recommendation. Full code walkthrough on Steam Games with Qwen embeddings.]]></description><link>https://www.mlwhiz.com/p/semantic-ids-rqvae-generative-recommender</link><guid isPermaLink="false">https://www.mlwhiz.com/p/semantic-ids-rqvae-generative-recommender</guid><dc:creator><![CDATA[Rahul Agarwal]]></dc:creator><pubDate>Sat, 09 May 2026 01:29:54 GMT</pubDate><enclosure url="https://images.unsplash.com/photo-1720946922856-9456abc7e582?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwzN3x8dGlnZXJ8ZW58MHx8fHwxNzc4Mjg1OTc1fDA&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yHq9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" width="1456" height="250" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:250,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77210,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p><em>Hey, Rahul here! &#128075; Each week, I publish long-form ML+AI posts covering ML, AI, and System design for MLwhiz. Paid subscribers also get how-to guides with full code walkthroughs. I publish occasional extra articles. If you&#8217;d like to become a paid subscriber, here&#8217;s a button for that:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.mlwhiz.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.mlwhiz.com/subscribe?"><span>Subscribe now</span></a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B1mx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B1mx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 424w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 848w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1272w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png" width="995" height="80" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:80,&quot;width&quot;:995,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15990,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B1mx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 424w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 848w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1272w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://images.unsplash.com/photo-1484820540004-14229fe36ca4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmxvY2tzfGVufDB8fHx8MTc3ODI5MjUwN3ww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://images.unsplash.com/photo-1484820540004-14229fe36ca4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmxvY2tzfGVufDB8fHx8MTc3ODI5MjUwN3ww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1484820540004-14229fe36ca4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmxvY2tzfGVufDB8fHx8MTc3ODI5MjUwN3ww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1484820540004-14229fe36ca4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmxvY2tzfGVufDB8fHx8MTc3ODI5MjUwN3ww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1484820540004-14229fe36ca4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmxvY2tzfGVufDB8fHx8MTc3ODI5MjUwN3ww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"><img src="https://images.unsplash.com/photo-1484820540004-14229fe36ca4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmxvY2tzfGVufDB8fHx8MTc3ODI5MjUwN3ww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080" width="3840" height="5760" data-attrs="{&quot;src&quot;:&quot;https://images.unsplash.com/photo-1484820540004-14229fe36ca4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmxvY2tzfGVufDB8fHx8MTc3ODI5MjUwN3ww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:5760,&quot;width&quot;:3840,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;child building an four boxes&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="child building an four boxes" title="child building an four boxes" srcset="https://images.unsplash.com/photo-1484820540004-14229fe36ca4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmxvY2tzfGVufDB8fHx8MTc3ODI5MjUwN3ww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 424w, https://images.unsplash.com/photo-1484820540004-14229fe36ca4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmxvY2tzfGVufDB8fHx8MTc3ODI5MjUwN3ww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 848w, https://images.unsplash.com/photo-1484820540004-14229fe36ca4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmxvY2tzfGVufDB8fHx8MTc3ODI5MjUwN3ww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1272w, https://images.unsplash.com/photo-1484820540004-14229fe36ca4?crop=entropy&amp;cs=tinysrgb&amp;fit=max&amp;fm=jpg&amp;ixid=M3wzMDAzMzh8MHwxfHNlYXJjaHwyN3x8YmxvY2tzfGVufDB8fHx8MTc3ODI5MjUwN3ww&amp;ixlib=rb-4.1.0&amp;q=80&amp;w=1080 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@markusspiske">Markus Spiske</a> on <a href="https://unsplash.com">Unsplash</a></figcaption></figure></div><p>At the end of <a href="https://www.mlwhiz.com/p/rnns-to-transformers-sequential-recommenders">Part 1</a>, I left you with a question: <em><strong>What if items had semantic identifiers that captured their content?</strong></em></p><p>That question sounds innocent enough, but it&#8217;s actually the hinge point for the entire generative recommender revolution we are going to be talking about in this post. And it is very interesting to say the least.</p><p>So, we built GRU4Rec and SASRec on the Steam Games dataset in our previous post. But both those models treat items as arbitrary integers. The model sees &#8220;Item 4,271 &#8594; Item 8,903 &#8594; Item 2,156&#8221; and learns some sort of statistical patterns between these numbers. </p><p>The thing we need to note is that everything about what these games actually <em>are</em> &#8212; their genre, their developer, their visual style, the <em>reason</em> a player moves from one to the next &#8212; lives entirely outside the model. That is a pretty big opportunity to work on.</p><p>Now imagine if every item carried an ID that <em>meant</em> something. Similar games naturally share ID prefixes. <em><strong>A brand-new title gets a meaningful ID the moment it&#8217;s published &#8212; before anyone plays it.</strong></em> The recommender can reason about games it&#8217;s never seen just from their ID, just like a human would by seeing a game&#8217;s description or title.</p><p>That&#8217;s <strong>Semantic IDs</strong>.</p><p>And these semantic IDs unlock something bigger &#8594;instead of scoring candidates from a retrieved shortlist(which is how SASRec and GRU4Rec work), a model can now <em>generate</em> the next item token by token, the way GPT generates words. </p><p>In this post, we will create a complete pipeline to do exactly that on the same Steam dataset, every game compressed into meaningful tokens using <strong>Residual Quantized VAE</strong>, and a generative recommender(<strong>TIGER</strong>) trained from scratch.</p><p>Let&#8217;s dive in!</p><div><hr></div><h2>1. The Problem with Random Item IDs</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4hDh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9a79638-15c2-4cf2-9cf5-5578aebe55e8_2956x1084.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4hDh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9a79638-15c2-4cf2-9cf5-5578aebe55e8_2956x1084.png 424w, https://substackcdn.com/image/fetch/$s_!4hDh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9a79638-15c2-4cf2-9cf5-5578aebe55e8_2956x1084.png 848w, https://substackcdn.com/image/fetch/$s_!4hDh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9a79638-15c2-4cf2-9cf5-5578aebe55e8_2956x1084.png 1272w, https://substackcdn.com/image/fetch/$s_!4hDh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9a79638-15c2-4cf2-9cf5-5578aebe55e8_2956x1084.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4hDh!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9a79638-15c2-4cf2-9cf5-5578aebe55e8_2956x1084.png" width="1200" height="440.1098901098901" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e9a79638-15c2-4cf2-9cf5-5578aebe55e8_2956x1084.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:534,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:295213,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.mlwhiz.com/i/196328148?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9a79638-15c2-4cf2-9cf5-5578aebe55e8_2956x1084.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4hDh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9a79638-15c2-4cf2-9cf5-5578aebe55e8_2956x1084.png 424w, https://substackcdn.com/image/fetch/$s_!4hDh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9a79638-15c2-4cf2-9cf5-5578aebe55e8_2956x1084.png 848w, https://substackcdn.com/image/fetch/$s_!4hDh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9a79638-15c2-4cf2-9cf5-5578aebe55e8_2956x1084.png 1272w, https://substackcdn.com/image/fetch/$s_!4hDh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9a79638-15c2-4cf2-9cf5-5578aebe55e8_2956x1084.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Just think about how SASRec works for a second - A user plays [Counter-Strike, Portal, Half-Life 2]. The model looks up the embedding for each game, runs self-attention, and predicts the next game. So far, so good.</p><p>But what does the model actually <em>know</em> about Counter-Strike? Nothing. It&#8217;s just item ID 4,271. The model has no idea it&#8217;s a first-person shooter made by Valve in 2004. </p><p>This creates three problems that compound in production:</p><ol><li><p><strong>No knowledge sharing.</strong> If 10,000 users play Counter-Strike &#8594; Team Fortress 2, the model learns that specific transition. But it learns <em>nothing</em> about why they&#8217;re related. A new Valve FPS arrives tomorrow, and the model has zero signal for it &#8212; even though any human could tell you &#8220;people who like Counter-Strike would probably like this.&#8221;</p></li><li><p><strong>Cold-start is brutal.</strong> New items have brand-new IDs with randomly initialized embeddings. As I covered in my <a href="https://www.mlwhiz.com/p/cold-start-problem-recsys-modern-approaches">post on the cold-start problem</a>, this means they need thousands of interactions before the model can meaningfully recommend them. In fast-moving catalogs &#8212; think news, short videos, new game releases &#8212; items can go stale before the model even learns to recommend them.</p></li><li><p><strong>No generalization.</strong> The model memorizes specific ID&#8594;ID transitions. It can&#8217;t reason about <em>categories</em> of items, <em>properties</em> of items, or <em>relationships</em> between items.</p></li></ol><p>And that&#8217;s where semantic IDs could help us.</p><div><hr></div><h2>2. Semantic IDs: Making Item IDs mean Something</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CCZk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e2e8800-bf65-4ac1-b64a-079c1b138c10_1610x680.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CCZk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e2e8800-bf65-4ac1-b64a-079c1b138c10_1610x680.png 424w, https://substackcdn.com/image/fetch/$s_!CCZk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e2e8800-bf65-4ac1-b64a-079c1b138c10_1610x680.png 848w, https://substackcdn.com/image/fetch/$s_!CCZk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e2e8800-bf65-4ac1-b64a-079c1b138c10_1610x680.png 1272w, https://substackcdn.com/image/fetch/$s_!CCZk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e2e8800-bf65-4ac1-b64a-079c1b138c10_1610x680.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CCZk!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e2e8800-bf65-4ac1-b64a-079c1b138c10_1610x680.png" width="1200" height="506.86813186813185" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5e2e8800-bf65-4ac1-b64a-079c1b138c10_1610x680.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:615,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:174157,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.mlwhiz.com/i/196328148?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e2e8800-bf65-4ac1-b64a-079c1b138c10_1610x680.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CCZk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e2e8800-bf65-4ac1-b64a-079c1b138c10_1610x680.png 424w, https://substackcdn.com/image/fetch/$s_!CCZk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e2e8800-bf65-4ac1-b64a-079c1b138c10_1610x680.png 848w, https://substackcdn.com/image/fetch/$s_!CCZk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e2e8800-bf65-4ac1-b64a-079c1b138c10_1610x680.png 1272w, https://substackcdn.com/image/fetch/$s_!CCZk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e2e8800-bf65-4ac1-b64a-079c1b138c10_1610x680.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Ok, so now we understand why Semantic IDs might be beneficial, let&#8217;s see how they are made. The idea behind <strong>Semantic IDs</strong> was introduced in <a href="https://papers.neurips.cc/paper_files/paper/2023/file/20dcab0f14046a5c6b02b61da9f13229-Paper-Conference.pdf">TIGER</a> (Transformer Index for Generative Recommenders) at NeurIPS 2023, and it completely changed how we think about item representation in recommender systems.</p><p>The pipeline works in three steps. Here&#8217;s what happens to a single game:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ah85!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84374c63-fea3-4005-972f-b6376ddbea4e_1766x608.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ah85!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84374c63-fea3-4005-972f-b6376ddbea4e_1766x608.png 424w, https://substackcdn.com/image/fetch/$s_!ah85!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84374c63-fea3-4005-972f-b6376ddbea4e_1766x608.png 848w, https://substackcdn.com/image/fetch/$s_!ah85!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84374c63-fea3-4005-972f-b6376ddbea4e_1766x608.png 1272w, https://substackcdn.com/image/fetch/$s_!ah85!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84374c63-fea3-4005-972f-b6376ddbea4e_1766x608.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ah85!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84374c63-fea3-4005-972f-b6376ddbea4e_1766x608.png" width="1200" height="412.9120879120879" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/84374c63-fea3-4005-972f-b6376ddbea4e_1766x608.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:501,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:167701,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.mlwhiz.com/i/196328148?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84374c63-fea3-4005-972f-b6376ddbea4e_1766x608.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ah85!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84374c63-fea3-4005-972f-b6376ddbea4e_1766x608.png 424w, https://substackcdn.com/image/fetch/$s_!ah85!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84374c63-fea3-4005-972f-b6376ddbea4e_1766x608.png 848w, https://substackcdn.com/image/fetch/$s_!ah85!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84374c63-fea3-4005-972f-b6376ddbea4e_1766x608.png 1272w, https://substackcdn.com/image/fetch/$s_!ah85!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84374c63-fea3-4005-972f-b6376ddbea4e_1766x608.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In step 1, we start by getting a content-based embedding of an item using a powerful text encoder. </p><p>Once we have that, we run the RQVAE algorithm in step 2 to generate Semantic IDs of items. (<em>Don&#8217;t worry, we will talk about this in this post. For now, just understand that it gives you some sort of discrete token-based vector given your item embeddings</em>)</p><p>In Step 3, we <em>generate</em> the next item&#8217;s Semantic ID token by token, the same way GPT writes one word at a time. It&#8217;s not scoring a fixed list; it&#8217;s constructing an answer.</p><p>Now think about what this structure gives you. Counter-Strike and Half-Life might both get the prefix [42, 187, ...] because they&#8217;re both Valve FPS games. A brand-new Valve FPS that launched this morning might then be assigned a similar prefix based purely on its content &#8212; and the model already knows what to do with items that start with [42, 187, ...], even though it&#8217;s never seen a single player interact with this game.</p><p>Remember how in the cold start <a href="https://www.mlwhiz.com/p/cold-start-problem-recsys-modern-approaches">post I wrote</a> about how new items have nothing but their metadata to work with? With Semantic IDs, the metadata <em>becomes their</em> ID. The model doesn&#8217;t need thousands of interactions to figure out what a new game is &#8212; it already knows, just from reading the ID.</p><p>And notice what else changed: we replaced the entire <a href="https://www.mlwhiz.com/p/the-recommendation-engine-under-the">retrieve&#8594;rank&#8594;rerank pipeline</a> with a single model that generates recommendations directly. At inference time, the decoder uses <strong>beam search</strong>: instead of greedily committing to one token at each level and hoping for the best, it keeps the top B candidates at each step and extends them all in parallel. You end up with a ranked list of complete Semantic IDs, each mapping to a real item. That means you don&#8217;t have to worry about maintaining an ANN index, a candidate retrieval stage, and a separate ranker stage. You can get all of this from a single model. </p><p>Now, the magic of this pipeline lives entirely in Step 2 &#8212; the RQVAE. That is how we create these semantic IDs. And if  the quantization is bad, the Semantic IDs are meaningless, and we&#8217;re back to square one. So let&#8217;s understand exactly how it works.</p><div><hr></div><h2>3. How RQVAE Works &#8212; The Engine Behind Semantic IDs</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bn71!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c4277d4-3d19-47ae-bfba-569aa203f16b_1620x974.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bn71!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c4277d4-3d19-47ae-bfba-569aa203f16b_1620x974.png 424w, https://substackcdn.com/image/fetch/$s_!bn71!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c4277d4-3d19-47ae-bfba-569aa203f16b_1620x974.png 848w, https://substackcdn.com/image/fetch/$s_!bn71!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c4277d4-3d19-47ae-bfba-569aa203f16b_1620x974.png 1272w, https://substackcdn.com/image/fetch/$s_!bn71!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c4277d4-3d19-47ae-bfba-569aa203f16b_1620x974.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bn71!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c4277d4-3d19-47ae-bfba-569aa203f16b_1620x974.png" width="1200" height="721.1538461538462" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1c4277d4-3d19-47ae-bfba-569aa203f16b_1620x974.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:875,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:238280,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.mlwhiz.com/i/196328148?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c4277d4-3d19-47ae-bfba-569aa203f16b_1620x974.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bn71!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c4277d4-3d19-47ae-bfba-569aa203f16b_1620x974.png 424w, https://substackcdn.com/image/fetch/$s_!bn71!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c4277d4-3d19-47ae-bfba-569aa203f16b_1620x974.png 848w, https://substackcdn.com/image/fetch/$s_!bn71!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c4277d4-3d19-47ae-bfba-569aa203f16b_1620x974.png 1272w, https://substackcdn.com/image/fetch/$s_!bn71!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c4277d4-3d19-47ae-bfba-569aa203f16b_1620x974.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Residual Quantized VAE</strong> sounds intimidating, but the intuition is straightforward. It&#8217;s progressive compression &#8212; like describing a location with increasing precision.</p><p>&#8220;North America&#8221; tells you the continent. &#8220;California&#8221; narrows it down. &#8220;San Francisco&#8221; pins it to a city. Each level adds detail that the previous level didn&#8217;t capture. RQVAE does exactly this, but for item embeddings.</p><p>Here&#8217;s the algorithm. </p><p><strong>Step 1: Encode.</strong> Take the 1024-dim item embedding and compress it through an encoder network (1024 &#8594; 512 &#8594; 256 &#8594; 128 &#8594; 32). It&#8217;s easier to follow with concrete numbers, so let&#8217;s walk through a simplified example &#8212; we will use 4 dimensions instead of 32, but the math is identical.</p><pre><code>Encoded: [0.8, -0.3, 0.5, 0.1]</code></pre><p><strong>Step 2: Level 1 Quantization.</strong> You have a <em>codebook</em> &#8212; a table of 256 learned vectors. Find the one closest to your latent vector:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!g4Y_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc72486f2-5c73-41d6-a890-8a991fd55605_950x422.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!g4Y_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc72486f2-5c73-41d6-a890-8a991fd55605_950x422.png 424w, https://substackcdn.com/image/fetch/$s_!g4Y_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc72486f2-5c73-41d6-a890-8a991fd55605_950x422.png 848w, https://substackcdn.com/image/fetch/$s_!g4Y_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc72486f2-5c73-41d6-a890-8a991fd55605_950x422.png 1272w, https://substackcdn.com/image/fetch/$s_!g4Y_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc72486f2-5c73-41d6-a890-8a991fd55605_950x422.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!g4Y_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc72486f2-5c73-41d6-a890-8a991fd55605_950x422.png" width="950" height="422" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c72486f2-5c73-41d6-a890-8a991fd55605_950x422.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:422,&quot;width&quot;:950,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:54806,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.mlwhiz.com/i/196328148?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc72486f2-5c73-41d6-a890-8a991fd55605_950x422.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!g4Y_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc72486f2-5c73-41d6-a890-8a991fd55605_950x422.png 424w, https://substackcdn.com/image/fetch/$s_!g4Y_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc72486f2-5c73-41d6-a890-8a991fd55605_950x422.png 848w, https://substackcdn.com/image/fetch/$s_!g4Y_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc72486f2-5c73-41d6-a890-8a991fd55605_950x422.png 1272w, https://substackcdn.com/image/fetch/$s_!g4Y_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc72486f2-5c73-41d6-a890-8a991fd55605_950x422.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div>
      <p>
          <a href="https://www.mlwhiz.com/p/semantic-ids-rqvae-generative-recommender">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[MLWhiz Weekly AI/ML/Recsys Newsletter # 5]]></title><description><![CDATA[The week the AI industry's partnership era ended &#8212; and the consulting era began.]]></description><link>https://www.mlwhiz.com/p/mlwhiz-weekly-aimlrecsys-newsletter</link><guid isPermaLink="false">https://www.mlwhiz.com/p/mlwhiz-weekly-aimlrecsys-newsletter</guid><dc:creator><![CDATA[Rahul Agarwal]]></dc:creator><pubDate>Wed, 06 May 2026 23:12:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!8F8b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b071d-ae1a-4496-bd11-17cf6da41fe5_1410x804.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8F8b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b071d-ae1a-4496-bd11-17cf6da41fe5_1410x804.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8F8b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b071d-ae1a-4496-bd11-17cf6da41fe5_1410x804.png 424w, https://substackcdn.com/image/fetch/$s_!8F8b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b071d-ae1a-4496-bd11-17cf6da41fe5_1410x804.png 848w, https://substackcdn.com/image/fetch/$s_!8F8b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b071d-ae1a-4496-bd11-17cf6da41fe5_1410x804.png 1272w, https://substackcdn.com/image/fetch/$s_!8F8b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b071d-ae1a-4496-bd11-17cf6da41fe5_1410x804.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8F8b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b071d-ae1a-4496-bd11-17cf6da41fe5_1410x804.png" width="1410" height="804" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e95b071d-ae1a-4496-bd11-17cf6da41fe5_1410x804.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:804,&quot;width&quot;:1410,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:98955,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.mlwhiz.com/i/196716889?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b071d-ae1a-4496-bd11-17cf6da41fe5_1410x804.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8F8b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b071d-ae1a-4496-bd11-17cf6da41fe5_1410x804.png 424w, https://substackcdn.com/image/fetch/$s_!8F8b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b071d-ae1a-4496-bd11-17cf6da41fe5_1410x804.png 848w, https://substackcdn.com/image/fetch/$s_!8F8b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b071d-ae1a-4496-bd11-17cf6da41fe5_1410x804.png 1272w, https://substackcdn.com/image/fetch/$s_!8F8b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b071d-ae1a-4496-bd11-17cf6da41fe5_1410x804.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Story of the Week: The Great Unbundling</h2><p>The Microsoft&#8211;OpenAI <a href="https://www.theverge.com/ai-artificial-intelligence/918981/openai-microsoft-renegotiate-contract">partnership</a> came apart Sunday night. Revenue sharing, the AGI clause, and exclusivity all dissolved in a single restructuring. </p><p>Twenty-four hours later, Sam Altman and AWS CEO Matt Garman were on Stratechery announcing OpenAI models on Amazon Bedrock. The speed tells you the deal was pre-negotiated; only the Microsoft contract was holding it back.</p><p><strong>The practical change for ML teams:</strong> GPT models now run natively on Bedrock alongside Claude, Llama, and Mistral. If you standardized on AWS but routed around for OpenAI access, that workaround is gone. <em><strong>If you picked Azure specifically for OpenAI, you have a real alternative for the first time.</strong></em> Multi-cloud routing for frontier models is table stakes now &#8212; expect inference prices to drop and cross-cloud model-parity to start mattering in vendor evaluations.</p><p>Then both labs made a stranger move. Anthropic and OpenAI launched enterprise consulting arms on the same day, aimed at different mar&#8230;</p>
      <p>
          <a href="https://www.mlwhiz.com/p/mlwhiz-weekly-aimlrecsys-newsletter">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Claude Code vs. Your ML Career: A 2026 Reality Check]]></title><description><![CDATA[The ML job market didn't die. It split into two &#8212; here's how to land on the right side.]]></description><link>https://www.mlwhiz.com/p/will-claude-code-take-your-ml-job</link><guid isPermaLink="false">https://www.mlwhiz.com/p/will-claude-code-take-your-ml-job</guid><dc:creator><![CDATA[Rahul Agarwal]]></dc:creator><pubDate>Tue, 28 Apr 2026 22:18:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!F1la!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a70a431-9858-4e27-ba05-099cc33a75f1_1200x720.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yHq9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" width="1456" height="250" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:250,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77210,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p><em>Hey, Rahul here! &#128075; Each week, I publish long-form ML+AI posts covering ML, AI, and System design for MLwhiz. Paid subscribers also get how-to guides with full code walkthroughs. I publish occasional extra articles. If you&#8217;d like to become a paid subscriber, here&#8217;s a button for that:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.mlwhiz.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.mlwhiz.com/subscribe?"><span>Subscribe now</span></a></p>
      <p>
          <a href="https://www.mlwhiz.com/p/will-claude-code-take-your-ml-job">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[The Most Complete Guide to PyTorch for Data Scientists]]></title><description><![CDATA[Pytorch is OG]]></description><link>https://www.mlwhiz.com/p/pytorch_guide</link><guid isPermaLink="false">https://www.mlwhiz.com/p/pytorch_guide</guid><dc:creator><![CDATA[Rahul Agarwal]]></dc:creator><pubDate>Mon, 27 Apr 2026 00:00:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AJst!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06047a8f-c4f0-4e72-a492-a82df82e196e_1920x1280.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AJst!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06047a8f-c4f0-4e72-a492-a82df82e196e_1920x1280.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AJst!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06047a8f-c4f0-4e72-a492-a82df82e196e_1920x1280.png 424w, https://substackcdn.com/image/fetch/$s_!AJst!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06047a8f-c4f0-4e72-a492-a82df82e196e_1920x1280.png 848w, https://substackcdn.com/image/fetch/$s_!AJst!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06047a8f-c4f0-4e72-a492-a82df82e196e_1920x1280.png 1272w, https://substackcdn.com/image/fetch/$s_!AJst!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06047a8f-c4f0-4e72-a492-a82df82e196e_1920x1280.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AJst!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06047a8f-c4f0-4e72-a492-a82df82e196e_1920x1280.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06047a8f-c4f0-4e72-a492-a82df82e196e_1920x1280.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;The Most Complete Guide to PyTorch for Data Scientists&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The Most Complete Guide to PyTorch for Data Scientists" title="The Most Complete Guide to PyTorch for Data Scientists" srcset="https://substackcdn.com/image/fetch/$s_!AJst!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06047a8f-c4f0-4e72-a492-a82df82e196e_1920x1280.png 424w, https://substackcdn.com/image/fetch/$s_!AJst!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06047a8f-c4f0-4e72-a492-a82df82e196e_1920x1280.png 848w, https://substackcdn.com/image/fetch/$s_!AJst!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06047a8f-c4f0-4e72-a492-a82df82e196e_1920x1280.png 1272w, https://substackcdn.com/image/fetch/$s_!AJst!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06047a8f-c4f0-4e72-a492-a82df82e196e_1920x1280.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em><strong>PyTorch</strong></em> has sort of became one of the de facto standards for creating Neural Networks now, and I love its interface. Yet, it is somehow a little difficult for beginners to get a hold of.</p><p>I remember picking PyTorch up only after some extensive experimentation a couple of years back. To tell you the truth, it took me a lot of time to pick it up but am I glad that I moved from <strong><a href="https://towardsdatascience.com/moving-from-keras-to-pytorch-f0d4fff4ce79">Keras to PyTorch</a></strong> . With its high customizability and pythonic syntax,PyTorch is just a joy to work with, and I would recommend it to anyone who wants to do some heavy lifting with Deep Learning.</p><p>So, in this PyTorch guide, <em><strong>I will try to ease some of the pain with PyTorch for starters</strong></em> and go through some of the most important classes and modules that you will require while creating any Neural Network with Pytorch.</p><p>But, that is not to say that this is aimed at beginners only as <em><strong>I will also talk about the</strong></em> <em><strong>high customizability PyTorch provides and will talk about custom Layers, Datasets, Dataloaders, and Loss functions</strong></em>.</p><p>So let&#8217;s get some coffee &#9749; &#65039;and start it up.</p><div><hr></div><h2><strong>Tensors</strong></h2><p>Tensors are the basic building blocks in PyTorch and put very simply, they are NumPy arrays but on GPU. In this part, I will list down some of the most used operations we can use while working with Tensors. This is by no means an exhaustive list of operations you can do with Tensors, but it is helpful to understand what tensors are before going towards the more exciting parts.</p><h3><strong>1. Create a Tensor</strong></h3><p>We can create a PyTorch tensor in multiple ways. This includes converting to tensor from a NumPy array. Below is just a small gist with some examples to start with, but you can do a whole lot of <strong><a href="https://pytorch.org/docs/stable/tensors.html">more things</a></strong> with tensors just like you can do with NumPy arrays.</p><pre><code><code># Using torch.Tensor
t = torch.Tensor([[1,2,3],[3,4,5]])
print(f"Created Tensor Using torch.Tensor:\n{t}")

# Using torch.randn
t = torch.randn(3, 5)
print(f"Created Tensor Using torch.randn:\n{t}")

# using torch.[ones|zeros](*size)
t = torch.ones(3, 5)
print(f"Created Tensor Using torch.ones:\n{t}")
t = torch.zeros(3, 5)
print(f"Created Tensor Using torch.zeros:\n{t}")

# using torch.randint - a tensor of size 4,5 with entries between 0 and 10(excluded)
t = torch.randint(low = 0,high = 10,size = (4,5))
print(f"Created Tensor Using torch.randint:\n{t}")

# Using from_numpy to convert from Numpy Array to Tensor
a = np.array([[1,2,3],[3,4,5]])
t = torch.from_numpy(a)
print(f"Convert to Tensor From Numpy Array:\n{t}")

# Using .numpy() to convert from Tensor to Numpy array
t = t.numpy()
print(f"Convert to Numpy Array From Tensor:\n{t}")
</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yFko!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df02c37-6da2-43cc-b2c1-4e84534dab60_1216x561.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yFko!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df02c37-6da2-43cc-b2c1-4e84534dab60_1216x561.png 424w, https://substackcdn.com/image/fetch/$s_!yFko!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df02c37-6da2-43cc-b2c1-4e84534dab60_1216x561.png 848w, https://substackcdn.com/image/fetch/$s_!yFko!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df02c37-6da2-43cc-b2c1-4e84534dab60_1216x561.png 1272w, https://substackcdn.com/image/fetch/$s_!yFko!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df02c37-6da2-43cc-b2c1-4e84534dab60_1216x561.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yFko!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df02c37-6da2-43cc-b2c1-4e84534dab60_1216x561.png" width="1216" height="561" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7df02c37-6da2-43cc-b2c1-4e84534dab60_1216x561.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:561,&quot;width&quot;:1216,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;MLWhiz: Data Science, Machine Learning, Artificial Intelligence&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="MLWhiz: Data Science, Machine Learning, Artificial Intelligence" title="MLWhiz: Data Science, Machine Learning, Artificial Intelligence" srcset="https://substackcdn.com/image/fetch/$s_!yFko!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df02c37-6da2-43cc-b2c1-4e84534dab60_1216x561.png 424w, https://substackcdn.com/image/fetch/$s_!yFko!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df02c37-6da2-43cc-b2c1-4e84534dab60_1216x561.png 848w, https://substackcdn.com/image/fetch/$s_!yFko!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df02c37-6da2-43cc-b2c1-4e84534dab60_1216x561.png 1272w, https://substackcdn.com/image/fetch/$s_!yFko!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7df02c37-6da2-43cc-b2c1-4e84534dab60_1216x561.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>2. Tensor Operations</strong></h3><p>Again, there are a lot of operations you can do on these tensors. The full list of functions can be found <strong><a href="https://pytorch.org/docs/stable/torch.html?highlight=mm#math-operations">here</a></strong> .</p><pre><code><code>A = torch.randn(3,4)
W = torch.randn(4,2)
# Multiply Matrix A and W
t = A.mm(W)
print(f"Created Tensor t by Multiplying A and W:\n{t}")
# Transpose Tensor t
t = t.t()
print(f"Transpose of Tensor t:\n{t}")
# Square each element of t
t = t**2
print(f"Square each element of Tensor t:\n{t}")
# return the size of a tensor
print(f"Size of Tensor t using .size():\n{t.size()}")
</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!r_xE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43c973b3-9646-49f1-860d-059a2ead7930_1216x264.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!r_xE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43c973b3-9646-49f1-860d-059a2ead7930_1216x264.png 424w, https://substackcdn.com/image/fetch/$s_!r_xE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43c973b3-9646-49f1-860d-059a2ead7930_1216x264.png 848w, https://substackcdn.com/image/fetch/$s_!r_xE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43c973b3-9646-49f1-860d-059a2ead7930_1216x264.png 1272w, https://substackcdn.com/image/fetch/$s_!r_xE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43c973b3-9646-49f1-860d-059a2ead7930_1216x264.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!r_xE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43c973b3-9646-49f1-860d-059a2ead7930_1216x264.png" width="1216" height="264" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/43c973b3-9646-49f1-860d-059a2ead7930_1216x264.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:264,&quot;width&quot;:1216,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;MLWhiz: Data Science, Machine Learning, Artificial Intelligence&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="MLWhiz: Data Science, Machine Learning, Artificial Intelligence" title="MLWhiz: Data Science, Machine Learning, Artificial Intelligence" srcset="https://substackcdn.com/image/fetch/$s_!r_xE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43c973b3-9646-49f1-860d-059a2ead7930_1216x264.png 424w, https://substackcdn.com/image/fetch/$s_!r_xE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43c973b3-9646-49f1-860d-059a2ead7930_1216x264.png 848w, https://substackcdn.com/image/fetch/$s_!r_xE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43c973b3-9646-49f1-860d-059a2ead7930_1216x264.png 1272w, https://substackcdn.com/image/fetch/$s_!r_xE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43c973b3-9646-49f1-860d-059a2ead7930_1216x264.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><strong>Note:</strong> What are PyTorch Variables? In the previous versions of Pytorch, Tensor and Variables used to be different and provided different functionality, but now the Variable API is <strong><a href="https://pytorch.org/docs/stable/autograd.html#variable-deprecated">deprecated</a></strong> , and all methods for variables work with Tensors. So, if you don&#8217;t know about them, it&#8217;s fine as they re not needed, and if you know them, you can forget about them.</p><div><hr></div><h2><strong>The nn.Module</strong></h2><p>Here comes the fun part as we are now going to talk about some of the most used constructs in Pytorch while creating deep learning projects. nn.Module lets you create your Deep Learning models as a class. You can inherit from nn.Moduleto define any model as a class. Every model class necessarily contains an<code> __init__</code> procedure block and a block for the <code>forward</code> pass.</p><ul><li><p>In the <code>__init__</code> part, the user can define all the layers the network is going to have but doesn&#8217;t yet define how those layers would be connected to each other.</p></li><li><p>In the <code>forward</code> pass block, the user defines how data flows from one layer to another inside the network.</p></li></ul><p>So, put simply, any network we define will look like:</p><pre><code><code>class myNeuralNet(nn.Module):
    def __init__(self):
        super().__init__()
        # Define all Layers Here
        self.lin1 = nn.Linear(784, 30)
        self.lin2 = nn.Linear(30, 10)
    def forward(self, x):
        # Connect the layer Outputs here to define the forward pass
        x = self.lin1(x)
        x = self.lin2(x)
        return x
</code></code></pre><p>Here we have defined a very simple Network that takes an input of size 784 and passes it through two linear layers in a sequential manner. But the thing to note is that we can define any sort of calculation while defining the forward pass, and that makes PyTorch highly customizable for research purposes. For example, in our crazy experimentation mode, we might have used the below network where we arbitrarily attach our layers. Here we send back the output from the second linear layer back again to the first one after adding the input to it(skip connection) back again(I honestly don&#8217;t know what that will do).</p><pre><code><code>class myCrazyNeuralNet(nn.Module):
    def __init__(self):
        super().__init__()
        # Define all Layers Here
        self.lin1 = nn.Linear(784, 30)
        self.lin2 = nn.Linear(30, 784)
        self.lin3 = nn.Linear(30, 10)

    def forward(self, x):
        # Connect the layer Outputs here to define the forward pass
        x_lin1 = self.lin1(x)
        x_lin2 = x + self.lin2(x_lin1)
        x_lin2 = self.lin1(x_lin2)
        x = self.lin3(x_lin2)
        return x
</code></code></pre><p>We can also check if the neural network forward pass works. I usually do that by first creating some random input and just passing that through the network I have created.</p><pre><code><code>x = torch.randn((100,784))
model = myCrazyNeuralNet()
model(x).size()
--------------------------
torch.Size([100, 10])
</code></code></pre><div><hr></div><h2><strong>A word about Layers</strong></h2><p>Pytorch is pretty powerful, and you can actually create any new experimental layer by yourself using <code>nn.Module</code>. For example, rather than using the predefined Linear Layer <code>nn.Linear</code> from Pytorch above, we could have created our <strong>custom linear layer</strong>.</p><pre><code><code>class myCustomLinearLayer(nn.Module):
    def __init__(self,in_size,out_size):
        super().__init__()
        self.weights = nn.Parameter(torch.randn(in_size, out_size))
        self.bias = nn.Parameter(torch.zeros(out_size))
    def forward(self, x):
        return x.mm(self.weights) + self.bias
</code></code></pre><p>You can see how we wrap our weights tensor in nn.Parameter. This is done to make the tensor to be considered as a model parameter. From PyTorch <strong><a href="https://pytorch.org/docs/stable/generated/torch.nn.parameter.Parameter.html#parameter">docs</a></strong> :</p><blockquote><p><em>Parameters are <strong><a href="https://pytorch.org/docs/stable/tensors.html#torch.Tensor">&lt;code&gt;*Tensor*&lt;/code&gt;</a></strong> subclasses, that have a very special property when used with Module - when they&#8217;re assigned as Module attributes they are automatically added to the list of its parameters, and will appear in </em><code>parameters()</code><em> iterator</em></p></blockquote><p>As you will later see, the <code>model.parameters()</code> iterator will be an input to the optimizer. But more on that later.</p><p>Right now, we can now use this custom layer in any PyTorch network, just like any other layer.</p><pre><code><code>class myCustomNeuralNet(nn.Module):
    def __init__(self):
        super().__init__()
        # Define all Layers Here
        self.lin1 = myCustomLinearLayer(784,10)

    def forward(self, x):
        # Connect the layer Outputs here to define the forward pass
        x = self.lin1(x)
        return x
x = torch.randn((100,784))
model = myCustomNeuralNet()
model(x).size()
------------------------------------------
torch.Size([100, 10])
</code></code></pre><p>But then again, Pytorch would not be so widely used if it didn&#8217;t provide a lot of ready to made layers used very frequently in wide varieties of Neural Network architectures. Some examples are: <strong><a href="https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear">nn.Linear</a></strong> , <strong><a href="https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html#torch.nn.Conv2d">nn.Conv2d</a></strong> , <strong><a href="https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html#torch.nn.MaxPool2d">nn.MaxPool2d</a></strong> , <strong><a href="https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html#torch.nn.ReLU">nn.ReLU</a></strong> , <strong><a href="https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html#torch.nn.BatchNorm2d">nn.BatchNorm2d</a></strong> , <strong><a href="https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html#torch.nn.Dropout">nn.Dropout</a></strong> , <strong><a href="https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html#torch.nn.Embedding">nn.Embedding</a></strong> , <strong><a href="https://pytorch.org/docs/stable/generated/torch.nn.GRU.html#torch.nn.GRU">nn.GRU</a></strong> / <strong><a href="https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html#torch.nn.LSTM">nn.LSTM</a></strong> , <strong><a href="https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html#torch.nn.Softmax">nn.Softmax</a></strong> , <strong><a href="https://pytorch.org/docs/stable/generated/torch.nn.LogSoftmax.html#torch.nn.LogSoftmax">nn.LogSoftmax</a></strong> , <strong><a href="https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html#torch.nn.MultiheadAttention">nn.MultiheadAttention</a></strong> , <strong><a href="https://pytorch.org/docs/stable/generated/torch.nn.TransformerEncoder.html#torch.nn.TransformerEncoder">nn.TransformerEncoder</a></strong> , <strong><a href="https://pytorch.org/docs/stable/generated/torch.nn.TransformerDecoder.html#torch.nn.TransformerDecoder">nn.TransformerDecoder</a></strong></p><p>I have linked all the layers to their source where you could read all about them, but to show how I usually try to understand a layer and read the docs, I would try to look at a very simple convolutional layer here.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Lie5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd35a99de-56c3-4811-981f-58eb15c6c82f_1462x515.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Lie5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd35a99de-56c3-4811-981f-58eb15c6c82f_1462x515.png 424w, https://substackcdn.com/image/fetch/$s_!Lie5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd35a99de-56c3-4811-981f-58eb15c6c82f_1462x515.png 848w, https://substackcdn.com/image/fetch/$s_!Lie5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd35a99de-56c3-4811-981f-58eb15c6c82f_1462x515.png 1272w, https://substackcdn.com/image/fetch/$s_!Lie5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd35a99de-56c3-4811-981f-58eb15c6c82f_1462x515.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Lie5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd35a99de-56c3-4811-981f-58eb15c6c82f_1462x515.png" width="1456" height="513" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d35a99de-56c3-4811-981f-58eb15c6c82f_1462x515.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:513,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;MLWhiz: Data Science, Machine Learning, Artificial Intelligence&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="MLWhiz: Data Science, Machine Learning, Artificial Intelligence" title="MLWhiz: Data Science, Machine Learning, Artificial Intelligence" srcset="https://substackcdn.com/image/fetch/$s_!Lie5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd35a99de-56c3-4811-981f-58eb15c6c82f_1462x515.png 424w, https://substackcdn.com/image/fetch/$s_!Lie5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd35a99de-56c3-4811-981f-58eb15c6c82f_1462x515.png 848w, https://substackcdn.com/image/fetch/$s_!Lie5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd35a99de-56c3-4811-981f-58eb15c6c82f_1462x515.png 1272w, https://substackcdn.com/image/fetch/$s_!Lie5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd35a99de-56c3-4811-981f-58eb15c6c82f_1462x515.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So, a Conv2d Layer needs as input an Image of height H and width W, with <code>Cin</code> channels. Now, for the first layer in a convnet, the number of <code>in_channels</code> would be 3(RGB), and the number of <code>out_channels</code> can be defined by the user. The <code>kernel_size</code> mostly used is 3x3, and the <code>stride</code> normally used is 1.</p><p>To check a new layer which I don&#8217;t know much about, I usually try to see the input as well as output for the layer like below where I would first initialize the layer:</p><pre><code><code>conv_layer = nn.Conv2d(in_channels = 3, out_channels = 64, kernel_size = (3,3), stride = 1, padding=1)
</code></code></pre><p>And then pass some random input through it. Here 100 is the batch size.</p><pre><code><code>x = torch.randn((100,3,24,24))
conv_layer(x).size()
--------------------------------
torch.Size([100, 64, 24, 24])
</code></code></pre><p>So, we get the output from the convolution operation as required, and I have sufficient information on how to use this layer in any Neural Network I design.</p><div><hr></div><h2><strong>Datasets and DataLoaders</strong></h2><p>How would we pass data to our Neural nets while training or while testing? We can definitely pass tensors as we have done above, but Pytorch also provides us with pre-built Datasets to make it easier for us to pass data to our neural nets. You can check out the complete list of datasets provided at <strong><a href="https://pytorch.org/docs/stable/torchvision/datasets.html">torchvision.datasets</a></strong> and <strong><a href="https://pytorch.org/text/datasets.html">torchtext.datasets</a></strong> . But, to give a concrete example for datasets, let&#8217;s say we had to pass images to an Image Neural net using a folder which has images in this structure:</p><pre><code><code>data
    train
        sailboat
        kayak
        .
        .
</code></code></pre><p>We can use torchvision.datasets.ImageFolder dataset to get an example image like below:</p><pre><code><code>from torchvision import transforms
from torchvision.datasets import ImageFolder
traindir = "data/train/"
t = transforms.Compose([
        transforms.Resize(size=256),
    transforms.CenterCrop(size=224),
        transforms.ToTensor()])
train_dataset = ImageFolder(root=traindir,transform=t)
print("Num Images in Dataset:", len(train_dataset))
print("Example Image and Label:", train_dataset[2])
</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7Po2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9b21897-4f40-4571-b0cb-29db6b14ac9d_989x513.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7Po2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9b21897-4f40-4571-b0cb-29db6b14ac9d_989x513.png 424w, https://substackcdn.com/image/fetch/$s_!7Po2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9b21897-4f40-4571-b0cb-29db6b14ac9d_989x513.png 848w, https://substackcdn.com/image/fetch/$s_!7Po2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9b21897-4f40-4571-b0cb-29db6b14ac9d_989x513.png 1272w, https://substackcdn.com/image/fetch/$s_!7Po2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9b21897-4f40-4571-b0cb-29db6b14ac9d_989x513.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7Po2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9b21897-4f40-4571-b0cb-29db6b14ac9d_989x513.png" width="989" height="513" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d9b21897-4f40-4571-b0cb-29db6b14ac9d_989x513.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:513,&quot;width&quot;:989,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;MLWhiz: Data Science, Machine Learning, Artificial Intelligence&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="MLWhiz: Data Science, Machine Learning, Artificial Intelligence" title="MLWhiz: Data Science, Machine Learning, Artificial Intelligence" srcset="https://substackcdn.com/image/fetch/$s_!7Po2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9b21897-4f40-4571-b0cb-29db6b14ac9d_989x513.png 424w, https://substackcdn.com/image/fetch/$s_!7Po2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9b21897-4f40-4571-b0cb-29db6b14ac9d_989x513.png 848w, https://substackcdn.com/image/fetch/$s_!7Po2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9b21897-4f40-4571-b0cb-29db6b14ac9d_989x513.png 1272w, https://substackcdn.com/image/fetch/$s_!7Po2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9b21897-4f40-4571-b0cb-29db6b14ac9d_989x513.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This dataset has 847 images, and we can get an image and its label using an index. Now we can pass images one by one to any image neural network using a for loop:</p><pre><code><code>for i in range(0,len(train_dataset)):
    image ,label = train_dataset[i]
    pred = model(image)
</code></code></pre><p><em><strong>But that is not optimal. We want to do batching.</strong></em> We can actually write some more code to append images and labels in a batch and then pass it to the Neural network. But Pytorch provides us with a utility iterator torch.utils.data.DataLoader to do precisely that. Now we can simply wrap our train_dataset in the Dataloader, and we will get batches instead of individual examples.</p><pre><code><code>train_dataloader = DataLoader(train_dataset,batch_size = 64, shuffle=True, num_workers=10)
</code></code></pre><p>We can simply iterate with batches using:</p><pre><code><code>for image_batch, label_batch in train_dataloader:
    print(image_batch.size(),label_batch.size())
    break
-------------------------------------------------
torch.Size([64, 3, 224, 224]) torch.Size([64])
</code></code></pre><p>So actually, the whole process of using datasets and Dataloaders becomes:</p><pre><code><code>t = transforms.Compose([
        transforms.Resize(size=256),
    transforms.CenterCrop(size=224),
        transforms.ToTensor()])

train_dataset = torchvision.datasets.ImageFolder(root=traindir,transform=t)
train_dataloader = DataLoader(train_dataset,batch_size = 64, shuffle=True, num_workers=10)

for image_batch, label_batch in train_dataloader:
    pred = myImageNeuralNet(image_batch)
</code></code></pre><p>You can look at this particular example in action in my previous blogpost on Image classification using Deep Learning <strong><a href="https://towardsdatascience.com/end-to-end-pipeline-for-setting-up-multiclass-image-classification-for-data-scientists-2e051081d41c">here</a></strong> .</p><p>This is great, and Pytorch does provide a lot of functionality out of the box. But the main power of Pytorch comes with its immense customization. We can also create our own custom datasets if the datasets provided by PyTorch don&#8217;t fit our use case.</p><div><hr></div><h3><strong>Understanding Custom Datasets</strong></h3><p>To write our custom datasets, we can make use of the abstract class <code>torch.utils.data.Dataset</code> provided by Pytorch. We need to inherit this <code>Dataset</code> class and need to define two methods to create a custom Dataset.</p><ul><li><p><code>__len__</code> : a function that returns the size of the dataset. This one is pretty simple to write in most cases.</p></li><li><p><code>__getitem__</code>: a function that takes as input an index i and returns the sample at index <code>i</code>.</p></li></ul><p>For example, we can create a simple custom dataset that returns an image and a label from a folder. See that most of the tasks are happening in <code>__init__</code> part where we use <code>glob.glob</code> to get image names and do some general preprocessing.</p><pre><code><code>from glob import glob
from PIL import Image
from torch.utils.data import Dataset

class customImageFolderDataset(Dataset):
    """Custom Image Loader dataset."""
    def __init__(self, root, transform=None):
        """
        Args:
            root (string): Path to the images organized in a particular folder structure.
            transform: Any Pytorch transform to be applied
        """
        # Get all image paths from a directory
        self.image_paths = glob(f"{root}/*/*")
        # Get the labels from the image paths
        self.labels = [x.split("/")[-2] for x in self.image_paths]
        # Create a dictionary mapping each label to a index from 0 to len(classes).
        self.label_to_idx = {x:i for i,x in enumerate(set(self.labels))}
        self.transform = transform

    def __len__(self):
        # return length of dataset
        return len(self.image_paths)

    def __getitem__(self, idx):
        # open and send one image and label
        img_name = self.image_paths[idx]
        label = self.labels[idx]
        image = Image.open(img_name)
        if self.transform:
            image = self.transform(image)
        return image,self.label_to_idx[label]
</code></code></pre><p>Also, note that we open our images one at a time in the <code>__getitem__</code> method and not while initializing. This is not done in <code>__init__</code> because we don&#8217;t want to load all our images in the memory and just need to load the required ones.</p><p>We can now use this dataset with the utility <code>Dataloader</code> just like before. It works just like the previous dataset provided by PyTorch but without some utility functions.</p><pre><code><code>t = transforms.Compose([
        transforms.Resize(size=256),
    transforms.CenterCrop(size=224),
        transforms.ToTensor()])

train_dataset = customImageFolderDataset(root=traindir,transform=t)
train_dataloader = DataLoader(train_dataset,batch_size = 64, shuffle=True, num_workers=10)

for image_batch, label_batch in train_dataloader:
    pred = myImageNeuralNet(image_batch)
</code></code></pre><div><hr></div><h3><strong>Understanding Custom DataLoaders</strong></h3><p><strong>This particular section is a little advanced and can be skipped going through this post as it will not be needed in a lot of situations.</strong> But I am adding it for completeness here.</p><p>So let&#8217;s say you are looking to provide batches to a network that processes text input, and the network could take sequences with any sequence size as long as the size remains constant in the batch. For example, we can have a BiLSTM network that can process sequences of any length. It&#8217;s alright if you don&#8217;t understand the layers used in it right now; just know that it can process sequences with variable sizes.</p><pre><code><code>class BiLSTM(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden_size = 64
        drp = 0.1
        max_features, embed_size = 10000,300
        self.embedding = nn.Embedding(max_features, embed_size)
        self.lstm = nn.LSTM(embed_size, self.hidden_size, bidirectional=True, batch_first=True)
        self.linear = nn.Linear(self.hidden_size*4 , 64)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(drp)
        self.out = nn.Linear(64, 1)


    def forward(self, x):
        h_embedding = self.embedding(x)
        h_embedding = torch.squeeze(torch.unsqueeze(h_embedding, 0))

        h_lstm, _ = self.lstm(h_embedding)
        avg_pool = torch.mean(h_lstm, 1)
        max_pool, _ = torch.max(h_lstm, 1)
        conc = torch.cat(( avg_pool, max_pool), 1)
        conc = self.relu(self.linear(conc))
        conc = self.dropout(conc)
        out = self.out(conc)
        return out
</code></code></pre><p>This network expects its input to be of shape (<code>batch_size</code>, <code>seq_length</code>) and works with any <code>seq_length</code>. We can check this by passing our model two random batches with different sequence lengths(10 and 25).</p><pre><code><code>model = BiLSTM()
input_batch_1 = torch.randint(low = 0,high = 10000, size = (100,**10**))
input_batch_2 = torch.randint(low = 0,high = 10000, size = (100,**25**))
print(model(input_batch_1).size())
print(model(input_batch_2).size())
------------------------------------------------------------------
torch.Size([100, 1])
torch.Size([100, 1])
</code></code></pre><p>Now, we want to provide tight batches to this model, such that each batch has the same sequence length based on the max sequence length in the batch to minimize padding. This has an added benefit of making the neural net run faster. It was, in fact, one of the methods used in the winning submission of the Quora Insincere challenge in Kaggle, where running time was of utmost importance.</p><p>So, how do we do this? Let&#8217;s write a very simple custom dataset class first.</p><pre><code><code>class CustomTextDataset(Dataset):
    '''
    Simple Dataset initializes with X and y vectors
    We start by sorting our X and y vectors by sequence lengths
    '''
    def __init__(self,X,y=None):
        self.data = list(zip(X,y))
        # Sort by length of first element in tuple
        self.data = sorted(self.data, key=lambda x: len(x[0]))

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx]
</code></code></pre><p>Also, let&#8217;s generate some random data which we will use with this custom Dataset.</p><pre><code><code>import numpy as np
train_data_size = 1024
sizes = np.random.randint(low=50,high=300,size=(train_data_size,))
X = [np.random.randint(0,10000, (sizes[i])) for i in range(train_data_size)]
y = np.random.rand(train_data_size).round()
#checking one example in dataset
print((X[0],y[0]))
</code></code></pre><p><em>Example of one random sequence and label. Each integer in the sequence corresponds to a word in the sentence.</em></p><p>We can use the custom dataset now using:</p><pre><code><code>train_dataset = CustomTextDataset(X,y)
</code></code></pre><p>If we now try to use the Dataloader on this dataset with <code>batch_size</code>&gt;1, we will get an error. Why is that?</p><pre><code><code>train_dataloader = DataLoader(train_dataset,batch_size = 64, shuffle=False, num_workers=10)
for xb,yb in train_dataloader:
    print(xb.size(),yb.size())
</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!w6zO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1f6c4f5-874e-418a-8ae0-cc9170e3fe8c_1069x29.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!w6zO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1f6c4f5-874e-418a-8ae0-cc9170e3fe8c_1069x29.png 424w, https://substackcdn.com/image/fetch/$s_!w6zO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1f6c4f5-874e-418a-8ae0-cc9170e3fe8c_1069x29.png 848w, https://substackcdn.com/image/fetch/$s_!w6zO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1f6c4f5-874e-418a-8ae0-cc9170e3fe8c_1069x29.png 1272w, https://substackcdn.com/image/fetch/$s_!w6zO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1f6c4f5-874e-418a-8ae0-cc9170e3fe8c_1069x29.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!w6zO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1f6c4f5-874e-418a-8ae0-cc9170e3fe8c_1069x29.png" width="1069" height="29" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e1f6c4f5-874e-418a-8ae0-cc9170e3fe8c_1069x29.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:29,&quot;width&quot;:1069,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;MLWhiz: Data Science, Machine Learning, Artificial Intelligence&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="MLWhiz: Data Science, Machine Learning, Artificial Intelligence" title="MLWhiz: Data Science, Machine Learning, Artificial Intelligence" srcset="https://substackcdn.com/image/fetch/$s_!w6zO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1f6c4f5-874e-418a-8ae0-cc9170e3fe8c_1069x29.png 424w, https://substackcdn.com/image/fetch/$s_!w6zO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1f6c4f5-874e-418a-8ae0-cc9170e3fe8c_1069x29.png 848w, https://substackcdn.com/image/fetch/$s_!w6zO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1f6c4f5-874e-418a-8ae0-cc9170e3fe8c_1069x29.png 1272w, https://substackcdn.com/image/fetch/$s_!w6zO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1f6c4f5-874e-418a-8ae0-cc9170e3fe8c_1069x29.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>This happens because the sequences have different lengths, and our data loader expects our sequences of the same length. Remember that in the previous image example, we resized all images to size 224 using the transforms, so we didn&#8217;t face this error.</p><p><em><strong>So, how do we iterate through this dataset so that each batch has sequences with the same length, but different batches may have different sequence lengths?</strong></em></p><p>We can use <code>collate_fn</code> parameter in the DataLoader that lets us define how to stack sequences in a particular batch. To use this, we need to define a function that takes as input a batch and returns (<code>x_batch</code>, <code>y_batch</code> ) with padded sequence lengths based on <code>max_sequence_length</code> in the batch. The functions I have used in the below function are simple NumPy operations. Also, the function is properly commented so you can understand what is happening.</p><pre><code><code>def collate_text(batch):
    # get text sequences in batch
    data = [item[0] for item in batch]
    # get labels in batch
    target = [item[1] for item in batch]
    # get max_seq_length in batch
    max_seq_len = max([len(x) for x in data])
    # pad text sequences based on max_seq_len
    data = [np.pad(p, (0, max_seq_len - len(p)), 'constant') for p in data]
    # convert data and target to tensor
    data = torch.LongTensor(data)
    target = torch.LongTensor(target)
    return [data, target]
</code></code></pre><p>We can now use this <code>collate_fn</code> with our Dataloader as:</p><pre><code><code>train_dataloader = DataLoader(train_dataset,batch_size = 64, shuffle=False, num_workers=10,collate_fn = collate_text)

for xb,yb in train_dataloader:
    print(xb.size(),yb.size())
</code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GtRA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45587a34-cb22-4a58-a2be-775fe91d0603_1224x451.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GtRA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45587a34-cb22-4a58-a2be-775fe91d0603_1224x451.png 424w, https://substackcdn.com/image/fetch/$s_!GtRA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45587a34-cb22-4a58-a2be-775fe91d0603_1224x451.png 848w, https://substackcdn.com/image/fetch/$s_!GtRA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45587a34-cb22-4a58-a2be-775fe91d0603_1224x451.png 1272w, https://substackcdn.com/image/fetch/$s_!GtRA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45587a34-cb22-4a58-a2be-775fe91d0603_1224x451.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GtRA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45587a34-cb22-4a58-a2be-775fe91d0603_1224x451.png" width="1224" height="451" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/45587a34-cb22-4a58-a2be-775fe91d0603_1224x451.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:451,&quot;width&quot;:1224,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;See that the batches have different sequence lengths now&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="See that the batches have different sequence lengths now" title="See that the batches have different sequence lengths now" srcset="https://substackcdn.com/image/fetch/$s_!GtRA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45587a34-cb22-4a58-a2be-775fe91d0603_1224x451.png 424w, https://substackcdn.com/image/fetch/$s_!GtRA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45587a34-cb22-4a58-a2be-775fe91d0603_1224x451.png 848w, https://substackcdn.com/image/fetch/$s_!GtRA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45587a34-cb22-4a58-a2be-775fe91d0603_1224x451.png 1272w, https://substackcdn.com/image/fetch/$s_!GtRA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45587a34-cb22-4a58-a2be-775fe91d0603_1224x451.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It will work this time as we have provided a custom <code>collate_fn</code>. And see that the batches have different sequence lengths now. Thus we would be able to train our BiLSTM using variable input sizes just like we wanted.</p><div><hr></div><h2><strong>Training a Neural Network</strong></h2><p>We know how to create a neural network using <code>nn.Module</code>. But how to train it? Any neural network that has to be trained will have a training loop that will look something similar to below:</p><pre><code><code>num_epochs = 5
for epoch in range(num_epochs):
    # Set model to train mode
    model.train()
    for x_batch,y_batch in train_dataloader:
        # Clear gradients
        optimizer.zero_grad()
        # Forward pass - Predicted outputs
        pred = model(x_batch)
        # Find Loss and backpropagation of gradients
        loss = loss_criterion(pred, y_batch)
        loss.backward()
        # Update the parameters
        optimizer.step()
    model.eval()
    for x_batch,y_batch in valid_dataloader:
        pred = model(x_batch)
        val_loss = loss_criterion(pred, y_batch)
</code></code></pre><p>In the above code, we are running five epochs and in each epoch:</p><ol><li><p>We iterate through the dataset using a data loader.</p></li><li><p>In each iteration, we do a forward pass using <code>model(x_batch)</code></p></li><li><p>We calculate the Loss using a <code>loss_criterion</code></p></li><li><p>We back-propagate that loss using <code>loss.backward()</code> call. We don&#8217;t have to worry about the calculation of the gradients at all, as this simple call does it all for us.</p></li><li><p>Take an optimizer step to change the weights in the whole network using <code>optimizer.step()</code>. This is where weights of the network get modified using the gradients calculated in <code>loss.backward()</code> call.</p></li><li><p>We go through the validation data loader to check the validation score/metrics. Before doing validation, we set the model to eval mode using <code>model.eval()</code>.Please note we don&#8217;t back-propagate losses in eval mode.</p></li></ol><p>Till now, we have talked about how to use <code>nn.Module</code> to create networks and how to use Custom Datasets and Dataloaders with Pytorch. So let&#8217;s talk about the various options available for Loss Functions and Optimizers.</p><div><hr></div><h2><strong>Loss functions</strong></h2><p>Pytorch provides us with a variety of <strong><a href="https://pytorch.org/docs/stable/nn.html#loss-functions">loss functions</a></strong> for our most common tasks, like Classification and Regression. Some most used examples are <strong><a href="https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss">nn.CrossEntropyLoss</a></strong> , <strong><a href="https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html#torch.nn.NLLLoss">nn.NLLLoss</a></strong> , <strong><a href="https://pytorch.org/docs/stable/generated/torch.nn.KLDivLoss.html#torch.nn.KLDivLoss">nn.KLDivLoss</a></strong> and <strong><a href="https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss">nn.MSELoss</a></strong> . You can read the documentation of each loss function, but to explain how to use these loss functions, I will go through the example of <strong><a href="https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html#torch.nn.NLLLoss">nn.NLLLoss</a></strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qpgd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30bc9067-c381-4d9b-bcc2-990f344be0cd_1488x908.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qpgd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30bc9067-c381-4d9b-bcc2-990f344be0cd_1488x908.png 424w, https://substackcdn.com/image/fetch/$s_!qpgd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30bc9067-c381-4d9b-bcc2-990f344be0cd_1488x908.png 848w, https://substackcdn.com/image/fetch/$s_!qpgd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30bc9067-c381-4d9b-bcc2-990f344be0cd_1488x908.png 1272w, https://substackcdn.com/image/fetch/$s_!qpgd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30bc9067-c381-4d9b-bcc2-990f344be0cd_1488x908.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qpgd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30bc9067-c381-4d9b-bcc2-990f344be0cd_1488x908.png" width="1456" height="888" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30bc9067-c381-4d9b-bcc2-990f344be0cd_1488x908.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:888,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;MLWhiz: Data Science, Machine Learning, Artificial Intelligence&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="MLWhiz: Data Science, Machine Learning, Artificial Intelligence" title="MLWhiz: Data Science, Machine Learning, Artificial Intelligence" srcset="https://substackcdn.com/image/fetch/$s_!qpgd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30bc9067-c381-4d9b-bcc2-990f344be0cd_1488x908.png 424w, https://substackcdn.com/image/fetch/$s_!qpgd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30bc9067-c381-4d9b-bcc2-990f344be0cd_1488x908.png 848w, https://substackcdn.com/image/fetch/$s_!qpgd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30bc9067-c381-4d9b-bcc2-990f344be0cd_1488x908.png 1272w, https://substackcdn.com/image/fetch/$s_!qpgd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30bc9067-c381-4d9b-bcc2-990f344be0cd_1488x908.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The documentation for NLLLoss is pretty succinct. As in, this loss function is used for Multiclass classification, and based on the documentation:</p><ul><li><p>the input expected needs to be of size (<code>batch_size</code> x <code>Num_Classes</code> ) &#8212; These are the predictions from the Neural Network we have created.</p></li><li><p>We need to have the log-probabilities of each class in the input &#8212; To get log-probabilities from a Neural Network, we can add a <code>LogSoftmax</code> Layer as the last layer of our network.</p></li><li><p>The target needs to be a tensor of classes with class numbers in the range(0, C-1) where C is the number of classes.</p></li></ul><p>So, we can try to use this Loss function for a simple classification network. Please note the LogSoftmax layer after the final linear layer. If you don&#8217;t want to use this LogSoftmax layer, you could have just used <strong><a href="https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss">&lt;code&gt;nn.CrossEntropyLoss&lt;/code&gt;</a></strong></p><pre><code><code>class myClassificationNet(nn.Module):
    def __init__(self):
        super().__init__()
        # Define all Layers Here
        self.lin = nn.Linear(784, 10)
        self.logsoftmax = nn.LogSoftmax(dim=1)
    def forward(self, x):
        # Connect the layer Outputs here to define the forward pass
        x = self.lin(x)
        x = self.logsoftmax(x)
        return x
</code></code></pre><p>Let&#8217;s define a random input to pass to our network to test it:</p><pre><code><code># some random input:

X = torch.randn(100,784)
y = torch.randint(low = 0,high = 10,size = (100,))
</code></code></pre><p>And pass it through the model to get predictions:</p><pre><code><code>model = myClassificationNet()
preds = model(X)
</code></code></pre><p>We can now get the loss as:</p><pre><code><code>criterion = nn.NLLLoss()
loss = criterion(preds,y)
loss
------------------------------------------
tensor(2.4852, grad_fn=&lt;NllLossBackward&gt;)
</code></code></pre><div><hr></div><h3><strong>Custom Loss Function</strong></h3>
      <p>
          <a href="https://www.mlwhiz.com/p/pytorch_guide">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[I Use Claude Code Every Day. Here's the Setup That Actually Matters]]></title><description><![CDATA[CLAUDE.md, Skills 2.0, permission modes, channels &#8212; an opinionated guide for beginners and the mildly curious]]></description><link>https://www.mlwhiz.com/p/i-use-claude-code-every-day-heres</link><guid isPermaLink="false">https://www.mlwhiz.com/p/i-use-claude-code-every-day-heres</guid><dc:creator><![CDATA[Rahul Agarwal]]></dc:creator><pubDate>Thu, 23 Apr 2026 02:06:45 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!zpGL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda038624-aeab-4f92-81b5-c1ed9ab38219_1280x737.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yHq9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" width="1456" height="250" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:250,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77210,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p><em>Hey, Rahul here! &#128075; Each week, I publish long-form ML+AI posts covering ML, AI, and System design for MLwhiz. Paid subscribers also get how-to guides with full code walkthroughs. I publish occasional extra articles. If you&#8217;d like to become a paid subscriber, here&#8217;s a button for that:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.mlwhiz.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.mlwhiz.com/subscribe?"><span>Subscribe now</span></a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B1mx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B1mx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 424w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 848w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1272w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png" width="995" height="80" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:80,&quot;width&quot;:995,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15990,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B1mx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 424w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 848w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1272w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zpGL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda038624-aeab-4f92-81b5-c1ed9ab38219_1280x737.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zpGL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda038624-aeab-4f92-81b5-c1ed9ab38219_1280x737.jpeg 424w, https://substackcdn.com/image/fetch/$s_!zpGL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda038624-aeab-4f92-81b5-c1ed9ab38219_1280x737.jpeg 848w, https://substackcdn.com/image/fetch/$s_!zpGL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda038624-aeab-4f92-81b5-c1ed9ab38219_1280x737.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!zpGL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda038624-aeab-4f92-81b5-c1ed9ab38219_1280x737.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zpGL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda038624-aeab-4f92-81b5-c1ed9ab38219_1280x737.jpeg" width="1280" height="737" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/da038624-aeab-4f92-81b5-c1ed9ab38219_1280x737.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:737,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Claude Code makes it easy to trigger a code check now with this simple  command | ZDNET&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Claude Code makes it easy to trigger a code check now with this simple  command | ZDNET" title="Claude Code makes it easy to trigger a code check now with this simple  command | ZDNET" srcset="https://substackcdn.com/image/fetch/$s_!zpGL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda038624-aeab-4f92-81b5-c1ed9ab38219_1280x737.jpeg 424w, https://substackcdn.com/image/fetch/$s_!zpGL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda038624-aeab-4f92-81b5-c1ed9ab38219_1280x737.jpeg 848w, https://substackcdn.com/image/fetch/$s_!zpGL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda038624-aeab-4f92-81b5-c1ed9ab38219_1280x737.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!zpGL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda038624-aeab-4f92-81b5-c1ed9ab38219_1280x737.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let me admit something upfront &#8594; there is always a small, mildly annoying blocker before picking up a new tool. </p><p>That little &lt;<em>do I really want to learn yet another thing?&gt;</em> feeling &#8212; even when half of LinkedIn is screaming that you must. </p><p>I felt it with Claude Code. I had it installed on my machine for two full weeks before I actually sat down to use it. And when I finally did, the first hour was a lot of reading docs, clicking &#8220;yes&#8221; to prompts I did not understand, and wondering if this was genuinely going to pay off or if I had just given another AI tool access to my filesystem.</p><p>Here is the thing about Claude Code, though. The hello-world is easy. </p><p>You <code>npm install</code>, you type <code>claude</code>, you ask it to fix a bug, it fixes the bug. You feel clever. Then you look at the docs and you feel lost. Are you making the most out of it?</p><p>This is so confusing. Which of this actually matters? And which of it can you safely ignore for now?</p><p>I&#8217;ve been using Claude Code daily for months since that slow start, and in the first week I made every setup mistake I could. I clicked &#8220;yes&#8221; to permission prompts a thousand times before I realized <code>acceptEdits</code> and <code>dangerously-skip-permissions</code> existed. Racked up a $200 API bill before I discovered the Max plan can be used to login as well. Ended up starting forty-something terminal claude sessions which got lost as I didn&#8217;t know <code>claude -c</code> was a thing. </p><p>This post is the shortcut I wish someone had handed me in week one &#8212; an opinionated setup that gets you from &#8220;I installed it&#8221; to &#8220;I actually use this daily&#8221; in a weekend. The 20% of the surface area that delivers 80% of the value. Plus honest opinions on which features are worth your time and which ones you can skip. </p><div class="pullquote"><p>The <a href="https://code.claude.com/docs/en/overview">official docs</a> are great if you want the full feature dump; this is the opposite of that.</p></div><p>Let&#8217;s dive in.</p><div><hr></div><h2>1. What Claude Code Actually Is (And What It Isn&#8217;t)</h2><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GLPn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfe35ba1-1c58-44c2-9d3e-d16d776998ee_2958x433.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GLPn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfe35ba1-1c58-44c2-9d3e-d16d776998ee_2958x433.png 424w, https://substackcdn.com/image/fetch/$s_!GLPn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfe35ba1-1c58-44c2-9d3e-d16d776998ee_2958x433.png 848w, https://substackcdn.com/image/fetch/$s_!GLPn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfe35ba1-1c58-44c2-9d3e-d16d776998ee_2958x433.png 1272w, https://substackcdn.com/image/fetch/$s_!GLPn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfe35ba1-1c58-44c2-9d3e-d16d776998ee_2958x433.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GLPn!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfe35ba1-1c58-44c2-9d3e-d16d776998ee_2958x433.png" width="1200" height="175.54945054945054" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bfe35ba1-1c58-44c2-9d3e-d16d776998ee_2958x433.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:213,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;The Claude Code agent loop: you prompt the CLI, the CLI talks to Claude with your CLAUDE.md and context, Claude calls tools that read and write your files, observations come back, the loop repeats until the task is done&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="The Claude Code agent loop: you prompt the CLI, the CLI talks to Claude with your CLAUDE.md and context, Claude calls tools that read and write your files, observations come back, the loop repeats until the task is done" title="The Claude Code agent loop: you prompt the CLI, the CLI talks to Claude with your CLAUDE.md and context, Claude calls tools that read and write your files, observations come back, the loop repeats until the task is done" srcset="https://substackcdn.com/image/fetch/$s_!GLPn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfe35ba1-1c58-44c2-9d3e-d16d776998ee_2958x433.png 424w, https://substackcdn.com/image/fetch/$s_!GLPn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfe35ba1-1c58-44c2-9d3e-d16d776998ee_2958x433.png 848w, https://substackcdn.com/image/fetch/$s_!GLPn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfe35ba1-1c58-44c2-9d3e-d16d776998ee_2958x433.png 1272w, https://substackcdn.com/image/fetch/$s_!GLPn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfe35ba1-1c58-44c2-9d3e-d16d776998ee_2958x433.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Before we install anything, let&#8217;s get the mental model right. This is the single most common reason people bounce off Claude Code in the first hour.</p><p>Claude Code is <strong>not</strong> a VS Code extension that autocompletes your code. It is <strong>not</strong> a chat sidebar. It is <strong>not</strong> Cursor or Copilot. Yes, it has a VS Code integration, but the integration is a thin window over a CLI tool that runs in your terminal.</p><p>What Claude Code actually is: an <strong>agentic CLI</strong>. You run <code>claude</code> in your terminal, you tell it what you want in plain English, and it then reads files, writes files, runs bash commands, executes your tests, browses the web, calls APIs, and keeps iterating until the task is done. You watch it work, you can interrupt at any point, <em><strong>you can steer.</strong></em></p><p>Think of it less like autocomplete and more like pair-programming with a junior developer who types fast, never gets tired, sometimes goes off the rails, and occasionally needs a hard &#8220;no, do not do that.&#8221;</p><p><em><strong>That loop above &#8212; shown in the diagram at the start of this section &#8212; is the whole product</strong></em>. Read, think, act, observe, repeat. Your job is to give it good context up front and steer when it drifts.</p><p>If this sounds familiar, it should &#8212; I wrote about <a href="https://www.mlwhiz.com/p/genai-series-my-tryst-with-ai-assisted">my first dance with vibe coding</a> using Claude Pro a while back. Claude Code is what happens when that same loop moves out of a chat window and into your actual terminal, with access to your actual files and your actual tests.</p><p>And it is not a small thing. As of February 2026, roughly <strong>4% of all public commits on GitHub</strong> were authored by Claude Code &#8212; about 135,000 commits per day. Anthropic itself says <strong>90% of their internal code is now AI-written</strong>. ServiceNow has 29,000 daily users on it. This is not a Twitter fad. <em><strong>This is real and this is here to stay.</strong></em></p><p>The question is no longer &#8220;is this useful.&#8221; The question is how you go from &#8220;I installed it&#8221; to &#8220;I actually use it well.&#8221; Which is the rest of this post.</p><div class="pullquote"><p><strong>Anyone still calling this &#8220;autocomplete&#8221; has either not used it, or has not been paying attention.</strong></p></div><div><hr></div><h2>2. Install and Get Logged In</h2><p>Installation is genuinely a one-liner now.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">npm install -g @anthropic-ai/claude-code</code></pre></div><p>You need Node 18 or newer. That is the whole prerequisite list.</p><p>Open a terminal and Run <code>claude</code> in any directory. The first time, it walks you through login. You get two choices: a <strong>claude.ai account</strong> (Pro or Max plan) or an <strong>API key</strong> from the Anthropic Console. Pick the first one if you are a human writing code daily. Pick the second one only if you have a specific reason &#8212; Bedrock, Vertex, headless CI, Corporate use and that kind of thing.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8FrP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc16f3e4-b9f9-4a45-b426-c2068d12456f_1684x630.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8FrP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc16f3e4-b9f9-4a45-b426-c2068d12456f_1684x630.png 424w, https://substackcdn.com/image/fetch/$s_!8FrP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc16f3e4-b9f9-4a45-b426-c2068d12456f_1684x630.png 848w, https://substackcdn.com/image/fetch/$s_!8FrP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc16f3e4-b9f9-4a45-b426-c2068d12456f_1684x630.png 1272w, https://substackcdn.com/image/fetch/$s_!8FrP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc16f3e4-b9f9-4a45-b426-c2068d12456f_1684x630.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8FrP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc16f3e4-b9f9-4a45-b426-c2068d12456f_1684x630.png" width="1456" height="545" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dc16f3e4-b9f9-4a45-b426-c2068d12456f_1684x630.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:545,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:88901,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.mlwhiz.com/i/194608696?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc16f3e4-b9f9-4a45-b426-c2068d12456f_1684x630.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8FrP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc16f3e4-b9f9-4a45-b426-c2068d12456f_1684x630.png 424w, https://substackcdn.com/image/fetch/$s_!8FrP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc16f3e4-b9f9-4a45-b426-c2068d12456f_1684x630.png 848w, https://substackcdn.com/image/fetch/$s_!8FrP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc16f3e4-b9f9-4a45-b426-c2068d12456f_1684x630.png 1272w, https://substackcdn.com/image/fetch/$s_!8FrP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc16f3e4-b9f9-4a45-b426-c2068d12456f_1684x630.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Pricing reality check:</strong> Max plan is $100 a month for individuals, $200 a month for teams. A typical 30-to-60-minute Claude Code session costs roughly $0.50 to $3.00 on the API. Do that math for yourself. If you are coding even a few hours a day, Max is the cheaper option, and it gives you Opus access without rate-limit anxiety.</p><p>If you want my full breakdown of how the AI subscriptions compare across coding, research, and general use, <a href="https://www.mlwhiz.com/p/which-ai-subscription-is-actually">I broke it down here</a>. </p><p>Once you are in, run <code>/status</code> to see which settings are active and how many tokens you have used up. We will come back to that command when things get weird.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7PAA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b62c4f3-a6f1-4839-8883-08999c133ef5_1776x1008.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7PAA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b62c4f3-a6f1-4839-8883-08999c133ef5_1776x1008.png 424w, https://substackcdn.com/image/fetch/$s_!7PAA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b62c4f3-a6f1-4839-8883-08999c133ef5_1776x1008.png 848w, https://substackcdn.com/image/fetch/$s_!7PAA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b62c4f3-a6f1-4839-8883-08999c133ef5_1776x1008.png 1272w, https://substackcdn.com/image/fetch/$s_!7PAA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b62c4f3-a6f1-4839-8883-08999c133ef5_1776x1008.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7PAA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b62c4f3-a6f1-4839-8883-08999c133ef5_1776x1008.png" width="1456" height="826" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6b62c4f3-a6f1-4839-8883-08999c133ef5_1776x1008.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:826,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:56688,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.mlwhiz.com/i/194608696?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b62c4f3-a6f1-4839-8883-08999c133ef5_1776x1008.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7PAA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b62c4f3-a6f1-4839-8883-08999c133ef5_1776x1008.png 424w, https://substackcdn.com/image/fetch/$s_!7PAA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b62c4f3-a6f1-4839-8883-08999c133ef5_1776x1008.png 848w, https://substackcdn.com/image/fetch/$s_!7PAA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b62c4f3-a6f1-4839-8883-08999c133ef5_1776x1008.png 1272w, https://substackcdn.com/image/fetch/$s_!7PAA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6b62c4f3-a6f1-4839-8883-08999c133ef5_1776x1008.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>3. CLAUDE.md &#8212; The One File You Have to Write</h2><p>If you only learn one thing from this entire post, learn this.</p><p><code>CLAUDE.md</code> is a markdown file at the root of your project. Every time you start Claude Code in that directory, the contents of this file get loaded into the conversation automatically. Think of this file as your project&#8217;s onboarding document &#8212; except the onboardee shows up every single session and reads it cover to cover.</p><p>This is where you tell Claude things like &#8594; what the project does, which command runs the tests, what the build tool is, conventions the team follows, paths it should never touch, and the gotchas.</p><p>Here is a real-world <code>CLAUDE.md</code> for a Python data science project, slightly trimmed:</p>
      <p>
          <a href="https://www.mlwhiz.com/p/i-use-claude-code-every-day-heres">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[MLWhiz Weekly AI/ML Newsletter # 4]]></title><description><![CDATA[The week AI buyers became AI owners &#8212; and a lot of people decided they&#8217;re done with agents.]]></description><link>https://www.mlwhiz.com/p/mlwhiz-weekly-aiml-newsletter-4</link><guid isPermaLink="false">https://www.mlwhiz.com/p/mlwhiz-weekly-aiml-newsletter-4</guid><dc:creator><![CDATA[Rahul Agarwal]]></dc:creator><pubDate>Tue, 21 Apr 2026 22:02:50 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ZEwF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc173bef8-6812-466c-83b1-8a7f8e5045dd_1410x804.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZEwF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc173bef8-6812-466c-83b1-8a7f8e5045dd_1410x804.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZEwF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc173bef8-6812-466c-83b1-8a7f8e5045dd_1410x804.png 424w, https://substackcdn.com/image/fetch/$s_!ZEwF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc173bef8-6812-466c-83b1-8a7f8e5045dd_1410x804.png 848w, https://substackcdn.com/image/fetch/$s_!ZEwF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc173bef8-6812-466c-83b1-8a7f8e5045dd_1410x804.png 1272w, https://substackcdn.com/image/fetch/$s_!ZEwF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc173bef8-6812-466c-83b1-8a7f8e5045dd_1410x804.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZEwF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc173bef8-6812-466c-83b1-8a7f8e5045dd_1410x804.png" width="1410" height="804" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c173bef8-6812-466c-83b1-8a7f8e5045dd_1410x804.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:804,&quot;width&quot;:1410,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:99270,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.mlwhiz.com/i/194950920?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc173bef8-6812-466c-83b1-8a7f8e5045dd_1410x804.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZEwF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc173bef8-6812-466c-83b1-8a7f8e5045dd_1410x804.png 424w, https://substackcdn.com/image/fetch/$s_!ZEwF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc173bef8-6812-466c-83b1-8a7f8e5045dd_1410x804.png 848w, https://substackcdn.com/image/fetch/$s_!ZEwF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc173bef8-6812-466c-83b1-8a7f8e5045dd_1410x804.png 1272w, https://substackcdn.com/image/fetch/$s_!ZEwF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc173bef8-6812-466c-83b1-8a7f8e5045dd_1410x804.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>&#127942; Story of the Week: OpenAI Just Bought 10% of Cerebras</h2><p>For two years, I&#8217;ve been hearing the same question in every infrastructure conversation: how does the Nvidia monopoly end?</p><p>This week, we got the answer. And it&#8217;s not what anyone predicted.</p><p>It wasn&#8217;t a DOJ antitrust case. It was a procurement contract &#8212; but a procurement contract structured like nothing we&#8217;ve ever seen in this industry.</p><p><a href="https://techcrunch.com/2026/04/18/ai-chip-startup-cerebras-files-for-ipo/">Cerebras filed its S-1 on Friday</a>. But buried inside was a deal between OpenAI and Cerebras that is pretty hard to believe.</p><p>Here&#8217;s what OpenAI committed to:</p><ul><li><p><strong>$20+ billion</strong> in chip spending through 2028</p></li><li><p><strong>750 MW</strong> of capacity, with an option to expand to <strong>2 GW</strong></p></li><li><p>A <strong>$1 billion loan</strong> to OpenAI from Cerebras at 6% interest</p></li></ul><p>In exchange, OpenAI got about <strong>10% of Cerebras</strong> post-IPO.</p><p>Read that again. The customer got equity in the supplier. The customer also got a billion-dollar loan from the supplier.</p><p><em><strong>OpenAI is now Cerebras&#8217;s biggest customer, biggest creditor, and one of its biggest shareholders</strong></em>. People are calling it &#8220;<em><strong>circular&#8230;</strong></em></p>
      <p>
          <a href="https://www.mlwhiz.com/p/mlwhiz-weekly-aiml-newsletter-4">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[From RNNs to Transformers: Building Sequential Recommenders (Part 1)]]></title><description><![CDATA[RecSys Series Part 9a: Implementing GRU4Rec and SASRec on Steam Games &#8212; with production deployment patterns]]></description><link>https://www.mlwhiz.com/p/rnns-to-transformers-sequential-recommenders</link><guid isPermaLink="false">https://www.mlwhiz.com/p/rnns-to-transformers-sequential-recommenders</guid><dc:creator><![CDATA[Rahul Agarwal]]></dc:creator><pubDate>Sat, 18 Apr 2026 10:56:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!dabI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c923b5c-fcfc-4cbc-8ba3-6de7f243c698_1415x993.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yHq9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" width="1456" height="250" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:250,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77210,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p><em>Hey, Rahul here! &#128075; Each week, I publish long-form ML+AI posts covering ML, AI, and System design for MLwhiz. Paid subscribers also get how-to guides with full code walkthroughs. I publish occasional extra articles. If you&#8217;d like to become a paid subscriber, here&#8217;s a button for that:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.mlwhiz.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.mlwhiz.com/subscribe?"><span>Subscribe now</span></a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B1mx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B1mx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 424w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 848w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1272w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png" width="995" height="80" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:80,&quot;width&quot;:995,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15990,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B1mx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 424w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 848w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1272w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p>Every tech revolution follows the same pattern. First, we solve the problem one way. Then, we realize we&#8217;ve been solving the <em>wrong</em> problem.</p><p><strong>Natural language processing:</strong> we spent a decade on classification (sentiment, NER, QA as pick-the-right-answer). </p><p><strong>Then GPT said:</strong> generation subsumes classification. Just generate the output.</p><p>Recommendation systems are having their moment right now. We spent years building the pipeline that would retrieve 10K candidates with Two-Tower, score 1K with a ranker, re-rank the top 100. <em><strong>But what if the recommender could just generate the next item directly?</strong></em> That&#8217;s where this series is headed.</p><p>But you can&#8217;t understand the generative revolution without understanding what came before it. </p><p>In this two-part post, we&#8217;ll trace the full evolution: <em><strong>how RNNs first cracked sequential recommendation, how Transformers took over, and ultimately how generative models are rewriting the rules entirely.</strong></em></p><p>This is <strong>Part 1</strong> &#8212; covering </p><ul><li><p>GRU4Rec (2016), </p></li><li><p>SASRec (2018), </p></li><li><p>the BERT4Rec controversy, and </p></li><li><p>production deployment patterns. </p></li></ul><p>Part 2 will cover Semantic IDs, TIGER, HSTU, and who&#8217;s deploying generative recommenders in production today.</p><p>Let&#8217;s dive in!</p><div><hr></div><h2>1. The Sequential Problem: Why Order Matters</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dWff!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69d139ac-6af5-447f-a16b-e33431f33cac_2076x569.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dWff!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69d139ac-6af5-447f-a16b-e33431f33cac_2076x569.png 424w, https://substackcdn.com/image/fetch/$s_!dWff!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69d139ac-6af5-447f-a16b-e33431f33cac_2076x569.png 848w, https://substackcdn.com/image/fetch/$s_!dWff!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69d139ac-6af5-447f-a16b-e33431f33cac_2076x569.png 1272w, https://substackcdn.com/image/fetch/$s_!dWff!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69d139ac-6af5-447f-a16b-e33431f33cac_2076x569.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dWff!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69d139ac-6af5-447f-a16b-e33431f33cac_2076x569.png" width="1456" height="399" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/69d139ac-6af5-447f-a16b-e33431f33cac_2076x569.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:399,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Static CF vs Sequential Recommendation&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Static CF vs Sequential Recommendation" title="Static CF vs Sequential Recommendation" srcset="https://substackcdn.com/image/fetch/$s_!dWff!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69d139ac-6af5-447f-a16b-e33431f33cac_2076x569.png 424w, https://substackcdn.com/image/fetch/$s_!dWff!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69d139ac-6af5-447f-a16b-e33431f33cac_2076x569.png 848w, https://substackcdn.com/image/fetch/$s_!dWff!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69d139ac-6af5-447f-a16b-e33431f33cac_2076x569.png 1272w, https://substackcdn.com/image/fetch/$s_!dWff!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69d139ac-6af5-447f-a16b-e33431f33cac_2076x569.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let&#8217;s say that you just finished watching <em>Inception</em>. Netflix recommends <em>Interstellar</em>. You watch it. Next up: <em>Arrival</em>.</p><p>As you can see this watching order is not random. This is not just &#8220;you like sci-fi&#8221; and so are watching sci-fi movies. </p><p>There&#8217;s a <strong>trajectory here</strong>. The recommender is following your path through Christopher Nolan&#8217;s mind-bending sci-fi catalog &#8212; what you watched <em>second might</em> change what you should see <em>third</em>.</p><p>In <a href="https://www.mlwhiz.com/p/the-recommenders-playbook-algorithms">the first post of this series, we covered collaborative filtering</a>, which treats user history as an unordered matrix &#8212; a bag of items. So, if you watched <em>[Inception, Interstellar, Arrival]</em>, traditional CF treats that the same as <em>[Arrival, Inception, Interstellar]</em>. But the order you watched them in tells you something completely different about what to recommend next.</p><p>Sequential models fixed that. They learned to predict not just &#8220;what you might like&#8221; but &#8220;what comes next.&#8221;</p><h3>Formalising the Problem</h3><p>Given a sequence of items a user has interacted with:</p><p style="text-align: center;"><strong>[i&#8321;, i&#8322;, i&#8323;, ..., i&#8345;]</strong></p><p>Predict the next item: <strong>i&#8345;&#8330;&#8321;</strong></p><p>This formulation applies across domains: </p><ul><li><p>E-commerce: product browsing &#8594; purchase prediction </p></li><li><p>Streaming: watch history &#8594; next video </p></li><li><p>Music: listening sequence &#8594; next song </p></li><li><p>News: reading pattern &#8594; next article</p></li></ul><h3>The Benchmark: Steam Games Dataset</h3><p>For this post, we&#8217;ll use the <strong>Steam Games</strong> dataset &#8212; a rich gaming interaction dataset from UCSD&#8217;s repository for building our models: </p><ul><li><p><strong>67,287 users</strong> (raw) &#8594; <strong>56,808 users</strong> (after 5-core filtering) </p></li><li><p><strong>32,133 games</strong> (raw) &#8594; <strong>6,382 games</strong> (after 5-core filtering) </p></li><li><p><strong>2,235,453 interactions</strong> (playtime &gt; 1 hour) </p></li><li><p>Average sequence length: 39.4 games (median: 26) </p></li></ul><p><strong>5-core filtering</strong> is a technique that removes all users and items with fewer than 5 interactions, applied iteratively until every remaining user and item has at least 5. It&#8217;s a standard preprocessing step in RecSys research to eliminate extreme cold-start cases (users who tried one game, games nobody played) that add noise without enough signal to learn from.</p><p>Now let&#8217;s see how different architectures tackle sequential prediction.</p><div><hr></div><h2>2. GRU4Rec &#8212; When RNNs Met Recommendations (2016)</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hbkG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88f6a116-966a-457b-b886-a84994ca6ad0_778x2368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hbkG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88f6a116-966a-457b-b886-a84994ca6ad0_778x2368.png 424w, https://substackcdn.com/image/fetch/$s_!hbkG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88f6a116-966a-457b-b886-a84994ca6ad0_778x2368.png 848w, https://substackcdn.com/image/fetch/$s_!hbkG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88f6a116-966a-457b-b886-a84994ca6ad0_778x2368.png 1272w, https://substackcdn.com/image/fetch/$s_!hbkG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88f6a116-966a-457b-b886-a84994ca6ad0_778x2368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hbkG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88f6a116-966a-457b-b886-a84994ca6ad0_778x2368.png" width="306" height="931.3727506426735" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/88f6a116-966a-457b-b886-a84994ca6ad0_778x2368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2368,&quot;width&quot;:778,&quot;resizeWidth&quot;:306,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;GRU4Rec Architecture&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="GRU4Rec Architecture" title="GRU4Rec Architecture" srcset="https://substackcdn.com/image/fetch/$s_!hbkG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88f6a116-966a-457b-b886-a84994ca6ad0_778x2368.png 424w, https://substackcdn.com/image/fetch/$s_!hbkG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88f6a116-966a-457b-b886-a84994ca6ad0_778x2368.png 848w, https://substackcdn.com/image/fetch/$s_!hbkG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88f6a116-966a-457b-b886-a84994ca6ad0_778x2368.png 1272w, https://substackcdn.com/image/fetch/$s_!hbkG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F88f6a116-966a-457b-b886-a84994ca6ad0_778x2368.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In 2016, Gravity R&amp;D published &#8220;<a href="https://arxiv.org/abs/1511.06939">Session-based Recommendations with Recurrent Neural Networks</a>&#8221; at ICLR. First major work applying RNNs to sequential recommendation. It dominated the field for nearly two years.</p>
      <p>
          <a href="https://www.mlwhiz.com/p/rnns-to-transformers-sequential-recommenders">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[MLWhiz Weekly AI/ML Newsletter # 3]]></title><description><![CDATA[Here is what happened this week.]]></description><link>https://www.mlwhiz.com/p/mlwhiz-weekly-aiml-newsletter-3</link><guid isPermaLink="false">https://www.mlwhiz.com/p/mlwhiz-weekly-aiml-newsletter-3</guid><dc:creator><![CDATA[Rahul Agarwal]]></dc:creator><pubDate>Mon, 13 Apr 2026 04:35:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!5E41!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3172ddc0-460d-49b7-8a6e-236ee4004188_1410x804.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5E41!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3172ddc0-460d-49b7-8a6e-236ee4004188_1410x804.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5E41!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3172ddc0-460d-49b7-8a6e-236ee4004188_1410x804.png 424w, https://substackcdn.com/image/fetch/$s_!5E41!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3172ddc0-460d-49b7-8a6e-236ee4004188_1410x804.png 848w, https://substackcdn.com/image/fetch/$s_!5E41!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3172ddc0-460d-49b7-8a6e-236ee4004188_1410x804.png 1272w, https://substackcdn.com/image/fetch/$s_!5E41!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3172ddc0-460d-49b7-8a6e-236ee4004188_1410x804.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5E41!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3172ddc0-460d-49b7-8a6e-236ee4004188_1410x804.png" width="1410" height="804" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3172ddc0-460d-49b7-8a6e-236ee4004188_1410x804.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:804,&quot;width&quot;:1410,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:99237,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.mlwhiz.com/i/193923335?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3172ddc0-460d-49b7-8a6e-236ee4004188_1410x804.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5E41!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3172ddc0-460d-49b7-8a6e-236ee4004188_1410x804.png 424w, https://substackcdn.com/image/fetch/$s_!5E41!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3172ddc0-460d-49b7-8a6e-236ee4004188_1410x804.png 848w, https://substackcdn.com/image/fetch/$s_!5E41!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3172ddc0-460d-49b7-8a6e-236ee4004188_1410x804.png 1272w, https://substackcdn.com/image/fetch/$s_!5E41!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3172ddc0-460d-49b7-8a6e-236ee4004188_1410x804.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>&#127942; Story of the Week: The AI Stack Fragments &#8212; From Silicon to Society</h2><p>This was the week the AI industry&#8217;s single-vendor era officially ended. Not in one dramatic announcement, but through a cascade of moves at every layer of the stack that collectively redrew the competitive map.</p><p><strong>Start at the bottom:</strong> Intel surged 4.2% on Monday after confirming its participation in Elon Musk&#8217;s Terafab project &#8212; the first serious attempt at a domestic US AI chip fab. The same day, reports confirmed that <a href="https://blog.mean.ceo/new-ai-model-releases-news-april-2026/">DeepSeek is building V4 entirely on Huawei Ascend 950PR chips</a>, fully decoupling from Nvidia. And Broadcom jumped 6.1% on expanded TPU deals with both Google and Anthropic. In a single session, the market priced in four distinct AI chip supply chains: Nvidia/TSMC (incumbent), Google/Broadcom TPUs, Intel/Terafab (domestic US), and Huawei/Ascend (China). The telling number: Nvidia fell 1.6% while the rest of the AI chip ecosystem rallied. SEMI data confirmed the investment cycle is real &#8212; global chip equipm&#8230;</p>
      <p>
          <a href="https://www.mlwhiz.com/p/mlwhiz-weekly-aiml-newsletter-3">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[Your Ranking Model Is Right. Your Recommendations Are Wrong]]></title><description><![CDATA[RecSys Series Part 8: How diversity, freshness, and business constraints turn a ranked list into a product-ready feed]]></description><link>https://www.mlwhiz.com/p/reranking-recsys-diversity-freshness</link><guid isPermaLink="false">https://www.mlwhiz.com/p/reranking-recsys-diversity-freshness</guid><dc:creator><![CDATA[Rahul Agarwal]]></dc:creator><pubDate>Sat, 11 Apr 2026 22:10:19 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!oaLi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0a4f8c65-5956-4c62-9f94-9fd7b7ce3894_2747x2099.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yHq9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" width="1456" height="250" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:250,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77210,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p><em>Hey, Rahul here! &#128075; Each week, I publish long-form ML+AI posts covering ML, AI, and System design for MLwhiz. Paid subscribers also get how-to guides with full code walkthroughs. I publish occasional extra articles. If you&#8217;d like to become a paid subscriber, here&#8217;s a button for that:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.mlwhiz.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.mlwhiz.com/subscribe?"><span>Subscribe now</span></a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B1mx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B1mx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 424w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 848w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1272w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png" width="995" height="80" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:80,&quot;width&quot;:995,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15990,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B1mx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 424w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 848w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1272w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p>Here&#8217;s something they don&#8217;t teach you in ML courses: </p><p>A perfectly relevant recommendation list is usually a terrible one.</p><p>You spend months training a ranking model. Features, architectures, multi-task objectives &#8212; the works. Then the product team walks in: &#8220;Can you make sure we don&#8217;t show 5 horror movies in a row? And boost new releases? Oh, and reserve slot 3 for promoted content.&#8221;</p><p>Each request costs you relevance. The question isn&#8217;t <em>whether</em> to spend &#8212; it&#8217;s how much.</p><p>Think of it as a budget. <a href="https://www.mlwhiz.com/p/from-candidates-to-clicks-the-engineering">Your ranking model gives you relevance scores</a> for every item. Re-ranking is the art of spending that relevance wisely &#8212; trading some accuracy for diversity, freshness, fairness, and business value.</p><div class="pullquote"><p>A perfectly relevant recommendation list is usually a terrible one.</p></div><p>This is Part 8 of the RecSys for MLEs series. We&#8217;ve covered <a href="https://www.mlwhiz.com/p/the-recommenders-playbook-algorithms">the fundamentals</a>, <a href="https://www.mlwhiz.com/p/the-algorithmic-journey-of-recommender">the evolution from CF to deep learning</a>, <a href="https://www.mlwhiz.com/p/the-recommendation-engine-under-the">the 3-stage funnel</a> where I first introduced re-ranking as the &#8220;business layer,&#8221; <a href="https://www.mlwhiz.com/p/building-youtube-scale-recommendation">two-tower retrieval</a>, <a href="https://www.mlwhiz.com/p/vector-search-at-scale-the-missing">vector search</a>, <a href="https://www.mlwhiz.com/p/from-candidates-to-clicks-the-engineering">the ranking layer</a>, and <a href="https://www.mlwhiz.com/p/cold-start-problem-recsys-modern-approaches">the cold start problem</a>.</p><p>Today, we&#8217;re opening up that final layer. Here&#8217;s what we&#8217;ll cover:</p><ul><li><p><strong>The Set Problem</strong> &#8594; Why sorting by relevance produces bad recommendations</p></li><li><p><strong>Diversity</strong> &#8594; From dedup rules to Determinantal Point Processes (YouTube&#8217;s production system)</p></li><li><p><strong>Calibration</strong> &#8594; Matching your recommendations to the user&#8217;s taste distribution</p></li><li><p><strong>Freshness</strong> &#8594; Getting new content into the feed without wrecking relevance</p></li><li><p><strong>Business Constraints</strong> &#8594; The product rules that shape the final feed</p></li><li><p><strong>Multi-Objective Re-Ranking</strong> &#8594; Combining everything: scalarization, constraints, and 2D layouts</p></li><li><p><strong>The Practitioner&#8217;s Playbook</strong> &#8594; When to use what, and the pitfalls that trip everyone up</p></li></ul><p>Let&#8217;s dive in!</p><div><hr></div><h2>1. Why Re-Ranking Exists &#8212; The Set Problem</h2><p>Your ranking model scores items independently. Item A gets 0.92. Item B gets 0.89. Item C gets 0.87. Sort descending. Done.</p><p>Except it&#8217;s not done. Because when you look at your top-10 list, items A, B, and C are all psychological thrillers from the same director. Items D through G are also thrillers. The model did exactly what you asked &#8212; it found the most relevant items. But the resulting <em>set</em> is terrible.</p><div class="pullquote"><p>This is what I call the <strong>set problem</strong>: optimizing each item independently doesn&#8217;t optimize the set.</p></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MRsw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95d35455-cbcf-4dea-bce1-a721c95fa282_2847x623.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MRsw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95d35455-cbcf-4dea-bce1-a721c95fa282_2847x623.png 424w, https://substackcdn.com/image/fetch/$s_!MRsw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95d35455-cbcf-4dea-bce1-a721c95fa282_2847x623.png 848w, https://substackcdn.com/image/fetch/$s_!MRsw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95d35455-cbcf-4dea-bce1-a721c95fa282_2847x623.png 1272w, https://substackcdn.com/image/fetch/$s_!MRsw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95d35455-cbcf-4dea-bce1-a721c95fa282_2847x623.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MRsw!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95d35455-cbcf-4dea-bce1-a721c95fa282_2847x623.png" width="1200" height="262.9120879120879" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/95d35455-cbcf-4dea-bce1-a721c95fa282_2847x623.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:319,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;RecSys Pipeline: Retrieval &#8594; Ranking &#8594; Re-Ranking &#8594; Serving&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="RecSys Pipeline: Retrieval &#8594; Ranking &#8594; Re-Ranking &#8594; Serving" title="RecSys Pipeline: Retrieval &#8594; Ranking &#8594; Re-Ranking &#8594; Serving" srcset="https://substackcdn.com/image/fetch/$s_!MRsw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95d35455-cbcf-4dea-bce1-a721c95fa282_2847x623.png 424w, https://substackcdn.com/image/fetch/$s_!MRsw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95d35455-cbcf-4dea-bce1-a721c95fa282_2847x623.png 848w, https://substackcdn.com/image/fetch/$s_!MRsw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95d35455-cbcf-4dea-bce1-a721c95fa282_2847x623.png 1272w, https://substackcdn.com/image/fetch/$s_!MRsw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95d35455-cbcf-4dea-bce1-a721c95fa282_2847x623.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Here&#8217;s how to think about it. Ranking answers: &#8220;How relevant is this item to this user?&#8221; Re-ranking answers a harder question: &#8220;What&#8217;s the best <em>collection</em> of items to show this user?&#8221;</p><p>The input to re-ranking is typically 100-500 scored items from your ranker. The output is the final 10-50 items in their display order. And the constraints are everything your ranking model doesn&#8217;t know about: diversity requirements, content freshness, promotional obligations, fairness targets, and a dozen product-specific rules.</p><p>I remember a team meeting where someone pulled up our top-10 list for a test user: ten nearly identical sci-fi action movies. &#8220;The model is working perfectly,&#8221; someone said. Technically correct &#8212; and completely useless. The top-10 wasn&#8217;t a recommendation; it was a redundancy report.</p><p>Netflix does this at massive scale &#8212; 15,000+ shows, nearly 300 million users, and a homepage that needs to feel both personally relevant and excitingly diverse. Their page construction system doesn&#8217;t just rank shows; it considers the <em>composition</em> of each row and the relationships <em>between</em> rows.</p><p>Here&#8217;s the key mental model I want you to hold for this entire post: <strong>re-ranking is spending a relevance budget.</strong> Your ranking model gives you a relevance score for each item. That score is currency. Every diversity constraint, every freshness boost, every business rule <em>costs</em> some of that relevance. The art is deciding how much to spend on each.</p><p>Let&#8217;s look at the algorithms that make this possible.</p><div><hr></div><h2>2. Diversity &#8212; From Rules to Determinantal Point Processes</h2><p>Diversity is the most visible re-ranking objective. When a user sees 10 items from the same genre, something has clearly gone wrong. But &#8220;add diversity&#8221; is easy to say and surprisingly hard to get right. Three levels of sophistication:</p><h3>Level 1: Rule-Based Dedup</h3><p>The simplest approach is just writing rules: - &#8220;<em>No more than 2 items from the same category in the top 5</em>&#8221; - &#8220;<em>No two items from the same creator in a row</em>&#8221; - &#8220;<em>At least 1 item from &#8216;trending&#8217; in top 3</em>&#8221;</p><p>Before YouTube deployed their DPP system(we will talk about this), they used exactly these kinds of heuristics: <strong>fuzzy deduplication</strong> (removing items too similar to ones already selected) and <strong>sliding window constraints</strong> (at most n out of every m items from the same type).</p><p>Rules are fast, interpretable, and easy to debug. But they&#8217;re also brittle. They can&#8217;t capture nuanced notions of similarity &#8212; &#8220;these are both thrillers&#8221; is a rule; &#8220;these have similar emotional arcs&#8221; is not. And they compose badly: stack 5 rules on top of each other and you&#8217;ll find they frequently conflict.</p><h3>Level 2: Maximal Marginal Relevance (MMR)</h3><p><strong>MMR</strong> is the first real algorithmic approach to diversity. It was originally proposed for document retrieval, but it maps perfectly to recommendations.</p><p>The idea is beautifully simple. Instead of selecting items by relevance alone, you select greedily: at each step, pick the item that best balances relevance with <em>dissimilarity</em> to items you&#8217;ve already selected.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

def mmr_rerank(relevance_scores, item_embeddings, lambda_param=0.5, top_k=10):
    &#8220;&#8221;&#8220;
    Maximal Marginal Relevance re-ranking.

    Args:
        relevance_scores: array of shape (N,) &#8212; ranking model scores
        item_embeddings: array of shape (N, d) &#8212; item feature vectors
        lambda_param: trade-off between relevance (1.0) and diversity (0.0)
        top_k: number of items to select

    Returns:
        selected: list of indices in selection order
    &#8220;&#8221;&#8220;
    n_items = len(relevance_scores)
    sim_matrix = cosine_similarity(item_embeddings)

    selected = []
    candidates = list(range(n_items))

    for _ in range(top_k):
        best_score = -np.inf
        best_idx = None

        for idx in candidates:
            # Relevance term
            rel = relevance_scores[idx]

            # Max similarity to any already-selected item
            if selected:
                max_sim = max(sim_matrix[idx][s] for s in selected)
            else:
                max_sim = 0

            # MMR score: balance relevance vs. novelty
            score = lambda_param * rel - (1 - lambda_param) * max_sim

            if score &gt; best_score:
                best_score = score
                best_idx = idx

        selected.append(best_idx)
        candidates.remove(best_idx)

    return selected</code></pre></div><p>The <code>lambda_param</code> is your knob. At &#955;=1.0, MMR is pure relevance (no diversity). At &#955;=0.0, it&#8217;s pure diversity (ignores relevance). In practice, values between 0.5 and 0.7 work well.</p><p>MMR&#8217;s complexity is O(Nk) per selection, which is fast. But it has a fundamental limitation: it&#8217;s <strong>myopic</strong>. At each step, it only compares the candidate to items already selected. It never evaluates the global quality of the final set.</p><h3>Level 3: Determinantal Point Processes (DPP)</h3><p>This is where things get interesting.</p><p>A <strong>DPP</strong> is a probabilistic model that assigns higher probability to subsets of items that are both high-quality AND diverse. Unlike MMR&#8217;s pairwise comparisons, a DPP evaluates the <em>entire subset at once</em>.</p><p><strong>Here&#8217;s the intuition:</strong> Imagine each item as an arrow in a high-dimensional space. The arrow&#8217;s length represents quality (the ranking model&#8217;s score). The arrow&#8217;s direction represents the item&#8217;s characteristics (its <a href="https://www.mlwhiz.com/p/vector-search-at-scale-the-missing">embedding</a>). A DPP selects the set of arrows that spans the maximum volume &#8212; you want arrows that are both long (high quality) AND point in different directions (diverse).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!49pS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80ea4eda-d247-49bb-9679-ba9c9476add9_2552x693.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!49pS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80ea4eda-d247-49bb-9679-ba9c9476add9_2552x693.png 424w, https://substackcdn.com/image/fetch/$s_!49pS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80ea4eda-d247-49bb-9679-ba9c9476add9_2552x693.png 848w, https://substackcdn.com/image/fetch/$s_!49pS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80ea4eda-d247-49bb-9679-ba9c9476add9_2552x693.png 1272w, https://substackcdn.com/image/fetch/$s_!49pS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80ea4eda-d247-49bb-9679-ba9c9476add9_2552x693.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!49pS!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80ea4eda-d247-49bb-9679-ba9c9476add9_2552x693.png" width="1200" height="325.54945054945057" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/80ea4eda-d247-49bb-9679-ba9c9476add9_2552x693.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:395,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;DPP Volume Visualization: Quality &#215; Diversity&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="DPP Volume Visualization: Quality &#215; Diversity" title="DPP Volume Visualization: Quality &#215; Diversity" srcset="https://substackcdn.com/image/fetch/$s_!49pS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80ea4eda-d247-49bb-9679-ba9c9476add9_2552x693.png 424w, https://substackcdn.com/image/fetch/$s_!49pS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80ea4eda-d247-49bb-9679-ba9c9476add9_2552x693.png 848w, https://substackcdn.com/image/fetch/$s_!49pS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80ea4eda-d247-49bb-9679-ba9c9476add9_2552x693.png 1272w, https://substackcdn.com/image/fetch/$s_!49pS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80ea4eda-d247-49bb-9679-ba9c9476add9_2552x693.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Mathematically, we define a kernel matrix <strong>L</strong> where each entry captures both quality and similarity:</p><p><code>            L[i,j] = q_i &#215; q_j &#215; similarity(i,j)</code></p><p>where <code>q_i</code> is item i&#8217;s quality score (from your ranker) and <code>similarity(i,j)</code> is the cosine similarity between item embeddings. The probability of selecting a subset S is proportional to <code>det(L_S)</code> &#8212; the determinant of the submatrix formed by those items, which is exactly the volume of the parallelogram those item vectors span.</p><p>That&#8217;s abstract. Let me walk through it with three movies.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ke1R!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a9ceb93-d123-4723-8ec1-a280630196ed_1185x1381.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ke1R!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a9ceb93-d123-4723-8ec1-a280630196ed_1185x1381.png 424w, https://substackcdn.com/image/fetch/$s_!ke1R!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a9ceb93-d123-4723-8ec1-a280630196ed_1185x1381.png 848w, https://substackcdn.com/image/fetch/$s_!ke1R!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a9ceb93-d123-4723-8ec1-a280630196ed_1185x1381.png 1272w, https://substackcdn.com/image/fetch/$s_!ke1R!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a9ceb93-d123-4723-8ec1-a280630196ed_1185x1381.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ke1R!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a9ceb93-d123-4723-8ec1-a280630196ed_1185x1381.png" width="1200" height="1398.4810126582279" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a9ceb93-d123-4723-8ec1-a280630196ed_1185x1381.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:1381,&quot;width&quot;:1185,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;DPP Kernel in Action: Three Movies, Pick Two&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="DPP Kernel in Action: Three Movies, Pick Two" title="DPP Kernel in Action: Three Movies, Pick Two" srcset="https://substackcdn.com/image/fetch/$s_!ke1R!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a9ceb93-d123-4723-8ec1-a280630196ed_1185x1381.png 424w, https://substackcdn.com/image/fetch/$s_!ke1R!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a9ceb93-d123-4723-8ec1-a280630196ed_1185x1381.png 848w, https://substackcdn.com/image/fetch/$s_!ke1R!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a9ceb93-d123-4723-8ec1-a280630196ed_1185x1381.png 1272w, https://substackcdn.com/image/fetch/$s_!ke1R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a9ceb93-d123-4723-8ec1-a280630196ed_1185x1381.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div>
      <p>
          <a href="https://www.mlwhiz.com/p/reranking-recsys-diversity-freshness">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[3 Modern Approaches to Solving Cold Start in RecSys]]></title><description><![CDATA[Contextual bandits, meta-learning, and LLMs &#8212; how Spotify, TikTok, and YouTube handle new users and items. The practitioner's guide to cold start.]]></description><link>https://www.mlwhiz.com/p/cold-start-problem-recsys-modern-approaches</link><guid isPermaLink="false">https://www.mlwhiz.com/p/cold-start-problem-recsys-modern-approaches</guid><dc:creator><![CDATA[Rahul Agarwal]]></dc:creator><pubDate>Wed, 25 Mar 2026 03:14:12 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/0e54ac48-da66-4c0e-8cb5-3097859655c0_1808x1379.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yHq9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png" width="1456" height="250" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:250,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:77210,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yHq9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 424w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 848w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1272w, https://substackcdn.com/image/fetch/$s_!yHq9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21050ae0-d6b0-4e64-9cdb-c017d983bf85_1501x258.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p><em>Hey, Rahul here! &#128075; Each week, I publish long-form ML+AI posts covering ML, AI, and System design for MLwhiz. Paid subscribers also get how-to guides with full code walkthroughs. I publish occasional extra articles. If you&#8217;d like to become a paid subscriber, here&#8217;s a button for that:</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.mlwhiz.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.mlwhiz.com/subscribe?"><span>Subscribe now</span></a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B1mx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B1mx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 424w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 848w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1272w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png" width="995" height="80" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:80,&quot;width&quot;:995,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15990,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B1mx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 424w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 848w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1272w, https://substackcdn.com/image/fetch/$s_!B1mx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f624f4d-a6b2-4226-808a-e860524c63c7_995x80.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Bz_I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44523b7f-79aa-4e11-96c6-c4e9ce4fd331_1808x1379.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bz_I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44523b7f-79aa-4e11-96c6-c4e9ce4fd331_1808x1379.png 424w, https://substackcdn.com/image/fetch/$s_!Bz_I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44523b7f-79aa-4e11-96c6-c4e9ce4fd331_1808x1379.png 848w, https://substackcdn.com/image/fetch/$s_!Bz_I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44523b7f-79aa-4e11-96c6-c4e9ce4fd331_1808x1379.png 1272w, https://substackcdn.com/image/fetch/$s_!Bz_I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44523b7f-79aa-4e11-96c6-c4e9ce4fd331_1808x1379.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bz_I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44523b7f-79aa-4e11-96c6-c4e9ce4fd331_1808x1379.png" width="1456" height="1111" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/44523b7f-79aa-4e11-96c6-c4e9ce4fd331_1808x1379.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1111,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;The Cold Start Problem: three types of cold start (New User, New Item, New System) mapped to three modern solutions (Contextual Bandits, Meta-Learning, LLMs) leading to personalized recommendations&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The Cold Start Problem: three types of cold start (New User, New Item, New System) mapped to three modern solutions (Contextual Bandits, Meta-Learning, LLMs) leading to personalized recommendations" title="The Cold Start Problem: three types of cold start (New User, New Item, New System) mapped to three modern solutions (Contextual Bandits, Meta-Learning, LLMs) leading to personalized recommendations" srcset="https://substackcdn.com/image/fetch/$s_!Bz_I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44523b7f-79aa-4e11-96c6-c4e9ce4fd331_1808x1379.png 424w, https://substackcdn.com/image/fetch/$s_!Bz_I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44523b7f-79aa-4e11-96c6-c4e9ce4fd331_1808x1379.png 848w, https://substackcdn.com/image/fetch/$s_!Bz_I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44523b7f-79aa-4e11-96c6-c4e9ce4fd331_1808x1379.png 1272w, https://substackcdn.com/image/fetch/$s_!Bz_I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F44523b7f-79aa-4e11-96c6-c4e9ce4fd331_1808x1379.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A user signs up for your streaming platform. They&#8217;ve never watched anything. They&#8217;ve never rated anything. They&#8217;ve never even scrolled. And your recommendation engine &#8212; the same engine that serves 200 million personalized feeds per day &#8212; stares at this blank profile and essentially says: &#8220;I have no idea who you are.&#8221;</p><p>This is the <strong>Cold Start Problem</strong>, and I&#8217;ve been fighting it for the better part of four years &#8212; at Meta, where new creators needed to find their audience from day one, and in the streaming world, where every new user expects a personalized experience the moment they log in. It&#8217;s the problem that&#8217;s been discussed on HackerNews since 2010, has a 400-page book written about it (Andrew Chen&#8217;s <em>The Cold Start Problem</em>), and STILL doesn&#8217;t have a clean answer.</p><p>This is the next installment in <a href="https://www.mlwhiz.com/t/recsys">my RecSys series</a>. We&#8217;ve covered the <a href="https://www.mlwhiz.com/p/the-algorithmic-journey-of-recommender">algorithmic evolution</a> of recommendation systems, <a href="https://www.mlwhiz.com/p/building-youtube-scale-recommendation">built two-tower retrieval from scratch</a>, <a href="https://www.mlwhiz.com/p/from-candidates-to-clicks-the-engineering">dissected the ranking laye</a>r &#8212; all assuming we have user data to work with. Today we drop that assumption entirely.</p><p>Here&#8217;s what we&#8217;ll cover:</p><ul><li><p><strong>The 3 types of cold start</strong> &#8212; they&#8217;re different problems with different solutions</p></li><li><p><strong>Classical approaches</strong> &#8212; the baselines everyone ships first, and where they hit a ceiling</p></li><li><p><strong>3 modern frontiers</strong>: contextual bandits, meta-learning (MAML, prototypical networks, CMML), and LLMs (feature extraction, reasoning, data generation)</p></li><li><p><strong>How Spotify, TikTok, and YouTube actually solve this</strong> in production &#8212; with specific engineering details</p></li><li><p><strong>A decision framework</strong> &#8212; so you know which approach fits your system, your data, and your budget</p></li></ul><p>This is meant to be the definitive practitioner&#8217;s guide. Let&#8217;s dive in!</p><div><hr></div><h2>The Three Faces of Cold Start</h2><p>Before we jump into solutions, let&#8217;s be precise about what we&#8217;re solving. &#8220;Cold start&#8221; isn&#8217;t one problem &#8212; it&#8217;s three distinct problems, and confusing them is one of the most common mistakes I see engineers make.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TRap!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b7ff65-de91-4e1a-909a-9d582cd457cb_1478x1885.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TRap!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b7ff65-de91-4e1a-909a-9d582cd457cb_1478x1885.png 424w, https://substackcdn.com/image/fetch/$s_!TRap!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b7ff65-de91-4e1a-909a-9d582cd457cb_1478x1885.png 848w, https://substackcdn.com/image/fetch/$s_!TRap!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b7ff65-de91-4e1a-909a-9d582cd457cb_1478x1885.png 1272w, https://substackcdn.com/image/fetch/$s_!TRap!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b7ff65-de91-4e1a-909a-9d582cd457cb_1478x1885.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TRap!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b7ff65-de91-4e1a-909a-9d582cd457cb_1478x1885.png" width="560" height="714.2307692307693" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/60b7ff65-de91-4e1a-909a-9d582cd457cb_1478x1885.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1857,&quot;width&quot;:1456,&quot;resizeWidth&quot;:560,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;The three types of cold start: New User, New Item, and New System &#8212; with examples and severity ratings&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The three types of cold start: New User, New Item, and New System &#8212; with examples and severity ratings" title="The three types of cold start: New User, New Item, and New System &#8212; with examples and severity ratings" srcset="https://substackcdn.com/image/fetch/$s_!TRap!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b7ff65-de91-4e1a-909a-9d582cd457cb_1478x1885.png 424w, https://substackcdn.com/image/fetch/$s_!TRap!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b7ff65-de91-4e1a-909a-9d582cd457cb_1478x1885.png 848w, https://substackcdn.com/image/fetch/$s_!TRap!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b7ff65-de91-4e1a-909a-9d582cd457cb_1478x1885.png 1272w, https://substackcdn.com/image/fetch/$s_!TRap!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b7ff65-de91-4e1a-909a-9d582cd457cb_1478x1885.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>New User Cold Start</h3><p>A user signs up. Zero watch history. Zero ratings. Zero clicks. Your collaborative filtering model &#8212; the one that works beautifully for your 50 million existing users &#8212; is completely blind. It relies on the user-item interaction matrix <strong>R</strong> where entry <strong>R(u, i)</strong> represents user <strong>u</strong>&#8216;s interaction with item <strong>i</strong>. For a new user <strong>u_new</strong>, the entire row <strong>R(u_new, :)</strong> is empty &#8212; a zero vector.</p><p>This means every technique that depends on finding similar users (nearest-neighbor CF), decomposing the interaction matrix (matrix factorization), or learning user embeddings from behavior (deep learning approaches) &#8212; the entire <a href="https://www.mlwhiz.com/p/the-algorithmic-journey-of-recommender">algorithmic evolution</a> we covered in Part 2 of this series &#8212; has literally nothing to work with. The new user is a point with no coordinates in preference space.</p><p>Mathematically, collaborative filtering predicts a rating as:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">r&#770;(u, i) = &#956; + q_i^T &#183; p_u
</code></pre></div><p>where <strong>p_u</strong> is the user&#8217;s latent factor vector and <strong>q_i</strong> is the item&#8217;s latent factor vector (<a href="https://ieeexplore.ieee.org/document/5197422">Koren et al., 2009, </a><em><a href="https://ieeexplore.ieee.org/document/5197422">&#8220;Matrix Factorization Techniques for Recommender Systems&#8221;</a></em><a href="https://ieeexplore.ieee.org/document/5197422">, IEEE Computer</a>). For a new user, <strong>p_u</strong> is undefined &#8212; there are no interactions to learn it from. You can initialize it randomly, but then your predictions are random noise.</p><h3>New Item Cold Start</h3><p>A video gets uploaded. A product gets listed. A song gets released. No one has interacted with it yet. Even if your model is phenomenal at scoring items with behavioral data, this item has zero behavioral signal &#8212; no clicks, no watches, no purchases. Its column <strong>R(:, i_new)</strong> in the interaction matrix is all zeros.</p><p>This creates a vicious cycle that I&#8217;ve seen destroy content platforms: the item is invisible to your <a href="https://www.mlwhiz.com/p/the-recommendation-engine-under-the">retrieval-ranking pipeline</a> because it has no engagement data. Because it&#8217;s invisible, it gets no exposure. Because it gets no exposure, it accumulates no engagement data. The item is trapped in a black hole of non-existence.</p><p>This isn&#8217;t an abstract concern &#8212; it directly affects creator retention on any content platform. If a creator uploads a show and it gets zero impressions for a week, that creator doesn&#8217;t come back. And losing the creator means losing not just that show, but everything they would have made in the future.</p><h3>New System Cold Start</h3><p>You&#8217;re launching a recommendation system from scratch. No users with behavioral data, no items with engagement history, no interaction matrix at all. <strong>R</strong> is entirely empty. This is the rarest variant, but it&#8217;s also the one that every startup and every new product line faces.</p><p>Here&#8217;s the uncomfortable truth that most blog posts skip: in production, the <strong>new item</strong> problem is often harder and more damaging than the new user problem. New users at least have <em>some</em> context you can exploit (device, location, time). New items have nothing but their own metadata. <em><strong>And the business cost of item-side cold start &#8212; creator churn, catalog invisibility, content deserts &#8212; compounds far faster than user-side cold start.</strong></em></p><p>There&#8217;s also a regime between cold and warm that&#8217;s arguably even more important in practice: <strong>warm start</strong> &#8212; when you have 1-5 interactions. Not zero, but not enough for your models to be confident. This is where your system spends most of its time for the long tail of users and items, and it&#8217;s where the modern approaches we&#8217;ll cover really shine.</p><div><hr></div><h2>Classical Solutions (And Why They&#8217;re Not Enough)</h2><p>Every recommendation system starts here. These are the baseline approaches &#8212; they work, they ship fast, and they&#8217;re better than showing nothing. But they all hit a ceiling, and understanding exactly <em>where</em> that ceiling is tells you when to invest in the modern approaches.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qsKy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f226471-e695-45bb-bae6-f1b4d61c5648_1806x2294.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qsKy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f226471-e695-45bb-bae6-f1b4d61c5648_1806x2294.png 424w, https://substackcdn.com/image/fetch/$s_!qsKy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f226471-e695-45bb-bae6-f1b4d61c5648_1806x2294.png 848w, https://substackcdn.com/image/fetch/$s_!qsKy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f226471-e695-45bb-bae6-f1b4d61c5648_1806x2294.png 1272w, https://substackcdn.com/image/fetch/$s_!qsKy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f226471-e695-45bb-bae6-f1b4d61c5648_1806x2294.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qsKy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f226471-e695-45bb-bae6-f1b4d61c5648_1806x2294.png" width="728" height="924.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6f226471-e695-45bb-bae6-f1b4d61c5648_1806x2294.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:1849,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Classical cold-start approaches decision tree: choose based on scenario type, available demographics, item metadata, and friction tolerance&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-normal" alt="Classical cold-start approaches decision tree: choose based on scenario type, available demographics, item metadata, and friction tolerance" title="Classical cold-start approaches decision tree: choose based on scenario type, available demographics, item metadata, and friction tolerance" srcset="https://substackcdn.com/image/fetch/$s_!qsKy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f226471-e695-45bb-bae6-f1b4d61c5648_1806x2294.png 424w, https://substackcdn.com/image/fetch/$s_!qsKy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f226471-e695-45bb-bae6-f1b4d61c5648_1806x2294.png 848w, https://substackcdn.com/image/fetch/$s_!qsKy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f226471-e695-45bb-bae6-f1b4d61c5648_1806x2294.png 1272w, https://substackcdn.com/image/fetch/$s_!qsKy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6f226471-e695-45bb-bae6-f1b4d61c5648_1806x2294.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Popularity-Based Ranking</h3><p><strong>The simplest possible move:</strong> show new users whatever is trending right now. It&#8217;s the &#8220;most popular dish&#8221; approach &#8212; safe, zero personalization required, trivial to implement. You&#8217;re essentially replacing the personalized score <strong>r&#770;(u, i)</strong> with the global popularity score:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">score(i) = &#931;_u R(u, i)  or  score(i) = count(clicks on i in last 24h)
</code></pre></div><p>The obvious problem: you&#8217;ll never discover that this specific user hates action movies and loves documentaries. Everyone gets the same feed, and you learn nothing about individual preferences. It also creates a rich-get-richer feedback loop &#8212; popular items get shown more, get more clicks, become more popular. This is the Matthew Effect in recommendation systems(rich get richer), and it&#8217;s brutal for new content.</p><p>That said, popularity-based ranking has one underappreciated strength: it surfaces items that are <em>currently relevant</em>. A user might not know they want to watch the Oscar-nominated film that just released, but a time-decayed popularity score will surface it naturally.</p><h3>Content-Based Fallback</h3><p>Instead of using behavioral signals (which don&#8217;t exist for cold-start entities), use the item&#8217;s features directly. A movie&#8217;s genre, director, cast, plot keywords, runtime, year, language &#8212; these are all available from day one, before anyone watches it.</p><p>The basic approach: represent each item as a feature vector <strong>f_i</strong> (using TF-IDF, one-hot encoding, or pretrained embeddings), represent the user as a weighted average of the features of items they&#8217;ve interacted with:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">p_u = (1/|I_u|) &#931;_{i &#8712; I_u} w_i &#183; f_i

where I_u are the items user interacted with
w_i is weight for item (can be 1 for basic cases, or can be a time decay)
</code></pre></div><p>Then score new items by cosine similarity:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">score(u, i) = cos(p_u, f_i) = (p_u &#183; f_i) / (||p_u|| &#183; ||f_i||)
</code></pre></div><p>This is exactly how <a href="https://www.mlwhiz.com/p/the-recommenders-playbook-algorithms">content-based and collaborative filtering</a> differ at their core. Content-based doesn&#8217;t need the interaction matrix &#8212; it needs item features and at least <em>some</em> user signal.</p><p>The catch: this works far better for new items than for new users. A new item has features you can compute similarity against. A new user with zero interactions has no profile vector <strong>p_u</strong> at all &#8212; you can&#8217;t compute an average of an empty set. You&#8217;d need at least one click to get started.</p><h3>Demographic Heuristics</h3><p>Use whatever you get at signup: geo-location, device type, language, age bracket, operating system. A user signing up from Tokyo on an iPhone at 11 PM likely has different preferences than someone from Texas on a smart TV at 2 PM on a Saturday.</p><p>Formally, you&#8217;re replacing the missing behavioral profile with a demographic profile:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">p_u = g(demographics_u)
</code></pre></div><p>where <strong>g</strong> is a learned function (often a simple lookup table or a shallow neural network) that maps demographic features to the same embedding space as your warm users. You train <strong>g</strong> on your existing warm users &#8212; learning, for example, that users aged 25-34 in urban Japan tend to prefer anime and J-drama.</p><p>The obvious limitation: demographics are a coarse proxy for user preferences. Not every 30-year-old in Tokyo likes the same content. You&#8217;re fighting the <strong>stereotype problem</strong> &#8212; making assumptions about individuals based on group statistics. But in the zero-data regime, coarse is better than random.</p><h3>Onboarding Surveys</h3><p>&#8220;Choose 3 genres you like.&#8221; &#8220;Rate these 5 movies.&#8221; &#8220;Pick your favorite artists.&#8221; Direct, explicit preference signal that bypasses the cold-start chicken-and-egg entirely.</p><p>The catch? Every additional question increases signup friction and hurts conversion. Research from the <a href="https://baymard.com/blog/checkout-flow-average-form-fields">Baymard Institute</a> shows that each additional step beyond 3-4 in a signup flow increases abandonment significantly &#8212; and streaming onboarding is no exception. And users lie &#8212; or more precisely, they pick aspirationally rather than truthfully. (A user saying &#8221;Yes, I definitely want to watch cerebral documentaries about climate change&#8221; might proceed to binge <em>Love Island</em> for 6 hours.)</p><p>There&#8217;s a rich literature on <em>optimal</em> onboarding question selection. <a href="https://dl.acm.org/doi/10.1145/1935826.1935910">Golbandi et al. (2011, </a><em><a href="https://dl.acm.org/doi/10.1145/1935826.1935910">&#8220;Adaptive Bootstrapping of Recommender Systems Using Decision Trees&#8221;</a></em><a href="https://dl.acm.org/doi/10.1145/1935826.1935910">, WSDM)</a> showed you can use decision trees to pick the maximally informative items to show in an onboarding survey &#8212; items where the user&#8217;s response tells you the most about their latent preferences.</p><h3>Hybrid Switching</h3><p>The textbook answer: start with content-based or popularity-based recommendations, gradually switch to collaborative filtering as behavioral data accumulates. Formally:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">r&#770;(u, i) = &#945;(u) &#183; r&#770;_PB(u, i) + (1 - &#945;(u)) &#183; r&#770;_CF(u, i)
</code></pre></div><p>where <strong>&#945;(u)</strong> is a function of how much data you have for user <strong>u</strong> &#8212; close to 1 for cold users (trust popularity-based), close to 0 for warm users (trust collaborative filtering).</p><p>Sounds clean in a blog post but it could be incredibly messy in production. Here&#8217;s why:</p><ul><li><p><strong>The blending weight &#945;(u) needs a functional form.</strong> Is it a step function (hard cutover at 10 interactions)? Sigmoid? Linear ramp? Each choice creates different user experiences, and there&#8217;s no universal right answer &#8212; you have to tune it per domain.</p></li><li><p><strong>Score calibration is a nightmare.</strong> Content-based or popularity scores and CF scores live on completely different scales and distributions. Naively adding them produces garbage &#8212; you need score normalization (min-max? z-score? rank-based?) that itself requires careful calibration.</p></li><li><p><strong>The transition can jar users.</strong> A user at &#945;=0.6 today and &#945;=0.3 tomorrow might see a completely different feed. Without smoothing, users experience sudden recommendation &#8220;personality shifts&#8221; that erode trust.</p></li><li><p><strong>You&#8217;re running two serving pipelines.</strong> Two models to train, two feature stores to maintain, two sets of latency budgets. The operational complexity of production recommendation systems is one of the most underappreciated challenges &#8212; and hybrid switching doubles it.</p></li></ul><h3>The Ceiling</h3><p>All of these classical approaches are basically guessing. Educated guessing, sure &#8212; but still guessing. They don&#8217;t actively <em>try</em> to learn about the user. They wait passively for data to trickle in and hope the user sticks around long enough. That&#8217;s fine for day one, maybe week one. But if your cold-start strategy is still &#8220;show popular stuff and pray&#8221; after a month, you&#8217;re leaving massive value on the table.</p><p>Here&#8217;s a simple Python sketch of the popularity + content-based fallback:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:null}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

def cold_start_recommend(user, items, item_features, popular_items, n=10):
    &#8220;&#8221;&#8220;Simple cold-start fallback: content-based if we have ANY
    signal, popularity otherwise.&#8221;&#8220;&#8221;

    if user.interactions:  # even 1 click gives us something
        # Build user profile from interacted item features
        profile = np.mean(
            [item_features[i] for i in user.interactions], axis=0
        )
        scores = cosine_similarity([profile], item_features)[0]
        top_items = np.argsort(scores)[::-1][:n]
        return [items[i] for i in top_items]

    # Zero interactions &#8594; fall back to popularity
    return popular_items[:n]
</code></pre></div><p>This is fine for day one. But the three modern approaches that follow are what separate a good recommendation system from a great one.</p><blockquote><p><em>The rest of this post covers the three modern frontiers &#8212; contextual bandits, meta-learning, and LLMs &#8212; plus how Spotify, TikTok, and YouTube solve cold start in production, and a decision framework for choosing the right approach. Subscribe to continue reading.</em></p></blockquote>
      <p>
          <a href="https://www.mlwhiz.com/p/cold-start-problem-recsys-modern-approaches">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[MLWhiz Weekly AI/ML Newsletter # 2]]></title><description><![CDATA[Here is what happened this week.]]></description><link>https://www.mlwhiz.com/p/mlwhiz-weekly-aiml-newsletter-2</link><guid isPermaLink="false">https://www.mlwhiz.com/p/mlwhiz-weekly-aiml-newsletter-2</guid><dc:creator><![CDATA[Rahul Agarwal]]></dc:creator><pubDate>Mon, 23 Mar 2026 23:44:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Kda4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faab0015c-bd3a-4889-87dc-75a55ec262e4_1410x804.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Kda4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faab0015c-bd3a-4889-87dc-75a55ec262e4_1410x804.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Kda4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faab0015c-bd3a-4889-87dc-75a55ec262e4_1410x804.png 424w, https://substackcdn.com/image/fetch/$s_!Kda4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faab0015c-bd3a-4889-87dc-75a55ec262e4_1410x804.png 848w, https://substackcdn.com/image/fetch/$s_!Kda4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faab0015c-bd3a-4889-87dc-75a55ec262e4_1410x804.png 1272w, https://substackcdn.com/image/fetch/$s_!Kda4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faab0015c-bd3a-4889-87dc-75a55ec262e4_1410x804.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Kda4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faab0015c-bd3a-4889-87dc-75a55ec262e4_1410x804.png" width="1410" height="804" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aab0015c-bd3a-4889-87dc-75a55ec262e4_1410x804.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:804,&quot;width&quot;:1410,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:72777,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.mlwhiz.com/i/191924238?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faab0015c-bd3a-4889-87dc-75a55ec262e4_1410x804.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Kda4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faab0015c-bd3a-4889-87dc-75a55ec262e4_1410x804.png 424w, https://substackcdn.com/image/fetch/$s_!Kda4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faab0015c-bd3a-4889-87dc-75a55ec262e4_1410x804.png 848w, https://substackcdn.com/image/fetch/$s_!Kda4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faab0015c-bd3a-4889-87dc-75a55ec262e4_1410x804.png 1272w, https://substackcdn.com/image/fetch/$s_!Kda4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faab0015c-bd3a-4889-87dc-75a55ec262e4_1410x804.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>&#127942; Story of the Week: The Agent Platform War Has Officially Started</h2><p>For years, the AI industry has been in a model quality race. GPT vs Claude vs Gemini, benchmark after benchmark, parameter count after parameter count. This week, that race ended &#8212; and a new one began.</p><p>On Sunday, OpenAI&#8217;s CEO of Applications Fidji Simo announced what insiders are calling &#8220;Code Red&#8221;: the consolidation of ChatGPT, the Codex coding platform, and the Atlas browser into a single desktop superapp built around agentic task handling. The catalyst? Internal data showing <a href="https://www.cnbc.com/2026/03/19/openai-desktop-super-app-chatgpt-browser-codex.html">Anthropic&#8217;s enterprise market share climbing to 40%</a> while OpenAI&#8217;s fell to roughly 27%. Simo told employees they could no longer afford &#8220;side quests&#8221; &#8212; a direct shot at Sora, which briefly hit #1 in the App Store before usage flatlined.</p><p>But this isn&#8217;t just an OpenAI crisis story &#8212; it&#8217;s an industry-wide convergence. Within the same week, Meta shipped <a href="https://www.cnbc.com/2026/03/18/metas-manus-launches-desktop-app-to-bring-its-ai-agent-onto-personal-devices.html">&#8220;My Computer&#8221;</a>, a desktop agent from its $2B Manus acquisition, already integrated into Meta Ads Man&#8230;</p>
      <p>
          <a href="https://www.mlwhiz.com/p/mlwhiz-weekly-aiml-newsletter-2">
              Read more
          </a>
      </p>
   ]]></content:encoded></item></channel></rss>