<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Polymath707]]></title><description><![CDATA[I write about AI model & chip architectures]]></description><link>https://polymath707.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!zaIx!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fpolymath707.substack.com%2Fimg%2Fsubstack.png</url><title>Polymath707</title><link>https://polymath707.substack.com</link></image><generator>Substack</generator><lastBuildDate>Mon, 01 Jun 2026 03:03:44 GMT</lastBuildDate><atom:link href="https://polymath707.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Polymath707]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[polymath707@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[polymath707@substack.com]]></itunes:email><itunes:name><![CDATA[Polymath707]]></itunes:name></itunes:owner><itunes:author><![CDATA[Polymath707]]></itunes:author><googleplay:owner><![CDATA[polymath707@substack.com]]></googleplay:owner><googleplay:email><![CDATA[polymath707@substack.com]]></googleplay:email><googleplay:author><![CDATA[Polymath707]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Getting closer to 'Her': From Moshi to Thinking Machines Interaction Model]]></title><description><![CDATA[a story of how an open source lab from France paved the way to this beautiful experience and Thinking Machines team took it to the next level. Also, how this give Cerebras a strong product market fit.]]></description><link>https://polymath707.substack.com/p/getting-closer-to-her-from-moshi</link><guid isPermaLink="false">https://polymath707.substack.com/p/getting-closer-to-her-from-moshi</guid><dc:creator><![CDATA[Polymath707]]></dc:creator><pubDate>Tue, 12 May 2026 17:27:39 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Vi1J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4677e79-659b-447a-bbf0-bfb780c8011b_700x933.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Vi1J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4677e79-659b-447a-bbf0-bfb780c8011b_700x933.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Vi1J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4677e79-659b-447a-bbf0-bfb780c8011b_700x933.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Vi1J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4677e79-659b-447a-bbf0-bfb780c8011b_700x933.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Vi1J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4677e79-659b-447a-bbf0-bfb780c8011b_700x933.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Vi1J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4677e79-659b-447a-bbf0-bfb780c8011b_700x933.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Vi1J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4677e79-659b-447a-bbf0-bfb780c8011b_700x933.jpeg" width="700" height="933" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b4677e79-659b-447a-bbf0-bfb780c8011b_700x933.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:933,&quot;width&quot;:700,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Her\&quot; minimalist movie poster :: Behance&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Her&quot; minimalist movie poster :: Behance" title="Her&quot; minimalist movie poster :: Behance" srcset="https://substackcdn.com/image/fetch/$s_!Vi1J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4677e79-659b-447a-bbf0-bfb780c8011b_700x933.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Vi1J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4677e79-659b-447a-bbf0-bfb780c8011b_700x933.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Vi1J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4677e79-659b-447a-bbf0-bfb780c8011b_700x933.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Vi1J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb4677e79-659b-447a-bbf0-bfb780c8011b_700x933.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>The new shape of intelligent experience</h3><p>For last few years, every product team building on top of language models for voice experience has been forced to pick a corner of an awkward triangle: <strong>you could have </strong><em><strong>fast</strong></em><strong>, you could have </strong><em><strong>smart</strong></em><strong> or you could have </strong><em><strong>cheap; but not all three at once</strong></em><strong>.</strong>  Real-time voice assistants were small and dumb because anything bigger could not respond inside a human&#8217;s patience window. Frontier reasoning models were astonishing in benchmark tables, but their real time voice versions felt lacking. If you found something that was intelligent and fast, it was also very expensive. <strong>In addition to experimenting with natively multimodal models,</strong> <strong>developers also tried cascading state of the art models together: ASR, LLM, TTS with each layer adding latency</strong>, each handoff losing context, each silence between turns reminding you that you were talking to a machine.</p><p><strong>The interaction-model  and background-model split,</strong> as proposed by Thinking Machines (TML),  is the first credible answer to that triangle. <strong>By assigning the two halves of intelligence to different hardwares on different latency budgets, the architecture sidesteps the trade-off rather than negotiating with it.</strong> The half that has to be <em>present</em> is allowed to be small, fast, multimodal, and continuously attentive &#8212; five times a second, no exceptions. The half that has to be <em>smart</em> is free to be enormous, slow, deliberative, and tool-using, because it is never on the critical path of the 200ms cadence. Neither side is asked to compromise on what it is good at. The user experiences both, fused.</p><p>What does that unlock concretely? Consider the experiences that have been technically promised for a decade and have never quite worked:</p><ol><li><p><strong>A language tutor</strong> who can listen to you stumble through Spanish, gently correct your pronunciation mid-sentence, <em>and</em> simultaneously plan a six-month curriculum tailored to where you actually struggle. The interaction model handles the listening and the correcting in real time. The background model is somewhere else, watching your error patterns accumulate, deciding when to introduce subjunctive mood, drafting tomorrow&#8217;s lesson while you finish today&#8217;s.</p></li><li><p><strong>A cooking companion</strong> that watches your hands through your phone camera, calls out that the onions are about to burn, and pulls up a substitute when you realise you&#8217;re out of cumin &#8212; without any of those interactions feeling like separate apps. The visual perception is handled in the 200ms loop. The recipe knowledge, the substitution reasoning, the meal-planning context lives in the background.</p></li><li><p><strong>A therapy or coaching presence</strong> that can simply <em>be there</em> in long sessions, patient, responsive, never lagging; while the background model does the genuine work of remembering what was said three weeks ago, noticing a pattern across months, deciding when a gentle reframe is appropriate.</p></li><li><p><strong>A workplace assistant that is actually in the meeting with you,</strong> not transcribing it for later. It can interject with the relevant figure when someone mis-cites a number, surface the Slack thread that supersedes a decision being re-debated, and do all of that without the awkward &#8220;OK assistant, are you there?&#8221; preamble that makes today&#8217;s tools feel like talking to a Roomba.</p></li></ol><p>The pattern is the same in every case: <strong>the experience becomes natural when </strong><em><strong>being there</strong></em><strong> and </strong><em><strong>being smart</strong></em><strong> stop competing for the same compute budget.</strong> Once you stop forcing one model to do both, the entire UX problem dissolves. The model that&#8217;s there is allowed to be specialised for presence. The model that&#8217;s smart is allowed to be specialised for thought. <strong>They cooperate over a thin interface, and the user perceives a single attentive intelligence that, finally, does not punish them for talking the way humans talk.</strong></p><p>This is what people mean when they say voice is the next interface. It was never really about voice. It was about whether intelligence could show up in the room without making you wait for it. The split architecture is the first proof that it can. </p><p>Finally we can have an ever present intelligent, responsive compassionate assistant like that showed in 2013 Science Fiction romance movie &#8220;<a href="https://www.imdb.com/title/tt1798709/">Her</a>&#8221;. This blog will discuss key technological milestones that led us to this point.</p><h3>Welcome Thinking Machine&#8217;s Interaction Model</h3><p>I have been mesmerised by the demos coming out of <a href="https://x.com/thinkymachines/status/2053938892152435174?s=20">Thinking Machines</a> interaction model. It is the same kind of mesmerisation I felt the first time I saw <a href="https://kyutai.org/blog/2024-07-03-meet-moshi">Kyutai&#8217;s Moshi</a>, and then again when <a href="https://www.sesame.com/">Sesame</a> dropped (try the preview on their site &#8212; it is worth the click). We are getting closer to &#8220;Her&#8221;. Top tier intelligence, with high interactivity and compassion.</p><p>What is striking is that both Sesame and Thinking Machines&#8217; new interaction model lean heavily on ideas pioneered by Kyutai&#8217;s Moshi. If you want to understand how Thinking Machines manages to combine frontier-grade intelligence with sub-second responsiveness, the most useful prerequisite is <a href="https://arxiv.org/pdf/2410.00037">Moshi</a> and its retrieval-augmented successor <a href="https://arxiv.org/pdf/2604.12928">MoshiRAG</a>. </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://polymath707.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><strong>The Thinking Machines (also referred to as TML) <a href="https://thinkingmachines.ai/blog/interaction-models/">blog post</a> is however sparse on how their interaction model interacts with their background model that does the asynchronous heavy lifting, and that is the gap this blog tries to fill.</strong> </p><p><strong>I also call out differences between TML&#8217;s approach and that of Moshi&#8217;s on input side.</strong> </p><p>I also want to take a moment to thank Kyutai lab who build and open sourced Moshi and also to thank Thinking Machines team for taking the torch further and bringing us closer to &#8220;Her&#8221;. As discussed, the kind of applications it is going to enable are going to be profound. This one is one cool example: </p><div id="youtube2-n2GXGjy41HQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;n2GXGjy41HQ&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/n2GXGjy41HQ?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2>Major milestones on the way</h2><p>Most language models still treat conversation as a turn-based protocol: you finish, then I speak, then I finish, then you speak. That works for chat boxes. It falls apart the moment you want a model to translate live, count your push-ups, or stop you mid-sentence when you have made a factual error. Real conversation is full-duplex, because both parties produce and perceive at the same time, and silence and overlap carry as much information as the words.</p><p>Three lineages have been pushing at this problem from different angles. Moshi rebuilt the speech LM stack around a dual-channel architecture. Sesame scaled it to a level where it started feeling magical. MoshiRAG bolted on asynchronous retrieval to fix factuality without sacrificing real-time flow. And Thinking Machines&#8217; <em>interaction model</em> takes both ideas and scales them to a 276B MoE backbone, to video, and to a deliberately tiny 200ms micro-turn cadence.</p><h3>1. Moshi: the dual-channel foundation</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3jj6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c4ca957-1383-4fd5-a170-4a0265f37df7_3484x1546.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3jj6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c4ca957-1383-4fd5-a170-4a0265f37df7_3484x1546.png 424w, https://substackcdn.com/image/fetch/$s_!3jj6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c4ca957-1383-4fd5-a170-4a0265f37df7_3484x1546.png 848w, https://substackcdn.com/image/fetch/$s_!3jj6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c4ca957-1383-4fd5-a170-4a0265f37df7_3484x1546.png 1272w, https://substackcdn.com/image/fetch/$s_!3jj6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c4ca957-1383-4fd5-a170-4a0265f37df7_3484x1546.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3jj6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c4ca957-1383-4fd5-a170-4a0265f37df7_3484x1546.png" width="1456" height="646" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5c4ca957-1383-4fd5-a170-4a0265f37df7_3484x1546.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:646,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Refer to caption&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Refer to caption" title="Refer to caption" srcset="https://substackcdn.com/image/fetch/$s_!3jj6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c4ca957-1383-4fd5-a170-4a0265f37df7_3484x1546.png 424w, https://substackcdn.com/image/fetch/$s_!3jj6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c4ca957-1383-4fd5-a170-4a0265f37df7_3484x1546.png 848w, https://substackcdn.com/image/fetch/$s_!3jj6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c4ca957-1383-4fd5-a170-4a0265f37df7_3484x1546.png 1272w, https://substackcdn.com/image/fetch/$s_!3jj6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c4ca957-1383-4fd5-a170-4a0265f37df7_3484x1546.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Overview of Moshi. Moshi is a speech-text foundation model which enables real-time spoken dialogue. The main components of Moshi&#8217;s architecture are: a bespoke text language model backbone; a neural audio codec with residual vector quantization and with semantic knowledge distilled from a self-supervised speech model; the streaming, hierarchical generation of semantic and acoustic tokens for both the user and Moshi, along with time-aligned text tokens for Moshi when using Inner Monologue.</figcaption></figure></div><p>Kyutai&#8217;s Moshi (2024) is a 7B parameter speech-text foundation model built on an RQ-Transformer (<a href="https://arxiv.org/pdf/2410.00037">the paper</a>). There are two transformers stacked together:</p><ul><li><p>A <strong>&#8220;temporal&#8221; Transformer</strong> running at 12.5 Hz, processing one frame every 80ms.</p></li><li><p>A <strong>&#8220;depth&#8221; Transformer</strong> that, at each temporal step, predicts 8 audio codebook tokens &#8212; Moshi&#8217;s actual speech output, encoded by the Mimi neural codec.</p></li></ul><p>The key trick is that Moshi is <em>dual-channel</em>: it ingests the user&#8217;s speech tokens and predicts its own speech tokens simultaneously, in interleaved streams. There is no voice-activity-detection harness deciding when it is &#8220;the model&#8217;s turn&#8221; &#8212; at every 80ms frame the model is free to speak, stay silent, backchannel, or interrupt. This is the architectural foundation that everyone in the modern full-duplex space, including Sesame builds on. Thinking Machines changes it a bit that we will see in detail.</p><h3>2. Sesame: refining the experience</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L745!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e078863-d69b-4d4c-86e0-c84f44d4224c_1872x1461.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L745!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e078863-d69b-4d4c-86e0-c84f44d4224c_1872x1461.jpeg 424w, https://substackcdn.com/image/fetch/$s_!L745!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e078863-d69b-4d4c-86e0-c84f44d4224c_1872x1461.jpeg 848w, https://substackcdn.com/image/fetch/$s_!L745!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e078863-d69b-4d4c-86e0-c84f44d4224c_1872x1461.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!L745!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e078863-d69b-4d4c-86e0-c84f44d4224c_1872x1461.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L745!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e078863-d69b-4d4c-86e0-c84f44d4224c_1872x1461.jpeg" width="1456" height="1136" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e078863-d69b-4d4c-86e0-c84f44d4224c_1872x1461.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1136,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;CSM model inference process&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="CSM model inference process" title="CSM model inference process" srcset="https://substackcdn.com/image/fetch/$s_!L745!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e078863-d69b-4d4c-86e0-c84f44d4224c_1872x1461.jpeg 424w, https://substackcdn.com/image/fetch/$s_!L745!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e078863-d69b-4d4c-86e0-c84f44d4224c_1872x1461.jpeg 848w, https://substackcdn.com/image/fetch/$s_!L745!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e078863-d69b-4d4c-86e0-c84f44d4224c_1872x1461.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!L745!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e078863-d69b-4d4c-86e0-c84f44d4224c_1872x1461.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><strong>CSM model inference process. Text (T) and audio (A) tokens are interleaved and fed sequentially into the Backbone, which predicts the zeroth level of the codebook. The Decoder then samples levels 1 through N &#8211; 1 conditioned on the predicted zeroth level. The reconstructed audio token (A) is then autoregressively fed back into the Backbone for the next step, continuing until the audio EOT symbol is emitted. This process begins again on the next inference request, with the interim audio (such as a user utterance) being represented by interleaved audio and text transcription tokens.</strong></figcaption></figure></div><p>Sesame&#8217;s Conversational Speech Model takes the same full-duplex spirit and pushes hard on voice quality and emotional naturalness. The public technical details are slimmer than Moshi&#8217;s (which has <a href="https://arxiv.org/pdf/2410.00037v2">an open paper</a>), but the design lineage &#8212; continuous bidirectional audio streams, no separate VAD harness, low-latency codec output &#8212; is unmistakably Moshi-style. If you have tried the preview, the <a href="https://www.sesame.com/">demo experience</a> speaks for itself.</p><h3>3. MoshiRAG: thinking while talking</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Px_7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32315b3f-d477-49d0-a385-12ecd148db5a_976x498.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Px_7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32315b3f-d477-49d0-a385-12ecd148db5a_976x498.png 424w, https://substackcdn.com/image/fetch/$s_!Px_7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32315b3f-d477-49d0-a385-12ecd148db5a_976x498.png 848w, https://substackcdn.com/image/fetch/$s_!Px_7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32315b3f-d477-49d0-a385-12ecd148db5a_976x498.png 1272w, https://substackcdn.com/image/fetch/$s_!Px_7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32315b3f-d477-49d0-a385-12ecd148db5a_976x498.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Px_7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32315b3f-d477-49d0-a385-12ecd148db5a_976x498.png" width="976" height="498" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/32315b3f-d477-49d0-a385-12ecd148db5a_976x498.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:498,&quot;width&quot;:976,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:119924,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/197354374?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32315b3f-d477-49d0-a385-12ecd148db5a_976x498.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Px_7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32315b3f-d477-49d0-a385-12ecd148db5a_976x498.png 424w, https://substackcdn.com/image/fetch/$s_!Px_7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32315b3f-d477-49d0-a385-12ecd148db5a_976x498.png 848w, https://substackcdn.com/image/fetch/$s_!Px_7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32315b3f-d477-49d0-a385-12ecd148db5a_976x498.png 1272w, https://substackcdn.com/image/fetch/$s_!Px_7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32315b3f-d477-49d0-a385-12ecd148db5a_976x498.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Illustration of the front-end and back-end components in MoshiR A G . When the model needs external information, it outputs a &#10216;ret&#10217; token. The conversation transcript is sent to the back end which operates asynchronously. Once ready, the result is injected into Moshi which then adapts its response with no interruption.</figcaption></figure></div><p>Moshi has one weakness that is hard to engineer around at 7B parameters: factuality &amp; intelligence. Speech LMs are trained on vastly less text (in word count) than text-only LMs, and it shows on QA benchmarks. The obvious fix is to make the model bigger, but a bigger model cannot run in real time. Kyutai&#8217;s answer, published in MoshiRAG in April 2026 (<a href="https://arxiv.org/pdf/2604.12928">the paper</a>), is to split the system in two:</p><ul><li><p>A <strong>front end</strong> that stays small and real-time (this is similar to TML&#8217;s interaction model) &#8212; Moshi 7B, a 1B streaming ASR, and a frozen reference text encoder (ARC-Encoder).</p></li><li><p>A <strong>back end</strong> that handles knowledge retrieval asynchronously  (this is similar to use of a background model by TML) &#8212; a local LLM like Gemma 3 27B or an API call to GPT-4.1.</p></li></ul><p>The two are connected by a single new token. When Moshi sees a knowledge-intensive question coming, it predicts a <code>&lt;ret&gt;</code> token in its text channel. That fires off a retrieval call. Crucially, Moshi does not <em>wait</em> for the answer &#8212; it keeps speaking, producing a &#8220;pre-RAG&#8221; lead like &#8220;Let me think about that...&#8221; or &#8220;In the Netflix series...&#8221; that buys the back end roughly two seconds of slack. By the time Moshi reaches the body of its answer, the retrieved reference is ready.</p><p>An ASR model transcribes the user speech and that with useful context is provided to the back end model to do its work.</p><h4>How the retrieved answer gets back into the stream</h4><p>This is the part most people get wrong on first reading. The retrieved reference is <strong>not</strong> injected as user text, nor as model text, nor as audio. It comes back as a sequence of continuous embeddings &#8212; produced by ARC-Encoder, compressed 4&#215; so the sequence is short enough to fit Moshi&#8217;s 12.5 Hz frame budget, then projected through a single trainable linear layer.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8kpr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74cbc386-3d54-4c7f-b346-472e7701ff9b_952x562.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8kpr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74cbc386-3d54-4c7f-b346-472e7701ff9b_952x562.png 424w, https://substackcdn.com/image/fetch/$s_!8kpr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74cbc386-3d54-4c7f-b346-472e7701ff9b_952x562.png 848w, https://substackcdn.com/image/fetch/$s_!8kpr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74cbc386-3d54-4c7f-b346-472e7701ff9b_952x562.png 1272w, https://substackcdn.com/image/fetch/$s_!8kpr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74cbc386-3d54-4c7f-b346-472e7701ff9b_952x562.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8kpr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74cbc386-3d54-4c7f-b346-472e7701ff9b_952x562.png" width="952" height="562" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/74cbc386-3d54-4c7f-b346-472e7701ff9b_952x562.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:562,&quot;width&quot;:952,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:188775,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/197354374?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74cbc386-3d54-4c7f-b346-472e7701ff9b_952x562.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8kpr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74cbc386-3d54-4c7f-b346-472e7701ff9b_952x562.png 424w, https://substackcdn.com/image/fetch/$s_!8kpr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74cbc386-3d54-4c7f-b346-472e7701ff9b_952x562.png 848w, https://substackcdn.com/image/fetch/$s_!8kpr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74cbc386-3d54-4c7f-b346-472e7701ff9b_952x562.png 1272w, https://substackcdn.com/image/fetch/$s_!8kpr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74cbc386-3d54-4c7f-b346-472e7701ff9b_952x562.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Text and audio token streams of the inputs and outputs of MoshiR A G . Front-end Moshi receives at all time its previous step token predictions and the user speech tokens. When the retrieval result is ready, its representation is summed with the embeddings from other token streams and ingested over a number of time steps.</figcaption></figure></div><p>Those projected embeddings are then <strong>additively summed into Moshi&#8217;s input vector</strong> at the temporal Transformer level, over a short window. Mathematically, the input that was originally:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M2Cu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb696628-7e3e-487a-b238-6dfbe5ea2760_802x144.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M2Cu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb696628-7e3e-487a-b238-6dfbe5ea2760_802x144.png 424w, https://substackcdn.com/image/fetch/$s_!M2Cu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb696628-7e3e-487a-b238-6dfbe5ea2760_802x144.png 848w, https://substackcdn.com/image/fetch/$s_!M2Cu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb696628-7e3e-487a-b238-6dfbe5ea2760_802x144.png 1272w, https://substackcdn.com/image/fetch/$s_!M2Cu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb696628-7e3e-487a-b238-6dfbe5ea2760_802x144.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M2Cu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb696628-7e3e-487a-b238-6dfbe5ea2760_802x144.png" width="802" height="144" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db696628-7e3e-487a-b238-6dfbe5ea2760_802x144.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:144,&quot;width&quot;:802,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19031,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/197354374?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb696628-7e3e-487a-b238-6dfbe5ea2760_802x144.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!M2Cu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb696628-7e3e-487a-b238-6dfbe5ea2760_802x144.png 424w, https://substackcdn.com/image/fetch/$s_!M2Cu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb696628-7e3e-487a-b238-6dfbe5ea2760_802x144.png 848w, https://substackcdn.com/image/fetch/$s_!M2Cu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb696628-7e3e-487a-b238-6dfbe5ea2760_802x144.png 1272w, https://substackcdn.com/image/fetch/$s_!M2Cu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb696628-7e3e-487a-b238-6dfbe5ea2760_802x144.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>becomes:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!43VJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F054b3305-c0de-46aa-b83d-a9c264d1465f_376x120.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!43VJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F054b3305-c0de-46aa-b83d-a9c264d1465f_376x120.png 424w, https://substackcdn.com/image/fetch/$s_!43VJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F054b3305-c0de-46aa-b83d-a9c264d1465f_376x120.png 848w, https://substackcdn.com/image/fetch/$s_!43VJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F054b3305-c0de-46aa-b83d-a9c264d1465f_376x120.png 1272w, https://substackcdn.com/image/fetch/$s_!43VJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F054b3305-c0de-46aa-b83d-a9c264d1465f_376x120.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!43VJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F054b3305-c0de-46aa-b83d-a9c264d1465f_376x120.png" width="376" height="120" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/054b3305-c0de-46aa-b83d-a9c264d1465f_376x120.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:120,&quot;width&quot;:376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:9599,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/197354374?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F054b3305-c0de-46aa-b83d-a9c264d1465f_376x120.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!43VJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F054b3305-c0de-46aa-b83d-a9c264d1465f_376x120.png 424w, https://substackcdn.com/image/fetch/$s_!43VJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F054b3305-c0de-46aa-b83d-a9c264d1465f_376x120.png 848w, https://substackcdn.com/image/fetch/$s_!43VJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F054b3305-c0de-46aa-b83d-a9c264d1465f_376x120.png 1272w, https://substackcdn.com/image/fetch/$s_!43VJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F054b3305-c0de-46aa-b83d-a9c264d1465f_376x120.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>for the duration of the reference&#8217;s compressed embedding sequence, and reverts to plain hi&#8203; afterwards. The reference is a transient side-channel that biases Moshi toward grounded facts during the body of its turn &#8212; nothing more, nothing less.</p><p>The authors did try &#8220;insertive&#8221; injection (treating the reference as extra tokens spliced into the input sequence) and found it scored higher on accuracy. They rejected it anyway, because lengthening Moshi&#8217;s input sequence hurts the model&#8217;s ability to hold long conversations. Additive injection was the better engineering trade-off.</p><h4>What changed in Moshi to make Moshi-RAG work</h4><p>The architecture changes are surprisingly minimal from orginal Moshi. The paper says it outright: &#8220;the only modifications from the original Moshi model are the introduction of a special retrieval trigger token <code>&lt;ret&gt;</code> and a reference text encoder.&#8221; In practice there are four small additions:</p><ol><li><p><strong>A new </strong><code>&lt;ret&gt;</code><strong> token</strong> in the text vocabulary, placed in training data immediately before the &#8220;lead&#8221; portion of every knowledge-intensive turn (using TTS forced alignment).</p></li><li><p><strong>A frozen ARC-Encoder</strong> that compresses retrieved text 4&#215; into Moshi&#8217;s embedding space.</p></li><li><p><strong>A one-layer trainable linear projection</strong> sitting between ARC-Encoder and Moshi.</p></li><li><p><strong>A learnable </strong><code>h_dropout</code><strong> vector</strong> that replaces the reference 20% of training time, so Moshi learns to behave gracefully when retrieval fails or is late.</p></li></ol><p>The base Moshi backbone &#8212; the RQ-Transformer, Mimi codec, dual-channel design &#8212; is untouched. MoshiRAG is fine-tuned from the original Moshi checkpoint with 100k updates at a tiny 2 &#215; 10&#8315;&#8310; learning rate.</p><h3>4. Thinking Machines&#8217; Interaction Model</h3><p>The TML blog post is up-front about the lineage: <em>&#8220;This approach builds upon prior work like Qwen-omni, KAME, MoshiRAG.&#8221;</em> But the scale and the design choices have moved on substantially. The demoes are incredible, to say the least. Here is on more example. You can see more in <a href="https://thinkingmachines.ai/blog/interaction-models/">the blog.</a></p><div id="youtube2-A12AVongNN4" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;A12AVongNN4&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/A12AVongNN4?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p><strong>Time-aligned micro-turns.</strong> Where Moshi works at 12.5 Hz (80ms frames), TML works in 200ms chunks. Each chunk interleaves input and output across all modalities &#8212; audio, video, and text. There are no turn boundaries, no VAD, no harness. We will talk more about it in the next section.</p><p><strong>Encoder-free early fusion.</strong>  On the input side to Interactive Model, the blog has following details: Audio enters as dMel features through a light embedding layer. Video enters as 40&#215;40 patches through a small hMLP. Audio output uses a flow head rather than a discrete codec. All of it is co-trained from scratch with the transformer. The 12B active parameters deal with all the modalities.</p><p><strong>The background model split.</strong> This is the part that most directly echoes MoshiRAG. The interaction model handles real-time presence. When a task needs deeper reasoning, planning, or tool use, the interaction model delegates to a background model that runs asynchronously, then weaves results back into the conversation as they arrive &#8212; at a moment &#8220;appropriate to what the user is currently doing, rather than as an abrupt context switch.&#8221; That last phrase is exactly the MoshiRAG philosophy: hide the retrieval latency inside the natural rhythm of speech. One call out: where MoshiRAG bolts on a separate Whisper-class streaming ASR (1B parameters, frozen), TML does not explain whether they use an ASR model or not to provide input to the background model, but it is quite likely they do.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8TEh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714517bd-fddc-46bf-88a3-2e3b13d79791_1136x704.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8TEh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714517bd-fddc-46bf-88a3-2e3b13d79791_1136x704.png 424w, https://substackcdn.com/image/fetch/$s_!8TEh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714517bd-fddc-46bf-88a3-2e3b13d79791_1136x704.png 848w, https://substackcdn.com/image/fetch/$s_!8TEh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714517bd-fddc-46bf-88a3-2e3b13d79791_1136x704.png 1272w, https://substackcdn.com/image/fetch/$s_!8TEh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714517bd-fddc-46bf-88a3-2e3b13d79791_1136x704.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8TEh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714517bd-fddc-46bf-88a3-2e3b13d79791_1136x704.png" width="1136" height="704" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/714517bd-fddc-46bf-88a3-2e3b13d79791_1136x704.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:704,&quot;width&quot;:1136,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:69106,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/197354374?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714517bd-fddc-46bf-88a3-2e3b13d79791_1136x704.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!8TEh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714517bd-fddc-46bf-88a3-2e3b13d79791_1136x704.png 424w, https://substackcdn.com/image/fetch/$s_!8TEh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714517bd-fddc-46bf-88a3-2e3b13d79791_1136x704.png 848w, https://substackcdn.com/image/fetch/$s_!8TEh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714517bd-fddc-46bf-88a3-2e3b13d79791_1136x704.png 1272w, https://substackcdn.com/image/fetch/$s_!8TEh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F714517bd-fddc-46bf-88a3-2e3b13d79791_1136x704.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The user continuously interacts with the interaction model, while the background model performs asynchronous tasks. Both systems share their context.</figcaption></figure></div><p>What the Thinking Machines blog does <em>not</em> tell us is how results come back into the interaction stream. MoshiRAG&#8217;s answer &#8212; additive embedding injection at the temporal Transformer level &#8212; is documented in the paper. TML&#8217;s mechanism is not. But the surface-level analogy is striking enough that it is reasonable to guess the implementation belongs to the same family.</p><h2>Thinking Machine&#8217;s approach to model input:</h2><p>Here there is a difference from Moshi</p><p><strong>Moshi: combined at every frame.</strong> As we discussed earlier, Moshi&#8217;s temporal Transformer receives, at each 12.5 Hz time step, <em>one</em> input vector that is the element-wise sum of three streams: model text, model speech, and user speech embeddings (Equation 1 in the paper). MoshiRAG adds a fourth optional term for the reference embedding during the injection window. There is no notion of &#8220;input now, output later&#8221; within a frame. At every 80 ms tick the model simultaneously sees the user&#8217;s incoming audio <em>and</em> predicts its own next text and audio tokens. Three concurrent streams collapse into one combined vector per step.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uaBq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d99cb55-5346-44c2-8638-37dc051850a4_1112x778.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uaBq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d99cb55-5346-44c2-8638-37dc051850a4_1112x778.png 424w, https://substackcdn.com/image/fetch/$s_!uaBq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d99cb55-5346-44c2-8638-37dc051850a4_1112x778.png 848w, https://substackcdn.com/image/fetch/$s_!uaBq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d99cb55-5346-44c2-8638-37dc051850a4_1112x778.png 1272w, https://substackcdn.com/image/fetch/$s_!uaBq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d99cb55-5346-44c2-8638-37dc051850a4_1112x778.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uaBq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d99cb55-5346-44c2-8638-37dc051850a4_1112x778.png" width="1112" height="778" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6d99cb55-5346-44c2-8638-37dc051850a4_1112x778.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:778,&quot;width&quot;:1112,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:233112,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/197354374?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d99cb55-5346-44c2-8638-37dc051850a4_1112x778.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!uaBq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d99cb55-5346-44c2-8638-37dc051850a4_1112x778.png 424w, https://substackcdn.com/image/fetch/$s_!uaBq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d99cb55-5346-44c2-8638-37dc051850a4_1112x778.png 848w, https://substackcdn.com/image/fetch/$s_!uaBq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d99cb55-5346-44c2-8638-37dc051850a4_1112x778.png 1272w, https://substackcdn.com/image/fetch/$s_!uaBq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d99cb55-5346-44c2-8638-37dc051850a4_1112x778.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Representation of the joint sequence modeled by Moshi</figcaption></figure></div><p><strong>TML Interaction Model: interleaved at chunk granularity, combined within a chunk: </strong>Thinking Machines runs at 200 ms chunks (5 Hz). The blog illustration is explicit about the macro structure &#8212; <em>&#8220;the model receives a single interleaved token sequence&#8221;</em> &#8212; and shows it as <code>input_0 &#8594; output_0 &#8594; input_1 &#8594; output_1 &#8594; input_2 &#8594; output_2 &#8594; &#8230;</code>. So at the <em>between-chunk</em> level, the model alternates: it consumes 200 ms of input, then emits 200 ms of output, then consumes the next 200 ms of input, and so on. That is sequence-based, not parallel.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RZbs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1f86162-cc07-4de2-8122-20dd5df8f8e9_1156x474.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RZbs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1f86162-cc07-4de2-8122-20dd5df8f8e9_1156x474.png 424w, https://substackcdn.com/image/fetch/$s_!RZbs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1f86162-cc07-4de2-8122-20dd5df8f8e9_1156x474.png 848w, https://substackcdn.com/image/fetch/$s_!RZbs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1f86162-cc07-4de2-8122-20dd5df8f8e9_1156x474.png 1272w, https://substackcdn.com/image/fetch/$s_!RZbs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1f86162-cc07-4de2-8122-20dd5df8f8e9_1156x474.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RZbs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1f86162-cc07-4de2-8122-20dd5df8f8e9_1156x474.png" width="1156" height="474" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e1f86162-cc07-4de2-8122-20dd5df8f8e9_1156x474.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:474,&quot;width&quot;:1156,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:57719,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/197354374?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1f86162-cc07-4de2-8122-20dd5df8f8e9_1156x474.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!RZbs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1f86162-cc07-4de2-8122-20dd5df8f8e9_1156x474.png 424w, https://substackcdn.com/image/fetch/$s_!RZbs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1f86162-cc07-4de2-8122-20dd5df8f8e9_1156x474.png 848w, https://substackcdn.com/image/fetch/$s_!RZbs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1f86162-cc07-4de2-8122-20dd5df8f8e9_1156x474.png 1272w, https://substackcdn.com/image/fetch/$s_!RZbs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe1f86162-cc07-4de2-8122-20dd5df8f8e9_1156x474.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Human perception preserves concurrent input and output streams, while the model receives a single interleaved token sequence.</figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TfGC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febb20c49-8dd0-41db-a26f-9773b2c0fdf7_1214x896.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TfGC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febb20c49-8dd0-41db-a26f-9773b2c0fdf7_1214x896.png 424w, https://substackcdn.com/image/fetch/$s_!TfGC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febb20c49-8dd0-41db-a26f-9773b2c0fdf7_1214x896.png 848w, https://substackcdn.com/image/fetch/$s_!TfGC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febb20c49-8dd0-41db-a26f-9773b2c0fdf7_1214x896.png 1272w, https://substackcdn.com/image/fetch/$s_!TfGC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febb20c49-8dd0-41db-a26f-9773b2c0fdf7_1214x896.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TfGC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febb20c49-8dd0-41db-a26f-9773b2c0fdf7_1214x896.png" width="1214" height="896" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ebb20c49-8dd0-41db-a26f-9773b2c0fdf7_1214x896.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:896,&quot;width&quot;:1214,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:188699,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/197354374?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febb20c49-8dd0-41db-a26f-9773b2c0fdf7_1214x896.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!TfGC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febb20c49-8dd0-41db-a26f-9773b2c0fdf7_1214x896.png 424w, https://substackcdn.com/image/fetch/$s_!TfGC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febb20c49-8dd0-41db-a26f-9773b2c0fdf7_1214x896.png 848w, https://substackcdn.com/image/fetch/$s_!TfGC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febb20c49-8dd0-41db-a26f-9773b2c0fdf7_1214x896.png 1272w, https://substackcdn.com/image/fetch/$s_!TfGC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febb20c49-8dd0-41db-a26f-9773b2c0fdf7_1214x896.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Turn-based models see an alternating token sequence. Time-aware interaction models see a continuous stream of micro-turns, so silence, overlap, and interruption remain part of the model's context.</figcaption></figure></div><p>But <em>within</em> a single 200 ms chunk, multiple modalities are still combined, just in a different way. The architecture diagram in the blog shows text frames, audio dMel features, and 40&#215;40 video patches (encoded via hMLP) all flowing into model input after bring embedded.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GESL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfbfb192-84fa-43be-b0d7-fc6a13315552_1136x874.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GESL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfbfb192-84fa-43be-b0d7-fc6a13315552_1136x874.png 424w, https://substackcdn.com/image/fetch/$s_!GESL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfbfb192-84fa-43be-b0d7-fc6a13315552_1136x874.png 848w, https://substackcdn.com/image/fetch/$s_!GESL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfbfb192-84fa-43be-b0d7-fc6a13315552_1136x874.png 1272w, https://substackcdn.com/image/fetch/$s_!GESL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfbfb192-84fa-43be-b0d7-fc6a13315552_1136x874.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GESL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfbfb192-84fa-43be-b0d7-fc6a13315552_1136x874.png" width="1136" height="874" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bfbfb192-84fa-43be-b0d7-fc6a13315552_1136x874.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:874,&quot;width&quot;:1136,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:130566,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/197354374?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfbfb192-84fa-43be-b0d7-fc6a13315552_1136x874.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!GESL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfbfb192-84fa-43be-b0d7-fc6a13315552_1136x874.png 424w, https://substackcdn.com/image/fetch/$s_!GESL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfbfb192-84fa-43be-b0d7-fc6a13315552_1136x874.png 848w, https://substackcdn.com/image/fetch/$s_!GESL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfbfb192-84fa-43be-b0d7-fc6a13315552_1136x874.png 1272w, https://substackcdn.com/image/fetch/$s_!GESL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfbfb192-84fa-43be-b0d7-fc6a13315552_1136x874.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">An illustration of the interaction model architecture for a single 200ms micro-turn. The model takes in any subset of text, audio, or video and predicts text and audio.</figcaption></figure></div><h3>Size comparison</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Jmje!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6625a9c-cf17-462a-81e2-89ebb520ebfa_1236x348.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Jmje!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6625a9c-cf17-462a-81e2-89ebb520ebfa_1236x348.png 424w, https://substackcdn.com/image/fetch/$s_!Jmje!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6625a9c-cf17-462a-81e2-89ebb520ebfa_1236x348.png 848w, https://substackcdn.com/image/fetch/$s_!Jmje!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6625a9c-cf17-462a-81e2-89ebb520ebfa_1236x348.png 1272w, https://substackcdn.com/image/fetch/$s_!Jmje!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6625a9c-cf17-462a-81e2-89ebb520ebfa_1236x348.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Jmje!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6625a9c-cf17-462a-81e2-89ebb520ebfa_1236x348.png" width="1236" height="348" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e6625a9c-cf17-462a-81e2-89ebb520ebfa_1236x348.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:348,&quot;width&quot;:1236,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:65971,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/197354374?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6625a9c-cf17-462a-81e2-89ebb520ebfa_1236x348.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Jmje!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6625a9c-cf17-462a-81e2-89ebb520ebfa_1236x348.png 424w, https://substackcdn.com/image/fetch/$s_!Jmje!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6625a9c-cf17-462a-81e2-89ebb520ebfa_1236x348.png 848w, https://substackcdn.com/image/fetch/$s_!Jmje!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6625a9c-cf17-462a-81e2-89ebb520ebfa_1236x348.png 1272w, https://substackcdn.com/image/fetch/$s_!Jmje!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6625a9c-cf17-462a-81e2-89ebb520ebfa_1236x348.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>TML Interaction Model&#8217;s Inference optimisations</h2><p>The blog is pretty brief on this. Here is my read: the 200ms micro-turn design fundamentally changes what inference looks like. Standard LLM serving is optimised for big prefills followed by long decodes; TML produces a <em>continuous stream</em> of tiny prefill+decode rounds, five per second per active session. Per-turn overhead  dominates the math, so most of the optimisation work is about killing that overhead.</p><p><strong>Streaming sessions.</strong> Instead of treating each 200ms chunk as a fresh request that re-allocates buffers and rebuilds metadata, the inference server keeps a persistent sequence in GPU memory and <em>appends</em> incoming chunks to it. A version of this is upstreamed to SGLang.</p><p><strong>Custom MoE kernels.</strong> TML-Interaction-Small is a 276B MoE with only 12B active. At interaction time the per-step batch is tiny, so the standard grouped-GEMM kernel that MoE inference usually relies on wastes most of its throughput. Thinking Machines replaced it with a GATHER+GEMV strategy &#8212; gather small per-token activations to the right experts, then run GEMV rather than GEMM &#8212; drawing on similar work in PyTorch&#8217;s gpt-fast and Cursor&#8217;s &#8220;warp decode.&#8221;</p><p><strong>Latency-tuned kernel shapes.</strong> Beyond MoE, the kernels are explicitly tuned for the unusual shapes of bidirectional serving &#8212; where prefill and decode happen continuously at small sizes rather than in the canonical &#8220;big prefill, long decode&#8221; pattern.</p><p>The post is notably quiet on quantisation, speculative decoding, and KV-cache compression. </p><h2>TML Background Model&#8217;s Inference optimisations &amp; Hybrid Architecture</h2><p>While the blog does not state what background model TML team used, it is probably a state of the art model. If you try to use an off the shelf LLM API like Sonnet 4.6 or Opus 4.7 you typically get 50-100 tokens per second per user. However, that is not enough for generating tool calls, executing them, preparing a response and handing it back to the interaction model. In such scenarios Cerebras like performance can make a material difference to serve SOTA intelligence in the real time. <a href="https://www.cerebras.ai/">Cerebras</a> can serve 1500+ tokens per second for SOTA open source models.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RjCP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F724bc254-6879-4740-9eac-438d518bd2ae_2500x1875.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RjCP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F724bc254-6879-4740-9eac-438d518bd2ae_2500x1875.jpeg 424w, https://substackcdn.com/image/fetch/$s_!RjCP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F724bc254-6879-4740-9eac-438d518bd2ae_2500x1875.jpeg 848w, https://substackcdn.com/image/fetch/$s_!RjCP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F724bc254-6879-4740-9eac-438d518bd2ae_2500x1875.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!RjCP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F724bc254-6879-4740-9eac-438d518bd2ae_2500x1875.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RjCP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F724bc254-6879-4740-9eac-438d518bd2ae_2500x1875.jpeg" width="1456" height="1092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/724bc254-6879-4740-9eac-438d518bd2ae_2500x1875.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Giant Chips Give Supercomputers a Run for Their Money - IEEE Spectrum&quot;,&quot;title&quot;:&quot;Giant Chips Give Supercomputers a Run for Their Money - IEEE Spectrum&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Giant Chips Give Supercomputers a Run for Their Money - IEEE Spectrum" title="Giant Chips Give Supercomputers a Run for Their Money - IEEE Spectrum" srcset="https://substackcdn.com/image/fetch/$s_!RjCP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F724bc254-6879-4740-9eac-438d518bd2ae_2500x1875.jpeg 424w, https://substackcdn.com/image/fetch/$s_!RjCP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F724bc254-6879-4740-9eac-438d518bd2ae_2500x1875.jpeg 848w, https://substackcdn.com/image/fetch/$s_!RjCP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F724bc254-6879-4740-9eac-438d518bd2ae_2500x1875.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!RjCP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F724bc254-6879-4740-9eac-438d518bd2ae_2500x1875.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Furthermore, one could deploy a hybrid architecture wherein interaction model runs on an edge server or on a server in a datacenter, and it connects to the background model over a fast network (within the same location or over the internet). Even if the connection to the background model is lost, the system will continue to work as interaction model has enough intelligence to carry on a conversation. The TML blog is not 100% clear on how the models interact. If it is text out, text in, it is easy. But if there is tighter integration involving activations, then it may require higher bandwidth (though I don&#8217;t think this is what is happening. It is most likely text out, text in). Also, for now Cerebras does not have built in web search from what I know, so there will need to be an intermediate agent - which would also need to be fast.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qplj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc5d8070-8353-4ff9-9974-b6da45134d20_1268x894.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qplj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc5d8070-8353-4ff9-9974-b6da45134d20_1268x894.png 424w, https://substackcdn.com/image/fetch/$s_!qplj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc5d8070-8353-4ff9-9974-b6da45134d20_1268x894.png 848w, https://substackcdn.com/image/fetch/$s_!qplj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc5d8070-8353-4ff9-9974-b6da45134d20_1268x894.png 1272w, https://substackcdn.com/image/fetch/$s_!qplj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc5d8070-8353-4ff9-9974-b6da45134d20_1268x894.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qplj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc5d8070-8353-4ff9-9974-b6da45134d20_1268x894.png" width="1268" height="894" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cc5d8070-8353-4ff9-9974-b6da45134d20_1268x894.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:894,&quot;width&quot;:1268,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:75832,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/197354374?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc5d8070-8353-4ff9-9974-b6da45134d20_1268x894.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qplj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc5d8070-8353-4ff9-9974-b6da45134d20_1268x894.png 424w, https://substackcdn.com/image/fetch/$s_!qplj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc5d8070-8353-4ff9-9974-b6da45134d20_1268x894.png 848w, https://substackcdn.com/image/fetch/$s_!qplj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc5d8070-8353-4ff9-9974-b6da45134d20_1268x894.png 1272w, https://substackcdn.com/image/fetch/$s_!qplj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc5d8070-8353-4ff9-9974-b6da45134d20_1268x894.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>Ambient AI</strong></h3><p><br>This is a privacy preserving setup and may be desired by many as interaction model is exposed to voice and video - which may haev sensitive content. This approach can safely enable &#8220;Ambient AI&#8221;. Ambient AI, often referred to as ambient intelligence (AmI) or ambient computing, <strong>represents a shift in artificial intelligence from reactive (waiting for prompts) to proactive and invisible (acting on context)</strong>.<br><br><strong>In my mental model if coding is 1X of token use, AI co-working is going to be 10X of token use, and Ambient AI is going to be 100X of token use.</strong> This two model architecture pattern and chipset like Cerebras are vital to enable it.</p><h3><strong>Trivia</strong></h3><p><br>One of my favourite blogger <a href="https://x.com/stratechery">Stratecherry</a> believes Cerebras main use case is not Coding, but such <a href="https://stratechery.com/2026/the-inference-shift/">real time interaction use cases in voice (Ambient AI) </a>and possibly robotics domain. I believe Cerebras - in addition to such performance sensitive applications - is also very important for coding, as well as, personal AI agents use cases when users are in driving seats. Stratcherry believe background agents will replace all the coding and personal agent use cases. I believe, users will want to be in the loop forever, not because they have to, but because they want to - just like people want to browse their social media themselves. It is immensely satisfying - plus it can compound the output! But, that is a debate for another time&#8230;<br>You can also read this excellent deep dive on <a href="https://newsletter.semianalysis.com/p/cerebras-faster-tokens-please">Cerebras</a> by Semianalysis.</p><h2>Kyutai, take a bow!</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!alvh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014e4db7-8271-4802-a39f-d3726c3f3037_2672x1272.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!alvh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014e4db7-8271-4802-a39f-d3726c3f3037_2672x1272.png 424w, https://substackcdn.com/image/fetch/$s_!alvh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014e4db7-8271-4802-a39f-d3726c3f3037_2672x1272.png 848w, https://substackcdn.com/image/fetch/$s_!alvh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014e4db7-8271-4802-a39f-d3726c3f3037_2672x1272.png 1272w, https://substackcdn.com/image/fetch/$s_!alvh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014e4db7-8271-4802-a39f-d3726c3f3037_2672x1272.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!alvh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014e4db7-8271-4802-a39f-d3726c3f3037_2672x1272.png" width="1456" height="693" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/014e4db7-8271-4802-a39f-d3726c3f3037_2672x1272.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:693,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2908567,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/197354374?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014e4db7-8271-4802-a39f-d3726c3f3037_2672x1272.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!alvh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014e4db7-8271-4802-a39f-d3726c3f3037_2672x1272.png 424w, https://substackcdn.com/image/fetch/$s_!alvh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014e4db7-8271-4802-a39f-d3726c3f3037_2672x1272.png 848w, https://substackcdn.com/image/fetch/$s_!alvh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014e4db7-8271-4802-a39f-d3726c3f3037_2672x1272.png 1272w, https://substackcdn.com/image/fetch/$s_!alvh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F014e4db7-8271-4802-a39f-d3726c3f3037_2672x1272.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Image Source: <a href="https://techcrunch.com/2023/11/17/kyutai-is-an-french-ai-research-lab-with-a-330-million-budget-that-will-make-everything-open-source/">Techcrunch</a></p><p>Kyutai was launched in November 2023 at Station F in Paris, a non-profit AI lab seeded with roughly &#8364;300 million from Xavier Niel (Iliad), Rodolphe Saad&#233; (CMA CGM), and Eric Schmidt's foundation, with a mandate that is still rare in the field: <strong>do frontier research, publish everything, release the weights, and treat open science as the default rather than a marketing posture.</strong> Six scientists took the stage that morning, and their CVs read like a quiet history of European generative AI. </p><ol><li><p>Patrick P&#233;rez, who became CEO, had spent decades at Inria, Microsoft Research, Technicolor, and Valeo, where he ran the valeo.ai lab. </p></li><li><p>Herv&#233; J&#233;gou had helped found Meta's FAIR Paris office and is the mind behind product quantization and the FAISS vector-search library that quietly underpins much of modern retrieval. </p></li><li><p>Edouard Grave came from FAIR's language modelling group. </p></li><li><p>Laurent Mazar&#233; arrived from DeepMind by way of Jane Street. </p></li><li><p>Neil Zeghidour and Alexandre D&#233;fossez, both veterans of Google Brain, DeepMind, and Meta, had between them already invented many of the neural audio codecs the field now takes for granted &#8212; SoundStream, EnCodec, MusicGen, AudioLM. </p></li></ol><p>When this team turned its attention to full-duplex speech, the result was Moshi: a system that did not just compete with the cascaded pipelines of the day but made them look like a category error. MoshiRAG followed in 2026, extending the same architecture with asynchronous retrieval while keeping the open-science commitment intact &#8212; every detail down to the training-time retrieval-delay sampling distribution is in the paper. In late 2025, Zeghidour, Mazar&#233;, and D&#233;fossez spun out Gradium to commercialise the voice work while keeping the lab close, a quietly elegant solution to the hard economic question of how a non-profit funds frontier compute. </p><p>While Thinking Machine should get massive well deserved credit for bringing us this far, Moshi paved the way for this revolution. The lineage matters. So does the lab that chose to publish it. Many thanks to the team behind Kyutai!!!</p><h2>Closing thoughts</h2><p>Moshi proved you could ditch the cascaded ASR&#8211;LLM&#8211;TTS pipeline and get a real full-duplex speech model. MoshiRAG proved you could keep that real-time front end and still get factuality by farming knowledge work out asynchronously. </p><p>Thinking Machines is now proving you can scale the same two-tier idea to a 276B/12B Active MoE backbone, add video, and still hit sub-second turn-taking latency and basically build your HER. You could add more capable agents between the interaction model and the background model and create phenomenal applications that will feel much more pleasant to interact with. This will also create a market for Cerebras style inference accelerators. And it is a big market.</p><p>TML&#8217;s interaction model is incredibly impressive, and as they say: &#8220;this is the worst it will ever be&#8221;!</p><p>Also, if you haven&#8217;t watched the movie Her, let me tempt you by showing this trailer.</p><div id="youtube2-GV01B5kVsC0" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;GV01B5kVsC0&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/GV01B5kVsC0?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h3></h3><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://polymath707.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Moore's Law Is Dead. So Why Do GPUs Keep Getting Faster?]]></title><description><![CDATA[Story of silicon tiling, memory stacking, superchips, and rack-scale interconnects. We understand the story by taking a deeper look at Nvidia's approach. However, these approaches are universal.]]></description><link>https://polymath707.substack.com/p/moores-law-is-dead-so-why-do-gpus</link><guid isPermaLink="false">https://polymath707.substack.com/p/moores-law-is-dead-so-why-do-gpus</guid><dc:creator><![CDATA[Polymath707]]></dc:creator><pubDate>Fri, 17 Apr 2026 03:06:03 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!tnzl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7aa7998-b776-419b-ba85-413f512879de_800x494.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tnzl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7aa7998-b776-419b-ba85-413f512879de_800x494.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tnzl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7aa7998-b776-419b-ba85-413f512879de_800x494.jpeg 424w, https://substackcdn.com/image/fetch/$s_!tnzl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7aa7998-b776-419b-ba85-413f512879de_800x494.jpeg 848w, https://substackcdn.com/image/fetch/$s_!tnzl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7aa7998-b776-419b-ba85-413f512879de_800x494.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!tnzl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7aa7998-b776-419b-ba85-413f512879de_800x494.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tnzl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7aa7998-b776-419b-ba85-413f512879de_800x494.jpeg" width="800" height="494" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b7aa7998-b776-419b-ba85-413f512879de_800x494.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:494,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Personal Memories of Gordon Moore | Bill Gates&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Personal Memories of Gordon Moore | Bill Gates" title="Personal Memories of Gordon Moore | Bill Gates" srcset="https://substackcdn.com/image/fetch/$s_!tnzl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7aa7998-b776-419b-ba85-413f512879de_800x494.jpeg 424w, https://substackcdn.com/image/fetch/$s_!tnzl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7aa7998-b776-419b-ba85-413f512879de_800x494.jpeg 848w, https://substackcdn.com/image/fetch/$s_!tnzl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7aa7998-b776-419b-ba85-413f512879de_800x494.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!tnzl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7aa7998-b776-419b-ba85-413f512879de_800x494.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Moore&#8217;s Law &#8212; and why it&#8217;s over - and what comes after</strong></h2><p><br>In 1965, Intel co-founder Gordon Moore observed that the number of transistors on a chip doubles roughly every two years at constant cost, a trend that held for five decades and powered the entire digital revolution. But Moore&#8217;s Law was never a law of physics; it was an economic observation, and it depended on two things: transistors getting smaller, and smaller transistors being cheaper. Both have stopped. On the cost side, the price per transistor plateaued at the 28nm node around 2014 and is now <em>rising</em>: TSMC&#8217;s 2nm node, shipping in 2026, will be the first major node where cost per transistor goes up, not down. On the physics side, we&#8217;ve hit the atomic wall: at the 2nm process, critical transistor features are roughly 10 atoms wide. At TSMC&#8217;s upcoming 1.4nm node (2028), that shrinks to about 7 atoms. You cannot build a switch smaller than an atom; this isn&#8217;t an engineering problem to be solved, it&#8217;s a fundamental limit of matter. Quantum tunneling causes electrons to leak through barriers that should be impenetrable, blurring the distinction between &#8220;on&#8221; and &#8220;off&#8221; that all digital computing depends on. The golden era where each generation was simultaneously denser, faster, and cheaper is definitively over. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mQ2t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151f8895-64ee-41e2-8284-cf38a416cee8_2118x1564.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mQ2t!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151f8895-64ee-41e2-8284-cf38a416cee8_2118x1564.png 424w, https://substackcdn.com/image/fetch/$s_!mQ2t!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151f8895-64ee-41e2-8284-cf38a416cee8_2118x1564.png 848w, https://substackcdn.com/image/fetch/$s_!mQ2t!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151f8895-64ee-41e2-8284-cf38a416cee8_2118x1564.png 1272w, https://substackcdn.com/image/fetch/$s_!mQ2t!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151f8895-64ee-41e2-8284-cf38a416cee8_2118x1564.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mQ2t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151f8895-64ee-41e2-8284-cf38a416cee8_2118x1564.png" width="1456" height="1075" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/151f8895-64ee-41e2-8284-cf38a416cee8_2118x1564.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1075,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1068915,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/194444376?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151f8895-64ee-41e2-8284-cf38a416cee8_2118x1564.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mQ2t!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151f8895-64ee-41e2-8284-cf38a416cee8_2118x1564.png 424w, https://substackcdn.com/image/fetch/$s_!mQ2t!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151f8895-64ee-41e2-8284-cf38a416cee8_2118x1564.png 848w, https://substackcdn.com/image/fetch/$s_!mQ2t!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151f8895-64ee-41e2-8284-cf38a416cee8_2118x1564.png 1272w, https://substackcdn.com/image/fetch/$s_!mQ2t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F151f8895-64ee-41e2-8284-cf38a416cee8_2118x1564.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>That is why Jensen Huang (CEO Nvidia) has been saying in his every interview that Moore&#8217;s Law is dead! We can no longer shrink transistors at the pace Gordon Moore predicted in 1965. The doubling of transistor density every two years on a single chip has stalled.  The number of transistors per GPU tile (we will talk more about tiles later) only increase slightly: 80 billion in Nvidia Hopper tiles, 104 billion in Nvidia Blackwell tiles, and 168  billion in Nvidia Rubin tiles. And yet, performance improved much more! </p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://polymath707.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Girish Patil! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>How? The answer isn&#8217;t one trick &#8212; it&#8217;s a system of engineering innovations across five axes (there is a name to it: <strong>extreme co-design</strong>):</p><blockquote><p>&#183; More tiles per GPU: tiling multiple dies into one GPU package</p><p>&#183; Lower precision: FP8 &#8594; FP4 doubles effective FLOPS per tensor core</p><p>&#183; More memory bandwidth: advancing HBM generations and adding more stacks</p><p>&#183; Superchips: fusing CPU + dual GPUs with coherent chip-to-chip interconnects</p><p>&#183; More GPU-to-GPU bandwidth: scaling NVLink and building rack-scale fabrics like NVL72</p></blockquote><p></p><h2>The Nvidia GPU Generations at a Glance</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Tkxs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78774b7-006a-487f-ac6e-1839832bdb80_1592x1038.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Tkxs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78774b7-006a-487f-ac6e-1839832bdb80_1592x1038.png 424w, https://substackcdn.com/image/fetch/$s_!Tkxs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78774b7-006a-487f-ac6e-1839832bdb80_1592x1038.png 848w, https://substackcdn.com/image/fetch/$s_!Tkxs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78774b7-006a-487f-ac6e-1839832bdb80_1592x1038.png 1272w, https://substackcdn.com/image/fetch/$s_!Tkxs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78774b7-006a-487f-ac6e-1839832bdb80_1592x1038.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Tkxs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78774b7-006a-487f-ac6e-1839832bdb80_1592x1038.png" width="1456" height="949" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d78774b7-006a-487f-ac6e-1839832bdb80_1592x1038.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:949,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:188861,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/194444376?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78774b7-006a-487f-ac6e-1839832bdb80_1592x1038.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Tkxs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78774b7-006a-487f-ac6e-1839832bdb80_1592x1038.png 424w, https://substackcdn.com/image/fetch/$s_!Tkxs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78774b7-006a-487f-ac6e-1839832bdb80_1592x1038.png 848w, https://substackcdn.com/image/fetch/$s_!Tkxs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78774b7-006a-487f-ac6e-1839832bdb80_1592x1038.png 1272w, https://substackcdn.com/image/fetch/$s_!Tkxs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd78774b7-006a-487f-ac6e-1839832bdb80_1592x1038.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>(To keep the table simple, I haven&#8217;t added H200, B300 and Rubin Ultra in the table. B300 offers for example much higher Peak AI TFLOPs for FP4 around 14,000 FP4. Rubin Ultra offers more performance and memory.) </p><p>For Rubin , Nvidia claims up to an effective 50,000 TFLOPS of FP4 performance can be achieved with an updated 3<sup>rd</sup> generation Transformer Engine that replaces 2:4 structured sparsity from prior generations. Without use of it, it would be around 35,000 TFLOPs.</p><p>Rubin has another variant Rubin Ultra, which doubles the silicon to 4 reticle-limit chiplets, pairs them with 16 HBM4e stacks totalling 1 TB of memory, and delivers ~100,000 TFLOPS of FP4 compute with the 3<sup>rd</sup> generation Transformer Engine . </p><p></p><h2>Axis 1: More Silicon &#8212; Tiling Past the Reticle Limit</h2><h3>The Reticle Wall</h3><p>Every chip is printed using a photolithography mask called a reticle. The maximum area a single reticle can expose on a wafer is roughly 800 mm&#178; &#8212; a hard physical limit of the lithography equipment. The H100, at 814 mm&#178;, was already pushing right up against this boundary. You simply cannot make a single chip bigger. This is the wall that killed the naive version of Moore&#8217;s Law for GPUs.</p><h3>Blackwell: Two Dies, One GPU</h3><p>NVIDIA&#8217;s B200 broke through this wall with a dual-die design. Two reticle-limit dies (each ~104 billion transistors on TSMC 4NP) are placed side by side and connected via NV-HBI (NVIDIA High-Bandwidth Interface) &#8212; a custom die-to-die interconnect running at 10 TB/s. The two dies appear as a single GPU to software. This is how NVIDIA jumped from 80B to 208B transistors without a dramatic process shrink.</p><h3>Rubin: Two Chiplets on 3nm</h3><p>The Rubin R100 GPU also uses a 2-die design, but on TSMC&#8217;s N3P (3nm-class) process. Two reticle-limit compute chiplets are connected on a CoWoS-L advanced package, reaching approximately 336 billion transistors. The density improvement from 4nm &#8594; 3nm, combined with architectural improvements, is how NVIDIA achieves ~1.6&#215; more transistors than Blackwell with the same number of tiles.</p><h3>Rubin Ultra (2027): Four Chiplets</h3><p>Rubin Ultra doubles the silicon again &#8212; four reticle-limit chiplets on a single package, using die-to-die bonding with a fast coherent interconnect. Combined with 16 HBM4e stacks (1 TB total), Rubin Ultra roughly doubles the compute of standard Rubin to ~100 PFLOPS FP4.</p><p><strong>Key insight: </strong>NVIDIA isn&#8217;t relying on Moore&#8217;s Law. They&#8217;re using packaging innovation &#8212; putting more dies on a single substrate &#8212; to scale transistor count far beyond what any single die could hold.</p><p>Below images can help you understand this point visually!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mPgv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd01d5964-7f15-4be5-94c6-7995d7595960_2004x1092.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mPgv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd01d5964-7f15-4be5-94c6-7995d7595960_2004x1092.png 424w, https://substackcdn.com/image/fetch/$s_!mPgv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd01d5964-7f15-4be5-94c6-7995d7595960_2004x1092.png 848w, https://substackcdn.com/image/fetch/$s_!mPgv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd01d5964-7f15-4be5-94c6-7995d7595960_2004x1092.png 1272w, https://substackcdn.com/image/fetch/$s_!mPgv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd01d5964-7f15-4be5-94c6-7995d7595960_2004x1092.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mPgv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd01d5964-7f15-4be5-94c6-7995d7595960_2004x1092.png" width="1456" height="793" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d01d5964-7f15-4be5-94c6-7995d7595960_2004x1092.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:793,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1450278,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/194444376?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd01d5964-7f15-4be5-94c6-7995d7595960_2004x1092.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mPgv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd01d5964-7f15-4be5-94c6-7995d7595960_2004x1092.png 424w, https://substackcdn.com/image/fetch/$s_!mPgv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd01d5964-7f15-4be5-94c6-7995d7595960_2004x1092.png 848w, https://substackcdn.com/image/fetch/$s_!mPgv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd01d5964-7f15-4be5-94c6-7995d7595960_2004x1092.png 1272w, https://substackcdn.com/image/fetch/$s_!mPgv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd01d5964-7f15-4be5-94c6-7995d7595960_2004x1092.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For the Blackwell GPU, you can notice two tiles: one on the left and one on the right. Area is split approximately equally between the two.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ECHX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F381c962e-77de-46d5-82b0-e89e65c5ab26_2182x1420.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ECHX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F381c962e-77de-46d5-82b0-e89e65c5ab26_2182x1420.webp 424w, https://substackcdn.com/image/fetch/$s_!ECHX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F381c962e-77de-46d5-82b0-e89e65c5ab26_2182x1420.webp 848w, https://substackcdn.com/image/fetch/$s_!ECHX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F381c962e-77de-46d5-82b0-e89e65c5ab26_2182x1420.webp 1272w, https://substackcdn.com/image/fetch/$s_!ECHX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F381c962e-77de-46d5-82b0-e89e65c5ab26_2182x1420.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ECHX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F381c962e-77de-46d5-82b0-e89e65c5ab26_2182x1420.webp" width="1456" height="948" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/381c962e-77de-46d5-82b0-e89e65c5ab26_2182x1420.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:948,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Inside NVIDIA Blackwell Ultra: The Chip Powering the AI Factory Era | NVIDIA  Technical Blog&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Inside NVIDIA Blackwell Ultra: The Chip Powering the AI Factory Era | NVIDIA  Technical Blog" title="Inside NVIDIA Blackwell Ultra: The Chip Powering the AI Factory Era | NVIDIA  Technical Blog" srcset="https://substackcdn.com/image/fetch/$s_!ECHX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F381c962e-77de-46d5-82b0-e89e65c5ab26_2182x1420.webp 424w, https://substackcdn.com/image/fetch/$s_!ECHX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F381c962e-77de-46d5-82b0-e89e65c5ab26_2182x1420.webp 848w, https://substackcdn.com/image/fetch/$s_!ECHX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F381c962e-77de-46d5-82b0-e89e65c5ab26_2182x1420.webp 1272w, https://substackcdn.com/image/fetch/$s_!ECHX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F381c962e-77de-46d5-82b0-e89e65c5ab26_2182x1420.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For the Rubin GPU as well, you can notice two tiles: one on the left and one on the right. Area is split approximately equally between the two.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!t-5_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F978af2d7-b4f0-416f-a7a2-ded832de26c3_2078x1170.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!t-5_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F978af2d7-b4f0-416f-a7a2-ded832de26c3_2078x1170.png 424w, https://substackcdn.com/image/fetch/$s_!t-5_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F978af2d7-b4f0-416f-a7a2-ded832de26c3_2078x1170.png 848w, https://substackcdn.com/image/fetch/$s_!t-5_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F978af2d7-b4f0-416f-a7a2-ded832de26c3_2078x1170.png 1272w, https://substackcdn.com/image/fetch/$s_!t-5_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F978af2d7-b4f0-416f-a7a2-ded832de26c3_2078x1170.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!t-5_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F978af2d7-b4f0-416f-a7a2-ded832de26c3_2078x1170.png" width="1456" height="820" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/978af2d7-b4f0-416f-a7a2-ded832de26c3_2078x1170.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:820,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3090606,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/194444376?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F978af2d7-b4f0-416f-a7a2-ded832de26c3_2078x1170.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!t-5_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F978af2d7-b4f0-416f-a7a2-ded832de26c3_2078x1170.png 424w, https://substackcdn.com/image/fetch/$s_!t-5_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F978af2d7-b4f0-416f-a7a2-ded832de26c3_2078x1170.png 848w, https://substackcdn.com/image/fetch/$s_!t-5_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F978af2d7-b4f0-416f-a7a2-ded832de26c3_2078x1170.png 1272w, https://substackcdn.com/image/fetch/$s_!t-5_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F978af2d7-b4f0-416f-a7a2-ded832de26c3_2078x1170.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>Axis 2: Lower Precision &#8212; How FP4 Doubles Effective FLOPS</h2><p>One of the most powerful &#8212; and least understood &#8212; levers NVIDIA pulls each generation is reducing numerical precision. This is not about making calculations less accurate. It&#8217;s about doing more useful work per clock cycle by using smaller numbers where full precision isn&#8217;t needed.</p><h3>The Precision Ladder</h3><p>Neural networks are remarkably tolerant of reduced precision. The weights and activations in a trained model don&#8217;t need 32-bit or even 16-bit floating point to produce correct results. NVIDIA has exploited this insight aggressively:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4sBp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6226ab17-b3db-46ad-9757-1a9bf73a0f78_1592x578.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4sBp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6226ab17-b3db-46ad-9757-1a9bf73a0f78_1592x578.png 424w, https://substackcdn.com/image/fetch/$s_!4sBp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6226ab17-b3db-46ad-9757-1a9bf73a0f78_1592x578.png 848w, https://substackcdn.com/image/fetch/$s_!4sBp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6226ab17-b3db-46ad-9757-1a9bf73a0f78_1592x578.png 1272w, https://substackcdn.com/image/fetch/$s_!4sBp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6226ab17-b3db-46ad-9757-1a9bf73a0f78_1592x578.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4sBp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6226ab17-b3db-46ad-9757-1a9bf73a0f78_1592x578.png" width="1456" height="529" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6226ab17-b3db-46ad-9757-1a9bf73a0f78_1592x578.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:529,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:121671,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/194444376?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6226ab17-b3db-46ad-9757-1a9bf73a0f78_1592x578.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4sBp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6226ab17-b3db-46ad-9757-1a9bf73a0f78_1592x578.png 424w, https://substackcdn.com/image/fetch/$s_!4sBp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6226ab17-b3db-46ad-9757-1a9bf73a0f78_1592x578.png 848w, https://substackcdn.com/image/fetch/$s_!4sBp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6226ab17-b3db-46ad-9757-1a9bf73a0f78_1592x578.png 1272w, https://substackcdn.com/image/fetch/$s_!4sBp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6226ab17-b3db-46ad-9757-1a9bf73a0f78_1592x578.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Why Halving Precision Doubles FLOPS</h3><p>The math is straightforward: when you halve the number of bits per value, the tensor cores can process twice as many values per clock cycle through the same silicon. A matrix multiply that operates on 8-bit values can crunch exactly twice as many elements per cycle as one operating on 16-bit values, using the same physical hardware.</p><h3>NVFP4: Not Just Smaller Numbers</h3><p>NVIDIA&#8217;s NVFP4 format isn&#8217;t a naive truncation. It uses a sophisticated dual-level scaling scheme:</p><blockquote><p>&#183; Values are grouped into small blocks (e.g., 32 elements)</p><p>&#183; Each block gets its own FP8 scaling factor that captures the local dynamic range</p><p>&#183; Individual values within the block are stored as 4-bit floats (1 sign bit, 2 exponent bits, 1 mantissa bit)</p><p>&#183; The combination of per-block scaling + 4-bit values preserves accuracy within ~1% of FP8 on most LLM tasks</p></blockquote><p>This is why NVFP4 works for both inference AND training &#8212; the scaling factors compensate for the limited range of 4-bit values.</p><h3>The Double Benefit: FLOPS + Memory</h3><p>FP4 doesn&#8217;t just double compute throughput &#8212; it also halves the memory footprint of model weights compared to FP8:</p><blockquote><p>&#183; A 70B parameter model in FP16: ~140 GB (doesn&#8217;t fit on one H100)</p><p>&#183; Same model in FP8: ~70 GB (fits on one H100 with 10 GB headroom)</p><p>&#183; Same model in FP4: ~35 GB (fits on one B200 with 157 GB free for KV-cache and batching)</p></blockquote><p>Smaller weights mean more of the GPU&#8217;s HBM is available for KV-cache (critical for long-context inference) and larger batch sizes (critical for throughput). This is why NVIDIA claims up to 15&#215; inference performance improvement from Blackwell over Hopper at system scale &#8212; it&#8217;s the compound effect of more FLOPS, more memory, more bandwidth, AND smaller model footprint.</p><p><strong>Key insight: </strong>Hopper&#8217;s lowest native precision is FP8. Blackwell and Rubin add native FP4 support, which effectively doubles the useful FLOPS per tensor core without adding any transistors. This is a &#8220;free&#8221; 2&#215; multiplier on top of all the other scaling axes.</p><p></p><h2>Axis 3: More Memory Bandwidth &#8212; The HBM Scaling Playbook</h2><p>AI workloads are memory-bandwidth-bound. A GPU can have all the FLOPS in the world, but if it can&#8217;t feed data to its compute cores fast enough, those cores sit idle. NVIDIA attacks this memory wall by advancing along three dimensions of HBM.</p><h3>How HBM Works (30-Second Primer)</h3><p>HBM (High Bandwidth Memory) is a 3D-stacked DRAM technology. Multiple DRAM dies are stacked vertically and connected using through-silicon vias (TSVs) &#8212; tiny vertical copper pillars that punch through each layer. The entire stack sits on a silicon interposer right next to the GPU die, connected via thousands of parallel wires.</p><h3>Three Levers NVIDIA Pulls Each Generation</h3><p><strong>1. More stacks on the package</strong></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hk2m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9838b7e-71eb-45cd-959c-b8386a6e26ce_1596x368.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hk2m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9838b7e-71eb-45cd-959c-b8386a6e26ce_1596x368.png 424w, https://substackcdn.com/image/fetch/$s_!hk2m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9838b7e-71eb-45cd-959c-b8386a6e26ce_1596x368.png 848w, https://substackcdn.com/image/fetch/$s_!hk2m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9838b7e-71eb-45cd-959c-b8386a6e26ce_1596x368.png 1272w, https://substackcdn.com/image/fetch/$s_!hk2m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9838b7e-71eb-45cd-959c-b8386a6e26ce_1596x368.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hk2m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9838b7e-71eb-45cd-959c-b8386a6e26ce_1596x368.png" width="1456" height="336" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a9838b7e-71eb-45cd-959c-b8386a6e26ce_1596x368.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:336,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:78006,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/194444376?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9838b7e-71eb-45cd-959c-b8386a6e26ce_1596x368.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hk2m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9838b7e-71eb-45cd-959c-b8386a6e26ce_1596x368.png 424w, https://substackcdn.com/image/fetch/$s_!hk2m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9838b7e-71eb-45cd-959c-b8386a6e26ce_1596x368.png 848w, https://substackcdn.com/image/fetch/$s_!hk2m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9838b7e-71eb-45cd-959c-b8386a6e26ce_1596x368.png 1272w, https://substackcdn.com/image/fetch/$s_!hk2m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9838b7e-71eb-45cd-959c-b8386a6e26ce_1596x368.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Each generation adds more HBM stacks, requiring a larger silicon interposer (CoWoS-L for Rubin).</p><p><strong>2. More layers per stack (taller stacks)</strong></p><blockquote><p>&#183; HBM3: 8 DRAM layers per stack</p><p>&#183; HBM3e: 8&#8211;12 layers per stack</p><p>&#183; HBM4: 12&#8211;16 layers per stack</p><p>&#183; HBM4e: 16-20 (rumoured) layers per stack</p></blockquote><p><strong>3. Wider interface per stack</strong></p><blockquote><p>&#183; HBM3: 1,024-bit interface &#8594; up to ~819 GB/s per stack</p><p>&#183; HBM3e: 1,024-bit interface &#8594; up to ~1.2 TB/s per stack (higher data rate)</p><p>&#183; HBM4: 2,048-bit interface &#8594; up to ~2.0 TB/s per stack (doubled bus width)</p><p>&#183; HBM4e: 2,048-bit interface &#8594; up to ~3.0-4.0 TB/s per stack (doubled bus width and throughput)</p></blockquote><p>HBM4 &amp; HBM4e literally doubles the bus width &#8212; a fundamental architectural change in the JEDEC spec.</p><h3>The Compound Effect</h3><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eKVZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8cf9f7c-b091-4ee8-9a73-cf8a57264c8b_1594x366.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eKVZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8cf9f7c-b091-4ee8-9a73-cf8a57264c8b_1594x366.png 424w, https://substackcdn.com/image/fetch/$s_!eKVZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8cf9f7c-b091-4ee8-9a73-cf8a57264c8b_1594x366.png 848w, https://substackcdn.com/image/fetch/$s_!eKVZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8cf9f7c-b091-4ee8-9a73-cf8a57264c8b_1594x366.png 1272w, https://substackcdn.com/image/fetch/$s_!eKVZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8cf9f7c-b091-4ee8-9a73-cf8a57264c8b_1594x366.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eKVZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8cf9f7c-b091-4ee8-9a73-cf8a57264c8b_1594x366.png" width="1456" height="334" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e8cf9f7c-b091-4ee8-9a73-cf8a57264c8b_1594x366.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:334,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:72704,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/194444376?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8cf9f7c-b091-4ee8-9a73-cf8a57264c8b_1594x366.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eKVZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8cf9f7c-b091-4ee8-9a73-cf8a57264c8b_1594x366.png 424w, https://substackcdn.com/image/fetch/$s_!eKVZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8cf9f7c-b091-4ee8-9a73-cf8a57264c8b_1594x366.png 848w, https://substackcdn.com/image/fetch/$s_!eKVZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8cf9f7c-b091-4ee8-9a73-cf8a57264c8b_1594x366.png 1272w, https://substackcdn.com/image/fetch/$s_!eKVZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8cf9f7c-b091-4ee8-9a73-cf8a57264c8b_1594x366.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>This is packaging engineering &#8212; more chips, taller stacks, wider bus. Not Moore&#8217;s Law.</p><p></p><h2>Axis 4: The Superchip &#8212; Fusing CPU + Dual GPUs with NVLink-C2C</h2><p>Starting with Grace Hopper, NVIDIA introduced the Superchip: a single module that fuses a CPU and GPU(s) together using NVLink-C2C (Chip-to-Chip) &#8212; a coherent, high-bandwidth interconnect fundamentally different from PCIe.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bp_z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884e6da2-c972-41e2-99bf-8c451d2d3c4d_1999x1018.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bp_z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884e6da2-c972-41e2-99bf-8c451d2d3c4d_1999x1018.png 424w, https://substackcdn.com/image/fetch/$s_!bp_z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884e6da2-c972-41e2-99bf-8c451d2d3c4d_1999x1018.png 848w, https://substackcdn.com/image/fetch/$s_!bp_z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884e6da2-c972-41e2-99bf-8c451d2d3c4d_1999x1018.png 1272w, https://substackcdn.com/image/fetch/$s_!bp_z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884e6da2-c972-41e2-99bf-8c451d2d3c4d_1999x1018.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bp_z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884e6da2-c972-41e2-99bf-8c451d2d3c4d_1999x1018.png" width="1456" height="741" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/884e6da2-c972-41e2-99bf-8c451d2d3c4d_1999x1018.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:741,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;NVIDIA Grace Hopper Superchip Architecture In-Depth | NVIDIA Technical Blog&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="NVIDIA Grace Hopper Superchip Architecture In-Depth | NVIDIA Technical Blog" title="NVIDIA Grace Hopper Superchip Architecture In-Depth | NVIDIA Technical Blog" srcset="https://substackcdn.com/image/fetch/$s_!bp_z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884e6da2-c972-41e2-99bf-8c451d2d3c4d_1999x1018.png 424w, https://substackcdn.com/image/fetch/$s_!bp_z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884e6da2-c972-41e2-99bf-8c451d2d3c4d_1999x1018.png 848w, https://substackcdn.com/image/fetch/$s_!bp_z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884e6da2-c972-41e2-99bf-8c451d2d3c4d_1999x1018.png 1272w, https://substackcdn.com/image/fetch/$s_!bp_z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884e6da2-c972-41e2-99bf-8c451d2d3c4d_1999x1018.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In this how Grace Hopper SuperChip looks. On the left you see Grace CPU and on the right you see Hopper GPU.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IyoL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3226e35f-4d18-44c4-8639-ce1eb22020d7_899x502.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IyoL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3226e35f-4d18-44c4-8639-ce1eb22020d7_899x502.png 424w, https://substackcdn.com/image/fetch/$s_!IyoL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3226e35f-4d18-44c4-8639-ce1eb22020d7_899x502.png 848w, https://substackcdn.com/image/fetch/$s_!IyoL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3226e35f-4d18-44c4-8639-ce1eb22020d7_899x502.png 1272w, https://substackcdn.com/image/fetch/$s_!IyoL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3226e35f-4d18-44c4-8639-ce1eb22020d7_899x502.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IyoL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3226e35f-4d18-44c4-8639-ce1eb22020d7_899x502.png" width="899" height="502" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3226e35f-4d18-44c4-8639-ce1eb22020d7_899x502.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:502,&quot;width&quot;:899,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;NVIDIA Grace Hopper &#8211; EXALIT Pte Ltd&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="NVIDIA Grace Hopper &#8211; EXALIT Pte Ltd" title="NVIDIA Grace Hopper &#8211; EXALIT Pte Ltd" srcset="https://substackcdn.com/image/fetch/$s_!IyoL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3226e35f-4d18-44c4-8639-ce1eb22020d7_899x502.png 424w, https://substackcdn.com/image/fetch/$s_!IyoL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3226e35f-4d18-44c4-8639-ce1eb22020d7_899x502.png 848w, https://substackcdn.com/image/fetch/$s_!IyoL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3226e35f-4d18-44c4-8639-ce1eb22020d7_899x502.png 1272w, https://substackcdn.com/image/fetch/$s_!IyoL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3226e35f-4d18-44c4-8639-ce1eb22020d7_899x502.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>What is NVLink-C2C?</h3><p>NVLink-C2C is a memory-coherent chip-to-chip interconnect that physically bonds the CPU and GPU dies on the same module. Unlike PCIe (which requires explicit data copies), NVLink-C2C creates a unified memory address space &#8212; CPU and GPU threads can access each other&#8217;s memory transparently.</p><p>Key properties of NVLink-C2C:</p><blockquote><p>&#183; Memory coherent &#8212; CPU and GPU see a single unified address space</p><p>&#183; 900 GB/s bidirectional bandwidth (Grace Hopper and Grace Blackwell superchips) &#8212; 7&#215; faster than PCIe Gen5</p><p>&#183; 1.8 TB/s bidirectional bandwidth (Vera Rubin) &#8212; 7&#215; faster than PCIe Gen6</p><p>&#183; 5&#215; more energy-efficient than PCIe per byte transferred (~1.3 pJ/bit)</p><p>&#183; Enables KV-cache offloading from GPU HBM to CPU LPDDR without performance penalty</p></blockquote><p><strong>Why this matters: </strong>In LLM inference, the KV-cache can consume most of the GPU&#8217;s HBM. With NVLink-C2C, the KV-cache can spill into the CPU&#8217;s large LPDDR memory (480 GB&#8211;1.5 TB) at near-HBM speeds, dramatically increasing the effective context window without adding more GPUs. Let us see how these SuperChips look like.</p><h3>Superchip Architecture for Grace Blackwell and Vera Rubin: <br>1 CPU + 2 GPUs</h3><p>Grace Hopper SuperChip had 1 Grace CPU and 1 Hopper GPU, but now with Grace Blackwell and Vera Rubin Architectures there are 2 GPUs with 1 CPU.</p><p>Each Superchip connects one CPU to two GPUs via NVLink-C2C. The two GPUs also connect to each other via NVLink. This creates a tightly-coupled compute unit that serves as the building block for larger systems.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rImF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a71952-e4a6-437b-a911-57795f6d49cd_1596x432.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rImF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a71952-e4a6-437b-a911-57795f6d49cd_1596x432.png 424w, https://substackcdn.com/image/fetch/$s_!rImF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a71952-e4a6-437b-a911-57795f6d49cd_1596x432.png 848w, https://substackcdn.com/image/fetch/$s_!rImF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a71952-e4a6-437b-a911-57795f6d49cd_1596x432.png 1272w, https://substackcdn.com/image/fetch/$s_!rImF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a71952-e4a6-437b-a911-57795f6d49cd_1596x432.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rImF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a71952-e4a6-437b-a911-57795f6d49cd_1596x432.png" width="1456" height="394" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9a71952-e4a6-437b-a911-57795f6d49cd_1596x432.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:394,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:103224,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/194444376?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a71952-e4a6-437b-a911-57795f6d49cd_1596x432.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rImF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a71952-e4a6-437b-a911-57795f6d49cd_1596x432.png 424w, https://substackcdn.com/image/fetch/$s_!rImF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a71952-e4a6-437b-a911-57795f6d49cd_1596x432.png 848w, https://substackcdn.com/image/fetch/$s_!rImF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a71952-e4a6-437b-a911-57795f6d49cd_1596x432.png 1272w, https://substackcdn.com/image/fetch/$s_!rImF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a71952-e4a6-437b-a911-57795f6d49cd_1596x432.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here is a schematic diagram for Vera Rubin SuperChip. Following that we will see how the real SuperChips look like:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UIp_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37779084-16e2-453a-b80c-5f19d710c587_1248x1154.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UIp_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37779084-16e2-453a-b80c-5f19d710c587_1248x1154.png 424w, https://substackcdn.com/image/fetch/$s_!UIp_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37779084-16e2-453a-b80c-5f19d710c587_1248x1154.png 848w, https://substackcdn.com/image/fetch/$s_!UIp_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37779084-16e2-453a-b80c-5f19d710c587_1248x1154.png 1272w, https://substackcdn.com/image/fetch/$s_!UIp_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37779084-16e2-453a-b80c-5f19d710c587_1248x1154.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UIp_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37779084-16e2-453a-b80c-5f19d710c587_1248x1154.png" width="1248" height="1154" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/37779084-16e2-453a-b80c-5f19d710c587_1248x1154.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1154,&quot;width&quot;:1248,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Diagram illustrating coherent memory access between Vera CPU&#8217;s 1.5TB LPDDR5X and Rubin GPU&#8217;s 288GB HBM4 (per GPU) via 1.8TB/s NVLink-C2C.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Diagram illustrating coherent memory access between Vera CPU&#8217;s 1.5TB LPDDR5X and Rubin GPU&#8217;s 288GB HBM4 (per GPU) via 1.8TB/s NVLink-C2C." title="Diagram illustrating coherent memory access between Vera CPU&#8217;s 1.5TB LPDDR5X and Rubin GPU&#8217;s 288GB HBM4 (per GPU) via 1.8TB/s NVLink-C2C." srcset="https://substackcdn.com/image/fetch/$s_!UIp_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37779084-16e2-453a-b80c-5f19d710c587_1248x1154.png 424w, https://substackcdn.com/image/fetch/$s_!UIp_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37779084-16e2-453a-b80c-5f19d710c587_1248x1154.png 848w, https://substackcdn.com/image/fetch/$s_!UIp_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37779084-16e2-453a-b80c-5f19d710c587_1248x1154.png 1272w, https://substackcdn.com/image/fetch/$s_!UIp_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37779084-16e2-453a-b80c-5f19d710c587_1248x1154.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>GB200 SuperChip: </strong>Notice two Blackwell (B200) GPUs in the upper part and one Grace CPU in the lower part.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!s1ws!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7765d790-3938-4935-8b2d-ee5e1a1be74e_705x898.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!s1ws!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7765d790-3938-4935-8b2d-ee5e1a1be74e_705x898.jpeg 424w, https://substackcdn.com/image/fetch/$s_!s1ws!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7765d790-3938-4935-8b2d-ee5e1a1be74e_705x898.jpeg 848w, https://substackcdn.com/image/fetch/$s_!s1ws!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7765d790-3938-4935-8b2d-ee5e1a1be74e_705x898.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!s1ws!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7765d790-3938-4935-8b2d-ee5e1a1be74e_705x898.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!s1ws!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7765d790-3938-4935-8b2d-ee5e1a1be74e_705x898.jpeg" width="705" height="898" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7765d790-3938-4935-8b2d-ee5e1a1be74e_705x898.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:898,&quot;width&quot;:705,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;NVIDIA GB200 NVL72 &#8211; EXALIT Pte Ltd&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="NVIDIA GB200 NVL72 &#8211; EXALIT Pte Ltd" title="NVIDIA GB200 NVL72 &#8211; EXALIT Pte Ltd" srcset="https://substackcdn.com/image/fetch/$s_!s1ws!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7765d790-3938-4935-8b2d-ee5e1a1be74e_705x898.jpeg 424w, https://substackcdn.com/image/fetch/$s_!s1ws!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7765d790-3938-4935-8b2d-ee5e1a1be74e_705x898.jpeg 848w, https://substackcdn.com/image/fetch/$s_!s1ws!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7765d790-3938-4935-8b2d-ee5e1a1be74e_705x898.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!s1ws!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7765d790-3938-4935-8b2d-ee5e1a1be74e_705x898.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>GB300 SuperChip is a performance advancement over GB200: </strong>Notice two Blackwell (B300) GPUs in the upper part and one Grace CPU in the lower part.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-Czl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a94c766-c9ad-4d25-be8c-51966f4a4027_960x540.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-Czl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a94c766-c9ad-4d25-be8c-51966f4a4027_960x540.png 424w, https://substackcdn.com/image/fetch/$s_!-Czl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a94c766-c9ad-4d25-be8c-51966f4a4027_960x540.png 848w, https://substackcdn.com/image/fetch/$s_!-Czl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a94c766-c9ad-4d25-be8c-51966f4a4027_960x540.png 1272w, https://substackcdn.com/image/fetch/$s_!-Czl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a94c766-c9ad-4d25-be8c-51966f4a4027_960x540.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-Czl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a94c766-c9ad-4d25-be8c-51966f4a4027_960x540.png" width="960" height="540" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a94c766-c9ad-4d25-be8c-51966f4a4027_960x540.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:540,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:868749,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/194444376?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a94c766-c9ad-4d25-be8c-51966f4a4027_960x540.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-Czl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a94c766-c9ad-4d25-be8c-51966f4a4027_960x540.png 424w, https://substackcdn.com/image/fetch/$s_!-Czl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a94c766-c9ad-4d25-be8c-51966f4a4027_960x540.png 848w, https://substackcdn.com/image/fetch/$s_!-Czl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a94c766-c9ad-4d25-be8c-51966f4a4027_960x540.png 1272w, https://substackcdn.com/image/fetch/$s_!-Czl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a94c766-c9ad-4d25-be8c-51966f4a4027_960x540.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Vera Rubin SuperChip: </strong>Notice two Rubin GPUs in the upper part and one Vera CPU in the lower part.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jE81!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab956247-3946-43f3-ae19-10b5f7032775_478x540.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jE81!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab956247-3946-43f3-ae19-10b5f7032775_478x540.png 424w, https://substackcdn.com/image/fetch/$s_!jE81!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab956247-3946-43f3-ae19-10b5f7032775_478x540.png 848w, https://substackcdn.com/image/fetch/$s_!jE81!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab956247-3946-43f3-ae19-10b5f7032775_478x540.png 1272w, https://substackcdn.com/image/fetch/$s_!jE81!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab956247-3946-43f3-ae19-10b5f7032775_478x540.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jE81!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab956247-3946-43f3-ae19-10b5f7032775_478x540.png" width="726" height="820.1673640167364" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ab956247-3946-43f3-ae19-10b5f7032775_478x540.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:540,&quot;width&quot;:478,&quot;resizeWidth&quot;:726,&quot;bytes&quot;:405638,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/194444376?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab956247-3946-43f3-ae19-10b5f7032775_478x540.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jE81!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab956247-3946-43f3-ae19-10b5f7032775_478x540.png 424w, https://substackcdn.com/image/fetch/$s_!jE81!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab956247-3946-43f3-ae19-10b5f7032775_478x540.png 848w, https://substackcdn.com/image/fetch/$s_!jE81!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab956247-3946-43f3-ae19-10b5f7032775_478x540.png 1272w, https://substackcdn.com/image/fetch/$s_!jE81!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab956247-3946-43f3-ae19-10b5f7032775_478x540.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>Axis 5: GPU-to-GPU Bandwidth &#8212; NVLink and the NVL72 Rack</h2><p>A single GPU cannot train a trillion-parameter model alone. The speed at which GPUs exchange data &#8212; gradients during training, KV-cache during inference &#8212; directly determines system-level performance.</p><h3>How NVLink Scales</h3><p>NVLink bandwidth per GPU = number of links (ports) &#215; bandwidth per link.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5Y2E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845f8bcc-a545-4511-94d8-c0c1f785441a_1600x302.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5Y2E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845f8bcc-a545-4511-94d8-c0c1f785441a_1600x302.png 424w, https://substackcdn.com/image/fetch/$s_!5Y2E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845f8bcc-a545-4511-94d8-c0c1f785441a_1600x302.png 848w, https://substackcdn.com/image/fetch/$s_!5Y2E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845f8bcc-a545-4511-94d8-c0c1f785441a_1600x302.png 1272w, https://substackcdn.com/image/fetch/$s_!5Y2E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845f8bcc-a545-4511-94d8-c0c1f785441a_1600x302.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5Y2E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845f8bcc-a545-4511-94d8-c0c1f785441a_1600x302.png" width="1456" height="275" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/845f8bcc-a545-4511-94d8-c0c1f785441a_1600x302.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:275,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:65208,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/194444376?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845f8bcc-a545-4511-94d8-c0c1f785441a_1600x302.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5Y2E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845f8bcc-a545-4511-94d8-c0c1f785441a_1600x302.png 424w, https://substackcdn.com/image/fetch/$s_!5Y2E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845f8bcc-a545-4511-94d8-c0c1f785441a_1600x302.png 848w, https://substackcdn.com/image/fetch/$s_!5Y2E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845f8bcc-a545-4511-94d8-c0c1f785441a_1600x302.png 1272w, https://substackcdn.com/image/fetch/$s_!5Y2E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845f8bcc-a545-4511-94d8-c0c1f785441a_1600x302.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Each generation keeps the same 18 NVLink ports per GPU but increases the per-link speed through faster SerDes circuits. From H100 to B200: 2&#215;. From B200 to Rubin: another 2&#215;.</p><p></p><h2>The NVL72: When 72 GPUs Become One</h2><h3>The Blackwell Generation Innovation</h3><p>Perhaps the most architecturally significant innovation in the Blackwell generation isn&#8217;t the GPU itself &#8212; it&#8217;s the GB200 NVL72 rack system. It connects 72 B200 GPUs and 36 Grace CPUs (36 GB200 Superchips) into a single, liquid-cooled rack where all 72 GPUs communicate at full NVLink 5 speed. The entire rack forms a single NVLink domain with 130 TB/s of aggregate bisection bandwidth. There is a GB300 version of this NVL72 system.</p><p>Each rack has 18 trays with 2 SuperChips (4 GPUs and 2 CPUs), this total of 36 CPUs and 72 GPUs. Note however that 18 trays present you with 18 nodes/servers. You can split them for prefill and decode functions. This is how GB200 NVL72 racks looks. As an exercise can you find 9 NVSwitch trays and 18 compute trays?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BRSe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4c26287-4f27-42cd-aecb-31d0fb30b3ad_435x418.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BRSe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4c26287-4f27-42cd-aecb-31d0fb30b3ad_435x418.png 424w, https://substackcdn.com/image/fetch/$s_!BRSe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4c26287-4f27-42cd-aecb-31d0fb30b3ad_435x418.png 848w, https://substackcdn.com/image/fetch/$s_!BRSe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4c26287-4f27-42cd-aecb-31d0fb30b3ad_435x418.png 1272w, https://substackcdn.com/image/fetch/$s_!BRSe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4c26287-4f27-42cd-aecb-31d0fb30b3ad_435x418.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BRSe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4c26287-4f27-42cd-aecb-31d0fb30b3ad_435x418.png" width="435" height="418" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d4c26287-4f27-42cd-aecb-31d0fb30b3ad_435x418.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:418,&quot;width&quot;:435,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:252778,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/194444376?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4c26287-4f27-42cd-aecb-31d0fb30b3ad_435x418.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BRSe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4c26287-4f27-42cd-aecb-31d0fb30b3ad_435x418.png 424w, https://substackcdn.com/image/fetch/$s_!BRSe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4c26287-4f27-42cd-aecb-31d0fb30b3ad_435x418.png 848w, https://substackcdn.com/image/fetch/$s_!BRSe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4c26287-4f27-42cd-aecb-31d0fb30b3ad_435x418.png 1272w, https://substackcdn.com/image/fetch/$s_!BRSe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4c26287-4f27-42cd-aecb-31d0fb30b3ad_435x418.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>How GPU-GPU Communication Works In NVL72 system</h3><p>The rack contains 9 NVLink Switch trays, each housing 2 NVLink Switch ASICs. These 18 switch chips create a non-blocking, all-to-all fabric. Every GPU has 18 NVLink ports wired through the copper backplane to the switch trays, which route traffic to any destination GPU.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WQnN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F375a7923-f6a3-4a29-a595-a9264facfdf9_960x538.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WQnN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F375a7923-f6a3-4a29-a595-a9264facfdf9_960x538.png 424w, https://substackcdn.com/image/fetch/$s_!WQnN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F375a7923-f6a3-4a29-a595-a9264facfdf9_960x538.png 848w, https://substackcdn.com/image/fetch/$s_!WQnN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F375a7923-f6a3-4a29-a595-a9264facfdf9_960x538.png 1272w, https://substackcdn.com/image/fetch/$s_!WQnN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F375a7923-f6a3-4a29-a595-a9264facfdf9_960x538.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WQnN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F375a7923-f6a3-4a29-a595-a9264facfdf9_960x538.png" width="960" height="538" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/375a7923-f6a3-4a29-a595-a9264facfdf9_960x538.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:538,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:224309,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/194444376?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F375a7923-f6a3-4a29-a595-a9264facfdf9_960x538.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WQnN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F375a7923-f6a3-4a29-a595-a9264facfdf9_960x538.png 424w, https://substackcdn.com/image/fetch/$s_!WQnN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F375a7923-f6a3-4a29-a595-a9264facfdf9_960x538.png 848w, https://substackcdn.com/image/fetch/$s_!WQnN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F375a7923-f6a3-4a29-a595-a9264facfdf9_960x538.png 1272w, https://substackcdn.com/image/fetch/$s_!WQnN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F375a7923-f6a3-4a29-a595-a9264facfdf9_960x538.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Comparing With H100 Based 8 GPU Servers Baseline</h3><p>Consider this popular topology developed by <a href="https://www.lmsys.org/blog/2025-05-05-large-scale-ep/">SGLang</a> team for DeepSeek inference on 12 servers with 8 H100 GPUs each. DeepSeek model has multiple layers (61 to be precise) and each has Attention and 58 have Mixture of Expert (MoE) experts. 90%+ model parameters are in MoE experts.<br><br>They reserve 3 nodes for Prefill (prompt processing and KV cache generation) and 9 for Decode (token by token generation). As you can see from the diagram, MoE experts are divided into subgroups across nodes, whereas each node (in fact each GPU) has full attention weights.</p><p>When tokens are processed by an attention layer, each of them gets sent to a bunch of experts for that layer. Some of those experts may be on the same server GPUs and some of them will be on GPUs on different servers. Communication for the former happens over NVLink within respective servers, communication for the latter happens over Infiniband which is much slower than NVLink. This introduces a performance penalty!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2kC0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0ecc5a-38bb-49f7-a26d-89665dbba02c_6000x4500.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2kC0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0ecc5a-38bb-49f7-a26d-89665dbba02c_6000x4500.png 424w, https://substackcdn.com/image/fetch/$s_!2kC0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0ecc5a-38bb-49f7-a26d-89665dbba02c_6000x4500.png 848w, https://substackcdn.com/image/fetch/$s_!2kC0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0ecc5a-38bb-49f7-a26d-89665dbba02c_6000x4500.png 1272w, https://substackcdn.com/image/fetch/$s_!2kC0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0ecc5a-38bb-49f7-a26d-89665dbba02c_6000x4500.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2kC0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0ecc5a-38bb-49f7-a26d-89665dbba02c_6000x4500.png" width="1456" height="1092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bd0ecc5a-38bb-49f7-a26d-89665dbba02c_6000x4500.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2kC0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0ecc5a-38bb-49f7-a26d-89665dbba02c_6000x4500.png 424w, https://substackcdn.com/image/fetch/$s_!2kC0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0ecc5a-38bb-49f7-a26d-89665dbba02c_6000x4500.png 848w, https://substackcdn.com/image/fetch/$s_!2kC0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0ecc5a-38bb-49f7-a26d-89665dbba02c_6000x4500.png 1272w, https://substackcdn.com/image/fetch/$s_!2kC0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0ecc5a-38bb-49f7-a26d-89665dbba02c_6000x4500.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Source: LMSYS</p><h4>How NVL72 Based Systems Solve This Problem?</h4><p>Traditional GPU clusters connect 8 GPUs per node via NVLink, then connect nodes via InfiniBand. <strong>Inter-node bandwidth is 10&#8211;50&#215; lower, creating a bandwidth cliff.</strong> The NVL72 eliminates this cliff for up to 72 GPUs. For mixture-of-experts LLM inference with GB200 based NVL72:</p><blockquote><p>&#183; Expert parallelism across 72 GPUs without bandwidth penalties</p><p>&#183; 13.5 TB of unified GPU memory (72 &#215; 192 GB) at NVLink speed</p><p>&#183; 30&#215; faster real-time inference on trillion-parameter models vs H100 (NVIDIA&#8217;s claim, we will see real world performance numbers later)</p></blockquote><h3>Grace Blackwell NVL72: Where It All Began!</h3><p>If my memory serves right NVL72 was announced in GTC 2024. I highly recommend watching the following video:</p><div id="youtube2-0JxowHz0JsM" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;0JxowHz0JsM&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/0JxowHz0JsM?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h3>Vera Rubin NVL72: The Next Step!</h3><p>The Vera Rubin NVL72 packs 72 Rubin GPUs and 36 Vera CPUs (36 Vera Rubin Superchips) into a rack delivering 3.6 EFLOPS of FP4 inference. With NVLink 6 at 3.6 TB/s per GPU, the aggregate rack bandwidth reaches 260 TB/s &#8212; 2&#215; the Blackwell NVL72. NVIDIA also introduces silicon photonics for rack-to-rack links, enabling POD configurations of up to 576 GPUs.</p><p>I highly recommend watching the following video:</p><div id="youtube2-e3PIvqig1MM" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;e3PIvqig1MM&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/e3PIvqig1MM?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p></p><h2><strong>Role of GPU-HBM &amp; GPU-GPU Bandwidths </strong></h2><p>In both training and inference, GPU compute cores spend most of their time <em>waiting for data</em> &#8212; the bottleneck is almost never arithmetic, it&#8217;s how fast you can feed weights, activations, and gradients to the tensor cores. During training, each GPU must read the full model weights from HBM every forward and backward pass, exchange gradients with every other GPU after each step, and write updated weights back &#8212; so HBM bandwidth determines how fast a single GPU can churn through batches, while NVLink bandwidth determines how fast GPUs can synchronise with each other. </p><p>The H100&#8217;s 3.35 TB/s HBM bandwidth meant a 70B-parameter model in FP16 (~140 GB, H100 has only 80GB memory, but let us ignore that for now) took about 42 microseconds just to <em>read once</em> from memory; the B200&#8217;s 8 TB/s cuts that to 17.5 &#181;s, and Rubin&#8217;s 22 TB/s brings it under 6.4 &#181;s &#8212; directly translating to faster iteration. <br>NOTE: All weights are not read from HBM at once, but they are loaded by GPUs layer by layer. This exercise tells you cumulative time lost in loading weights.</p><p>During inference, the constraint is even starker: autoregressive token generation is almost entirely memory-bandwidth-bound because each token requires reading the full KV-cache and model weights but performs very little compute per byte read &#8212; this is why inference throughput scales nearly linearly with HBM bandwidth. On the NVLink side, large models that don&#8217;t fit on a single GPU must be split across multiple GPUs using tensor parallelism (splitting individual matrix multiplies) or pipeline parallelism (splitting layers), both of which require GPUs to exchange intermediate activations every few milliseconds &#8212; if NVLink bandwidth is 2&#215; higher, you can split across 2&#215; more GPUs before communication becomes the bottleneck, which is exactly why the GB200 NVL72&#8217;s 130 TB/s aggregate bandwidth enables mixture-of-experts models with hundreds of experts distributed across 72 GPUs without the communication penalty that would cripple the same workload on an 8-GPU H100 based node connected by InfiniBand. In short: <strong>HBM bandwidth sets the ceiling on single-GPU performance, and NVLink bandwidth sets the ceiling on how many GPUs you can scale to before returns diminish &#8212; every generation that raises both ceilings directly unlocks larger models, longer contexts, and lower latency.</strong></p><h2>Are These Innovations Helping? </h2><p>One of the best places to see the performance difference caused by these innovations is checking out results of <a href="https://inferencex.semianalysis.com/inference">InferenceMax</a> benchmark maintained by SemiAnalysis. Scaling to larger number of GPUs in a single domain and change of GPU generation can cause dramatic improvement in performance. Below we compare performance of H200 (8 GPU servers) with that of GB300 (NVL72 server with 72 GPUs) for inference of DeepSeek R1 model.</p><p>Quote by SemiAnalysis: <strong>At GTC 2024, Jensen said that GB200 NVL72 was 35x faster than Hopper. Nobody believed it and thought it was classic fake Jensen Math. When we tested the performance of it, it wasn't just 35x faster, it was over 50x times faster even against an strong Hopper baseline with all of the inference optimisation composed together like MTP, Disagg prefill, wideEP, etc. View the nuanced results at InferenceX dot com.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6Cfl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5575a701-75de-4f26-bdd7-b1f8cdf72590_2376x1344.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6Cfl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5575a701-75de-4f26-bdd7-b1f8cdf72590_2376x1344.png 424w, https://substackcdn.com/image/fetch/$s_!6Cfl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5575a701-75de-4f26-bdd7-b1f8cdf72590_2376x1344.png 848w, https://substackcdn.com/image/fetch/$s_!6Cfl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5575a701-75de-4f26-bdd7-b1f8cdf72590_2376x1344.png 1272w, https://substackcdn.com/image/fetch/$s_!6Cfl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5575a701-75de-4f26-bdd7-b1f8cdf72590_2376x1344.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6Cfl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5575a701-75de-4f26-bdd7-b1f8cdf72590_2376x1344.png" width="1456" height="824" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5575a701-75de-4f26-bdd7-b1f8cdf72590_2376x1344.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:824,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:354138,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/194444376?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5575a701-75de-4f26-bdd7-b1f8cdf72590_2376x1344.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6Cfl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5575a701-75de-4f26-bdd7-b1f8cdf72590_2376x1344.png 424w, https://substackcdn.com/image/fetch/$s_!6Cfl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5575a701-75de-4f26-bdd7-b1f8cdf72590_2376x1344.png 848w, https://substackcdn.com/image/fetch/$s_!6Cfl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5575a701-75de-4f26-bdd7-b1f8cdf72590_2376x1344.png 1272w, https://substackcdn.com/image/fetch/$s_!6Cfl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5575a701-75de-4f26-bdd7-b1f8cdf72590_2376x1344.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Source: SemiAnalysis</p><h2>2026: The Era of Extreme Co-design</h2><p>In the latest Nvidia GTC conference (GTC 2026), Nvidia announced five rack types. We will see in upcoming articles the roles of other racks. But for now remember Groq3 LPX helps accelerate decode performance, Vera CPU racks are useful for deploying large number of agents (during training with RL or inference), STX Storage racks are for KV Cache and Spectrum-6 SPX are for networking.</p><p>Nvidia truely understands that &#8220;the whole can be greater than the sum of parts&#8221;.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_u0m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea5339c-9cae-49ec-94dc-cb64afa82f61_953x510.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_u0m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea5339c-9cae-49ec-94dc-cb64afa82f61_953x510.webp 424w, https://substackcdn.com/image/fetch/$s_!_u0m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea5339c-9cae-49ec-94dc-cb64afa82f61_953x510.webp 848w, https://substackcdn.com/image/fetch/$s_!_u0m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea5339c-9cae-49ec-94dc-cb64afa82f61_953x510.webp 1272w, https://substackcdn.com/image/fetch/$s_!_u0m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea5339c-9cae-49ec-94dc-cb64afa82f61_953x510.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_u0m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea5339c-9cae-49ec-94dc-cb64afa82f61_953x510.webp" width="953" height="510" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ea5339c-9cae-49ec-94dc-cb64afa82f61_953x510.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:510,&quot;width&quot;:953,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;NVIDIA Vera Rubin POD: Seven Chips, Five Rack-Scale Systems, One AI  Supercomputer | NVIDIA Technical Blog&quot;,&quot;title&quot;:&quot;NVIDIA Vera Rubin POD: Seven Chips, Five Rack-Scale Systems, One AI  Supercomputer | NVIDIA Technical Blog&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="NVIDIA Vera Rubin POD: Seven Chips, Five Rack-Scale Systems, One AI  Supercomputer | NVIDIA Technical Blog" title="NVIDIA Vera Rubin POD: Seven Chips, Five Rack-Scale Systems, One AI  Supercomputer | NVIDIA Technical Blog" srcset="https://substackcdn.com/image/fetch/$s_!_u0m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea5339c-9cae-49ec-94dc-cb64afa82f61_953x510.webp 424w, https://substackcdn.com/image/fetch/$s_!_u0m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea5339c-9cae-49ec-94dc-cb64afa82f61_953x510.webp 848w, https://substackcdn.com/image/fetch/$s_!_u0m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea5339c-9cae-49ec-94dc-cb64afa82f61_953x510.webp 1272w, https://substackcdn.com/image/fetch/$s_!_u0m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ea5339c-9cae-49ec-94dc-cb64afa82f61_953x510.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>SUMMARY: Why Do GPUs Keep Getting Faster?</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7s0k!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F502cf94a-e5bf-419b-a611-c0431f30140f_1598x830.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7s0k!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F502cf94a-e5bf-419b-a611-c0431f30140f_1598x830.png 424w, https://substackcdn.com/image/fetch/$s_!7s0k!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F502cf94a-e5bf-419b-a611-c0431f30140f_1598x830.png 848w, https://substackcdn.com/image/fetch/$s_!7s0k!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F502cf94a-e5bf-419b-a611-c0431f30140f_1598x830.png 1272w, https://substackcdn.com/image/fetch/$s_!7s0k!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F502cf94a-e5bf-419b-a611-c0431f30140f_1598x830.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7s0k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F502cf94a-e5bf-419b-a611-c0431f30140f_1598x830.png" width="1456" height="756" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/502cf94a-e5bf-419b-a611-c0431f30140f_1598x830.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:756,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:191060,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/194444376?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F502cf94a-e5bf-419b-a611-c0431f30140f_1598x830.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7s0k!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F502cf94a-e5bf-419b-a611-c0431f30140f_1598x830.png 424w, https://substackcdn.com/image/fetch/$s_!7s0k!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F502cf94a-e5bf-419b-a611-c0431f30140f_1598x830.png 848w, https://substackcdn.com/image/fetch/$s_!7s0k!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F502cf94a-e5bf-419b-a611-c0431f30140f_1598x830.png 1272w, https://substackcdn.com/image/fetch/$s_!7s0k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F502cf94a-e5bf-419b-a611-c0431f30140f_1598x830.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The era of &#8220;free&#8221; scaling from Moore&#8217;s Law is over. What we&#8217;re in now is the era of system-level scaling &#8212; where performance comes from co-designing the chip, the precision format, the package, the memory, the interconnect, and the rack as a single integrated system. And NVIDIA has mastered this playbook.</p><p>Moore&#8217;s Law gave us exponential gains for free. The post-Moore era demands exponential engineering effort for each generation. NVIDIA&#8217;s bet is that AI workloads are valuable enough to justify that effort &#8212; and so far, the market agrees.</p><h2>APPENDIX</h2><h3>Approach that Huawei is following:</h3><p>While Chinese chipmakers can not get EUV machine needed for chips below 7nm, they can still follow this extreme co-design approach and build efficient systems for AI training and inference that compensate for not having chips denser than 7nm. That is exactly what Huawei is doing with their Cloud Matrix multi rack solution. You can read more from this <a href="https://newsletter.semianalysis.com/p/huawei-ai-cloudmatrix-384-chinas-answer-to-nvidia-gb200-nvl72">SemiAnalysis</a> report. Off-course these systems would need more power, but power is abundant in China.</p><p>Nvidia&#8217;s western competition like AMD is following this approach as well. So this is the race where the fastest one wins. However, what benefits Nvidia is also being able to work with thousands of engineers from frontier labs that informs them how the next generation of systems need to be built. Otherwise they have no way of knowing. In return engineers build model architectures and inference engines that take advantage of Nvidia ecosystem capabilities. In Jensen&#8217;s view China has abundant AI talent. If Nvidia is not active in China, they miss out on all these learnings that are possible by closely working with that talent. Also labs from the region won&#8217;t build for Nvidia ecosystem - thus a great American company stands to lose its position.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pqIj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd653127-08ee-40ac-b023-5d629cca21b4_1000x485.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pqIj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd653127-08ee-40ac-b023-5d629cca21b4_1000x485.jpeg 424w, https://substackcdn.com/image/fetch/$s_!pqIj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd653127-08ee-40ac-b023-5d629cca21b4_1000x485.jpeg 848w, https://substackcdn.com/image/fetch/$s_!pqIj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd653127-08ee-40ac-b023-5d629cca21b4_1000x485.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!pqIj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd653127-08ee-40ac-b023-5d629cca21b4_1000x485.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pqIj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd653127-08ee-40ac-b023-5d629cca21b4_1000x485.jpeg" width="1000" height="485" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bd653127-08ee-40ac-b023-5d629cca21b4_1000x485.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:485,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;CloudMatrix M8&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="CloudMatrix M8" title="CloudMatrix M8" srcset="https://substackcdn.com/image/fetch/$s_!pqIj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd653127-08ee-40ac-b023-5d629cca21b4_1000x485.jpeg 424w, https://substackcdn.com/image/fetch/$s_!pqIj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd653127-08ee-40ac-b023-5d629cca21b4_1000x485.jpeg 848w, https://substackcdn.com/image/fetch/$s_!pqIj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd653127-08ee-40ac-b023-5d629cca21b4_1000x485.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!pqIj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd653127-08ee-40ac-b023-5d629cca21b4_1000x485.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>Dedication</strong>:</h3><p>Dedicating this article to the giant who inspired the industry at an exponential speed Gordon Moore. He has a true successor in form of Jensen Huang (Nvidia CEO) who continue to inspire the industry for exponential improvements.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NZpH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9946ea88-7ec1-488e-91fa-6ddd54050c58_1746x1008.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NZpH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9946ea88-7ec1-488e-91fa-6ddd54050c58_1746x1008.png 424w, https://substackcdn.com/image/fetch/$s_!NZpH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9946ea88-7ec1-488e-91fa-6ddd54050c58_1746x1008.png 848w, https://substackcdn.com/image/fetch/$s_!NZpH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9946ea88-7ec1-488e-91fa-6ddd54050c58_1746x1008.png 1272w, https://substackcdn.com/image/fetch/$s_!NZpH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9946ea88-7ec1-488e-91fa-6ddd54050c58_1746x1008.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NZpH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9946ea88-7ec1-488e-91fa-6ddd54050c58_1746x1008.png" width="1456" height="841" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9946ea88-7ec1-488e-91fa-6ddd54050c58_1746x1008.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:841,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1170082,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/194444376?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9946ea88-7ec1-488e-91fa-6ddd54050c58_1746x1008.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!NZpH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9946ea88-7ec1-488e-91fa-6ddd54050c58_1746x1008.png 424w, https://substackcdn.com/image/fetch/$s_!NZpH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9946ea88-7ec1-488e-91fa-6ddd54050c58_1746x1008.png 848w, https://substackcdn.com/image/fetch/$s_!NZpH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9946ea88-7ec1-488e-91fa-6ddd54050c58_1746x1008.png 1272w, https://substackcdn.com/image/fetch/$s_!NZpH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9946ea88-7ec1-488e-91fa-6ddd54050c58_1746x1008.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://polymath707.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Girish Patil! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Claude Mythos class: Training Compute & Cost Analysis for the next gen models]]></title><description><![CDATA[Estimating infrastructure requirements for hypothetical frontier MoE models at 2T, 4T, and 10T total parameter scales with 5% MoE sparsity ratio on NVIDIA Blackwell GPUs.]]></description><link>https://polymath707.substack.com/p/claude-mythos-class-training-compute</link><guid isPermaLink="false">https://polymath707.substack.com/p/claude-mythos-class-training-compute</guid><pubDate>Wed, 08 Apr 2026 21:52:50 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!8EhS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d5f675-6287-496a-80a4-5853de9913f4_3004x1488.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Saying Claude Mythos is impressive is an understatement. It represents the next step change in capabilities, and as Dario said there is no upper limit.</p><p>We want to understand how much compute and data are needed to train the next generation of models which are going to be in 2T, 4T and 10T parameter count classes (rumours!). Anyways, Elon announced today that they are training their models in this class, so this would be a very timely exercise.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://polymath707.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3H8p!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc703e1b-8ade-4ed0-80d9-f1ac48e985cc_720x255.avif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3H8p!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc703e1b-8ade-4ed0-80d9-f1ac48e985cc_720x255.avif 424w, https://substackcdn.com/image/fetch/$s_!3H8p!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc703e1b-8ade-4ed0-80d9-f1ac48e985cc_720x255.avif 848w, https://substackcdn.com/image/fetch/$s_!3H8p!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc703e1b-8ade-4ed0-80d9-f1ac48e985cc_720x255.avif 1272w, https://substackcdn.com/image/fetch/$s_!3H8p!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc703e1b-8ade-4ed0-80d9-f1ac48e985cc_720x255.avif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3H8p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc703e1b-8ade-4ed0-80d9-f1ac48e985cc_720x255.avif" width="720" height="255" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dc703e1b-8ade-4ed0-80d9-f1ac48e985cc_720x255.avif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:255,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Elon Musk Reveals New Grok Training on 10 Trillion Parameters: 'Need to catch up'&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Elon Musk Reveals New Grok Training on 10 Trillion Parameters: 'Need to catch up'" title="Elon Musk Reveals New Grok Training on 10 Trillion Parameters: 'Need to catch up'" srcset="https://substackcdn.com/image/fetch/$s_!3H8p!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc703e1b-8ade-4ed0-80d9-f1ac48e985cc_720x255.avif 424w, https://substackcdn.com/image/fetch/$s_!3H8p!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc703e1b-8ade-4ed0-80d9-f1ac48e985cc_720x255.avif 848w, https://substackcdn.com/image/fetch/$s_!3H8p!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc703e1b-8ade-4ed0-80d9-f1ac48e985cc_720x255.avif 1272w, https://substackcdn.com/image/fetch/$s_!3H8p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc703e1b-8ade-4ed0-80d9-f1ac48e985cc_720x255.avif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Methodology</strong>: We use formulas and approaches used in estimating training costs for DeepSeek V3, ARCEE, KIMI 2.5 etc. We will use GPU performance numbers from Nvidia Data-sheets for B200/B300 and NVL72 racks, and statistics on data availability from Epoch AI. When we say B200/B300 we are actually referring them in GB200/GB300 superchips, as a part of NVL72 racks. NVL72 has world size of 72GPUs, it makes it easier to train models in this class as those 72 GPUs can access each others HBM at high throughputs through NVLink/NVSwitch connectivity. That said data parallelism alone is not enough. You still need other parallelism (pipeline, tensor, sequence). To account for all the bottlenecks, we assume MFU (Model Flop Utilisation) of only 20%.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_LM-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d5f4c2-ac8b-4c55-81c4-a639948c8fea_1271x654.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_LM-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d5f4c2-ac8b-4c55-81c4-a639948c8fea_1271x654.png 424w, https://substackcdn.com/image/fetch/$s_!_LM-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d5f4c2-ac8b-4c55-81c4-a639948c8fea_1271x654.png 848w, https://substackcdn.com/image/fetch/$s_!_LM-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d5f4c2-ac8b-4c55-81c4-a639948c8fea_1271x654.png 1272w, https://substackcdn.com/image/fetch/$s_!_LM-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d5f4c2-ac8b-4c55-81c4-a639948c8fea_1271x654.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_LM-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d5f4c2-ac8b-4c55-81c4-a639948c8fea_1271x654.png" width="1271" height="654" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b8d5f4c2-ac8b-4c55-81c4-a639948c8fea_1271x654.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:654,&quot;width&quot;:1271,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;AI factory&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="AI factory" title="AI factory" srcset="https://substackcdn.com/image/fetch/$s_!_LM-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d5f4c2-ac8b-4c55-81c4-a639948c8fea_1271x654.png 424w, https://substackcdn.com/image/fetch/$s_!_LM-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d5f4c2-ac8b-4c55-81c4-a639948c8fea_1271x654.png 848w, https://substackcdn.com/image/fetch/$s_!_LM-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d5f4c2-ac8b-4c55-81c4-a639948c8fea_1271x654.png 1272w, https://substackcdn.com/image/fetch/$s_!_LM-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8d5f4c2-ac8b-4c55-81c4-a639948c8fea_1271x654.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>1. Assumptions</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3dg3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36b70328-c6bd-46bd-bfce-5030bc0230b3_1600x586.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3dg3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36b70328-c6bd-46bd-bfce-5030bc0230b3_1600x586.png 424w, https://substackcdn.com/image/fetch/$s_!3dg3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36b70328-c6bd-46bd-bfce-5030bc0230b3_1600x586.png 848w, https://substackcdn.com/image/fetch/$s_!3dg3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36b70328-c6bd-46bd-bfce-5030bc0230b3_1600x586.png 1272w, https://substackcdn.com/image/fetch/$s_!3dg3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36b70328-c6bd-46bd-bfce-5030bc0230b3_1600x586.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3dg3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36b70328-c6bd-46bd-bfce-5030bc0230b3_1600x586.png" width="1456" height="533" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/36b70328-c6bd-46bd-bfce-5030bc0230b3_1600x586.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:533,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:148563,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/193616072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36b70328-c6bd-46bd-bfce-5030bc0230b3_1600x586.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3dg3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36b70328-c6bd-46bd-bfce-5030bc0230b3_1600x586.png 424w, https://substackcdn.com/image/fetch/$s_!3dg3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36b70328-c6bd-46bd-bfce-5030bc0230b3_1600x586.png 848w, https://substackcdn.com/image/fetch/$s_!3dg3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36b70328-c6bd-46bd-bfce-5030bc0230b3_1600x586.png 1272w, https://substackcdn.com/image/fetch/$s_!3dg3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36b70328-c6bd-46bd-bfce-5030bc0230b3_1600x586.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>2. Training Data Requirements</h4><p><strong>Formula:</strong></p><blockquote><p>D = tokens_per_parameter &#215; N_total</p><p>D = 40 &#215; N_total</p></blockquote><p>Token count is based on total parameters &#8212; all expert weights must be trained across the full run, even though only 5% activate per token.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1hon!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb761e340-8b62-4252-ac1d-a0b1d9969ead_1648x254.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1hon!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb761e340-8b62-4252-ac1d-a0b1d9969ead_1648x254.png 424w, https://substackcdn.com/image/fetch/$s_!1hon!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb761e340-8b62-4252-ac1d-a0b1d9969ead_1648x254.png 848w, https://substackcdn.com/image/fetch/$s_!1hon!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb761e340-8b62-4252-ac1d-a0b1d9969ead_1648x254.png 1272w, https://substackcdn.com/image/fetch/$s_!1hon!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb761e340-8b62-4252-ac1d-a0b1d9969ead_1648x254.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1hon!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb761e340-8b62-4252-ac1d-a0b1d9969ead_1648x254.png" width="1456" height="224" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b761e340-8b62-4252-ac1d-a0b1d9969ead_1648x254.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:224,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:61972,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/193616072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb761e340-8b62-4252-ac1d-a0b1d9969ead_1648x254.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1hon!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb761e340-8b62-4252-ac1d-a0b1d9969ead_1648x254.png 424w, https://substackcdn.com/image/fetch/$s_!1hon!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb761e340-8b62-4252-ac1d-a0b1d9969ead_1648x254.png 848w, https://substackcdn.com/image/fetch/$s_!1hon!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb761e340-8b62-4252-ac1d-a0b1d9969ead_1648x254.png 1272w, https://substackcdn.com/image/fetch/$s_!1hon!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb761e340-8b62-4252-ac1d-a0b1d9969ead_1648x254.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The publicly crawlable internet provides ~100T tokens (<a href="https://epoch.ai/blog/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data/">Epoch AI</a> analysis. I have highlighted 100T tokens mark below). Models requiring more must rely on synthetic data generation, rephrasing, or multi-epoch training. As we will see later, synthetic data generation becomes very costly as we approach 10T parameter class.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TjlX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988cdf-b29f-4dc1-960a-f8f06b5412ae_1628x864.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TjlX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988cdf-b29f-4dc1-960a-f8f06b5412ae_1628x864.png 424w, https://substackcdn.com/image/fetch/$s_!TjlX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988cdf-b29f-4dc1-960a-f8f06b5412ae_1628x864.png 848w, https://substackcdn.com/image/fetch/$s_!TjlX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988cdf-b29f-4dc1-960a-f8f06b5412ae_1628x864.png 1272w, https://substackcdn.com/image/fetch/$s_!TjlX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988cdf-b29f-4dc1-960a-f8f06b5412ae_1628x864.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TjlX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988cdf-b29f-4dc1-960a-f8f06b5412ae_1628x864.png" width="1456" height="773" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e8988cdf-b29f-4dc1-960a-f8f06b5412ae_1628x864.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:773,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:994130,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/193616072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988cdf-b29f-4dc1-960a-f8f06b5412ae_1628x864.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TjlX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988cdf-b29f-4dc1-960a-f8f06b5412ae_1628x864.png 424w, https://substackcdn.com/image/fetch/$s_!TjlX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988cdf-b29f-4dc1-960a-f8f06b5412ae_1628x864.png 848w, https://substackcdn.com/image/fetch/$s_!TjlX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988cdf-b29f-4dc1-960a-f8f06b5412ae_1628x864.png 1272w, https://substackcdn.com/image/fetch/$s_!TjlX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8988cdf-b29f-4dc1-960a-f8f06b5412ae_1628x864.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4>3. Sparsity Ratios of Frontier MoE Models</h4><p>To justify the 5% active ratio, we surveyed the latest frontier models:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IexE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d889303-a46f-43f8-9926-14ac14d28ecb_1594x450.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IexE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d889303-a46f-43f8-9926-14ac14d28ecb_1594x450.png 424w, https://substackcdn.com/image/fetch/$s_!IexE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d889303-a46f-43f8-9926-14ac14d28ecb_1594x450.png 848w, https://substackcdn.com/image/fetch/$s_!IexE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d889303-a46f-43f8-9926-14ac14d28ecb_1594x450.png 1272w, https://substackcdn.com/image/fetch/$s_!IexE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d889303-a46f-43f8-9926-14ac14d28ecb_1594x450.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IexE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d889303-a46f-43f8-9926-14ac14d28ecb_1594x450.png" width="1456" height="411" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9d889303-a46f-43f8-9926-14ac14d28ecb_1594x450.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:411,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:116273,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/193616072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d889303-a46f-43f8-9926-14ac14d28ecb_1594x450.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IexE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d889303-a46f-43f8-9926-14ac14d28ecb_1594x450.png 424w, https://substackcdn.com/image/fetch/$s_!IexE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d889303-a46f-43f8-9926-14ac14d28ecb_1594x450.png 848w, https://substackcdn.com/image/fetch/$s_!IexE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d889303-a46f-43f8-9926-14ac14d28ecb_1594x450.png 1272w, https://substackcdn.com/image/fetch/$s_!IexE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d889303-a46f-43f8-9926-14ac14d28ecb_1594x450.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h5>Why 5% Is a Sound Assumption</h5><blockquote><p>&#183; DeepSeek V3/V3.2 (5.4%), GLM-5 (5.4%), and MiniMax M2.5 (4.3%) all cluster around 4&#8211;6%. 5% is the center of this range.</p><p>&#183; Kimi K2.5 already achieves 3.1% &#8212; the trend is toward more sparsity. 5% is conservative for a next-gen model.</p><p>&#183; Older models (Mixtral, DBRX) used ~27% active. In 18 months the field moved to 3&#8211;6%, a decisive shift.</p><p>&#183; DeepSeek V3 trained successfully at 5.5% active on 14.8T tokens with no quality loss vs. dense baselines.</p><p>&#183; 5% active yields practical sizes: 2T&#8594;100B, 4T&#8594;200B, 10T&#8594;500B active per token &#8212; all proven architecture scales.</p></blockquote><h4>4. Training Compute (FLOPs)</h4><p><strong>With MoE, only the active parameters compute per token:</strong></p><blockquote><p>N_active = N_total &#215; 0.05</p><p>C = 6 &#215; N_active &#215; D</p><p>C = 6 &#215; (N_total &#215; 0.05) &#215; (40 &#215; N_total)</p><p>C = 12 &#215; N_total&#178;</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!e0Da!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb766c567-3aec-4e49-aa4f-3f00f4d46ff5_1600x214.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!e0Da!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb766c567-3aec-4e49-aa4f-3f00f4d46ff5_1600x214.png 424w, https://substackcdn.com/image/fetch/$s_!e0Da!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb766c567-3aec-4e49-aa4f-3f00f4d46ff5_1600x214.png 848w, https://substackcdn.com/image/fetch/$s_!e0Da!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb766c567-3aec-4e49-aa4f-3f00f4d46ff5_1600x214.png 1272w, https://substackcdn.com/image/fetch/$s_!e0Da!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb766c567-3aec-4e49-aa4f-3f00f4d46ff5_1600x214.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!e0Da!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb766c567-3aec-4e49-aa4f-3f00f4d46ff5_1600x214.png" width="1456" height="195" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b766c567-3aec-4e49-aa4f-3f00f4d46ff5_1600x214.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:195,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:49216,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/193616072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb766c567-3aec-4e49-aa4f-3f00f4d46ff5_1600x214.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!e0Da!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb766c567-3aec-4e49-aa4f-3f00f4d46ff5_1600x214.png 424w, https://substackcdn.com/image/fetch/$s_!e0Da!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb766c567-3aec-4e49-aa4f-3f00f4d46ff5_1600x214.png 848w, https://substackcdn.com/image/fetch/$s_!e0Da!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb766c567-3aec-4e49-aa4f-3f00f4d46ff5_1600x214.png 1272w, https://substackcdn.com/image/fetch/$s_!e0Da!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb766c567-3aec-4e49-aa4f-3f00f4d46ff5_1600x214.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h4>5. GPU Specifications</h4><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_fQs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62208cab-1d19-4d96-8e02-d10ead83b0e7_1596x360.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_fQs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62208cab-1d19-4d96-8e02-d10ead83b0e7_1596x360.png 424w, https://substackcdn.com/image/fetch/$s_!_fQs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62208cab-1d19-4d96-8e02-d10ead83b0e7_1596x360.png 848w, https://substackcdn.com/image/fetch/$s_!_fQs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62208cab-1d19-4d96-8e02-d10ead83b0e7_1596x360.png 1272w, https://substackcdn.com/image/fetch/$s_!_fQs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62208cab-1d19-4d96-8e02-d10ead83b0e7_1596x360.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_fQs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62208cab-1d19-4d96-8e02-d10ead83b0e7_1596x360.png" width="1456" height="328" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/62208cab-1d19-4d96-8e02-d10ead83b0e7_1596x360.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:328,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:69427,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/193616072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62208cab-1d19-4d96-8e02-d10ead83b0e7_1596x360.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_fQs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62208cab-1d19-4d96-8e02-d10ead83b0e7_1596x360.png 424w, https://substackcdn.com/image/fetch/$s_!_fQs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62208cab-1d19-4d96-8e02-d10ead83b0e7_1596x360.png 848w, https://substackcdn.com/image/fetch/$s_!_fQs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62208cab-1d19-4d96-8e02-d10ead83b0e7_1596x360.png 1272w, https://substackcdn.com/image/fetch/$s_!_fQs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62208cab-1d19-4d96-8e02-d10ead83b0e7_1596x360.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>B300 only outperforms B200 at FP4. At BF16, they are identical (2.25 PFLOPS).</p><h4>6. Calculation Chain</h4><blockquote><p>Step 1: Effective_FLOPS = Peak_FLOPS &#215; 0.20</p><p>Step 2: GPU_seconds = C &#247; Effective_FLOPS</p><p>Step 3: GPU_hours = GPU_seconds &#247; 3,600</p><p>Step 4: GPUs_needed = GPU_hours &#247; 2,160 (= 90 days &#215; 24 hrs)</p><p>Step 5: NVL72_Racks = GPUs_needed &#247; 72</p><p>Step 6: Total_Cost = GPU_hours &#215; $/GPU/hr</p></blockquote><h4>7. FP4 Scenario</h4><p>Theoretical best case &#8212; all FLOPs at FP4 throughput. </p><p><em>From what I know it is not possible today, or not proven today at this scale. <a href="https://arxiv.org/pdf/2509.25149">Nvidia has done it for smaller models like 12B</a> with 4 bit precision training. But, even there they maintained many other things in higher precision. To quote: &#8220;<strong>Attention, embedding, non-linear layers, and other tensors:</strong> To ensure numerical stability during training, we retain the original precision (e.g., BF16 or FP32) for embeddings, the output projection head, normalization layers, non-linearities, and attention components, including softmax and the query-key and attention score-value batched GEMMs. The main weights (stored by the optimizer), weight gradients (used for gradient accumulation across microbatches and across data-parallel replicas), and optimizer states are also kept in FP32. Tensor parallel reductions are performed in BF16 precision.&#8221;</em></p><h5>B200 (FP4) &#8212; $5/GPU/hr</h5><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UbqF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0f79e7-0a2e-490b-b66b-0ba72048a86d_1590x212.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UbqF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0f79e7-0a2e-490b-b66b-0ba72048a86d_1590x212.png 424w, https://substackcdn.com/image/fetch/$s_!UbqF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0f79e7-0a2e-490b-b66b-0ba72048a86d_1590x212.png 848w, https://substackcdn.com/image/fetch/$s_!UbqF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0f79e7-0a2e-490b-b66b-0ba72048a86d_1590x212.png 1272w, https://substackcdn.com/image/fetch/$s_!UbqF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0f79e7-0a2e-490b-b66b-0ba72048a86d_1590x212.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UbqF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0f79e7-0a2e-490b-b66b-0ba72048a86d_1590x212.png" width="1456" height="194" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5e0f79e7-0a2e-490b-b66b-0ba72048a86d_1590x212.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:194,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:56201,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/193616072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0f79e7-0a2e-490b-b66b-0ba72048a86d_1590x212.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UbqF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0f79e7-0a2e-490b-b66b-0ba72048a86d_1590x212.png 424w, https://substackcdn.com/image/fetch/$s_!UbqF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0f79e7-0a2e-490b-b66b-0ba72048a86d_1590x212.png 848w, https://substackcdn.com/image/fetch/$s_!UbqF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0f79e7-0a2e-490b-b66b-0ba72048a86d_1590x212.png 1272w, https://substackcdn.com/image/fetch/$s_!UbqF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e0f79e7-0a2e-490b-b66b-0ba72048a86d_1590x212.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h5>B300 (FP4) &#8212; $6/GPU/hr</h5><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9OsU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28464895-5837-4f9c-845a-4487a5ef30ae_1592x212.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9OsU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28464895-5837-4f9c-845a-4487a5ef30ae_1592x212.png 424w, https://substackcdn.com/image/fetch/$s_!9OsU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28464895-5837-4f9c-845a-4487a5ef30ae_1592x212.png 848w, https://substackcdn.com/image/fetch/$s_!9OsU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28464895-5837-4f9c-845a-4487a5ef30ae_1592x212.png 1272w, https://substackcdn.com/image/fetch/$s_!9OsU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28464895-5837-4f9c-845a-4487a5ef30ae_1592x212.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9OsU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28464895-5837-4f9c-845a-4487a5ef30ae_1592x212.png" width="1456" height="194" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/28464895-5837-4f9c-845a-4487a5ef30ae_1592x212.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:194,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:56125,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/193616072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28464895-5837-4f9c-845a-4487a5ef30ae_1592x212.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9OsU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28464895-5837-4f9c-845a-4487a5ef30ae_1592x212.png 424w, https://substackcdn.com/image/fetch/$s_!9OsU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28464895-5837-4f9c-845a-4487a5ef30ae_1592x212.png 848w, https://substackcdn.com/image/fetch/$s_!9OsU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28464895-5837-4f9c-845a-4487a5ef30ae_1592x212.png 1272w, https://substackcdn.com/image/fetch/$s_!9OsU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28464895-5837-4f9c-845a-4487a5ef30ae_1592x212.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h4>8. BF16 Scenario</h4><p>This is a more realistic scenario that they did all their training in BF16.</p><h5>B200 (BF16) &#8212; $5/GPU/hr</h5><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!536T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81fba8ef-fc97-438f-8b95-2144e52f1b41_1596x212.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!536T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81fba8ef-fc97-438f-8b95-2144e52f1b41_1596x212.png 424w, https://substackcdn.com/image/fetch/$s_!536T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81fba8ef-fc97-438f-8b95-2144e52f1b41_1596x212.png 848w, https://substackcdn.com/image/fetch/$s_!536T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81fba8ef-fc97-438f-8b95-2144e52f1b41_1596x212.png 1272w, https://substackcdn.com/image/fetch/$s_!536T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81fba8ef-fc97-438f-8b95-2144e52f1b41_1596x212.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!536T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81fba8ef-fc97-438f-8b95-2144e52f1b41_1596x212.png" width="1456" height="193" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/81fba8ef-fc97-438f-8b95-2144e52f1b41_1596x212.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:193,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:57041,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/193616072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81fba8ef-fc97-438f-8b95-2144e52f1b41_1596x212.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!536T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81fba8ef-fc97-438f-8b95-2144e52f1b41_1596x212.png 424w, https://substackcdn.com/image/fetch/$s_!536T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81fba8ef-fc97-438f-8b95-2144e52f1b41_1596x212.png 848w, https://substackcdn.com/image/fetch/$s_!536T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81fba8ef-fc97-438f-8b95-2144e52f1b41_1596x212.png 1272w, https://substackcdn.com/image/fetch/$s_!536T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81fba8ef-fc97-438f-8b95-2144e52f1b41_1596x212.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h5>B300 (BF16) &#8212; $6/GPU/hr</h5><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!g0LZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9629b42-aabd-4a2c-a463-2a6b9e165f75_1596x212.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!g0LZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9629b42-aabd-4a2c-a463-2a6b9e165f75_1596x212.png 424w, https://substackcdn.com/image/fetch/$s_!g0LZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9629b42-aabd-4a2c-a463-2a6b9e165f75_1596x212.png 848w, https://substackcdn.com/image/fetch/$s_!g0LZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9629b42-aabd-4a2c-a463-2a6b9e165f75_1596x212.png 1272w, https://substackcdn.com/image/fetch/$s_!g0LZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9629b42-aabd-4a2c-a463-2a6b9e165f75_1596x212.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!g0LZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9629b42-aabd-4a2c-a463-2a6b9e165f75_1596x212.png" width="1456" height="193" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f9629b42-aabd-4a2c-a463-2a6b9e165f75_1596x212.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:193,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:56615,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/193616072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9629b42-aabd-4a2c-a463-2a6b9e165f75_1596x212.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!g0LZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9629b42-aabd-4a2c-a463-2a6b9e165f75_1596x212.png 424w, https://substackcdn.com/image/fetch/$s_!g0LZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9629b42-aabd-4a2c-a463-2a6b9e165f75_1596x212.png 848w, https://substackcdn.com/image/fetch/$s_!g0LZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9629b42-aabd-4a2c-a463-2a6b9e165f75_1596x212.png 1272w, https://substackcdn.com/image/fetch/$s_!g0LZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9629b42-aabd-4a2c-a463-2a6b9e165f75_1596x212.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p></p><h4>9. Cost Comparison Summary</h4><h5>Total Training Cost</h5><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EzPx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69d4af8-c2a9-412d-bd40-2d5b34ebc630_1596x214.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EzPx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69d4af8-c2a9-412d-bd40-2d5b34ebc630_1596x214.png 424w, https://substackcdn.com/image/fetch/$s_!EzPx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69d4af8-c2a9-412d-bd40-2d5b34ebc630_1596x214.png 848w, https://substackcdn.com/image/fetch/$s_!EzPx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69d4af8-c2a9-412d-bd40-2d5b34ebc630_1596x214.png 1272w, https://substackcdn.com/image/fetch/$s_!EzPx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69d4af8-c2a9-412d-bd40-2d5b34ebc630_1596x214.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EzPx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69d4af8-c2a9-412d-bd40-2d5b34ebc630_1596x214.png" width="1456" height="195" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f69d4af8-c2a9-412d-bd40-2d5b34ebc630_1596x214.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:195,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:48324,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/193616072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69d4af8-c2a9-412d-bd40-2d5b34ebc630_1596x214.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EzPx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69d4af8-c2a9-412d-bd40-2d5b34ebc630_1596x214.png 424w, https://substackcdn.com/image/fetch/$s_!EzPx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69d4af8-c2a9-412d-bd40-2d5b34ebc630_1596x214.png 848w, https://substackcdn.com/image/fetch/$s_!EzPx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69d4af8-c2a9-412d-bd40-2d5b34ebc630_1596x214.png 1272w, https://substackcdn.com/image/fetch/$s_!EzPx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff69d4af8-c2a9-412d-bd40-2d5b34ebc630_1596x214.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h5>NVL72 Racks Required</h5><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wkRm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ce81372-4167-4084-be1e-c334672b2a8a_1602x212.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wkRm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ce81372-4167-4084-be1e-c334672b2a8a_1602x212.png 424w, https://substackcdn.com/image/fetch/$s_!wkRm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ce81372-4167-4084-be1e-c334672b2a8a_1602x212.png 848w, https://substackcdn.com/image/fetch/$s_!wkRm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ce81372-4167-4084-be1e-c334672b2a8a_1602x212.png 1272w, https://substackcdn.com/image/fetch/$s_!wkRm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ce81372-4167-4084-be1e-c334672b2a8a_1602x212.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wkRm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ce81372-4167-4084-be1e-c334672b2a8a_1602x212.png" width="1456" height="193" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ce81372-4167-4084-be1e-c334672b2a8a_1602x212.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:193,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:41705,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/193616072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ce81372-4167-4084-be1e-c334672b2a8a_1602x212.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wkRm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ce81372-4167-4084-be1e-c334672b2a8a_1602x212.png 424w, https://substackcdn.com/image/fetch/$s_!wkRm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ce81372-4167-4084-be1e-c334672b2a8a_1602x212.png 848w, https://substackcdn.com/image/fetch/$s_!wkRm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ce81372-4167-4084-be1e-c334672b2a8a_1602x212.png 1272w, https://substackcdn.com/image/fetch/$s_!wkRm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ce81372-4167-4084-be1e-c334672b2a8a_1602x212.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h4>10. Training Cost Visualisations</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8EhS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d5f675-6287-496a-80a4-5853de9913f4_3004x1488.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8EhS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d5f675-6287-496a-80a4-5853de9913f4_3004x1488.png 424w, https://substackcdn.com/image/fetch/$s_!8EhS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d5f675-6287-496a-80a4-5853de9913f4_3004x1488.png 848w, https://substackcdn.com/image/fetch/$s_!8EhS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d5f675-6287-496a-80a4-5853de9913f4_3004x1488.png 1272w, https://substackcdn.com/image/fetch/$s_!8EhS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d5f675-6287-496a-80a4-5853de9913f4_3004x1488.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8EhS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d5f675-6287-496a-80a4-5853de9913f4_3004x1488.png" width="1456" height="721" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9d5f675-6287-496a-80a4-5853de9913f4_3004x1488.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:721,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:295202,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/193616072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d5f675-6287-496a-80a4-5853de9913f4_3004x1488.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8EhS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d5f675-6287-496a-80a4-5853de9913f4_3004x1488.png 424w, https://substackcdn.com/image/fetch/$s_!8EhS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d5f675-6287-496a-80a4-5853de9913f4_3004x1488.png 848w, https://substackcdn.com/image/fetch/$s_!8EhS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d5f675-6287-496a-80a4-5853de9913f4_3004x1488.png 1272w, https://substackcdn.com/image/fetch/$s_!8EhS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d5f675-6287-496a-80a4-5853de9913f4_3004x1488.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: center;"><strong>Figure 1: Training Cost by Model Size and GPU Configuration</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vPka!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F522358b1-8d9b-42d1-807c-a7ec99d9094c_2998x1478.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vPka!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F522358b1-8d9b-42d1-807c-a7ec99d9094c_2998x1478.png 424w, https://substackcdn.com/image/fetch/$s_!vPka!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F522358b1-8d9b-42d1-807c-a7ec99d9094c_2998x1478.png 848w, https://substackcdn.com/image/fetch/$s_!vPka!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F522358b1-8d9b-42d1-807c-a7ec99d9094c_2998x1478.png 1272w, https://substackcdn.com/image/fetch/$s_!vPka!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F522358b1-8d9b-42d1-807c-a7ec99d9094c_2998x1478.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vPka!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F522358b1-8d9b-42d1-807c-a7ec99d9094c_2998x1478.png" width="1456" height="718" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/522358b1-8d9b-42d1-807c-a7ec99d9094c_2998x1478.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:718,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:289005,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/193616072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F522358b1-8d9b-42d1-807c-a7ec99d9094c_2998x1478.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vPka!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F522358b1-8d9b-42d1-807c-a7ec99d9094c_2998x1478.png 424w, https://substackcdn.com/image/fetch/$s_!vPka!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F522358b1-8d9b-42d1-807c-a7ec99d9094c_2998x1478.png 848w, https://substackcdn.com/image/fetch/$s_!vPka!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F522358b1-8d9b-42d1-807c-a7ec99d9094c_2998x1478.png 1272w, https://substackcdn.com/image/fetch/$s_!vPka!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F522358b1-8d9b-42d1-807c-a7ec99d9094c_2998x1478.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: center;"><strong>Figure 2: NVL72 Racks Required</strong></p><h4>11. Synthetic Data Generation Cost</h4><p>The 10T model requires 400T tokens but only ~100T are available from the internet. The remaining 300T must be generated synthetically. We analyze the cost using Claude Sonnet 4.6 as a representative frontier API model.</p><h5>Sonnet 4.6 Pricing</h5><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IzIg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd3de3d7-53f9-4636-a494-6ae7583f4327_1600x164.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IzIg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd3de3d7-53f9-4636-a494-6ae7583f4327_1600x164.png 424w, https://substackcdn.com/image/fetch/$s_!IzIg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd3de3d7-53f9-4636-a494-6ae7583f4327_1600x164.png 848w, https://substackcdn.com/image/fetch/$s_!IzIg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd3de3d7-53f9-4636-a494-6ae7583f4327_1600x164.png 1272w, https://substackcdn.com/image/fetch/$s_!IzIg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd3de3d7-53f9-4636-a494-6ae7583f4327_1600x164.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IzIg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd3de3d7-53f9-4636-a494-6ae7583f4327_1600x164.png" width="1456" height="149" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cd3de3d7-53f9-4636-a494-6ae7583f4327_1600x164.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:149,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:38317,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/193616072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd3de3d7-53f9-4636-a494-6ae7583f4327_1600x164.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IzIg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd3de3d7-53f9-4636-a494-6ae7583f4327_1600x164.png 424w, https://substackcdn.com/image/fetch/$s_!IzIg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd3de3d7-53f9-4636-a494-6ae7583f4327_1600x164.png 848w, https://substackcdn.com/image/fetch/$s_!IzIg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd3de3d7-53f9-4636-a494-6ae7583f4327_1600x164.png 1272w, https://substackcdn.com/image/fetch/$s_!IzIg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd3de3d7-53f9-4636-a494-6ae7583f4327_1600x164.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h5>Cost to Generate 300T Synthetic Tokens</h5><p>Assuming a 1:10 input-to-output ratio (30T prompt tokens &#8594; 300T generated tokens):</p><blockquote><p>Standard: (30T &#247; 1M &#215; $3) + (300T &#247; 1M &#215; $15) = $90M + $4,500M = $4.6B</p><p>Batch: (30T &#247; 1M &#215; $1.50) + (300T &#247; 1M &#215; $7.50) = $45M + $2,250M = $2.3B</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YUVw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51639b8-563d-405c-8030-2b881a24501d_1592x300.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YUVw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51639b8-563d-405c-8030-2b881a24501d_1592x300.png 424w, https://substackcdn.com/image/fetch/$s_!YUVw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51639b8-563d-405c-8030-2b881a24501d_1592x300.png 848w, https://substackcdn.com/image/fetch/$s_!YUVw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51639b8-563d-405c-8030-2b881a24501d_1592x300.png 1272w, https://substackcdn.com/image/fetch/$s_!YUVw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51639b8-563d-405c-8030-2b881a24501d_1592x300.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YUVw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51639b8-563d-405c-8030-2b881a24501d_1592x300.png" width="1456" height="274" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b51639b8-563d-405c-8030-2b881a24501d_1592x300.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:274,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:65155,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/193616072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51639b8-563d-405c-8030-2b881a24501d_1592x300.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YUVw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51639b8-563d-405c-8030-2b881a24501d_1592x300.png 424w, https://substackcdn.com/image/fetch/$s_!YUVw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51639b8-563d-405c-8030-2b881a24501d_1592x300.png 848w, https://substackcdn.com/image/fetch/$s_!YUVw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51639b8-563d-405c-8030-2b881a24501d_1592x300.png 1272w, https://substackcdn.com/image/fetch/$s_!YUVw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb51639b8-563d-405c-8030-2b881a24501d_1592x300.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>(off-course in case of Anthropic - it would cost them much lesser than outsiders due to them not having to pay their own margins :) )</p><h4>12. Total Cost: Training + Synthetic Data (10T class)</h4><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qBMO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb46377af-ffca-4ac7-97d0-6621c69a60e0_1594x442.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qBMO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb46377af-ffca-4ac7-97d0-6621c69a60e0_1594x442.png 424w, https://substackcdn.com/image/fetch/$s_!qBMO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb46377af-ffca-4ac7-97d0-6621c69a60e0_1594x442.png 848w, https://substackcdn.com/image/fetch/$s_!qBMO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb46377af-ffca-4ac7-97d0-6621c69a60e0_1594x442.png 1272w, https://substackcdn.com/image/fetch/$s_!qBMO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb46377af-ffca-4ac7-97d0-6621c69a60e0_1594x442.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qBMO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb46377af-ffca-4ac7-97d0-6621c69a60e0_1594x442.png" width="1456" height="404" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b46377af-ffca-4ac7-97d0-6621c69a60e0_1594x442.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:404,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:98239,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/193616072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb46377af-ffca-4ac7-97d0-6621c69a60e0_1594x442.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qBMO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb46377af-ffca-4ac7-97d0-6621c69a60e0_1594x442.png 424w, https://substackcdn.com/image/fetch/$s_!qBMO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb46377af-ffca-4ac7-97d0-6621c69a60e0_1594x442.png 848w, https://substackcdn.com/image/fetch/$s_!qBMO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb46377af-ffca-4ac7-97d0-6621c69a60e0_1594x442.png 1272w, https://substackcdn.com/image/fetch/$s_!qBMO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb46377af-ffca-4ac7-97d0-6621c69a60e0_1594x442.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QIDW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a808f9-e396-4995-b54c-2a0b01ef4004_3008x1786.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QIDW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a808f9-e396-4995-b54c-2a0b01ef4004_3008x1786.png 424w, https://substackcdn.com/image/fetch/$s_!QIDW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a808f9-e396-4995-b54c-2a0b01ef4004_3008x1786.png 848w, https://substackcdn.com/image/fetch/$s_!QIDW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a808f9-e396-4995-b54c-2a0b01ef4004_3008x1786.png 1272w, https://substackcdn.com/image/fetch/$s_!QIDW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a808f9-e396-4995-b54c-2a0b01ef4004_3008x1786.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QIDW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a808f9-e396-4995-b54c-2a0b01ef4004_3008x1786.png" width="1456" height="864" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c0a808f9-e396-4995-b54c-2a0b01ef4004_3008x1786.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:864,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:493793,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://polymath707.substack.com/i/193616072?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a808f9-e396-4995-b54c-2a0b01ef4004_3008x1786.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QIDW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a808f9-e396-4995-b54c-2a0b01ef4004_3008x1786.png 424w, https://substackcdn.com/image/fetch/$s_!QIDW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a808f9-e396-4995-b54c-2a0b01ef4004_3008x1786.png 848w, https://substackcdn.com/image/fetch/$s_!QIDW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a808f9-e396-4995-b54c-2a0b01ef4004_3008x1786.png 1272w, https://substackcdn.com/image/fetch/$s_!QIDW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0a808f9-e396-4995-b54c-2a0b01ef4004_3008x1786.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p style="text-align: center;"><strong>Figure 3: 10T class Cost Breakdown &#8212; Training vs Synthetic Data</strong></p><h5>Practical alternatives to save on synthetic data costs</h5><blockquote><p>&#183; Self-hosted inference: Running a Llama-class (MiniMax etc would be good - as they have fewer active parameters and robust performance) model on owned GPUs costs ~$50-100M for 300T tokens &#8212; 20-40&#215; cheaper than Sonnet 4.6 API.</p><p>&#183; Multi-epoch training: Train on the same 100T tokens 4&#215; with curriculum learning. Zero data generation cost, though diminishing returns after 2-3 epochs.</p><p>&#183; Rephrasing/augmentation: Take 100T internet tokens and rephrase 3-4&#215; using a cheap model. Much cheaper than generating from scratch.</p><p>&#183; Hybrid approach: 100T internet + 50T high-quality synthetic + 3 epochs approximately 400T effective tokens at minimal generation cost.</p></blockquote><h4>13. What about post training?</h4><p>We mostly focus on pre-training and synthetic data generation cost. Post training (SFT + RL) can cost additional 20% (<a href="https://epoch.ai/gradient-updates/what-went-into-training-deepseek-r1/#:~:text=While%20DeepSeek%20has%20not%20made,estimate%20it's%20around%20%241M.">Epoch AI&#8217;s analysis of DeepSeek R1</a>). xAI have reported spent almost as much compute on post training for Grok 4 as on pre-training of the model.</p><h1>Key Findings of this thought exercise</h1><p><strong>2T class is very achievable: </strong>191 NVL72 racks, $148M at BF16. NeoLabs can still manage it. Data requirement (80T) is within internet availability.</p><p><strong>4T class is feasible: </strong>762 racks, $593M at BF16. Needs ~60T synthetic tokens to reach total of 160T tokens &#8212; manageable but starts getting very expensive.</p><p><strong>10T class is within reach with hyper-scaler partners and truck load of cash: </strong>4,763 racks, $3.7B at BF16. But 300T synthetic tokens add $2.3B (API) or ~$75M (self-hosted cheaper models).</p><p><strong>Synthetic data can cost as much as training: </strong>At API pricing, 300T tokens via Sonnet 4.6 costs $2.3B &#8212; it is almost 60% of $3.7B training cost for 10T class model on B200 with BF16 precision. Self-hosted generation may become essential at this scale.</p><p><strong>B300 only wins at FP4: </strong>At BF16, identical throughput to B200 but 20% more expensive. As I understand, Nvidia positions B300 for inference, rather than training. </p><p><strong>Data is a major bottleneck: </strong>Not compute or GPUs alone&#8212; the 400T token requirement for 10T models is probably the hardest constraint to satisfy!!!!</p><h4>Disclaimer</h4><p>This is just a thought exercise to discuss order of magnitude compute needed for next generation of models. I have used all the public formulas and performance numbers. Thanks!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://polymath707.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>