<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[AI Foundry]]></title><description><![CDATA[A community of practitioners building an open-source, composable AI ecosystem. Our goal is to reduce the complexity of the AI industry. Join our thriving community and share, collaborate and innovate with us.]]></description><link>https://blog.aifoundry.org</link><image><url>https://substackcdn.com/image/fetch/$s_!OUFo!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8247b19d-fe58-4cbd-a2c0-83484ae20625_230x230.png</url><title>AI Foundry</title><link>https://blog.aifoundry.org</link></image><generator>Substack</generator><lastBuildDate>Mon, 18 May 2026 03:20:45 GMT</lastBuildDate><atom:link href="https://blog.aifoundry.org/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[AI Foundry]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[aifoundryorg@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[aifoundryorg@substack.com]]></itunes:email><itunes:name><![CDATA[AI Foundry]]></itunes:name></itunes:owner><itunes:author><![CDATA[AI Foundry]]></itunes:author><googleplay:owner><![CDATA[aifoundryorg@substack.com]]></googleplay:owner><googleplay:email><![CDATA[aifoundryorg@substack.com]]></googleplay:email><googleplay:author><![CDATA[AI Foundry]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[The Next Thousand Chips]]></title><description><![CDATA[aka AInekko's Manifesto]]></description><link>https://blog.aifoundry.org/p/the-next-thousand-chips</link><guid isPermaLink="false">https://blog.aifoundry.org/p/the-next-thousand-chips</guid><dc:creator><![CDATA[Tanya Dadasheva]]></dc:creator><pubDate>Wed, 06 May 2026 04:53:38 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/45da3d47-1edd-47db-8f9c-5a3885377253_1200x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most emerging semiconductor startups are trying to be the Nvidia-killer. That means two- to three-year design cycles, chasing the newest and most expensive process nodes, big teams that barely fit in a group photo on the website, billions of VC dollars.</p><p>This framing is already outdated.</p><p>It assumes that the problem is to outperform existing architectures under the same design constraints: similar toolchains, similar abstractions, similar iteration cycles. It assumes the bottleneck is architecture, ISA or pure execution.</p><p>The old chip design stack was built around human scarcity and walled gardens. RTL, compilers, EDA flows, licensed IP, NDAs, verification boundaries, and organizational handoffs all made sense when the central problem was helping humans divide a system too complex to hold in one&#8217;s mind.</p><p>Even with AI agents applied to chip design everybody rushed to make the old stack faster, write verification faster, help with synthesis. All these efforts are empowering the same teams with the new tools, but at the end of the day Conway&#8217;s law still means you &#8220;ship the [old] org chart&#8221;. Same happened with SW at first with co-pilots that were effectively autocomplete on steroids, before agentic systems entered the scene.</p><p>The next frontier will be defined by how many architectures can be explored, tested, refactored, and specialized by small teams using AI as a true collaborator. But how to open the chip design space to more people than just a handful of well-funded chip designers? What is missing?</p><p>The answer is not a better RTL generator.</p><div><hr></div><h2><strong>Agents change the search space</strong></h2><p>Agents do not need to respect the boundaries humans created between software and hardware, they don&#8217;t need PyTorch, CUDA, PTX or similar abstraction layers.</p><p>If you can vibe-code directly in assembly, you don&#8217;t need a compiler.<br>If you can reliably vibe-code RTL, you don&#8217;t need the layers above it.<br>If the design process itself becomes programmable, even RTL starts to look like an unnecessary intermediate representation. We are not suggesting vibe coding GDS just yet, but you get the idea.</p><p>Should you be asking AI to create chip design from scratch? Not really &#8211; it is an infinite search space. What agents can do is reason about model structure, data movement, memory layout, compute topology, scheduling, microarchitecture, and RTL as coupled choices. This turns hardware design into an end-to-end optimization problem: given a workload and constraints - find the best implementation across the software-hardware boundary. It&#8217;s still the human&#8217;s job to choose the constraints and the optimization function, otherwise the search will not converge, and you&#8217;ll get a hallucinated result.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gs9j!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb3bf30-1a26-427a-89a2-091cfdd79bbb_1550x964.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gs9j!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb3bf30-1a26-427a-89a2-091cfdd79bbb_1550x964.png 424w, https://substackcdn.com/image/fetch/$s_!Gs9j!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb3bf30-1a26-427a-89a2-091cfdd79bbb_1550x964.png 848w, https://substackcdn.com/image/fetch/$s_!Gs9j!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb3bf30-1a26-427a-89a2-091cfdd79bbb_1550x964.png 1272w, https://substackcdn.com/image/fetch/$s_!Gs9j!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb3bf30-1a26-427a-89a2-091cfdd79bbb_1550x964.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gs9j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb3bf30-1a26-427a-89a2-091cfdd79bbb_1550x964.png" width="524" height="326.06043956043953" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7cb3bf30-1a26-427a-89a2-091cfdd79bbb_1550x964.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:906,&quot;width&quot;:1456,&quot;resizeWidth&quot;:524,&quot;bytes&quot;:2428786,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.aifoundry.org/i/196616048?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb3bf30-1a26-427a-89a2-091cfdd79bbb_1550x964.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Gs9j!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb3bf30-1a26-427a-89a2-091cfdd79bbb_1550x964.png 424w, https://substackcdn.com/image/fetch/$s_!Gs9j!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb3bf30-1a26-427a-89a2-091cfdd79bbb_1550x964.png 848w, https://substackcdn.com/image/fetch/$s_!Gs9j!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb3bf30-1a26-427a-89a2-091cfdd79bbb_1550x964.png 1272w, https://substackcdn.com/image/fetch/$s_!Gs9j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb3bf30-1a26-427a-89a2-091cfdd79bbb_1550x964.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Human designed rocket engine nozzle vs AI generated</figcaption></figure></div><p></p><div><hr></div><h1><strong>Collapsing the stack</strong></h1><p>Intent, optimization function and constraints &#8211; how do we start?</p><p>Luckily, we are not the first ones to start asking these questions.</p><p>One approach that works well is narrowing the goal. And the most prized goal right now is running AI models at the top speed ASICs are capable of. Models are defined by compute graphs and those are not Turing complete programs. If you fix the model, you can take the model graph and lower it directly into hardware implementations, bypassing traditional layers &#8211; etch the model into the chip. That&#8217;s what <a href="https://www.eetimes.com/taalas-specializes-to-extremes-for-extraordinary-token-speed/">Taalas</a> did compiling models directly into Verilog, showing 50x improvement vs Nvidia.</p><p>This approach took away the notion that abstraction stack is sacred.</p><p>But it only makes sense economically if that one model you lower is extremely popular and that constitutes a big market segment. For fast evolving markets or proprietary models, it doesn&#8217;t work that well.</p><p>If you want some level of reusability of your ASIC, the constraint space is not as tightly defined, so you need to introduce some other guardrails to make the search reasonable. E.g. give agents a structure, or a substrate to map models to. And in an interesting experiment in the University of Toronto (<a href="https://v2.talos.wtf/">Talos V2</a>), team of researchers and agents started with transformer model structure not a single model and ended up discovering a need for a substrate to map it to - some kind of reusable tile. Spoiler alert: they chose systolic arrays. More flexible yet not an architecture from scratch.</p><p>With the explosion of Physical AI use cases, the requirements for the diverse collection of chips are back in the game, too &#8211; it&#8217;s not just one homogeneous data center environment we are talking about &#8211; size, power, latency and other constraints, but new models and model architectures create pressure that old school chip vendors can&#8217;t sustain.</p><p>Now another question, what if we can give agents a real flexible substrate that can be morphed into many things?</p><div><hr></div><h2><strong>Open Substrate</strong></h2><p>The substrate has to be real. It must come from hardware that has survived contact with implementation and usage, not just from academic exercises that have never been taped out.</p><p>The substrate has to be open. No IP licenses, no NDAs, no worries about foundational LLMs CLAs - agents are welcomed.</p><p>The substrate has to have tests, simulation, co-simulation, synthesis paths, examples, documentation, and clear contribution flows.</p><p>Luckily now there is a substrate like that.</p><p>And it lives under the name <a href="https://github.com/openhwgroup/core-et">CORE-ET</a> in the OpenHW Group of the Eclipse Foundation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Hrbn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F265cc0fc-503a-4a75-b25c-daf22a27fdc9_900x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Hrbn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F265cc0fc-503a-4a75-b25c-daf22a27fdc9_900x900.png 424w, https://substackcdn.com/image/fetch/$s_!Hrbn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F265cc0fc-503a-4a75-b25c-daf22a27fdc9_900x900.png 848w, https://substackcdn.com/image/fetch/$s_!Hrbn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F265cc0fc-503a-4a75-b25c-daf22a27fdc9_900x900.png 1272w, https://substackcdn.com/image/fetch/$s_!Hrbn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F265cc0fc-503a-4a75-b25c-daf22a27fdc9_900x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Hrbn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F265cc0fc-503a-4a75-b25c-daf22a27fdc9_900x900.png" width="342" height="342" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/265cc0fc-503a-4a75-b25c-daf22a27fdc9_900x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:900,&quot;width&quot;:900,&quot;resizeWidth&quot;:342,&quot;bytes&quot;:265235,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.aifoundry.org/i/196616048?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F265cc0fc-503a-4a75-b25c-daf22a27fdc9_900x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Hrbn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F265cc0fc-503a-4a75-b25c-daf22a27fdc9_900x900.png 424w, https://substackcdn.com/image/fetch/$s_!Hrbn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F265cc0fc-503a-4a75-b25c-daf22a27fdc9_900x900.png 848w, https://substackcdn.com/image/fetch/$s_!Hrbn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F265cc0fc-503a-4a75-b25c-daf22a27fdc9_900x900.png 1272w, https://substackcdn.com/image/fetch/$s_!Hrbn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F265cc0fc-503a-4a75-b25c-daf22a27fdc9_900x900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>CORE-ET is not a white paper about future chip design. The Erbium branch is the exact design that we have taped out and expect back from TSMC in a few months. It&#8217;s also a piece of a bigger design that has been taped out <a href="https://www.esperanto.ai/wp-content/uploads/2022/05/Dave-IEEE-Micro.pdf">5 years ago</a> and still stands as a useful and very energy efficient chip sporting 1088 minion cores.</p><p>How sophisticated are the Minion cores? Here comes the geeky part. Minion is a tightly optimized, dual-threaded RISC-V core: small enough to replicate aggressively, predictable enough to compose into larger fabrics, and flexible enough to sit at the center of many different architectures. It has an optional vector/graphics coprocessor for dense data-parallel work.</p><p>But the real unit of design is not the core. Eight Minions are packaged together into an atomic compute tile: efficient, coordinated, and shaped for systolic-array-like fabrics. If you want a hint about the design philosophy, remember that the T in ET stands for Transputer.</p><p>That is what makes CORE-ET interesting as substrate. It is a reusable unit of compute: compact enough for experimentation, powerful enough to matter, and structured enough for agents to reason about.</p><p>The gory microarchitectural details are <a href="https://github.com/openhwgroup/core-et/tree/erbium/docs">here</a>.</p><p>But! It&#8217;s not just RTL, it comes with real chips, programmer&#8217;s manual, simulator, runtime, optimized kernels, even a bare metal Go implementation. It can exist in the old school paradigm. 1 tile delivering roughly 1 TOPS at 0.5W.</p><p>But for us this RTL is not the final product. It is the source material. It contains real implementation knowledge: the decisions, scars, constraints, and structures that only exist when a design has been pushed toward silicon.</p><p>And the things that we were able to do with it (with the help of agents) are the beginning of a bright future where the chip design gets unlocked for millions of developers.</p><h2></h2><div><hr></div><h2><strong>New Economics</strong></h2><p>Once design is grounded in a composable substrate, the entire cost structure shifts.</p><p>Iteration becomes cheap.<br>Exploration becomes parallelizable.<br>Specialization becomes viable at a much smaller scale.</p><p>The limiting factor is no longer the cost of committing to a design, but the ability to search through possibilities efficiently.</p><p>Even better the composability finally gives a boost to chiplet design. Tape out building blocks. Recompose them for different use cases. Increase the agentic search space to multiple composed blocks. Get custom chips without custom tape outs.</p><p>The difference is not incremental. It is structural</p><p>It&#8217;s the difference that can serve the demand of a tape out every 9 months even if you are not Tesla.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vgLy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F695c2cdb-cdc3-4f78-bd3f-8890b93d212d_1194x692.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vgLy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F695c2cdb-cdc3-4f78-bd3f-8890b93d212d_1194x692.png 424w, https://substackcdn.com/image/fetch/$s_!vgLy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F695c2cdb-cdc3-4f78-bd3f-8890b93d212d_1194x692.png 848w, https://substackcdn.com/image/fetch/$s_!vgLy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F695c2cdb-cdc3-4f78-bd3f-8890b93d212d_1194x692.png 1272w, https://substackcdn.com/image/fetch/$s_!vgLy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F695c2cdb-cdc3-4f78-bd3f-8890b93d212d_1194x692.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vgLy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F695c2cdb-cdc3-4f78-bd3f-8890b93d212d_1194x692.png" width="438" height="253.84924623115577" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/695c2cdb-cdc3-4f78-bd3f-8890b93d212d_1194x692.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:692,&quot;width&quot;:1194,&quot;resizeWidth&quot;:438,&quot;bytes&quot;:150421,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.aifoundry.org/i/196616048?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F695c2cdb-cdc3-4f78-bd3f-8890b93d212d_1194x692.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vgLy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F695c2cdb-cdc3-4f78-bd3f-8890b93d212d_1194x692.png 424w, https://substackcdn.com/image/fetch/$s_!vgLy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F695c2cdb-cdc3-4f78-bd3f-8890b93d212d_1194x692.png 848w, https://substackcdn.com/image/fetch/$s_!vgLy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F695c2cdb-cdc3-4f78-bd3f-8890b93d212d_1194x692.png 1272w, https://substackcdn.com/image/fetch/$s_!vgLy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F695c2cdb-cdc3-4f78-bd3f-8890b93d212d_1194x692.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><div><hr></div><h2><strong>What&#8217;s next</strong></h2><p>CORE-ET is not the only thing we have. We have a treasure trove of IP developed over the course of 10 years by a talented team of engineers in the old paradigm. Like any other chip company on the mission to fight Nvidia they have developed a lot of things (even the RBOX!), and we aim to keep open sourcing them with the same purpose of getting more substrate for agents.</p><p>Luckily the substrate is so flexible that with agentic help you can turn it almost into anything &#8211; from systolic array architecture to Turing complete coherent blocks, from 8 core to 4000+ cores, we even have big out-of-order cores that are not so easy for agents to design right now but maybe with the right substrate?</p><p>We also are open source people, so we couldn&#8217;t shy away from open tools. And guess what? With small enough designs you not only can put them in FPGAs by tool calling Yosys in the matter of hours, but you give the open source SkyWater PDK to the agents and go through place and route with OpenROAD and you get yourself a Tiny Tapeout. Not just one chip-per-student anymore &#8211; one chip per Claude session!</p><p>Memory is next. It&#8217;s already emerging as the next bottleneck as both Taalas and Talos discovered. How to store the weights throughout the system, how they move, how they are reused, and whether they should move at all is defining the next phase of substrate design.</p><p>That deserves its own post.</p><p>For now, the point is simple: the old race was to build one giant architecture to serve them all.</p><p>The new race is to make the next thousand chips.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://discord.gg/yAVjKxKy&quot;,&quot;text&quot;:&quot;Join discussions on  Discord&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://discord.gg/yAVjKxKy"><span>Join discussions on  Discord</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.aifoundry.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Investigating the ET-SoC-1 NoC]]></title><description><![CDATA[Because why ask your RTL engineers when you can have some fun]]></description><link>https://blog.aifoundry.org/p/investigating-the-et-soc-1-noc</link><guid isPermaLink="false">https://blog.aifoundry.org/p/investigating-the-et-soc-1-noc</guid><dc:creator><![CDATA[Martin Chang]]></dc:creator><pubDate>Thu, 30 Apr 2026 08:44:04 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!hgfC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918c2708-2535-44eb-adfd-e573e491fce2_917x687.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>Investigating the ET-SoC-1 NoC</h1><p>Ainekko purchased Esperanto&#8217;s IP and is developing in the public - the chip was and still is a good idea; we believe that there&#8217;s a place for a open computing platform based on said IP, providing beyond what GPUs can offer architecturally. ET-SoC-1 is the beginning. We are taping out our test chip. Until the package from TSMC shows up, ET-SoC-1 is what I&#8217;m working on.</p><p>That&#8217;s my elevator pitch as an engineer working at Nekko.</p><p>As Nekko develops in the public: I can share almost everything we do. Ironically this is the best devrel possible. Show, not only tell. I am partially done with basic kernel work in our LLM inference stack and starting to look at the processor as a proper systolic array and see what is possible. Looking at Esperanto&#8217;s code. There&#8217;s little use of the fact that the processor is a grid of cores. Either Esperanto figured out that it is not worth the effort or they are unable to make use of that fact. I don&#8217;t know which. And I care about my performance running FlashAttention and matrix multiplication on llama.cpp. Yet efforts continuously end up with &#8220;the bandwidth doesn&#8217;t make sense and I am optimizing blind&#8221;. To understand what we are working with, this is the rough NoC layout:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hgfC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918c2708-2535-44eb-adfd-e573e491fce2_917x687.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hgfC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918c2708-2535-44eb-adfd-e573e491fce2_917x687.webp 424w, https://substackcdn.com/image/fetch/$s_!hgfC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918c2708-2535-44eb-adfd-e573e491fce2_917x687.webp 848w, https://substackcdn.com/image/fetch/$s_!hgfC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918c2708-2535-44eb-adfd-e573e491fce2_917x687.webp 1272w, https://substackcdn.com/image/fetch/$s_!hgfC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918c2708-2535-44eb-adfd-e573e491fce2_917x687.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hgfC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918c2708-2535-44eb-adfd-e573e491fce2_917x687.webp" width="917" height="687" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/918c2708-2535-44eb-adfd-e573e491fce2_917x687.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:687,&quot;width&quot;:917,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;The pre-binned NoC level view of the ET-SoC1. Shire = Compute Shire, DRAM = Memory Shire, PCIe = PCIe Shire, IO = IO Shire&quot;,&quot;title&quot;:&quot;Image: The pre-binned NoC level view of the ET-SoC1. Shire = Compute Shire, DRAM = Memory Shire, PCIe = PCIe Shire, IO = IO Shire&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The pre-binned NoC level view of the ET-SoC1. Shire = Compute Shire, DRAM = Memory Shire, PCIe = PCIe Shire, IO = IO Shire" title="Image: The pre-binned NoC level view of the ET-SoC1. Shire = Compute Shire, DRAM = Memory Shire, PCIe = PCIe Shire, IO = IO Shire" srcset="https://substackcdn.com/image/fetch/$s_!hgfC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918c2708-2535-44eb-adfd-e573e491fce2_917x687.webp 424w, https://substackcdn.com/image/fetch/$s_!hgfC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918c2708-2535-44eb-adfd-e573e491fce2_917x687.webp 848w, https://substackcdn.com/image/fetch/$s_!hgfC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918c2708-2535-44eb-adfd-e573e491fce2_917x687.webp 1272w, https://substackcdn.com/image/fetch/$s_!hgfC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F918c2708-2535-44eb-adfd-e573e491fce2_917x687.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image: The pre-binned NoC level view of the ET-SoC1. Shire = Compute Shire, DRAM = Memory Shire, PCIe = PCIe Shire, IO = IO Shire</figcaption></figure></div><p>And so experiments we do. First, what&#8217;s the bandwidth across shires? To be specific, what if we transfer a large-ish chunk (960B, to avoid aliasing in L2) across pairs of shires via TensorLoadL2SCP (think: hardware DMA against other shire&#8217;s scratchpad)?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UfYt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa393b1f8-7f02-4fd8-887f-49a61890c9e7_1296x1072.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UfYt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa393b1f8-7f02-4fd8-887f-49a61890c9e7_1296x1072.png 424w, https://substackcdn.com/image/fetch/$s_!UfYt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa393b1f8-7f02-4fd8-887f-49a61890c9e7_1296x1072.png 848w, https://substackcdn.com/image/fetch/$s_!UfYt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa393b1f8-7f02-4fd8-887f-49a61890c9e7_1296x1072.png 1272w, https://substackcdn.com/image/fetch/$s_!UfYt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa393b1f8-7f02-4fd8-887f-49a61890c9e7_1296x1072.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UfYt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa393b1f8-7f02-4fd8-887f-49a61890c9e7_1296x1072.png" width="1296" height="1072" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a393b1f8-7f02-4fd8-887f-49a61890c9e7_1296x1072.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1072,&quot;width&quot;:1296,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Shire Pair Bandwidth&quot;,&quot;title&quot;:&quot;Image: Shire Pair Bandwidth&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Shire Pair Bandwidth" title="Image: Shire Pair Bandwidth" srcset="https://substackcdn.com/image/fetch/$s_!UfYt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa393b1f8-7f02-4fd8-887f-49a61890c9e7_1296x1072.png 424w, https://substackcdn.com/image/fetch/$s_!UfYt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa393b1f8-7f02-4fd8-887f-49a61890c9e7_1296x1072.png 848w, https://substackcdn.com/image/fetch/$s_!UfYt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa393b1f8-7f02-4fd8-887f-49a61890c9e7_1296x1072.png 1272w, https://substackcdn.com/image/fetch/$s_!UfYt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa393b1f8-7f02-4fd8-887f-49a61890c9e7_1296x1072.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image: Shire Pair Bandwidth</figcaption></figure></div><p>Despite potential shires being yielded for.. yield reasons. The internal logical layout seems to be clean. The bandwidth shows a clear-ish 8x4 pattern in bandwidth. Matching the <a href="https://github.com/aifoundry-org/et-man/blob/5fe80a34e9e1b799d0378968c5f76352b4ba7e55/ET%20Preliminary%20Datasheet%20Rev%201.0.pdf">SoC-1 datasheet&#8217;s</a> core layout with shire id incrementing in the y direction. Bandwidth reduces per hop away, but I assume that&#8217;s more latency than throughput. And the NoC is symmetric, there is no difference in bandwidth between sending and receiving.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_u59!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7beaf82-be3d-4fee-bddb-d1af063daa32_894x657.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_u59!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7beaf82-be3d-4fee-bddb-d1af063daa32_894x657.webp 424w, https://substackcdn.com/image/fetch/$s_!_u59!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7beaf82-be3d-4fee-bddb-d1af063daa32_894x657.webp 848w, https://substackcdn.com/image/fetch/$s_!_u59!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7beaf82-be3d-4fee-bddb-d1af063daa32_894x657.webp 1272w, https://substackcdn.com/image/fetch/$s_!_u59!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7beaf82-be3d-4fee-bddb-d1af063daa32_894x657.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_u59!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7beaf82-be3d-4fee-bddb-d1af063daa32_894x657.webp" width="894" height="657" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a7beaf82-be3d-4fee-bddb-d1af063daa32_894x657.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:657,&quot;width&quot;:894,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;ET-SoC-1 Block Diagram with Shire ID Incrementing Direction&quot;,&quot;title&quot;:&quot;Image: ET-SoC-1 Block Diagram with Shire ID Incrementing Direction&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="ET-SoC-1 Block Diagram with Shire ID Incrementing Direction" title="Image: ET-SoC-1 Block Diagram with Shire ID Incrementing Direction" srcset="https://substackcdn.com/image/fetch/$s_!_u59!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7beaf82-be3d-4fee-bddb-d1af063daa32_894x657.webp 424w, https://substackcdn.com/image/fetch/$s_!_u59!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7beaf82-be3d-4fee-bddb-d1af063daa32_894x657.webp 848w, https://substackcdn.com/image/fetch/$s_!_u59!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7beaf82-be3d-4fee-bddb-d1af063daa32_894x657.webp 1272w, https://substackcdn.com/image/fetch/$s_!_u59!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7beaf82-be3d-4fee-bddb-d1af063daa32_894x657.webp 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image: ET-SoC-1 Block Diagram with Shire ID Incrementing Direction</figcaption></figure></div><p>Question: does the shire id map to the physical location? ... We see bands so that has to mean NoC hop. But it is scrambled and not the simple pattern of physical location = shire id. Running simulated annealing we can figure the topology:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GEaY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70cef565-a675-4223-b61c-11757c920fa6_1856x1032.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GEaY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70cef565-a675-4223-b61c-11757c920fa6_1856x1032.webp 424w, https://substackcdn.com/image/fetch/$s_!GEaY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70cef565-a675-4223-b61c-11757c920fa6_1856x1032.webp 848w, https://substackcdn.com/image/fetch/$s_!GEaY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70cef565-a675-4223-b61c-11757c920fa6_1856x1032.webp 1272w, https://substackcdn.com/image/fetch/$s_!GEaY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70cef565-a675-4223-b61c-11757c920fa6_1856x1032.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GEaY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70cef565-a675-4223-b61c-11757c920fa6_1856x1032.webp" width="1456" height="810" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/70cef565-a675-4223-b61c-11757c920fa6_1856x1032.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:810,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;ET-SoC-1 Inferred Physical Topology&quot;,&quot;title&quot;:&quot;Image: ET-SoC-1 Inferred Physical Topology&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="ET-SoC-1 Inferred Physical Topology" title="Image: ET-SoC-1 Inferred Physical Topology" srcset="https://substackcdn.com/image/fetch/$s_!GEaY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70cef565-a675-4223-b61c-11757c920fa6_1856x1032.webp 424w, https://substackcdn.com/image/fetch/$s_!GEaY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70cef565-a675-4223-b61c-11757c920fa6_1856x1032.webp 848w, https://substackcdn.com/image/fetch/$s_!GEaY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70cef565-a675-4223-b61c-11757c920fa6_1856x1032.webp 1272w, https://substackcdn.com/image/fetch/$s_!GEaY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70cef565-a675-4223-b61c-11757c920fa6_1856x1032.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image: ET-SoC-1 Inferred Physical Topology</figcaption></figure></div><p>The next natural question: How does congestion work on this NoC? Does it work like some chips where each link between nodes can congest? Is the NoC half or full duplex? Does the NoC have directionality?</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BfHi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3b58f84-0277-47aa-8f82-62b8f64e5aaf_850x216.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BfHi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3b58f84-0277-47aa-8f82-62b8f64e5aaf_850x216.png 424w, https://substackcdn.com/image/fetch/$s_!BfHi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3b58f84-0277-47aa-8f82-62b8f64e5aaf_850x216.png 848w, https://substackcdn.com/image/fetch/$s_!BfHi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3b58f84-0277-47aa-8f82-62b8f64e5aaf_850x216.png 1272w, https://substackcdn.com/image/fetch/$s_!BfHi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3b58f84-0277-47aa-8f82-62b8f64e5aaf_850x216.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BfHi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3b58f84-0277-47aa-8f82-62b8f64e5aaf_850x216.png" width="850" height="216" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d3b58f84-0277-47aa-8f82-62b8f64e5aaf_850x216.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:216,&quot;width&quot;:850,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14576,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://marty1885.substack.com/i/195846965?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3b58f84-0277-47aa-8f82-62b8f64e5aaf_850x216.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!BfHi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3b58f84-0277-47aa-8f82-62b8f64e5aaf_850x216.png 424w, https://substackcdn.com/image/fetch/$s_!BfHi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3b58f84-0277-47aa-8f82-62b8f64e5aaf_850x216.png 848w, https://substackcdn.com/image/fetch/$s_!BfHi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3b58f84-0277-47aa-8f82-62b8f64e5aaf_850x216.png 1272w, https://substackcdn.com/image/fetch/$s_!BfHi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3b58f84-0277-47aa-8f82-62b8f64e5aaf_850x216.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Measurement results say the NoC is bidirectional and/or the bandwidth is so high that the amount of traffic can be sustained without congestion. I cannot tell if the NoC is sized to match the rest of the chip (maybe.. but questionable choice as that means most usage will underutilize the bandwidth and waste of chip power and area) or the simple pattern is too trivial and Esperanto has designed to solve the specific case. I can&#8217;t tell from numerical results. Some further pattern testing shows the same thing:</p><p>Scenarios:</p><ul><li><p>disjoint_adj_8: 8 independent flows (0-&gt;1, 2-&gt;3, 4-&gt;5, 6-&gt;7, 8-&gt;9, 10-&gt;11, 12-&gt;13, 14-&gt;15) on disjoint shire pairs. Does aggregate scale linearly?</p></li><li><p>disjoint_adj_16: Same but scaled to all 32 shires</p></li><li><p>parallel_cols_4: 4 long flows (0-&gt;16, 1-&gt;17, 2-&gt;18, 3-&gt;19) down parallel columns. How does that work?</p></li><li><p>row_crossing_4: 4 flows (0-&gt;7, 1-&gt;6, 2-&gt;5, 3-&gt;4) in the same row that overlap. Does that congest?</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Vuv4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321f0585-daba-4ef6-b0ef-b8e1fb46d624_853x313.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Vuv4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321f0585-daba-4ef6-b0ef-b8e1fb46d624_853x313.png 424w, https://substackcdn.com/image/fetch/$s_!Vuv4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321f0585-daba-4ef6-b0ef-b8e1fb46d624_853x313.png 848w, https://substackcdn.com/image/fetch/$s_!Vuv4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321f0585-daba-4ef6-b0ef-b8e1fb46d624_853x313.png 1272w, https://substackcdn.com/image/fetch/$s_!Vuv4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321f0585-daba-4ef6-b0ef-b8e1fb46d624_853x313.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Vuv4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321f0585-daba-4ef6-b0ef-b8e1fb46d624_853x313.png" width="853" height="313" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/321f0585-daba-4ef6-b0ef-b8e1fb46d624_853x313.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:313,&quot;width&quot;:853,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19328,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://marty1885.substack.com/i/195846965?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321f0585-daba-4ef6-b0ef-b8e1fb46d624_853x313.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Vuv4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321f0585-daba-4ef6-b0ef-b8e1fb46d624_853x313.png 424w, https://substackcdn.com/image/fetch/$s_!Vuv4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321f0585-daba-4ef6-b0ef-b8e1fb46d624_853x313.png 848w, https://substackcdn.com/image/fetch/$s_!Vuv4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321f0585-daba-4ef6-b0ef-b8e1fb46d624_853x313.png 1272w, https://substackcdn.com/image/fetch/$s_!Vuv4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F321f0585-daba-4ef6-b0ef-b8e1fb46d624_853x313.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Apparently no. No congestion at all. It seems the only congestion I can find is source read bandwidth. As more and more shires try to read from the same source, bandwidth plateaus. But again I cannot tell if this is a read port limitation or L2 only supporting that read bandwidth. And it&#8217;s kinda suspicious that the measured max L2SCP bandwidth is around the claimed DRAM bandwidth on the card. Coincidence? Your guess is as good as mine.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mfQb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ca95c6-dffd-403a-bacd-cf4dda0f53a3_1916x1372.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mfQb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ca95c6-dffd-403a-bacd-cf4dda0f53a3_1916x1372.png 424w, https://substackcdn.com/image/fetch/$s_!mfQb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ca95c6-dffd-403a-bacd-cf4dda0f53a3_1916x1372.png 848w, https://substackcdn.com/image/fetch/$s_!mfQb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ca95c6-dffd-403a-bacd-cf4dda0f53a3_1916x1372.png 1272w, https://substackcdn.com/image/fetch/$s_!mfQb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ca95c6-dffd-403a-bacd-cf4dda0f53a3_1916x1372.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mfQb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ca95c6-dffd-403a-bacd-cf4dda0f53a3_1916x1372.png" width="1456" height="1043" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/70ca95c6-dffd-403a-bacd-cf4dda0f53a3_1916x1372.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1043,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Effective read bandwidth from L2SCP vs. number of readers&quot;,&quot;title&quot;:&quot;Image: Effective read bandwidth from L2SCP vs. number of readers&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Effective read bandwidth from L2SCP vs. number of readers" title="Image: Effective read bandwidth from L2SCP vs. number of readers" srcset="https://substackcdn.com/image/fetch/$s_!mfQb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ca95c6-dffd-403a-bacd-cf4dda0f53a3_1916x1372.png 424w, https://substackcdn.com/image/fetch/$s_!mfQb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ca95c6-dffd-403a-bacd-cf4dda0f53a3_1916x1372.png 848w, https://substackcdn.com/image/fetch/$s_!mfQb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ca95c6-dffd-403a-bacd-cf4dda0f53a3_1916x1372.png 1272w, https://substackcdn.com/image/fetch/$s_!mfQb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70ca95c6-dffd-403a-bacd-cf4dda0f53a3_1916x1372.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image: Effective read bandwidth from L2SCP vs. number of readers</figcaption></figure></div><p>If I target Shire 14 as the source (center of the chip). Aggregate bandwidth from L2SCP to all shires should be higher, right? Also no. It actually achieves a lower peak and aggregate bandwidth drops off at the end. NoC packet congestion......?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Wbbb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36980f8a-6624-4458-9739-74560693b366_1916x1372.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Wbbb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36980f8a-6624-4458-9739-74560693b366_1916x1372.png 424w, https://substackcdn.com/image/fetch/$s_!Wbbb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36980f8a-6624-4458-9739-74560693b366_1916x1372.png 848w, https://substackcdn.com/image/fetch/$s_!Wbbb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36980f8a-6624-4458-9739-74560693b366_1916x1372.png 1272w, https://substackcdn.com/image/fetch/$s_!Wbbb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36980f8a-6624-4458-9739-74560693b366_1916x1372.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Wbbb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36980f8a-6624-4458-9739-74560693b366_1916x1372.png" width="1456" height="1043" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/36980f8a-6624-4458-9739-74560693b366_1916x1372.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1043,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Effective read bandwidth from L2SCP vs. number of readers (center shire)&quot;,&quot;title&quot;:&quot;Image: Effective read bandwidth from L2SCP vs. number of readers (center shire)&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Effective read bandwidth from L2SCP vs. number of readers (center shire)" title="Image: Effective read bandwidth from L2SCP vs. number of readers (center shire)" srcset="https://substackcdn.com/image/fetch/$s_!Wbbb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36980f8a-6624-4458-9739-74560693b366_1916x1372.png 424w, https://substackcdn.com/image/fetch/$s_!Wbbb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36980f8a-6624-4458-9739-74560693b366_1916x1372.png 848w, https://substackcdn.com/image/fetch/$s_!Wbbb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36980f8a-6624-4458-9739-74560693b366_1916x1372.png 1272w, https://substackcdn.com/image/fetch/$s_!Wbbb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36980f8a-6624-4458-9739-74560693b366_1916x1372.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image: Effective read bandwidth from L2SCP vs. number of readers (center shire)</figcaption></figure></div><p>I am as baffled as you are. New question: How does the DRAM work? Does it follow the same pattern? Let&#8217;s just read from the DRAM both in single shire and all shires..</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rqgX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d4cce69-a89a-4db5-8000-6873fd16be77_1916x1072.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rqgX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d4cce69-a89a-4db5-8000-6873fd16be77_1916x1072.png 424w, https://substackcdn.com/image/fetch/$s_!rqgX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d4cce69-a89a-4db5-8000-6873fd16be77_1916x1072.png 848w, https://substackcdn.com/image/fetch/$s_!rqgX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d4cce69-a89a-4db5-8000-6873fd16be77_1916x1072.png 1272w, https://substackcdn.com/image/fetch/$s_!rqgX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d4cce69-a89a-4db5-8000-6873fd16be77_1916x1072.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rqgX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d4cce69-a89a-4db5-8000-6873fd16be77_1916x1072.png" width="1456" height="815" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1d4cce69-a89a-4db5-8000-6873fd16be77_1916x1072.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:815,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;DRAM read performance vs. shire&quot;,&quot;title&quot;:&quot;Image: DRAM read performance vs. shire&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="DRAM read performance vs. shire" title="Image: DRAM read performance vs. shire" srcset="https://substackcdn.com/image/fetch/$s_!rqgX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d4cce69-a89a-4db5-8000-6873fd16be77_1916x1072.png 424w, https://substackcdn.com/image/fetch/$s_!rqgX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d4cce69-a89a-4db5-8000-6873fd16be77_1916x1072.png 848w, https://substackcdn.com/image/fetch/$s_!rqgX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d4cce69-a89a-4db5-8000-6873fd16be77_1916x1072.png 1272w, https://substackcdn.com/image/fetch/$s_!rqgX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d4cce69-a89a-4db5-8000-6873fd16be77_1916x1072.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image: DRAM read performance vs. shire</figcaption></figure></div><p>It is surprisingly even. I would expect more performance variance as shires are further and further away from the DRAM. Wherever the DRAM is. But again, the NoC is faster than the DRAM bandwidth (thank god I am not working on a NoC congested chip again). Especially the block diagram in the datasheet tells us nothing about where the DRAM is on the NoC. The best guess I have given the memory read pattern:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!l8JP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd75130-51ff-4c09-adaa-d9460a63d3b5_1796x1072.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!l8JP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd75130-51ff-4c09-adaa-d9460a63d3b5_1796x1072.png 424w, https://substackcdn.com/image/fetch/$s_!l8JP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd75130-51ff-4c09-adaa-d9460a63d3b5_1796x1072.png 848w, https://substackcdn.com/image/fetch/$s_!l8JP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd75130-51ff-4c09-adaa-d9460a63d3b5_1796x1072.png 1272w, https://substackcdn.com/image/fetch/$s_!l8JP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd75130-51ff-4c09-adaa-d9460a63d3b5_1796x1072.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!l8JP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd75130-51ff-4c09-adaa-d9460a63d3b5_1796x1072.png" width="1456" height="869" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cbd75130-51ff-4c09-adaa-d9460a63d3b5_1796x1072.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:869,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;The inferred Memshire location based on the DRAM read pattern&quot;,&quot;title&quot;:&quot;Image: The inferred Memshire location based on the DRAM read pattern&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="The inferred Memshire location based on the DRAM read pattern" title="Image: The inferred Memshire location based on the DRAM read pattern" srcset="https://substackcdn.com/image/fetch/$s_!l8JP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd75130-51ff-4c09-adaa-d9460a63d3b5_1796x1072.png 424w, https://substackcdn.com/image/fetch/$s_!l8JP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd75130-51ff-4c09-adaa-d9460a63d3b5_1796x1072.png 848w, https://substackcdn.com/image/fetch/$s_!l8JP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd75130-51ff-4c09-adaa-d9460a63d3b5_1796x1072.png 1272w, https://substackcdn.com/image/fetch/$s_!l8JP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcbd75130-51ff-4c09-adaa-d9460a63d3b5_1796x1072.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image: The inferred Memshire location based on the DRAM read pattern</figcaption></figure></div><p>Also the Programmer&#8217;s reference manual is unclear on how L3 is used. Especially in the context of the <code>TensorLoadL2SCP</code> command.</p><blockquote><p>The TensorLoadL2Scp instruction copies a tensor from memory, bypassing the L1 data cache and the L2 cache, to the L2 scratchpad.</p></blockquote><p>Doing some benchmarking.. we can see that performance drops if the reading set is larger than ~32MB. Which coincides with the default L3 configuration (1MB per shire, 32MB total).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!w_Rh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6775d14-2a25-42a3-9dae-fa799d7cf882_1796x1072.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!w_Rh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6775d14-2a25-42a3-9dae-fa799d7cf882_1796x1072.png 424w, https://substackcdn.com/image/fetch/$s_!w_Rh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6775d14-2a25-42a3-9dae-fa799d7cf882_1796x1072.png 848w, https://substackcdn.com/image/fetch/$s_!w_Rh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6775d14-2a25-42a3-9dae-fa799d7cf882_1796x1072.png 1272w, https://substackcdn.com/image/fetch/$s_!w_Rh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6775d14-2a25-42a3-9dae-fa799d7cf882_1796x1072.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!w_Rh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6775d14-2a25-42a3-9dae-fa799d7cf882_1796x1072.png" width="1456" height="869" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b6775d14-2a25-42a3-9dae-fa799d7cf882_1796x1072.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:869,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Plot of disjoint vs shared memory region performance in TensorLoadL2Scp&quot;,&quot;title&quot;:&quot;Image: Plot of disjoint vs shared memory region performance in TensorLoadL2Scp&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Plot of disjoint vs shared memory region performance in TensorLoadL2Scp" title="Image: Plot of disjoint vs shared memory region performance in TensorLoadL2Scp" srcset="https://substackcdn.com/image/fetch/$s_!w_Rh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6775d14-2a25-42a3-9dae-fa799d7cf882_1796x1072.png 424w, https://substackcdn.com/image/fetch/$s_!w_Rh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6775d14-2a25-42a3-9dae-fa799d7cf882_1796x1072.png 848w, https://substackcdn.com/image/fetch/$s_!w_Rh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6775d14-2a25-42a3-9dae-fa799d7cf882_1796x1072.png 1272w, https://substackcdn.com/image/fetch/$s_!w_Rh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6775d14-2a25-42a3-9dae-fa799d7cf882_1796x1072.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image: Plot of disjoint vs shared memory region performance in TensorLoadL2Scp</figcaption></figure></div><p>Given we kinda know where the Memshire lives and can observe its effect on bandwidth, and we have a problem getting the full 120GB/s during real workload. Can we do the same trick as some other grid-like AI accelerator companies and make shires send requests to one specific Memshire? So we can maximize memory access pattern and avoid crosstalk...</p><p>Not at large intervals, which is expected because that&#8217;d be a terrible design</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RZOO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9adcac0-b392-4fad-93d4-c109adf77bdd_1898x1076.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RZOO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9adcac0-b392-4fad-93d4-c109adf77bdd_1898x1076.png 424w, https://substackcdn.com/image/fetch/$s_!RZOO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9adcac0-b392-4fad-93d4-c109adf77bdd_1898x1076.png 848w, https://substackcdn.com/image/fetch/$s_!RZOO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9adcac0-b392-4fad-93d4-c109adf77bdd_1898x1076.png 1272w, https://substackcdn.com/image/fetch/$s_!RZOO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9adcac0-b392-4fad-93d4-c109adf77bdd_1898x1076.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RZOO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9adcac0-b392-4fad-93d4-c109adf77bdd_1898x1076.png" width="1456" height="825" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9adcac0-b392-4fad-93d4-c109adf77bdd_1898x1076.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:825,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Plot of DRAM bandwidth vs. address sweep at unit of 32MB&quot;,&quot;title&quot;:&quot;Image: Plot of DRAM bandwidth vs. address sweep at unit of 32MB&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Plot of DRAM bandwidth vs. address sweep at unit of 32MB" title="Image: Plot of DRAM bandwidth vs. address sweep at unit of 32MB" srcset="https://substackcdn.com/image/fetch/$s_!RZOO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9adcac0-b392-4fad-93d4-c109adf77bdd_1898x1076.png 424w, https://substackcdn.com/image/fetch/$s_!RZOO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9adcac0-b392-4fad-93d4-c109adf77bdd_1898x1076.png 848w, https://substackcdn.com/image/fetch/$s_!RZOO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9adcac0-b392-4fad-93d4-c109adf77bdd_1898x1076.png 1272w, https://substackcdn.com/image/fetch/$s_!RZOO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9adcac0-b392-4fad-93d4-c109adf77bdd_1898x1076.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image: Plot of DRAM bandwidth vs. address sweep at unit of 32MB</figcaption></figure></div><p>We can find patterns at really small intervals -- at near the cacheline size, we find a pattern of 512B.. so 8 cachelines (cacheline size is 64B).. mapping to the 8 Memshires?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9eWs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46cd497f-121e-4455-8826-7df3b1e08a9e_1998x2376.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9eWs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46cd497f-121e-4455-8826-7df3b1e08a9e_1998x2376.png 424w, https://substackcdn.com/image/fetch/$s_!9eWs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46cd497f-121e-4455-8826-7df3b1e08a9e_1998x2376.png 848w, https://substackcdn.com/image/fetch/$s_!9eWs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46cd497f-121e-4455-8826-7df3b1e08a9e_1998x2376.png 1272w, https://substackcdn.com/image/fetch/$s_!9eWs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46cd497f-121e-4455-8826-7df3b1e08a9e_1998x2376.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9eWs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46cd497f-121e-4455-8826-7df3b1e08a9e_1998x2376.png" width="1456" height="1731" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/46cd497f-121e-4455-8826-7df3b1e08a9e_1998x2376.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1731,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Plot of DRAM bandwidth vs. address sweep at small units&quot;,&quot;title&quot;:&quot;Image: Plot of DRAM bandwidth vs. address sweep at small units&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Plot of DRAM bandwidth vs. address sweep at small units" title="Image: Plot of DRAM bandwidth vs. address sweep at small units" srcset="https://substackcdn.com/image/fetch/$s_!9eWs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46cd497f-121e-4455-8826-7df3b1e08a9e_1998x2376.png 424w, https://substackcdn.com/image/fetch/$s_!9eWs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46cd497f-121e-4455-8826-7df3b1e08a9e_1998x2376.png 848w, https://substackcdn.com/image/fetch/$s_!9eWs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46cd497f-121e-4455-8826-7df3b1e08a9e_1998x2376.png 1272w, https://substackcdn.com/image/fetch/$s_!9eWs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46cd497f-121e-4455-8826-7df3b1e08a9e_1998x2376.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image: Plot of DRAM bandwidth vs. address sweep at small units</figcaption></figure></div><p>Which if we fold the cacheline sized graph along 8 cachelines. We get the following diagram. That clearly shows different bandwidth to different Memshires, up to 50% difference.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!A0r4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7087000-6077-4807-9c2b-2d6024bcb0d5_1496x872.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!A0r4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7087000-6077-4807-9c2b-2d6024bcb0d5_1496x872.png 424w, https://substackcdn.com/image/fetch/$s_!A0r4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7087000-6077-4807-9c2b-2d6024bcb0d5_1496x872.png 848w, https://substackcdn.com/image/fetch/$s_!A0r4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7087000-6077-4807-9c2b-2d6024bcb0d5_1496x872.png 1272w, https://substackcdn.com/image/fetch/$s_!A0r4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7087000-6077-4807-9c2b-2d6024bcb0d5_1496x872.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!A0r4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7087000-6077-4807-9c2b-2d6024bcb0d5_1496x872.png" width="1456" height="849" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d7087000-6077-4807-9c2b-2d6024bcb0d5_1496x872.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:849,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Plot of DRAM bandwidth to certain lines&quot;,&quot;title&quot;:&quot;Image: Plot of DRAM bandwidth to certain lines&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Plot of DRAM bandwidth to certain lines" title="Image: Plot of DRAM bandwidth to certain lines" srcset="https://substackcdn.com/image/fetch/$s_!A0r4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7087000-6077-4807-9c2b-2d6024bcb0d5_1496x872.png 424w, https://substackcdn.com/image/fetch/$s_!A0r4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7087000-6077-4807-9c2b-2d6024bcb0d5_1496x872.png 848w, https://substackcdn.com/image/fetch/$s_!A0r4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7087000-6077-4807-9c2b-2d6024bcb0d5_1496x872.png 1272w, https://substackcdn.com/image/fetch/$s_!A0r4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7087000-6077-4807-9c2b-2d6024bcb0d5_1496x872.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image: Plot of DRAM bandwidth to certain lines</figcaption></figure></div><p>With that knowledge we can hit the specific lines that we know the access goes to specific memshires, loading from a different address to avoid L3 effects. And bingo! We found approximately where each memshire is and the achievable bandwidth from them with only one shire asking for data:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pB-1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b70a59f-b3e8-4163-8958-1745c71ce749_1996x1372.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pB-1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b70a59f-b3e8-4163-8958-1745c71ce749_1996x1372.png 424w, https://substackcdn.com/image/fetch/$s_!pB-1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b70a59f-b3e8-4163-8958-1745c71ce749_1996x1372.png 848w, https://substackcdn.com/image/fetch/$s_!pB-1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b70a59f-b3e8-4163-8958-1745c71ce749_1996x1372.png 1272w, https://substackcdn.com/image/fetch/$s_!pB-1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b70a59f-b3e8-4163-8958-1745c71ce749_1996x1372.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pB-1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b70a59f-b3e8-4163-8958-1745c71ce749_1996x1372.png" width="1456" height="1001" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4b70a59f-b3e8-4163-8958-1745c71ce749_1996x1372.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1001,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;bandwidth of each shire hitting a specific memshire, with the shire with max bandwidth marked&quot;,&quot;title&quot;:&quot;Image: bandwidth of each shire hitting a specific memshire, with the shire with max bandwidth marked&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="bandwidth of each shire hitting a specific memshire, with the shire with max bandwidth marked" title="Image: bandwidth of each shire hitting a specific memshire, with the shire with max bandwidth marked" srcset="https://substackcdn.com/image/fetch/$s_!pB-1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b70a59f-b3e8-4163-8958-1745c71ce749_1996x1372.png 424w, https://substackcdn.com/image/fetch/$s_!pB-1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b70a59f-b3e8-4163-8958-1745c71ce749_1996x1372.png 848w, https://substackcdn.com/image/fetch/$s_!pB-1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b70a59f-b3e8-4163-8958-1745c71ce749_1996x1372.png 1272w, https://substackcdn.com/image/fetch/$s_!pB-1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b70a59f-b3e8-4163-8958-1745c71ce749_1996x1372.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image: bandwidth of each shire hitting a specific memshire, with the shire with max bandwidth marked</figcaption></figure></div><p>Now the final and obvious question - where does the memshire live on the NoC? ... but before that the other question - is the 8x4 mesh correct? We assume the underlying is 8x4 because that&#8217;s what the datasheet and PRM say. We all know these can lie. Let&#8217;s do the same simulated annealing but make the topology itself also part of the model. And see if we can predict the bandwidth:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!p0HB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9afb09f-2f8c-4f8b-8dc3-445409600285_2586x1329.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p0HB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9afb09f-2f8c-4f8b-8dc3-445409600285_2586x1329.webp 424w, https://substackcdn.com/image/fetch/$s_!p0HB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9afb09f-2f8c-4f8b-8dc3-445409600285_2586x1329.webp 848w, https://substackcdn.com/image/fetch/$s_!p0HB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9afb09f-2f8c-4f8b-8dc3-445409600285_2586x1329.webp 1272w, https://substackcdn.com/image/fetch/$s_!p0HB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9afb09f-2f8c-4f8b-8dc3-445409600285_2586x1329.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p0HB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9afb09f-2f8c-4f8b-8dc3-445409600285_2586x1329.webp" width="1456" height="748" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f9afb09f-2f8c-4f8b-8dc3-445409600285_2586x1329.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:748,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Computed real topology and predictive power of hops vs bandwidth&quot;,&quot;title&quot;:&quot;Image: Computed real topology and predictive power of hops vs bandwidth&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Computed real topology and predictive power of hops vs bandwidth" title="Image: Computed real topology and predictive power of hops vs bandwidth" srcset="https://substackcdn.com/image/fetch/$s_!p0HB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9afb09f-2f8c-4f8b-8dc3-445409600285_2586x1329.webp 424w, https://substackcdn.com/image/fetch/$s_!p0HB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9afb09f-2f8c-4f8b-8dc3-445409600285_2586x1329.webp 848w, https://substackcdn.com/image/fetch/$s_!p0HB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9afb09f-2f8c-4f8b-8dc3-445409600285_2586x1329.webp 1272w, https://substackcdn.com/image/fetch/$s_!p0HB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9afb09f-2f8c-4f8b-8dc3-445409600285_2586x1329.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image: Computed real topology and predictive power of hops vs bandwidth</figcaption></figure></div><p>So documents do lie. Turns out it is a mesh... but with missing links (I assume yield reasons, so the topology on your chip might differ), definitely not the 8x4 mesh that you&#8217;re lead to believe according to the <s>datasheet</s> product brochure and we can almost perfectly express the bandwidth between shires as a function of hops given the right topology. But that hole at 5,3 feels suspicious. And there&#8217;s outliers in the predicted bandwidth. Why does our standard deviation suddenly go from very small to 0.25? Unless... there&#8217;s other nodes that exist on the NoC that passes through NoC transparent from/to the shires, but it itself not a compute shire so I cannot measure the bandwidth of it. What if I just add a node there? Does that improve my predictive power?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Lkqy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dc45632-3af4-45c9-b02c-aaa9677c05b8_2682x1329.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Lkqy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dc45632-3af4-45c9-b02c-aaa9677c05b8_2682x1329.webp 424w, https://substackcdn.com/image/fetch/$s_!Lkqy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dc45632-3af4-45c9-b02c-aaa9677c05b8_2682x1329.webp 848w, https://substackcdn.com/image/fetch/$s_!Lkqy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dc45632-3af4-45c9-b02c-aaa9677c05b8_2682x1329.webp 1272w, https://substackcdn.com/image/fetch/$s_!Lkqy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dc45632-3af4-45c9-b02c-aaa9677c05b8_2682x1329.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Lkqy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dc45632-3af4-45c9-b02c-aaa9677c05b8_2682x1329.webp" width="1456" height="721" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2dc45632-3af4-45c9-b02c-aaa9677c05b8_2682x1329.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:721,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Real topology patched with node (5, 3) as a unknown NoC router&quot;,&quot;title&quot;:&quot;Image: Real topology patched with node (5, 3) as a unknown NoC router&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Real topology patched with node (5, 3) as a unknown NoC router" title="Image: Real topology patched with node (5, 3) as a unknown NoC router" srcset="https://substackcdn.com/image/fetch/$s_!Lkqy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dc45632-3af4-45c9-b02c-aaa9677c05b8_2682x1329.webp 424w, https://substackcdn.com/image/fetch/$s_!Lkqy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dc45632-3af4-45c9-b02c-aaa9677c05b8_2682x1329.webp 848w, https://substackcdn.com/image/fetch/$s_!Lkqy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dc45632-3af4-45c9-b02c-aaa9677c05b8_2682x1329.webp 1272w, https://substackcdn.com/image/fetch/$s_!Lkqy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2dc45632-3af4-45c9-b02c-aaa9677c05b8_2682x1329.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image: Real topology patched with node (5, 3) as a unknown NoC router</figcaption></figure></div><p>And now all the outliers are gone. That left me suspecting the 3 empty nodes are actually also routers. Maybe...? But there is not enough topological information to determine their existance or not. By the same approach we can figure out where the memshires lives on the NoC -- the predictive power of the graph is already high, so they must not sit on their own NoC hop in the mest, either outside of the current mesh or they inject themslves into the mesh. Some simulated annealing later... We have the result. Note that I (personally) assumed the injection location is not uniform. Memshire 0, 1 and 7 have 2 locations (because the plots seem to indicate so). But know <strong>this is likely wrong and contaminated by L3 access speed</strong> as chip floorplan would be insane if designed as so:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Fyca!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0007c1f-da61-479d-9ec2-379cea3fcc0d_1185x1498.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Fyca!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0007c1f-da61-479d-9ec2-379cea3fcc0d_1185x1498.webp 424w, https://substackcdn.com/image/fetch/$s_!Fyca!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0007c1f-da61-479d-9ec2-379cea3fcc0d_1185x1498.webp 848w, https://substackcdn.com/image/fetch/$s_!Fyca!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0007c1f-da61-479d-9ec2-379cea3fcc0d_1185x1498.webp 1272w, https://substackcdn.com/image/fetch/$s_!Fyca!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0007c1f-da61-479d-9ec2-379cea3fcc0d_1185x1498.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Fyca!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0007c1f-da61-479d-9ec2-379cea3fcc0d_1185x1498.webp" width="1185" height="1498" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a0007c1f-da61-479d-9ec2-379cea3fcc0d_1185x1498.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1498,&quot;width&quot;:1185,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Where the memshire lives on the NoC&quot;,&quot;title&quot;:&quot;Image: Where the memshire lives on the NoC&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Where the memshire lives on the NoC" title="Image: Where the memshire lives on the NoC" srcset="https://substackcdn.com/image/fetch/$s_!Fyca!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0007c1f-da61-479d-9ec2-379cea3fcc0d_1185x1498.webp 424w, https://substackcdn.com/image/fetch/$s_!Fyca!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0007c1f-da61-479d-9ec2-379cea3fcc0d_1185x1498.webp 848w, https://substackcdn.com/image/fetch/$s_!Fyca!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0007c1f-da61-479d-9ec2-379cea3fcc0d_1185x1498.webp 1272w, https://substackcdn.com/image/fetch/$s_!Fyca!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa0007c1f-da61-479d-9ec2-379cea3fcc0d_1185x1498.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image: Where the memshire lives on the NoC</figcaption></figure></div><p>And we can update our visualization of Memshire access performance and it looks <strong>absolutely beautiful</strong>. But this does not sit well with me - you wouldn&#8217;t put memory access at the center of the NoC. Maybe it&#8217;s L3 working as intended? Instead of <code>shrire -&gt; Victim L3 -&gt; shire (not found) -&gt; memshire</code> it chip dpes <code>shrire -&gt; Victim L3 -&gt; memshire -&gt; shire</code>? Either way bandwidth is bandwidth and that&#8217;s what matters for a real kernel.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LZZV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fff2d4f-d699-4504-943f-b8f0643fff5f_2198x1476.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LZZV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fff2d4f-d699-4504-943f-b8f0643fff5f_2198x1476.png 424w, https://substackcdn.com/image/fetch/$s_!LZZV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fff2d4f-d699-4504-943f-b8f0643fff5f_2198x1476.png 848w, https://substackcdn.com/image/fetch/$s_!LZZV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fff2d4f-d699-4504-943f-b8f0643fff5f_2198x1476.png 1272w, https://substackcdn.com/image/fetch/$s_!LZZV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fff2d4f-d699-4504-943f-b8f0643fff5f_2198x1476.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LZZV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fff2d4f-d699-4504-943f-b8f0643fff5f_2198x1476.png" width="1456" height="978" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5fff2d4f-d699-4504-943f-b8f0643fff5f_2198x1476.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:978,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Memshire to Memshire single-issue bandwidth (corrected topology)&quot;,&quot;title&quot;:&quot;Image: Memshire to Memshire single-issue bandwidth (corrected topology)&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Memshire to Memshire single-issue bandwidth (corrected topology)" title="Image: Memshire to Memshire single-issue bandwidth (corrected topology)" srcset="https://substackcdn.com/image/fetch/$s_!LZZV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fff2d4f-d699-4504-943f-b8f0643fff5f_2198x1476.png 424w, https://substackcdn.com/image/fetch/$s_!LZZV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fff2d4f-d699-4504-943f-b8f0643fff5f_2198x1476.png 848w, https://substackcdn.com/image/fetch/$s_!LZZV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fff2d4f-d699-4504-943f-b8f0643fff5f_2198x1476.png 1272w, https://substackcdn.com/image/fetch/$s_!LZZV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5fff2d4f-d699-4504-943f-b8f0643fff5f_2198x1476.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image: Memshire to Memshire single-issue bandwidth (corrected topology)</figcaption></figure></div><p>We can finally figure out the max bandwidth possible against a single memshire. And calculate the actual peak bandwidth achievable to be ~88GB/s on my specific chip.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yqfv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f6ae670-90ca-47db-b6d9-a15ab8df869a_1096x692.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yqfv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f6ae670-90ca-47db-b6d9-a15ab8df869a_1096x692.png 424w, https://substackcdn.com/image/fetch/$s_!yqfv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f6ae670-90ca-47db-b6d9-a15ab8df869a_1096x692.png 848w, https://substackcdn.com/image/fetch/$s_!yqfv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f6ae670-90ca-47db-b6d9-a15ab8df869a_1096x692.png 1272w, https://substackcdn.com/image/fetch/$s_!yqfv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f6ae670-90ca-47db-b6d9-a15ab8df869a_1096x692.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yqfv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f6ae670-90ca-47db-b6d9-a15ab8df869a_1096x692.png" width="1096" height="692" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4f6ae670-90ca-47db-b6d9-a15ab8df869a_1096x692.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:692,&quot;width&quot;:1096,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Measured peak bandwidth against a single memshire&quot;,&quot;title&quot;:&quot;Image: Measured peak bandwidth against a single memshire&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Measured peak bandwidth against a single memshire" title="Image: Measured peak bandwidth against a single memshire" srcset="https://substackcdn.com/image/fetch/$s_!yqfv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f6ae670-90ca-47db-b6d9-a15ab8df869a_1096x692.png 424w, https://substackcdn.com/image/fetch/$s_!yqfv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f6ae670-90ca-47db-b6d9-a15ab8df869a_1096x692.png 848w, https://substackcdn.com/image/fetch/$s_!yqfv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f6ae670-90ca-47db-b6d9-a15ab8df869a_1096x692.png 1272w, https://substackcdn.com/image/fetch/$s_!yqfv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f6ae670-90ca-47db-b6d9-a15ab8df869a_1096x692.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image: Measured peak bandwidth against a single memshire</figcaption></figure></div><p>As a side note. I tried to figure out if I can use latency instead of bandwidth to locate the Memshire. No. Atomics share the same pattern. Presumably the chip uses L3 as the atomic target instead of DRAM. Fair. But apparently atoics have a baseline 120 cycle latency regardless of where they are in memory (plus measurable overhead, but that&#8217;s relatively low). And each hop contributes a clean 6 cycle latency increase. Which indicates 3 cycles per direction.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oY4x!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafa21ba4-4da5-4518-8479-168aa1aac001_1996x1372.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oY4x!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafa21ba4-4da5-4518-8479-168aa1aac001_1996x1372.png 424w, https://substackcdn.com/image/fetch/$s_!oY4x!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafa21ba4-4da5-4518-8479-168aa1aac001_1996x1372.png 848w, https://substackcdn.com/image/fetch/$s_!oY4x!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafa21ba4-4da5-4518-8479-168aa1aac001_1996x1372.png 1272w, https://substackcdn.com/image/fetch/$s_!oY4x!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafa21ba4-4da5-4518-8479-168aa1aac001_1996x1372.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oY4x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafa21ba4-4da5-4518-8479-168aa1aac001_1996x1372.png" width="1456" height="1001" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/afa21ba4-4da5-4518-8479-168aa1aac001_1996x1372.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1001,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Atomic latency map&quot;,&quot;title&quot;:&quot;Image: Atomic latency map&quot;,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Atomic latency map" title="Image: Atomic latency map" srcset="https://substackcdn.com/image/fetch/$s_!oY4x!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafa21ba4-4da5-4518-8479-168aa1aac001_1996x1372.png 424w, https://substackcdn.com/image/fetch/$s_!oY4x!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafa21ba4-4da5-4518-8479-168aa1aac001_1996x1372.png 848w, https://substackcdn.com/image/fetch/$s_!oY4x!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafa21ba4-4da5-4518-8479-168aa1aac001_1996x1372.png 1272w, https://substackcdn.com/image/fetch/$s_!oY4x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fafa21ba4-4da5-4518-8479-168aa1aac001_1996x1372.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image: Atomic latency map</figcaption></figure></div><div><hr></div><p>The ET-SoC-1 chip is really nice. It behaves like a flat processor without the NoC being super obvious. You can still see it. And you can still use it. But most applications ought to treat the processor as a flat array of cores and not step onto any footguns. Best of both worlds.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.aifoundry.org/p/investigating-the-et-soc-1-noc/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.aifoundry.org/p/investigating-the-et-soc-1-noc/comments"><span>Leave a comment</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[The $100 Robot Arm: Low-Cost Open-Source Infrastructure for Embodied AI Inference]]></title><description><![CDATA[Sergey Sergyenko, CEO @Cybergizer]]></description><link>https://blog.aifoundry.org/p/the-100-robot-arm-low-cost-open-source</link><guid isPermaLink="false">https://blog.aifoundry.org/p/the-100-robot-arm-low-cost-open-source</guid><pubDate>Tue, 24 Mar 2026 15:46:58 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/191992882/b1f8755ce221c7a199c069e9b95ad7ee.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>On Feb 2, 2026, in Brussels, we got together at the second edition of the AI Plumbers FOSDEM fringe event, where Sergey Sergyenko, CEO at Cybergizer brought a bunch of robot hands for everybody to try. </p><p>Purely a community effort. If you haven&#8217;t seen them yet, find local events or a Makerspace - it&#8217;s amazing how something so small and toy-looking can lead to real-life use cases and a new paradigm of training robots, not models. </p><p>Watch Sergey&#8217;s talk to find out more!</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.aifoundry.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.aifoundry.org/subscribe?"><span>Subscribe now</span></a></p><div class="file-embed-wrapper" data-component-name="FileToDOM"><div class="file-embed-container-reader"><div class="file-embed-container-top"><image class="file-embed-thumbnail-default" src="https://substackcdn.com/image/fetch/$s_!0Cy0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack.com%2Fimg%2Fattachment_icon.svg"></image><div class="file-embed-details"><div class="file-embed-details-h1">Ai Plumbers Fosdem 2026 Lerobot</div><div class="file-embed-details-h2">3.67MB &#8729; PDF file</div></div><a class="file-embed-button wide" href="https://blog.aifoundry.org/api/v1/file/d837f5d6-efde-4847-aaab-c05d8aaffca6.pdf"><span class="file-embed-button-text">Download</span></a></div><a class="file-embed-button narrow" href="https://blog.aifoundry.org/api/v1/file/d837f5d6-efde-4847-aaab-c05d8aaffca6.pdf"><span class="file-embed-button-text">Download</span></a></div></div><p> </p>]]></content:encoded></item><item><title><![CDATA[Kernels Deep Dive ]]></title><description><![CDATA[Ben Burtenshaw, Community Education in AI @ Hugging Face]]></description><link>https://blog.aifoundry.org/p/kernels-deep-dive</link><guid isPermaLink="false">https://blog.aifoundry.org/p/kernels-deep-dive</guid><pubDate>Tue, 10 Mar 2026 08:35:49 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/190190433/1084782824eac11ca81f1e45d32cfe52.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>On Feb 2, 2026, in Brussels, we got together at the second edition of the AI Plumbers FOSDEM fringe event, where Ben Burtenshaw from Hugging Face broke down why optimized kernels are critical for real-world deep learning performance and how the Hugging Face Kernels ecosystem makes them easier to build and use.</p><p>He covers memory-bound bottlenecks, the kernel-builder workflow, reproducible multi-hardware builds with Nix, and practical PyTorch/Transformers integration patterns that reduce setup time from hours to seconds. <br><br><strong>Key moments from the talk:</strong><br>0:00 Intro and speaker background <br>1:35 Why Hugging Face Kernels matters <br>2:05 Compute vs memory bottlenecks in deep learning <br>3:30 Fused kernels and why they speed things up <br>5:05 Talk agenda and ecosystem overview <br>5:35 Kernel pain points: fragmentation and long installs <br>7:12 Supporting older, cheaper hardware for the community <br>8:18 Goal: from CMake errors to one-line kernel usage <br>8:54 Kernels + kernel-builder architecture <br>10:00 Reproducible builds with Nix and support matrix <br>11:5 Kernel project structure (`build.toml`, sources, ton) <br>12:23Publishing kernels to the HugHub <br>13:03 Real-world gain: faster FlashAttup <br>14:18 Docs, repos, and how tted <br>16:00 Verifying compatibility and loading kernhon <br>17:20 Managing local cache with ls` <br>17:55 Kernelizing PyTorch layers with hpgs <br>19:23 Transformers integration (`use_ke`) <br>20:48 Performance chart and closing resources<br></p><div class="file-embed-wrapper" data-component-name="FileToDOM"><div class="file-embed-container-reader"><div class="file-embed-container-top"><image class="file-embed-thumbnail-default" src="https://substackcdn.com/image/fetch/$s_!0Cy0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack.com%2Fimg%2Fattachment_icon.svg"></image><div class="file-embed-details"><div class="file-embed-details-h1">Hf Kernels Community</div><div class="file-embed-details-h2">3.68MB &#8729; PDF file</div></div><a class="file-embed-button wide" href="https://blog.aifoundry.org/api/v1/file/d17ab7eb-2159-4e1c-964e-987be0b34240.pdf"><span class="file-embed-button-text">Download</span></a></div><a class="file-embed-button narrow" href="https://blog.aifoundry.org/api/v1/file/d17ab7eb-2159-4e1c-964e-987be0b34240.pdf"><span class="file-embed-button-text">Download</span></a></div></div><p><br></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.aifoundry.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.aifoundry.org/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[From AI Agents to Faster Kernels ]]></title><description><![CDATA[Felix LeClair, Ainekko & Ben Burtenshaw, HF]]></description><link>https://blog.aifoundry.org/p/from-ai-agents-to-faster-kernels</link><guid isPermaLink="false">https://blog.aifoundry.org/p/from-ai-agents-to-faster-kernels</guid><pubDate>Tue, 10 Mar 2026 08:15:24 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/190190018/4a4068c368b4e33c43e6ab8e1ce2e2c9.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>On Feb 2d 2026, In Brussels we got together at AI Plumbers FOSDEM fringe event, second edition, where Felix LeClair, Platform Engineer at Ainekko sits down with Ben Burtenshaw from Hugging Face to discuss how the Hugging Face Kernels ecosystem makes high-performance kernels easier to build, share, and run.  </p><p><br>They cover agent-assisted kernel development, benchmarking results across model families, and why reproducible kernel infrastructure is key for broader hardware support.<br><br><strong>Key moments from the talk:</strong><br>0:00 Intro and guest welcome  <br>0:28 What Hugging Face Kernels solves  <br>1:53 Can LLMs help write kernels?  <br>2:06 Using Claude Code + skills to generate kernels  <br>3:21 Benchmarking different models for kernel generation  <br>4:30 Agentic workflows and performance engineering  <br>5:16 Supporting older and affordable hardware  <br>6:55 Why infrastructure matters beyond generated code  <br>7:46How Kernels Hub helps developers learn and contribute  <br>9:20 How to get involved with Hugging Face Kernels                                                    10:00 Wrap-up</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.aifoundry.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.aifoundry.org/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Introducing ET-SOC - the fully open source manycore platform]]></title><description><![CDATA[Gianluca Guida, Head of Software, Ainekko, AI Plumbers: San Francisco Edition]]></description><link>https://blog.aifoundry.org/p/introducing-et-soc-the-fully-open</link><guid isPermaLink="false">https://blog.aifoundry.org/p/introducing-et-soc-the-fully-open</guid><pubDate>Mon, 19 Jan 2026 14:13:22 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/185059150/2bc6193076fefa026564bc7624bf7581.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>On October 25th, in SF we got together to discuss &#8220;What&#8217;s missing in an open-source full-stack AI platform?&#8221;</p><p>&#8203;&#8203;The AI Plumbers Unconference: San Francisco Edition is an open-source meetup for builders of low-level AI systems to dive into the plumbing of modern AI, from modern data infrastructure to AI accelerators.</p><p>Watch #AIPlumbers presentation by Gianluca Guida, Head of Software, Ainekko on Introducing ET-SOC - the fully open source manycore platform, first ever talk introducing it!</p><p><strong>Key moments from the talk:</strong></p><p>00:00 &#8211; 00:25 &#8212; Introduction to the topic</p><p>00:26 &#8211; 01:00 &#8212; Gianluca Guida&#8217;s background</p><p>01:01 &#8211; 02:00 &#8212; Agenda</p><p>02:01 &#8211; 03:54 &#8212; ET Platform Software</p><p>03:55 &#8211; 05:32 &#8212; ETSOC1 Hardware: The PCIe board &#8212; Overview</p><p>05:33 &#8211; 07:25 &#8212; ETSOC1 Hardware: ETSOC-1 &#8212; Overview</p><p>07:26 &#8211; 09:05 &#8212; ETSOC1 Hardware: Minion &#8212; Overview</p><p>09:06 &#8211; 13:03 &#8212; How can I take part in this project?</p><p>13:04 &#8211; 15:21 &#8212; Q&amp;A</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.aifoundry.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.aifoundry.org/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[NVIDIA DYNAMO: Serving LLMs at AI-Factory Scale ]]></title><description><![CDATA[Anish Maddipoti, Product Manager and Rohan Varma, AI Developer, NVIDIA, AI Plumbers: San Francisco Edition]]></description><link>https://blog.aifoundry.org/p/nvidia-dynamo-serving-llms-at-ai</link><guid isPermaLink="false">https://blog.aifoundry.org/p/nvidia-dynamo-serving-llms-at-ai</guid><pubDate>Sat, 13 Dec 2025 10:45:17 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/180524640/40f9d9904c3fded16453e8a4a278867f.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>On October 25th, in SF we got together to discuss &#8220;What&#8217;s missing in an open-source full-stack AI platform?&#8221;</p><p>&#8203;&#8203;The AI Plumbers Unconference: San Francisco Edition is an open-source meetup for builders of low-level AI systems to dive into the plumbing of modern AI, from modern data infrastructure to AI accelerators.</p><p>Watch #AIPlumbers presentation by NVIDIA team on Dynamo, a deep dive into production environment for inference at scale, i.e. both compute and memory demands exploding exponentially.</p><p>Disaggregated serving, intelligent scheduling, multi-tier memory management, KV-routing, and high-availability mechanics &#8212; all designed to push inference efficiency to the maximum.</p><p>This #AIPlumbers talk showcased production-grade engineering: from offline performance configurators that find optimal cluster layouts, to dynamic K8s scheduling that understands physical GPU topology, coordinated multi-GPU serving, etc. Lot&#8217;s of clever tricks on handling compute-bound vs memory-bound workloads, I&#8217;ve heard people discussing before, but now not in theory! And it&#8217;s all #opensource.</p><p>And also really hope to hear more at #FOSDEM26 from the Dynamo team - don&#8217;t miss it!</p><p><strong>Key moments from the talk:</strong></p><p>00:00 &#8211; 01:02 &#8212; Dynamo: Inference at Scale</p><p>01:03 &#8211; 02:49 &#8212; Inference Compute Requirements Scaling Exponentially</p><p>02:50 &#8211; 05:59 &#8212; Dynamo: A Systematic Approach to AI Inference at Scale</p><p>06:00 &#8211; 08:54 &#8212; Memory Management</p><p>08:55 &#8211; 12:19 &#8212; KV Router</p><p>12:20 &#8211; 15:00 &#8212; Production-Grade Serving with Dynamo</p><p>15:01 &#8211; 16:33 &#8212; Offline Perf Configurator</p><p>16:34 &#8211; 18:39 &#8212; Offline Perf Optimizer</p><p>18:40 &#8211; 26:00 &#8212; Topology-Optimized Dynamic K8s Scheduling</p><p>26:01 &#8211; 29:22 &#8212; Fault Tolerance</p><p>29:23 &#8211; 32:32 &#8212; How Dynamo Works</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.aifoundry.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.aifoundry.org/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[What's missing in the open ecosystem for AI? ]]></title><description><![CDATA[Tanya Dadasheva and Roman Shaposhnik, AI Plumbers: San Francisco Edition]]></description><link>https://blog.aifoundry.org/p/whats-missing-in-the-open-ecosystem</link><guid isPermaLink="false">https://blog.aifoundry.org/p/whats-missing-in-the-open-ecosystem</guid><dc:creator><![CDATA[Tanya Dadasheva]]></dc:creator><pubDate>Fri, 21 Nov 2025 10:18:06 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/179539687/65e7b34003762ee9c4828629b8f3d91b.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>On October 25th, in SF we got together to discuss &#8220;What&#8217;s missing in an open-source full-stack AI platform?&#8221;</p><p>&#8203;&#8203;The AI Plumbers Unconference: San Francisco Edition is an open-source meetup for builders of low-level AI systems to dive into the plumbing of modern AI, from modern data infrastructure to AI accelerators.</p><p>Watch #AIPlumbers presentation by Tanya Dadasheva, CEO &amp; CO-Founder, Ainekko and Roman Shaposhnik, CTO &amp; Co-Founder, Ainekko - their first presentation to the community and proper introduction of Ainekko. Well, technically it&#8217;s the second presentation, the first one was at #RISCV summit!</p><p>This #AIPlumbers took place in the industrialized comfort of Studio45 - makerspace in SF, where Ainekko lab is. Nothing shows the vibes of Ainekko more than this - not a fancy office, not an official sales/marketing pitch - here we are with are closest friends - open source community.</p><p><strong>Key moments from the talk:</strong></p><p>00:00 &#8211; 01:19 &#8212; Intro</p><p>01:20 &#8211; 02:04 &#8212; Industry take on AI Infrastructure</p><p>02:05 &#8211; 02:47 &#8212; Evolutionary take on AI Infrastructure</p><p>02:48 &#8211; 03:15 &#8212; VC take on AI Infrastructure</p><p>03:16 &#8211; 04:36 &#8212; Evolution of Chip &amp; Framework Codependency</p><p>04:37 &#8211; 06:43 &#8212; Open Community take on Infrastructure</p><p>06:44 &#8211; 08:52 &#8212; From SW to Manycore Architecture</p><p>08:53 &#8211; 13:10 &#8212; RISC-V Manycore LAMP Stack: &#8220;Less is More&#8221;</p><p>13:11 &#8211; 13:38 &#8212; 100% Evaluation Platform</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.aifoundry.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.aifoundry.org/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Adventures in Model Quantization and GPU performance]]></title><description><![CDATA[John Leimgruber, AI Plumbers: San Francisco Edition]]></description><link>https://blog.aifoundry.org/p/adventures-in-model-quantization</link><guid isPermaLink="false">https://blog.aifoundry.org/p/adventures-in-model-quantization</guid><pubDate>Tue, 11 Nov 2025 10:21:49 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/178577856/2b9dceb8e68ed6b924a4830fdef5a2d8.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>On October 25th, in SF we got together to discuss &#8220;What&#8217;s missing in an open-source full-stack AI platform?&#8221;</p><p>&#8203;&#8203;The AI Plumbers Unconference: San Francisco Edition is an open-source meetup for builders of low-level AI systems to dive into the plumbing of modern AI, from modern data infrastructure to AI accelerators.</p><p>Watch #AIPlumbers presentation by John Leimgruber, Community LLM Quantizer also known as <strong>ubergarm </strong>is one of the most known and productive quantizers in AI world. I&#8217;m not joking about productivity - he has about 30TB of quants on Hugging Face, they even started to limmit his uploads (all fixed now, no worries, there will be more!). I wish there would be people with the job &#8220;quantizer&#8221; on Linkedin so at least we would know how many are there but also I&#8217;m not so sure all of them will even have Linkedin. Anyway, I don&#8217;t know to many, but I sure know the great ones! </p><p>Btw, if you haven&#8217;t seen it go watch <a href="https://mirrors.dotsrc.org/fosdem/2025/ub2252a/fosdem-2025-5991-history-and-advances-of-quantization-in-llama-cpp.mp4">Iwan Kawrakov talk from last #FOSDEM 2025</a>!</p><p>But also do watch John&#8217;s talk to learn what does it take and how to start the journey into #quantization, the benchmarking of different quantizations, what metrics to use - what is speed? what is inside the quant? is fp8 a data type or a quantization type? So if you don&#8217;t have time to download the whole #ggml github discussions and PRs and grep to figure it all out - listen to somebody who did.</p><p>And in the true #AIplumbers tradition it doesn&#8217;t stop on SW optimizations - different #HW backends, memory usage optimizations, thermal considerations and more!</p><p><strong>Key moments from the talk:</strong></p><p>00:00 &#8211; 04:30       Personal Background and Journey</p><p>04:31 &#8211; 7:30         Quant Cooking Quick-Start</p><p>07:31 &#8211; 10:17       Benchmarking Quantization &#8220;Quality&#8221;</p><p>10:18 &#8211; 12:39       Quant comparison for &#8220;Quality&#8221;</p><p>12:40 &#8211; 16:15       Benchmarking Quantization &#8220;Speed&#8221;</p><p>16:16 &#8211; 18:20        LLM Tensors</p><p>18:21 &#8211; 20:38        llama-quantize &#8212; help</p><p>20:39 &#8211; 21:57        MXFP-4 Quantization with 4-bit blocks</p><p>21:58 &#8211; 27:27        GPU Tuning</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://ubergarm.com/images/AI-Plumbers-Conference-2025-SF.pdf&quot;,&quot;text&quot;:&quot;John&#8217;s presentation is here!&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://ubergarm.com/images/AI-Plumbers-Conference-2025-SF.pdf"><span>John&#8217;s presentation is here!</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Trends with Physical AI ]]></title><description><![CDATA[Dhruv Diddi, AI Plumbers: San Francisco Edition]]></description><link>https://blog.aifoundry.org/p/trends-with-physical-ai</link><guid isPermaLink="false">https://blog.aifoundry.org/p/trends-with-physical-ai</guid><pubDate>Fri, 31 Oct 2025 15:25:15 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/177637865/206babe0e00adc59a1e873522541e576.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p><strong>On October 25th, in SF we got together to discuss &#8220;What&#8217;s missing in an open-source full-stack AI platform?&#8221;</strong></p><p>&#8203;&#8203;The AI Plumbers Unconference: San Francisco Edition is an open-source meetup for builders of low-level AI systems to dive into the plumbing of modern AI, from modern data infrastructure to AI accelerators.</p><p>Watch <strong>#AIPlumbers</strong> presentation by Dhruv Diddi, Founder, Solo Tech who walked us trough the demands of #physicalAI from chips to the fine tuning of open robotics models. We all know love #lerobot and SmolVLA models from Hugging Face that bootstrapped that ecosystem and now there is enough substance to innovate, customize and optimise. There are also some demos and interesting numbers in that video - so watch till the end!</p><p>And if you have never seen those cute robots check out Dhruv&#8217;s  #RoboticsGym events or just hang out more at makerspaces!</p><p><strong>Key moments from the talk:</strong></p><p>00:00 &#8211; 00:45       Opening and Purpose of the Talk</p><p>00:46 &#8211; 1:40         Personal Background and Journey</p><p>01:41 &#8211; 02:48       Redefining Intelligence and Artificial Intelligence</p><p>04:30 &#8211; 05:00       From Early AI Models to the Open-Source Revolution</p><p>05:01 &#8211; 06:20       The Rise of Physical AI and Robotics Challenges</p><p>06:21 &#8211; 7:26         Usecase: Robotics</p><p>7:27 &#8211; 10:40         Solo. Open Collaboration and Measurable Impact</p><p>10:40 &#8211; 12:10       Looking Ahead: The Future of Physical AI</p><p>12:11 - 14:00        Q/A session</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.aifoundry.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.aifoundry.org/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[LAMP STACK FOR AI]]></title><description><![CDATA[Roman Shaposhnik, AI Plumbers Conference: 2nd edition]]></description><link>https://blog.aifoundry.org/p/lamp-stack-for-ai</link><guid isPermaLink="false">https://blog.aifoundry.org/p/lamp-stack-for-ai</guid><dc:creator><![CDATA[Roman Shaposhnik]]></dc:creator><pubDate>Fri, 17 Oct 2025 06:29:47 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/176390459/f106a8cf583f455baf5fb4fbc033b188.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p><strong>On July 15, In Berlin we got together at AI Plumbers Conference second edition &#8212; an open source meetup for low-level AI builders to dive deep into the plumbing of modern AI, from cutting-edge data infrastructure to AI accelerators. Take a look at how it was!</strong></p><p>Watch  <strong>#AIPlumbers</strong> presentation by <strong><a href="https://www.linkedin.com/preload/#">Roman Shaposhnik</a></strong>, it raises a lot of questions like what stack can the industry standardize on and making analogy with LAMP stack of the previous era, who is L,A,M and P for the <strong>#AIcomputing</strong> industry and what role does the <strong>#HW</strong> play in it.</p><p>Questions were raised, community discussions happened and now we coming with some of the answers. And as always we would love community feedback as we don&#8217;t want to be developing anything in isolation.</p><p><strong>Key moments from the talk:</strong></p><p>00:00 &#8211; 02:03 From the Internet Boom to the AI Era</p><p>02:04 &#8211; 05:20 The Rise of Digital Entrepreneurship</p><p>05:21 &#8211; 08:00 Scaling Up: From Closed Systems to Open Ecosystems</p><p>08:01 &#8211; 10:30 The Nvidia Monopoly and the New AI Reality</p><p>10:31 &#8211; 13:10 Rethinking the Modern AI Stack</p><p>13:11 &#8211; 14:39 What Enterprises Actually Use in Production</p><p>14:40 &#8211; 18:45 Building the Open-Source AI Infrastructure</p><p>18:46 &#8211; 21:53 Observations. Moving towards Collaborative and Sustainable AI Systems</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.aifoundry.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.aifoundry.org/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[Shaping the AI Landscape: Hugging Face on Community and Innovation]]></title><description><![CDATA[Tanya Dadasheva talking to VB (whatever needs doing @ Hugging Face), AI Plumbers Conference: 2nd edition]]></description><link>https://blog.aifoundry.org/p/shaping-the-ai-landscape-hugging</link><guid isPermaLink="false">https://blog.aifoundry.org/p/shaping-the-ai-landscape-hugging</guid><pubDate>Sun, 12 Oct 2025 09:45:25 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/175938483/ffd1f1e32bb63346647ed3c568cbae98.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>On July 15, In Berlin we got together at AI Plumbers Conference second edition &#8212; an open source meetup for low-level AI builders to dive deep into the plumbing of modern AI, from cutting-edge data infrastructure to AI accelerators. </p><p>With the next edition coming on 25th of October 2025 (and this time in SF, USA!) let&#8217;s hear the advise and insights from the last edition on how to assist the open source community especially in emerging sectors.</p><p>It was great to pick VB&#8217;s brain on where to find the contributors, what is a role of companies like Hugging Face in seeding the communities and making different use cases more accessible - a great example is the #LeRobot arm and hackathons around it. I&#8217;ve seen more and more of these arms in Makerspaces and at different events but what is even cooler - I&#8217;ve seen a few startups picking up the SmolVLA model that was developed for this almost toy robot and finetuning it to real life use cases like excavators and warehouse robots. </p><p>Of course, we have ported the SmolVLA model to our HW to make the inference locally and we&#8217;ll be demoing it at the upcoming AIPlumbers event!</p><p>Key moments from the talk:</p><p>0:00 &#8211; 1:15 | Introduction &amp; Speaker Presentation</p><p>1:16 &#8211; 3:08 | Work at Hugging Face &amp; the Role of Open Source</p><p>3:09 &#8211; 5:10 | Engaging with the Community</p><p>5:11 &#8211; 6:37 | Favorite Use Cases of AI</p><p>6:38 &#8211; 8:22 | Local Models vs. Saa</p><p>8:23 &#8211; 11:10 | Robotics &amp; Hugging Face</p><p>11:20 &#8211; 12:53 | Future &amp; Potential of AI</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.aifoundry.org/p/shaping-the-ai-landscape-hugging/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.aifoundry.org/p/shaping-the-ai-landscape-hugging/comments"><span>Leave a comment</span></a></p>]]></content:encoded></item><item><title><![CDATA[Porting models to dataflow architectures: From Joining the Discord to Drug Discovery]]></title><description><![CDATA[Roman Shaposhnik talking to Moritz Thuning, AI Plumbers Conference: 2nd edition]]></description><link>https://blog.aifoundry.org/p/porting-models-to-dataflow-architectures</link><guid isPermaLink="false">https://blog.aifoundry.org/p/porting-models-to-dataflow-architectures</guid><pubDate>Tue, 23 Sep 2025 10:42:45 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/174324232/9377a71c4537127df5b303b6e2a11452.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>On July 15, In Berlin we got together at AI Plumbers Conference second edition &#8212; an open source meetup for low-level AI builders to dive deep into the plumbing of modern AI, from cutting-edge data infrastructure to AI accelerators. Take a look at how it was!</p><p>How to start in <a href="https://www.linkedin.com/search/results/all/?keywords=%23ai&amp;origin=HASH_TAG_FROM_FEED">#AI</a> as a developer, not just a user? We are pretty sure a lot of people have asked themselves this question and it's broader than a skills question. We would translate it to "how to find the right community where what you are doing is non trivial and useful but you don't have to do all the lift alone". <br><br>Well <a href="https://www.linkedin.com/search/results/all/?keywords=%23opensource&amp;origin=HASH_TAG_FROM_FEED">#opensource</a> is always a good answer. The good news is that there are a lot of really novel areas where you will get the excitement of being on the frontier, being the first. <a href="https://www.linkedin.com/in/moritzthuening/">Moritz Th&#252;ning</a>'s journey is exactly like this: <br><br>stumbling upon <a href="https://www.linkedin.com/company/tenstorrent-inc./">Tenstorrent</a> community, working with new and exciting HW and porting models in a new and exciting research field of drug discovery - that's a way to do your PhD and beyond! Of course we couldn't miss an opportunity at <a href="https://www.linkedin.com/search/results/all/?keywords=%23aiplumbers&amp;origin=HASH_TAG_FROM_FEED">#AIPlumbers</a> in Berlin to get a personal perspective, so instead of a formal talk, it's a friendly chat and sincere answers. </p><p> Key moments from the talk: </p><p>00:00 &#8211; 01:19 Intro &amp; First Impressions at AI Plumbers </p><p>01:20 &#8211; 02:33 Moritz&#8217;s Background &amp; Journey into AI </p><p>02:34 &#8211; 04:44 AlphaFold on Tenstorrent Hardware </p><p>04:45 &#8211; 06:29 Open-Source Culture &amp; Community Experience </p><p>06:30 &#8211; 08:15 Advice for Beginners in AI + Closing</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.aifoundry.org/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.aifoundry.org/subscribe?"><span>Subscribe now</span></a></p><p><br><br></p>]]></content:encoded></item><item><title><![CDATA[From laptop to production: building and scaling LLM-enabled apps with Open-Source tools]]></title><description><![CDATA[Karsten Gresch, AI Plumbers Conference: 2nd edition]]></description><link>https://blog.aifoundry.org/p/from-laptop-to-production-building</link><guid isPermaLink="false">https://blog.aifoundry.org/p/from-laptop-to-production-building</guid><pubDate>Fri, 05 Sep 2025 06:02:21 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/168944473/582c09d55c1f9d07ac4fa6cfbff825f5.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>On July 15, In Berlin we got together at AI Plumbers Conference second edition &#8212; an open source meetup for low-level AI builders to dive deep into the plumbing of modern AI, from cutting-edge data infrastructure to AI accelerators. Take a look at how it was!</p><p>Have you ever wondered if it&#8217;s possible to do full model development and deployment lifecycle with open tools only? Karsten walked us through different stages with <s>almost real time</s> prerecorded demos - it still takes time, but hopefully one day the process will fit in the length of a talk. And hearing it from a field engineer from Red Hat - you get a feel of it working in production in real world not just on a single laptop.</p><p><strong>Key moments from the talk:</strong></p><p>1.00 &#8212; Introduction. Who is Karsten Gresch?</p><p>2.43 &#8212; Various stage of model lifecycle that would be demoed and tools that were used </p><p>5.37 &#8212; Demo 1 - inferencing model locally: Podman AI Lab, Granite open models</p><p>8:40 &#8212; Demo 2 - app communicating with LLM deployed locally: Quarkus, LangChain4j</p><p>15.19 &#8212; Demo 3 - Retraining model locally: InstructLab</p><p>24.55 &#8212; Demo 4 - Running models in production: Backstage</p><p>31.12 &#8212; Summary</p><p>32.15 &#8212; Q&amp;A</p><p></p><p><strong>The presentation slides are available here:</strong></p><div class="file-embed-wrapper" data-component-name="FileToDOM"><div class="file-embed-container-reader"><div class="file-embed-container-top"><image class="file-embed-thumbnail-default" src="https://substackcdn.com/image/fetch/$s_!0Cy0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack.com%2Fimg%2Fattachment_icon.svg"></image><div class="file-embed-details"><div class="file-embed-details-h1">Ai Plumbers Local Llms To Enterprise Ai From Your Laptop To Production</div><div class="file-embed-details-h2">5.41MB &#8729; PDF file</div></div><a class="file-embed-button wide" href="https://blog.aifoundry.org/api/v1/file/59d3a7e2-9c49-4db3-b699-deff93df20ca.pdf"><span class="file-embed-button-text">Download</span></a></div><a class="file-embed-button narrow" href="https://blog.aifoundry.org/api/v1/file/59d3a7e2-9c49-4db3-b699-deff93df20ca.pdf"><span class="file-embed-button-text">Download</span></a></div></div><p> </p>]]></content:encoded></item><item><title><![CDATA[Generative AI at Hugging Face]]></title><description><![CDATA[VB (GPU poor @Hugging Face), AI Plumbers Conference: 2nd edition]]></description><link>https://blog.aifoundry.org/p/generative-ai-at-hugging-face</link><guid isPermaLink="false">https://blog.aifoundry.org/p/generative-ai-at-hugging-face</guid><pubDate>Wed, 27 Aug 2025 11:23:15 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/168934283/6b457b17d0dcd2c990505e95708b9624.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>On July 15, In Berlin we got together at AI Plumbers Conference second edition &#8212; an open source meetup for low-level AI builders to dive deep into the plumbing of modern AI, from cutting-edge data infrastructure to AI accelerators. Take a look at how it was!</p><p>Ever wondered how much companies use their own tools? This is an opportunity to look behind the curtains and see all this AI use cases that Hugging Face has implemented internally. Kicking off our event in Berlin, Vaibhav Srivastav (VB) of Hugging Face with most demo intense talk explaining the implementations of those use cases, what models were used (it&#8217;s actually quite a variety), what pipelines and some advice on how to start implementing your own use cases. Go try the example from the talk on HF and get inspired!</p><p><strong>Key moments from the talk:</strong></p><p>0.29 - Introduction of VB and the use of generative AI and ML in the company&#8217;s products</p><p>2.25 - The 8 useful AI use cases at Hugging Face that we going to <s>deep</s> dive into</p><p>4.24 - Demo of Translation use case</p><p>6.20 - Summarization of research papers indexed on HF</p><p>7.50 - Emoji generation - SDXL LoRA used on the Hub</p><p>8.45 - Demo of Semantic Search on Spaces with natural language </p><p>11.18 - Demo of Semantic Search on Daily Papers </p><p>13.20 - Structured Generation &amp; Parsing - summarize relevant papers from Archive for the use of researches (highest level of automation and one of the coolest use cases from VB perspective) </p><p>16.45 - Demo of SQL Generation</p><p>20.05 - Presentation of the entire HF Hub acting as MCP server (Model Context Protocol) </p><p>22.36 - Recommendations for using generative AI in your own projects</p><p></p><p>The presentation slides are available here:</p><div class="file-embed-wrapper" data-component-name="FileToDOM"><div class="file-embed-container-reader"><div class="file-embed-container-top"><image class="file-embed-thumbnail-default" src="https://substackcdn.com/image/fetch/$s_!0Cy0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack.com%2Fimg%2Fattachment_icon.svg"></image><div class="file-embed-details"><div class="file-embed-details-h1">Generative Ai At Hf</div><div class="file-embed-details-h2">5.23MB &#8729; PDF file</div></div><a class="file-embed-button wide" href="https://blog.aifoundry.org/api/v1/file/1e637fbe-f8d6-4e52-b023-debbf0e4243d.pdf"><span class="file-embed-button-text">Download</span></a></div><a class="file-embed-button narrow" href="https://blog.aifoundry.org/api/v1/file/1e637fbe-f8d6-4e52-b023-debbf0e4243d.pdf"><span class="file-embed-button-text">Download</span></a></div></div><p> </p>]]></content:encoded></item><item><title><![CDATA[ТT-Boltz: AlphaFold 3 on Tenstorrent Wormhole]]></title><description><![CDATA[Moritz Th&#252;ning (Implementing AlphaFold 3 on Tenstorrent Wormhole | CS @TUM), AI Plumbers Conference: 2nd edition]]></description><link>https://blog.aifoundry.org/p/it-boltz-alphafold-3-on-tenstorrent</link><guid isPermaLink="false">https://blog.aifoundry.org/p/it-boltz-alphafold-3-on-tenstorrent</guid><pubDate>Thu, 21 Aug 2025 09:51:18 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/168936857/dff0c0735d3665abddc04b8328a45a37.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>On July 15, In Berlin we got together at AI Plumbers Conference second edition &#8212; an open source meetup for low-level AI builders to dive deep into the plumbing of modern AI, from cutting-edge data infrastructure to AI accelerators. Take a look at how it was!</p><p>If you know what AlphaFold is (and you should, it won a Noble Prize) and are thinking that this talk requires you to have a PhD in biology, we have news for you! Sure, the creation of the model required a lot of research, but it doesn&#8217;t stop there. There are a lot of ways to make your contributions as an engineer improving the way the model is run and the systems it&#8217;s running on. And that way you can become part of this cutting edge solution creating protein structures. </p><p>Watch Moritz Th&#252;ning demonstrate how he ported the model to a new HW architecture. There are definitely hacks required as there is not yet a compiler that can just take any model and run it, but as Moritz demonstrates it&#8217;s doable and absolutely worth it! There is a very special energy in taking a fresh new research and be the first one to run it on certain HW. This kind of exercise truly gives you end-to-end experience that #AIPlumbers is all about. And since it all so new there&#8217;s a lot of room for optimizations and be the first one to implement them!</p><p><strong>Key moments from the talk:</strong></p><p>0:28 &#8212; Introduction to the TT Boltz project (running AlphaFold on Tenstorrent) and how Moritz ran into Tenstorrent and decided to do this project</p><p>3:22 &#8212; Dataflow architecture - is it a successful paradigm?</p><p>5:15 &#8212; Tenstorrent hardware overview</p><p>9:27 &#8212; Biology side of things - protein folding problem</p><p>10:23 &#8212; Results and architecture of AlphaFold 3 (very weird diffusion model) </p><p>11:17 &#8212; Why Boltz exists - open source licensing restrictions of AlphaFold</p><p>12:08 &#8212; Profiling runtime on CPU to find modules with highest memory and time complexity</p><p>12:46 &#8212; Rewriting pytorch modules (performer and diffusion) in tt-nn + rapper, integrate it back for existence proof until there is a proper complier</p><p>14:22 &#8212; Results - Prediction of a protein</p><p>14:51 &#8212; Performance across different hardware</p><p>15:58 &#8212; Triangle Self-attention (in Performer) and possible optimizations to fit it in the chip, trading memory complexity for time complexity, data/model/tensor parallelism </p><p>18:18 &#8212; Join the work on github!</p><p></p><p>The presentation slides are available here:</p><div class="file-embed-wrapper" data-component-name="FileToDOM"><div class="file-embed-container-reader"><div class="file-embed-container-top"><image class="file-embed-thumbnail-default" src="https://substackcdn.com/image/fetch/$s_!0Cy0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack.com%2Fimg%2Fattachment_icon.svg"></image><div class="file-embed-details"><div class="file-embed-details-h1">Tt Boltz Berlin</div><div class="file-embed-details-h2">4.32MB &#8729; PDF file</div></div><a class="file-embed-button wide" href="https://blog.aifoundry.org/api/v1/file/6c64b4a2-fba5-40d2-b5ec-4eadad0bc238.pdf"><span class="file-embed-button-text">Download</span></a></div><a class="file-embed-button narrow" href="https://blog.aifoundry.org/api/v1/file/6c64b4a2-fba5-40d2-b5ec-4eadad0bc238.pdf"><span class="file-embed-button-text">Download</span></a></div></div><p> </p>]]></content:encoded></item><item><title><![CDATA[Overhauling vision support in llama.cpp and llama-server]]></title><description><![CDATA[Xuan-Son Nguyen (Engineer @Hugging Face), AI Plumbers Conference: 2nd edition]]></description><link>https://blog.aifoundry.org/p/overhauling-vision-support-in-llamacpp</link><guid isPermaLink="false">https://blog.aifoundry.org/p/overhauling-vision-support-in-llamacpp</guid><pubDate>Wed, 13 Aug 2025 08:20:32 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/168946268/a9882aa0644f08a1737fa0845a5b7e45.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>On July 15, In Berlin we got together at AI Plumbers Conference second edition &#8212; an open source meetup for low-level AI builders to dive deep into the plumbing of modern AI, from cutting-edge data infrastructure to AI accelerators. Take a look at how it was!</p><p>Community choices, perfectionism overhaul and the locally run demos - Xuan-Son Nguyen demonstrating how vision support was added to llama.cpp for multimodel use cases, the obstacles and the clever hacks. Still work to do - go try it on Hugging Face Spaces (of course) or locally and contribute to llama.cpp!</p><p><strong>Key moments from the talk:</strong></p><p>0:55 &#8212; Demo running locally llama-server with Qwen 3bn omni model with image and audio input</p><p>3:21 &#8212; Introduction: who is Xuan-Son Nguyen</p><p>4:10 &#8212; A little bit about history - how multimodel works</p><p>6:10 &#8212; History - adding and removing multimodel (LLaVA) support in llama.cpp </p><p>9:21 &#8212; History - what caused the problems in the llava.cpp / clip.cpp implementation</p><p>10:45 &#8212; How to fix it?</p><p>12:08 &#8212; Enter libmtnd</p><p>13:12 &#8212; libmtnd architecture</p><p>16:50 &#8212; libmtnd: minimal, simple, well-documented API (adding audio support didn&#8217;t require API change!)</p><p>17:30 &#8212; LM Studio is one of the earliest adopter of libmtnd</p><p>18:10 &#8212; Demo mtdt-CLI</p><p>19:14 &#8212; Bring this work to llama-server (some functionality) </p><p>21:55 &#8212; llama-server WebUI</p><p>23:58 &#8212; <a href="https://github.com/ngxson/smolvlm-realtime-webcam">Viral demo</a> - try it!</p><p>25:10 &#8212; TODO</p><p></p><p><strong>The presentation slides are available here:</strong></p><div class="file-embed-wrapper" data-component-name="FileToDOM"><div class="file-embed-container-reader"><div class="file-embed-container-top"><image class="file-embed-thumbnail-default" src="https://substackcdn.com/image/fetch/$s_!0Cy0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack.com%2Fimg%2Fattachment_icon.svg"></image><div class="file-embed-details"><div class="file-embed-details-h1">Ai Plumbers 2nd Edition</div><div class="file-embed-details-h2">1.77MB &#8729; PDF file</div></div><a class="file-embed-button wide" href="https://blog.aifoundry.org/api/v1/file/33c6eda1-e0d8-42e2-90ba-7946b2991879.pdf"><span class="file-embed-button-text">Download</span></a></div><a class="file-embed-button narrow" href="https://blog.aifoundry.org/api/v1/file/33c6eda1-e0d8-42e2-90ba-7946b2991879.pdf"><span class="file-embed-button-text">Download</span></a></div></div><p><strong> </strong></p>]]></content:encoded></item><item><title><![CDATA[How to Build a Community at an AI Hardware Company, AI Plumbers Conference]]></title><description><![CDATA[Podcast with Tanya Dadasheva & Shubham Saboo, Head of Developer Relations @ Tenstorrent]]></description><link>https://blog.aifoundry.org/p/how-to-build-a-community-at-an-ai</link><guid isPermaLink="false">https://blog.aifoundry.org/p/how-to-build-a-community-at-an-ai</guid><pubDate>Mon, 09 Jun 2025 10:48:47 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/164224377/288861a5a8895bb02936e02317a7bcd3.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p><strong>How do you build a strong developer community around hardware? And even more complicated task - how do you do it in a nascent market?</strong></p><p>In this conversation, Tanya sits down with Shubham Saboo, Head of Developer Relations at Tenstorrent, to dive into what it takes to connect developers, open-source contributors, and AI innovators across the stack - from conference insights to real-world engagement tactics.</p><p></p><p>00:30 &#8212; What does &#8220;community&#8221; mean for the Head of Developer Relations at Tenstorrent, and how did Shubham get started in this role? </p><p>02:08 &#8212; What conferences does Shubham attend to learn about model enhancement and the rest of the stack? </p><p>03:35 &#8212; What has Shubham learned from different conferences related to open source? </p><p>04:54 &#8212; Meeting developers, vendors, and model builders where they are</p><p>06:33 &#8212; A detailed look at Tenstorrent&#8217;s community-building approach</p><p>07:40 &#8212; Hacks and tips to engage people</p><p>10:08 &#8212; What excites Shubham, as Head of Developer Relations at Tenstorrent, about working in the AI ecosystem? </p><p></p><p>Whether you're building AI tools, contributing to open source, or simply passionate about community in tech &#8212; tune in and share with those shaping the future of AI, one connection at a time.</p>]]></content:encoded></item><item><title><![CDATA[Nerds Talking to Nerds, AI Plumbers Conference]]></title><description><![CDATA[Podcast with Roman Shaposhnik & Felix LeClair, HPC Specialist @ Tenstorrent]]></description><link>https://blog.aifoundry.org/p/nerds-talking-to-nerds-ai-plumbers</link><guid isPermaLink="false">https://blog.aifoundry.org/p/nerds-talking-to-nerds-ai-plumbers</guid><pubDate>Mon, 02 Jun 2025 12:47:49 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/164226023/1203bdaea69eca542c603c25dbf090a6.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>When two engineers sit down for a conversation, it rarely stays on the surface. Unfortunately you can&#8217;t really film all of the conversations over beer or in a hallway track. For those who missed it at FOSDEM we have sneaked out during AI Plumbers fringe event to film some of those unscripted discussions. Tune in if you enjoy hearing people who speak your language.</p><p></p><p>1:00 &#8212; Felix&#8217; background and big start in the world in OpenBLAS and Formula1</p><p>1:53 &#8212; Engineers having a real job in AI, finally!</p><p>2:50 &#8212; Open source plays a major role in fostering discussions like this and for people arguing about compilers </p><p>3:17 &#8212; &#8220;When you build something, you want it to be the best&#8221; -  that&#8217;s the honesty OSS gives you</p><p>4:20 &#8212; A place to think about end-to-end problems - putting all the parts of the puzzle together</p><p>5:06 &#8212; The most mind-blowing CPU architecture for Felix</p><p>6:30 &#8212; Applying AI to chip development</p><p>7:33 &#8212; What are compiler friends talking about </p><p>8:32 &#8212; &#8220;Machines building machines&#8221; = AI driven silicon design</p><p>9:03  &#8212; &#8220;Eating your own dog food&#8221; and doing it in open source </p><p>Felix&#8217;s insights are both experienced and bold &#8212; a reminder that if you&#8217;re building, build to be the best. Share with fellow nerds who love thinking at this level.</p>]]></content:encoded></item><item><title><![CDATA[Trevor Grant with presentation "Chain of Thought Reasoning And Other LLM Tricks"]]></title><description><![CDATA[(Re)visit the talk from the 1st edition of AI Plumbers]]></description><link>https://blog.aifoundry.org/p/trevor-grant-with-presentation-chain</link><guid isPermaLink="false">https://blog.aifoundry.org/p/trevor-grant-with-presentation-chain</guid><pubDate>Mon, 21 Apr 2025 12:16:00 GMT</pubDate><enclosure url="https://api.substack.com/feed/podcast/161304259/758f96d21b329f25a0e0586d2b0cd690.mp3" length="0" type="audio/mpeg"/><content:encoded><![CDATA[<p>Check out the talk by Trevor Grant, PMC Apache Mahout AI Plumbers conference, fringe event of the <a href="https://fosdem.org/2025/schedule/track/ai/">FOSDEM 2025 Low-level AI Engineering and Hacking DevRoom</a> in Brussels, Belgium</p><p><strong>Key moments from the talk:</strong></p><p>2:07 - About the AI Alliance...</p><p>3:27 - Do I Need a Ridiculously Large Closed Source Model?</p><p>4:24 - An (Embarrassingly Quick) Overview of LLM Function Calling</p><p>10:21 - Overview of Chain of Thought</p><p>14:29 - Implementation - The Code</p><p>15:26 - Implementation - The Demo</p><p>15:45 - Do Math</p><p>16:30 - Hot Take - Your Laptop Is Not An LLM Server</p><p>18:25 - Hosting your LLM</p><p>The presentation slides are available here:</p><div class="file-embed-wrapper" data-component-name="FileToDOM"><div class="file-embed-container-reader"><div class="file-embed-container-top"><image class="file-embed-thumbnail-default" src="https://substackcdn.com/image/fetch/$s_!0Cy0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack.com%2Fimg%2Fattachment_icon.svg"></image><div class="file-embed-details"><div class="file-embed-details-h1">Trevor Grant @ AI Plumber Ghent</div><div class="file-embed-details-h2">2.42MB &#8729; PDF file</div></div><a class="file-embed-button wide" href="https://aifoundryorg.substack.com/api/v1/file/40f6764f-45e1-4d73-9bab-10f2e4bba199.pdf"><span class="file-embed-button-text">Download</span></a></div><a class="file-embed-button narrow" href="https://aifoundryorg.substack.com/api/v1/file/40f6764f-45e1-4d73-9bab-10f2e4bba199.pdf"><span class="file-embed-button-text">Download</span></a></div></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.aifoundry.org/p/trevor-grant-with-presentation-chain/comments&quot;,&quot;text&quot;:&quot;Leave a comment&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.aifoundry.org/p/trevor-grant-with-presentation-chain/comments"><span>Leave a comment</span></a></p><p></p>]]></content:encoded></item></channel></rss>