On July 15, In Berlin we got together at AI Plumbers Conference second edition — an open source meetup for low-level AI builders to dive deep into the plumbing of modern AI, from cutting-edge data infrastructure to AI accelerators. Take a look at how it was!
If you know what AlphaFold is (and you should, it won a Noble Prize) and are thinking that this talk requires you to have a PhD in biology, we have news for you! Sure, the creation of the model required a lot of research, but it doesn’t stop there. There are a lot of ways to make your contributions as an engineer improving the way the model is run and the systems it’s running on. And that way you can become part of this cutting edge solution creating protein structures.
Watch Moritz Thüning demonstrate how he ported the model to a new HW architecture. There are definitely hacks required as there is not yet a compiler that can just take any model and run it, but as Moritz demonstrates it’s doable and absolutely worth it! There is a very special energy in taking a fresh new research and be the first one to run it on certain HW. This kind of exercise truly gives you end-to-end experience that #AIPlumbers is all about. And since it all so new there’s a lot of room for optimizations and be the first one to implement them!
Key moments from the talk:
0:28 — Introduction to the TT Boltz project (running AlphaFold on Tenstorrent) and how Moritz ran into Tenstorrent and decided to do this project
3:22 — Dataflow architecture - is it a successful paradigm?
5:15 — Tenstorrent hardware overview
9:27 — Biology side of things - protein folding problem
10:23 — Results and architecture of AlphaFold 3 (very weird diffusion model)
11:17 — Why Boltz exists - open source licensing restrictions of AlphaFold
12:08 — Profiling runtime on CPU to find modules with highest memory and time complexity
12:46 — Rewriting pytorch modules (performer and diffusion) in tt-nn + rapper, integrate it back for existence proof until there is a proper complier
14:22 — Results - Prediction of a protein
14:51 — Performance across different hardware
15:58 — Triangle Self-attention (in Performer) and possible optimizations to fit it in the chip, trading memory complexity for time complexity, data/model/tensor parallelism
18:18 — Join the work on github!
The presentation slides are available here:
Share this post