Roman Shaposhnik " ̶A̶t̶t̶e̶n̶t̶i̶o̶n̶ T̶r̶a̶n̶s̶f̶o̶r̶m̶e̶r̶s̶ ̶ llama.cpp is all you need"

(Re)visit the talk from the 1st edition of AI Plumbers
After a whole day full of talks at AI DevRoom at FOSDEM and the talks track at AI Plumbers, this was a final presentation to spark controversy and raise question for unconference section that came next. 

Key moments from the talk:
0:30 - Picking up where the "The Local AI Rebellion" left off
1:15 - Inference vs training needs - inference vs training chips
3:03 - What kind of silicon architecture would work best
4:25 - Frameworks, compilers, "stuff" that makes a system 
6:18 - History lesson on inference frameworks - how we ended up where we are
18:00 - Ancient history lesson - recognizing similar patterns
19:57 - Refactoring? Why are things so complicated? Right level of abstraction? Figuring it out is on us!
26:13 - tinygrad approach - RISC for Op types? 
30:00 - Ongoing work, predictions and AIFoundry principles 

The presentation slides are available here:
Roman Shaposhnik @ AI Plumbers Ghent
7.17MB ∙ PDF file
Download
Download
P.S. Don't miss the hot takes to be in on all the inside jokes, look for discussion of the main use case for local AI (if you know, you know, thank you kobold.cpp), "uncovering" hidden dependencies in ML frameworks, some blackhat use cases and many more!

Leave a comment