Roman Shaposhnik " ̶A̶t̶t̶e̶n̶t̶i̶o̶n̶ T̶r̶a̶n̶s̶f̶o̶r̶m̶e̶r̶s̶ ̶ llama.cpp is all you need"

Playback speed

Share post at current time

Share from 0:00

0:00

Transcript

Roman Shaposhnik " ̶A̶t̶t̶e̶n̶t̶i̶o̶n̶ T̶r̶a̶n̶s̶f̶o̶r̶m̶e̶r̶s̶ ̶ llama.cpp is all you need"

(Re)visit the talk from the 1st edition of AI Plumbers

Roman Shaposhnik

Apr 21, 2025

Transcript

Text within this block will maintain its original spacing when published

After a whole day full of talks at AI DevRoom at FOSDEM and the talks track at AI Plumbers, this was a final presentation to spark controversy and raise question for unconference section that came next. 

Key moments from the talk:
0:30 - Picking up where the "The Local AI Rebellion" left off
1:15 - Inference vs training needs - inference vs training chips
3:03 - What kind of silicon architecture would work best
4:25 - Frameworks, compilers, "stuff" that makes a system 
6:18 - History lesson on inference frameworks - how we ended up where we are
18:00 - Ancient history lesson - recognizing similar patterns
19:57 - Refactoring? Why are things so complicated? Right level of abstraction? Figuring it out is on us!
26:13 - tinygrad approach - RISC for Op types? 
30:00 - Ongoing work, predictions and AIFoundry principles 

The presentation slides are available here:

Roman Shaposhnik @ AI Plumbers Ghent

7.17MB ∙ PDF file

Download

Text within this block will maintain its original spacing when published

P.S. Don't miss the hot takes to be in on all the inside jokes, look for discussion of the main use case for local AI (if you know, you know, thank you kobold.cpp), "uncovering" hidden dependencies in ML frameworks, some blackhat use cases and many more!

AI Foundry

Roman Shaposhnik " ̶A̶t̶t̶e̶n̶t̶i̶o̶n̶ T̶r̶a̶n̶s̶f̶o̶r̶m̶e̶r̶s̶ ̶ llama.cpp is all you need"

Discussion about this video