On Feb 2, 2026, in Brussels, we got together at the second edition of the AI Plumbers FOSDEM fringe event, where Ben Burtenshaw from Hugging Face broke down why optimized kernels are critical for real-world deep learning performance and how the Hugging Face Kernels ecosystem makes them easier to build and use.
He covers memory-bound bottlenecks, the kernel-builder workflow, reproducible multi-hardware builds with Nix, and practical PyTorch/Transformers integration patterns that reduce setup time from hours to seconds.
Key moments from the talk:
0:00 Intro and speaker background
1:35 Why Hugging Face Kernels matters
2:05 Compute vs memory bottlenecks in deep learning
3:30 Fused kernels and why they speed things up
5:05 Talk agenda and ecosystem overview
5:35 Kernel pain points: fragmentation and long installs
7:12 Supporting older, cheaper hardware for the community
8:18 Goal: from CMake errors to one-line kernel usage
8:54 Kernels + kernel-builder architecture
10:00 Reproducible builds with Nix and support matrix
11:5 Kernel project structure (`build.toml`, sources, ton)
12:23Publishing kernels to the HugHub
13:03 Real-world gain: faster FlashAttup
14:18 Docs, repos, and how tted
16:00 Verifying compatibility and loading kernhon
17:20 Managing local cache with ls`
17:55 Kernelizing PyTorch layers with hpgs
19:23 Transformers integration (`use_ke`)
20:48 Performance chart and closing resources







