On July 15, In Berlin we got together at AI Plumbers Conference second edition — an open source meetup for low-level AI builders to dive deep into the plumbing of modern AI, from cutting-edge data infrastructure to AI accelerators. Take a look at how it was!
Community choices, perfectionism overhaul and the locally run demos - Xuan-Son Nguyen demonstrating how vision support was added to llama.cpp for multimodel use cases, the obstacles and the clever hacks. Still work to do - go try it on Hugging Face Spaces (of course) or locally and contribute to llama.cpp!
Key moments from the talk:
0:55 — Demo running locally llama-server with Qwen 3bn omni model with image and audio input
3:21 — Introduction: who is Xuan-Son Nguyen
4:10 — A little bit about history - how multimodel works
6:10 — History - adding and removing multimodel (LLaVA) support in llama.cpp
9:21 — History - what caused the problems in the llava.cpp / clip.cpp implementation
10:45 — How to fix it?
12:08 — Enter libmtnd
13:12 — libmtnd architecture
16:50 — libmtnd: minimal, simple, well-documented API (adding audio support didn’t require API change!)
17:30 — LM Studio is one of the earliest adopter of libmtnd
18:10 — Demo mtdt-CLI
19:14 — Bring this work to llama-server (some functionality)
21:55 — llama-server WebUI
23:58 — Viral demo - try it!
25:10 — TODO
The presentation slides are available here:
Share this post