0:00
/
0:00
Transcript

Overhauling vision support in llama.cpp and llama-server

Xuan-Son Nguyen (Engineer @Hugging Face), AI Plumbers Conference: 2nd edition

On July 15, In Berlin we got together at AI Plumbers Conference second edition — an open source meetup for low-level AI builders to dive deep into the plumbing of modern AI, from cutting-edge data infrastructure to AI accelerators. Take a look at how it was!

Community choices, perfectionism overhaul and the locally run demos - Xuan-Son Nguyen demonstrating how vision support was added to llama.cpp for multimodel use cases, the obstacles and the clever hacks. Still work to do - go try it on Hugging Face Spaces (of course) or locally and contribute to llama.cpp!

Key moments from the talk:

0:55 — Demo running locally llama-server with Qwen 3bn omni model with image and audio input

3:21 — Introduction: who is Xuan-Son Nguyen

4:10 — A little bit about history - how multimodel works

6:10 — History - adding and removing multimodel (LLaVA) support in llama.cpp

9:21 — History - what caused the problems in the llava.cpp / clip.cpp implementation

10:45 — How to fix it?

12:08 — Enter libmtnd

13:12 — libmtnd architecture

16:50 — libmtnd: minimal, simple, well-documented API (adding audio support didn’t require API change!)

17:30 — LM Studio is one of the earliest adopter of libmtnd

18:10 — Demo mtdt-CLI

19:14 — Bring this work to llama-server (some functionality)

21:55 — llama-server WebUI

23:58 — Viral demo - try it!

25:10 — TODO

The presentation slides are available here:

Ai Plumbers 2nd Edition
1.77MB ∙ PDF file
Download
Download

Discussion about this video

User's avatar