Hi, I'm
Xuan-Son Nguyen
Software engineer at Hugging Face
What I am doing
- Writing blog posts about LLM
- Core maintainer of llama.cpp
- Hacking real-time vision models
- Ollama - Hugging Face integration

My slogan

I do AI for its science and its power to shape the future.
Profit is not my purpose - impact is.

Where to find me
My favorite recipe

Ingredients
- A computer (GPU is not required)
- A modern OS (Linux, Mac, Windows)
- 10GB+ of disk space
- A terminal or command-line interface
- Maybe a cup of café or tea, whatever
Step 1: Install llama.cpp
Right, listen up! Installing llama.cpp is simple!
Follow this beautiful install guide for your OS.
Step 2: Pick a model
I've teamed up with the absolute legends at LM Studio, Bartowski, and Unsloth to serve you the most exquisite, perfectly quantized GGUF models. Models around 8 billion parameters? Chef's kiss - that's your sweet spot, the perfect balance between performance and quality. Beautiful!
And here's the beautiful part - no manual downloads, no faff! Just grab the model's Hugging Face repository name in the format <user>/<model>
. Write it down, respect it - we'll need this golden ticket for the next step.
Step 3: Fire it up!
Command line warriors, this one's for you - clean and simple:
llama-cli -hf <user>/<model>
Or maybe you want the full restaurant experience? Spawn a server and get that gorgeous web UI at http://127.0.0.1:8080 - it's stunning!
llama-server -hf <user>/<model>
The secret sauce: Multimodal magic!
Listen, I didn't just add vision and audio support to llama.cpp - I perfected it! Grab any compatible model from this incredible collection and upload images or audio files straight through the web UI. It's so smooth, so elegant - even Gordon Ramsay would be proud!
