Show HN: Xybrid – run LLM and speech locally in your app (no back end, Rust)
TL;DR
Xybrid is a Rust library that embeds LLM and speech pipelines directly inside your app – no server, no daemon, just one binary.
Key Points
- Supports GGUF, ONNX, and CoreML; integrations available for Flutter, Swift, Kotlin, Unity, and Tauri.
- On current smartphones the library achieves roughly 20 tok/s on Android and 40 tok/s on iOS with quantized ~3B models.
- Demo: 6 NPCs in a Unity tavern scene generate real-time dialogue entirely on-device – no API key, no internet, no per-request cost.
Nauti's Take
The fact that a Unity tavern scene with six talking NPCs is the most compelling proof point for a serious infrastructure library says a lot about how far on-device AI has come. Xybrid solves a real problem: the 'no separate server' approach sounds trivial but is in practice the biggest friction point when shipping embedded AI features.
The 40 tok/s on iOS is impressive – as long as expectations stay in the 3B-model range and nobody expects a mid-range Android phone to match cloud-scale quality. Open-source, Rust, broad platform support, clear positioning: this is not a hype project, it deserves a proper look.