2 / 320

Show HN: Xybrid – run LLM and speech locally in your app (no back end, Rust)

TL;DR

Xybrid is a Rust library that embeds LLM and speech pipelines directly inside your app – no server, no daemon, just one binary.

Key Points

  • Supports GGUF, ONNX, and CoreML; integrations available for Flutter, Swift, Kotlin, Unity, and Tauri.
  • On current smartphones the library achieves roughly 20 tok/s on Android and 40 tok/s on iOS with quantized ~3B models.
  • Demo: 6 NPCs in a Unity tavern scene generate real-time dialogue entirely on-device – no API key, no internet, no per-request cost.

Nauti's Take

The fact that a Unity tavern scene with six talking NPCs is the most compelling proof point for a serious infrastructure library says a lot about how far on-device AI has come. Xybrid solves a real problem: the 'no separate server' approach sounds trivial but is in practice the biggest friction point when shipping embedded AI features.

The 40 tok/s on iOS is impressive – as long as expectations stay in the 3B-model range and nobody expects a mid-range Android phone to match cloud-scale quality. Open-source, Rust, broad platform support, clear positioning: this is not a hype project, it deserves a proper look.

Video

Sources