community

Show HN: Xybrid – run LLM and speech locally in your app (no back end, Rust)

March 18, 2026 at 03:53 PMUpdated: Mar 201 Sources

TL;DR

Xybrid is a Rust library that embeds LLM and speech pipelines directly inside your app – no server, no daemon, just one binary. Supports GGUF, ONNX, and CoreML; integrations available for Flutter, Swift, Kotlin, Unity, and Tauri. On current smartphones the library achieves roughly 20 tok/s on Android and 40 tok/s on iOS with quantized ~3B models. Demo: 6 NPCs in a Unity tavern scene generate real-time dialogue entirely on-device – no API key, no internet, no per-request cost.

Nauti's Take

The fact that a Unity tavern scene with six talking NPCs is the most compelling proof point for a serious infrastructure library says a lot about how far on-device AI has come. Xybrid solves a real problem: the 'no separate server' approach sounds trivial but is in practice the biggest friction point when shipping embedded AI features.

The 40 tok/s on iOS is impressive – as long as expectations stay in the 3B-model range and nobody expects a mid-range Android phone to match cloud-scale quality. Open-source, Rust, broad platform support, clear positioning: this is not a hype project, it deserves a proper look.

Briefingshow

On-device inference has largely been the domain of large players with proprietary frameworks. Xybrid dramatically lowers the barrier: anyone comfortable with Rust or a supported language can ship privacy-respecting AI features without cloud dependencies. For industries with strict data-protection requirements – healthcare, legal, enterprise software – a single binary with zero network traffic is a compelling argument.

Developers running apps with ongoing API costs will immediately grasp what a zero-marginal-cost inference stack means for unit economics.

Video

Sources

18.3.26

Show HN: Xybrid – run LLM and speech locally in your app (no back end, Rust)

TL;DR

Nauti's Take

Video

Sources

From Our Newsletter