---
title: "Show HN: Xybrid – run LLM and speech locally in your app (no back end, Rust)"
slug: "show-hn-xybrid-run-llm-and-speech-locally-in-your-app-no-back-end-rust"
date: 2026-03-18
category: community
tags: []
language: en
sources_count: 1
featured: false
publisher: AInauten News
url: https://news.ainauten.com/en/story/show-hn-xybrid-run-llm-and-speech-locally-in-your-app-no-back-end-rust
---

# Show HN: Xybrid – run LLM and speech locally in your app (no back end, Rust)

**Published**: 2026-03-18 | **Category**: community | **Sources**: 1

---

## TL;DR

- Xybrid is a Rust library that embeds LLM and speech pipelines directly inside your app – no server, no daemon, just one binary.

---

## Summary

- Xybrid is a Rust library that embeds LLM and speech pipelines directly inside your app – no server, no daemon, just one binary.
- Supports GGUF, ONNX, and CoreML; integrations available for Flutter, Swift, Kotlin, Unity, and Tauri.
- On current smartphones the library achieves roughly 20 tok/s on Android and 40 tok/s on iOS with quantized ~3B models.
- Demo: 6 NPCs in a Unity tavern scene generate real-time dialogue entirely on-device – no API key, no internet, no per-request cost.

---

## Why it matters

Xybrid is a Rust library that embeds LLM and speech pipelines directly inside your app – no server, no daemon, just one binary.

---

## Key Points

- Xybrid is a Rust library that embeds LLM and speech pipelines directly inside your app – no server, no daemon, just one binary.
- Supports GGUF, ONNX, and CoreML; integrations available for Flutter, Swift, Kotlin, Unity, and Tauri.
- On current smartphones the library achieves roughly 20 tok/s on Android and 40 tok/s on iOS with quantized ~3B models.
- Demo: 6 NPCs in a Unity tavern scene generate real-time dialogue entirely on-device – no API key, no internet, no per-request cost.

---

## Nauti's Take

The fact that a Unity tavern scene with six talking NPCs is the most compelling proof point for a serious infrastructure library says a lot about how far on-device AI has come. Xybrid solves a real problem: the 'no separate server' approach sounds trivial but is in practice the biggest friction point when shipping embedded AI features. The 40 tok/s on iOS is impressive – as long as expectations stay in the 3B-model range and nobody expects a mid-range Android phone to match cloud-scale quality. Open-source, Rust, broad platform support, clear positioning: this is not a hype project, it deserves a proper look.

---


## FAQ

**Q:** What is Show HN about?

**A:** - Xybrid is a Rust library that embeds LLM and speech pipelines directly inside your app – no server, no daemon, just one binary.

**Q:** Why does it matter?

**A:** Xybrid is a Rust library that embeds LLM and speech pipelines directly inside your app – no server, no daemon, just one binary.

**Q:** What are the key takeaways?

**A:** Xybrid is a Rust library that embeds LLM and speech pipelines directly inside your app – no server, no daemon, just one binary.. Supports GGUF, ONNX, and CoreML; integrations available for Flutter, Swift, Kotlin, Unity, and Tauri.. On current smartphones the library achieves roughly 20 tok/s on Android and 40 tok/s on iOS with quantized ~3B models.

---

## Related Topics

- —

---

## Sources

- [Show HN: Xybrid – run LLM and speech locally in your app (no back end, Rust)](https://github.com/xybrid-ai/xybrid) - Hacker News AI

---

## About This Article

This article is a synthesis of 1 sources, curated and summarized by AInauten News. We aggregate AI news from trusted sources and provide bilingual (German/English) coverage.

**Publisher**: [AInauten](https://www.ainauten.com) | **Site**: [news.ainauten.com](https://news.ainauten.com)

---

*Last Updated: 2026-03-20*