4 / 597

Show HN: Dewey – Ingest docs, search semantically, get cited AI answers

TL;DR

Dewey is a RAG framework that models documents, sections, and chunks as first-class API primitives rather than treating a PDF as a flat bag of paragraphs.

Key Points

  • A 'section manifest' provides the full heading hierarchy with byte offsets, letting agents scan document structure cheaply before committing to full chunk retrieval.
  • The /research endpoint runs an agent capable of multi-hop reasoning across multiple documents, enabling synthesis of conflicting findings from different sources.
  • The primary target use case is scientific literature, where standard RAG (embed, top-k, generate) breaks down when research depth is required.
  • Answers include citations at the section level, allowing results to be traced back to the exact passage in a methods section or similar.

Nauti's Take

The core insight isn't new – the RAG community has been discussing hierarchical chunking since at least 2023 – but Dewey ships it as a clean API rather than a research proof-of-concept, and that is the meaningful step forward. The section manifest as a scan layer before actual retrieval is essentially a classic database index pattern applied to document structure: query the index first, then fetch the rows.

Running a full agent loop in /research rather than just returning top-k results is ambitious, but quality will depend heavily on how well the source documents are structured to begin with. Poorly OCR'd PDFs or documents lacking consistent headings will push the system to its limits quickly.

Sources