AI document Q&A for legal-tech SaaS (anonymised)

Rebuilt a retrofit GPT Q&A feature into an AI-native pipeline: pgvector retrieval, reranking, structured outputs, eval harness, LLM gateway with caching. Cost per question dropped 74%, latency dropped 77%, hallucination rate fell below 1%.

Section 01Problem

What we were asked to solve.

Client had a retrofit GPT wrapper bolted onto their existing legal research product. It was slow, expensive, and had a visible hallucination rate. They were considering abandoning the feature.

Section 02Approach

How we engineered it.

Three-week engagement. Week 1: migrated document embeddings into pgvector, designed reranking pipeline, built 40-case eval set. Week 2: structured output migration, LLM gateway with caching and model routing. Week 3: production cut-over with shadow-mode comparison for two days, then full rollout with feature flag.

Section 03Stack

What we built with.

·Next.js

·Postgres + pgvector

·LangChain

·Langfuse

Section 04Outcome

What shipped.

Result 01

Cost per question: $0.08 → $0.021 (74% reduction)

Result 02

Median latency: 4s → 900ms (77% reduction)

Result 03

Hallucination rate: 6% → under 1% (over 8 weeks post-launch)

Result 04

Monthly AI bill: $4,800 → $1,260

Result 05

40 evals shipped in CI, preventing regressions

Section 05More work

Other case studies.

INTERNAL · 2026

Rebuilding triomavtech.com as an AI-native institutional site

Rebuilt the company site from the ground up — 33 static pages, per-page SEO metadata, JSON-LD schema, sitemap, RSS, adva…

Read case study →

CorrespondenceGet in touch

Tell us what your institution actually needs.

Send us your requirement. We respond fast, price transparently, and tell you honestly whether AI genuinely helps your problem — or whether you're better off without it.

Write to us

info@triomavtech.com

Call

+91 94402 66755

Office

Hyderabad, Telangana, India

Request a proposal →Call us →Request capability deck →

Request a proposal→