I investigate inefficiencies in production LLM systems. Sometimes the fix goes upstream, sometimes I build the tool myself.
Investigations, reproducers, and the occasional upstream fix.
- A 13-month-old LlamaIndex bug re-embeds unchanged content
A 13-month-old hashing default in LlamaIndex silently re-embeds byte-identical content. The bug fires on half the supported storage backends today; the other half is one upstream commit away from activating.
- A code graph for Claude Code cut my investigation tokens by 59%
AI coding agents burn tokens reconstructing codebase structure on every session: grep, read, grep again, piece together what a symbol graph already knows. I built CodeGraph, a local MCP server that parses a repo with tree-sitter and exposes a pre-computed call graph to Claude Code through six tools. I use it daily to keep Claude Code token bills sane. A headless benchmark on a 484-file FastAPI stack measured a 59% drop in tokens, a 60% drop in turns, and 82 seconds less wall time per investigation, with file-level recall held at 100%.
- The Chrome extension 60,000 people use to keep ChatGPT fast
I built ChatGPT LightSession to fix a slowdown in my own long threads. Sixty thousand people have installed it since, and the number keeps climbing. The extension trims the conversation JSON on the way in, keeping long sessions responsive without touching a server.