blog

Engineering blog

How we build eval infrastructure for AI-generated code. Architecture decisions, benchmarks, and lessons from production.

How runlit works

A technical deep-dive into the eval pipeline that scores AI-generated code before it ships. Four signals, one score, sub-4-second latency.