Kabir Khandpur

HOME · ABOUT · RSS
SWE-bench Multilingual · May 2025

Introducing a new dataset in the SWE-bench family with 300 curated tasks in 9 programming languages to evaluate LLMs on software engineering tasks.

How well can LLMs see? · Nov 2024

Creating a small benchmark to test how well multimodal language models can find specific objects in complex Where's Waldo-style illustrations.

Using the checker idiom to monitor software systems · Jul 2024

Exploring a missing piece of the software observability stack to monitor business logic.

Making Quartz site rebuilds 25x faster · Jun 2024

Implementing partial builds to reduce the time taken to live reload a static site by >99%.

Notes on Britain's finances · Jan 2024

Understanding how the UK raises and spends money, drawing from Paul Johnson's book Follow the Money.