Open-world evaluations for measuring frontier AI capabilitiesAI Snake OilSayash KapoorApr 16, 2026, 01:47 PMView OriginalIntroducing CRUX, a new project for evaluating AI on long, messy tasksView OriginalBack to List