Perspectives on AI evaluation, benchmarking methodology, and what it takes to build reliable autonomous systems.