Agents
Eval
Train
New Evaluation
Evaluations
Results
Datasets
Models
Recents
View All
Your evaluations
Sign in to run benchmarks and follow your results here.
Website
Docs
Sign In
Toggle Sidebar
Benchgen
Yirmi Soru Oyunu Kıyaslaması
GSM8K-TR - Turkish Math Reasoning Benchmark
FinArena — Banking Fraud Agent Benchmark
SWE-bench Verified
MMLU-Pro
SWE-bench Pro
TerminalBench 2.1
LiveCodeBench
LiveCodeBench Pro
Humanity's Last Exam
CharXiv Reasoning
GPQA Diamond
SciCode
τ³ Banking
Long Context Reasoning
MRCRv2
EnterpriseClawBench
QCalEval
Nilüfer Belediyesi - Benchmark
AI Benchmarks
Loading benchmarks...