Discrepancy

Blog
digitpatroxApril 22, 2025
66

AI Benchmark Discrepancy Reveals Gaps in Performance Claims

FrontierMath accuracy for OpenAI’s o3 and o4-mini compared to leading models. Image: Epoch AI The latest results from FrontierMath, a benchmark test for generative AI on advanced math problems, show OpenAI’s o3 model performed worse than OpenAI originally stated. While newer OpenAI models now outperform o3, the discrepancy highlights the need to scrutinize AI benchmarks closely. Epoch AI, the research…
Read More »

close