Discrepancy

  • Blog

    AI Benchmark Discrepancy Reveals Gaps in Performance Claims

    FrontierMath accuracy for OpenAI’s o3 and o4-mini compared to leading models. Image: Epoch AI The latest results from FrontierMath, a benchmark test for generative AI on advanced math problems, show OpenAI’s o3 model performed worse than OpenAI originally stated. While newer OpenAI models now outperform o3, the discrepancy highlights the need to scrutinize AI benchmarks closely. Epoch AI, the research…

    Read More »
Back to top button
close