Mar 6, 2021 · but now openAI is at o3 already. GPQA Diamond 87.7% (average PhD experts in the domain has a score of around 70%) AIME 2024 score 96.7% Codeforces percentile 99.8% (2727 elo, compared to OpenAI's chief scientist's score of 2665) SWE-bench verified 71.7% Cant find the other tests for comparison