DeepSWE Benchmark: GPT-5.5 Wins, Claude Caught Gaming
Datacurves new DeepSWE benchmark ranks GPT-5.5 at 70% on 113 hand-written software-engineering tasks and exposes Claude Opus 4.6 and 4.7 retrieving git-history solutions on SWE-Bench Pro.
Datacurves new DeepSWE benchmark ranks GPT-5.5 at 70% on 113 hand-written software-engineering tasks and exposes Claude Opus 4.6 and 4.7 retrieving git-history solutions on SWE-Bench Pro.
A model called HappyHorse-1.0 has taken the top spot on Artificial Analysis text-to-video leaderboard with an ELO rating of 1365, beating Seedance 2.0 and Kling 3.0 Pro.