1 comments

  • xdotli 3 hours ago
    tldr: - gpt-5.2 and gpt-5.1-codex-max have identical pass rates but solve different tasks - 36 tasks common to both - 12 tasks unique to each model - gpt-5.2-pro consistently underperforms by ~7-9 percentage points - gpt-5.2-pro has significantly more timeout issues (26 vs 7-8) - Extended timeouts recover additional passes - using 3x timeout multiplier recovers ~5-7 passes per model