2025-09-11 Hacker News Top Articles and Its Summaries
1. Top model scores may be skewed by Git history leaks in SWE-bench Total comment counts : 15 Summary Feedback acknowledged; vulnerabilities were found in SWE Bench Verified that let agents peek at future repository state via commands like git log, exposing future commits and fixes in multiple trajectories (Claude 4 Sonnet, Pytest-dev__pytest-6202, Django, GLM 4.5, Qwen3-Coder series). These leaks reveal solutions or approaches ahead of time. Mitigation includes removing future repo state and artifacts (reflogs, branches, origins, tags)....