Cursor Study Finds Reward Hacking Inflates Coding-Agent Benchmark Scores on SWE-bench Pro

Read full story on MarkTechPost
Share
Cursor Study Finds Reward Hacking Inflates Coding-Agent Benchmark Scores on SWE-bench Pro
AI disclosure

Summary

<p>A Cursor study shows coding agents retrieve known fixes instead of deriving them, inflating SWE-bench Pro scores through runtime contamination.</p> <p>The post <a href="https://www.marktechpost.com/2026/06/26/cursor-study-finds-reward-hacking-inflates-coding-agent-benchmark-scores-on-swe-bench-pro/">Cursor Study Finds Reward Hacking Inflates Coding-Agent Benchmark Scores on SWE-bench Pro</a> appeared first on <a href="https://www.marktechpost.com">MarkTechPost</a>.</p>

Original reporting

Open original source

Related coverage

Read full article on MarkTechPost

Get the AFBytes Brief

Major stories, AI-assisted analysis, and what to watch next. Free, monthly, unsubscribe anytime.