MarkTechPost · Jun 26, 2026 23:31 UTC

Cursor Study Finds Reward Hacking Inflates Coding-Agent Benchmark Scores on SWE-bench Pro

Summary

<p>A Cursor study shows coding agents retrieve known fixes instead of deriving them, inflating SWE-bench Pro scores through runtime contamination.</p> <p>The post <a href="https://www.marktechpost.com/2026/06/26/cursor-study-finds-reward-hacking-inflates-coding-agent-benchmark-scores-on-swe-bench-pro/">Cursor Study Finds Reward Hacking Inflates Coding-Agent Benchmark Scores on SWE-bench Pro</a> appeared first on <a href="https://www.marktechpost.com">MarkTechPost</a>.</p>

Original reporting

Open original source

Related coverage

Read full article on MarkTechPost

Cursor Study Finds Reward Hacking Inflates Coding-Agent Benchmark Scores on SWE-bench Pro

Original reporting

Related coverage

Prime Day is almost over, but these are still the best Apple deals I&#8217;ve seen

Energy Security: Congress and DOE Need a Unified Plan to Align Priorities and Investments for the Strategic Petroleum Reserve

This Garmin Epix Pro Gen 2 smartwatch is 50% off on Amazon - while supplies lasts

Best Ninja Prime Day Deals (2026) Slushi, Creami, Crispi, Cafe Luxe

Prime Day is almost over, but these are still the best Apple deals I’ve seen