Building Supervised Fine-Tuning Data from NVIDIA Open-SWE-Traces: Trajectory Parsing, Patch Analysis, Token Budgets, and Tool-Use Metrics
Summary
<p>In this tutorial, we work with NVIDIA's Open-SWE-Traces dataset to study agentic software-engineering trajectories for fine-tuning. We stream the data directly from Hugging Face, so we can process it efficiently in Google Colab without downloading everything locally. We normalize multi-turn agent conversations, parse final code patches, and build an analysis DataFrame covering trajectory length, tool usage, patch size, language distribution, and resolution outcomes. We then curate a supervised fine-tuning subset using success labels, token limits, language filters, and patch availability.</p> <p>The post <a href="https://www.marktechpost.com/2026/06/26/building-supervised-fine-tuning-data-from-nvidia-open-swe-traces-trajectory-parsing-patch-analysis-token-budgets-and-tool-use-metrics/">Building Supervised Fine-Tuning Data from NVIDIA Open-SWE-Traces: Trajectory Parsing, Patch Analysis, Token Budgets, and Tool-Use Metrics</a> appeared first on <a href="https://www.marktechpost.com">MarkTechPost</a>.</p>
Discussion on
Trending posts from X.
the reason we excluded frontier models from our data page is they are artificially low in usage and didn't want people concluding this
— dax (@thdxr) June 26, 2026
most people using them use them directly, they don't buy them through us
we wouldn't be surprised if they're 90%+ of total token spend https://t.co/NQkuK4CAQ7