What AI benchmarks miss about real-world performance
Summary
<p><i>Presented by F5</i></p><hr /><p>Enterprise AI teams have spent years solving for compute, securing GPU allocations, negotiating cloud capacity, and benchmarking training throughput. The assumption embedded in that work is that the path between storage and compute will keep up. In production, that assumption increasingly does not hold. Real traffic introduces latency spikes, network jitter, and node degradation that controlled benchmarks fail to capture, resulting in pipelines that perform well in the lab but stall in deployment. A growing response is <a href="https://www.f5.com/solutions/use-cases/ai-data-delivery">AI data delivery</a>, deploying an application delivery controller (ADC) or application delivery and security platform (ADSP) in front of storage as a resilient and secure control point.</p><p>"Provisioning solves for capacity but not for delivery, and that is where the constraint now hides," says Hunter Smit, senior manager of product marketing at F5. "Enterprises buy enough GPUs and enough storage, then assume the path between them will keep up, but AI traffic is bursty, highly concurrent, and random in its reads in ways ordinary storage networking was never built to absorb."</p><h2>The production gap benchmarks don't show</h2><p>Standard benchmark methodology compounds the problem, says Paul Pindell, principal solutions architect for technology alliances at F5. </p><p>"Benchmark testing is usually built to produce the best possible performance or security result, not the most realistic one," he says. "With S3, latency is a known factor in degrading performance, so meaningful testing has to introduce consistent latency into the path." </p><p>Most benchmark environments never do that, which means the performance numbers enterprises rely on for infrastructure decisions are drawn from conditions that production systems will never replicate. To test this assumption, F5 and MinIO conducted throughput testing under degraded network conditions. </p><p>"What stood out was how quickly S3 throughput falls off once you introduce latency," Pindell says. "Even modest latency takes a real bite out of it, and as latency climbs toward long-haul distances, the degradation gets severe." </p><p>The testing also showed latency mattered far more than jitter as a driver of throughput loss, which inverted what the team had expected going in. The upshot for enterprise architects is that S3 object storage deployments cannot be designed around clean-room assumptions; they have to be engineered for the degraded network conditions they will actually face.</p><h2>The cost of fragile data paths </h2><p>"In AI infrastructure, people naturally focus on GPUs because they're the most visible and expensive resource," says Tanu Mutreja, senior director of product management at F5. "But in production environments, GPUs generate only as much value as the data path that feeds them."</p><p>That path runs through storage, networking, databases, security, and orchestration layers, often stitched together from multiple vendors. Customers experience none of those seams; they experience the output of the whole system.</p><p>When the data path degrades, the effects compound. GPU underutilization is the most immediate and visible symptom, but Mutreja pointed to a wider set of consequences: degraded inference performance, poor-quality AI outputs, higher egress costs from unnecessary data replication, and growing operational complexity. </p><p>"At scale, data-path efficiency becomes a strategic business lever rather than technical optimization," she says. "When the data path is engineered well, GPUs remain productive, AI applications stay responsive and trustworthy, operations scale efficiently, and organizations maximize the return on their AI investments."</p><p>AI workloads are structurally more exposed to these failures than traditional enterprise applications. Databases, ERP systems, and web services absorb transient storage delays through caching and buffering. AI workloads running across massively parallel GPU clusters have no equivalent protection. As Mutreja noted, even minor latency spikes or bandwidth bottlenecks can cascade across large GPU clusters, simultaneously hitting utilization, training efficiency, and the customer experience.</p><h2>Treating the storage edge as a control point</h2><p>For decades, storage and intelligence operated as sequential concerns in enterprise architecture: data was stored first, then analyzed downstream. Mutreja argued that this model no longer fits the demands of AI. </p><p>"Competitive advantage is determined not only by the volume of data, but also by relevance, lineage, security, and performant delivery of data," she says. "Across the industry, from NVIDIA and AWS to enterprise storage providers, the movement is toward embedding intelligence directly into data infrastructure rather than stacking it on top."</p><p>F5’s integration with MinIO instantiates this approach at the layer where storage and compute actually interact. As part of the F5 ADSP, BIG-IP sits in the data path, continuously monitoring the health of MinIO’s distributed storage nodes and directing requests only to those that remain available.</p><p>The operational impact of that capability becomes clear when nodes degrade, which is expected in distributed storage clusters. Without intelligent routing, clients that land on an unhealthy node must retry and may land on another degraded node, dragging down overall performance. </p><p>"F5 makes sure traffic only goes to healthy nodes, or even the least busy ones, so S3 client traffic is always processed in the most efficient way," Pindell says.</p><h2>Governance across distributed environments</h2><p>The challenge grows at scale, when AI pipelines stretch across multiple locations, clouds, or edge environments. </p><p>"Once an AI pipeline crosses regions and clouds, the question stops being about performance and becomes about control," Smit says. "You are operating under different rules in every jurisdiction, and digital sovereignty is now a design constraint. Where your data is allowed to live, who is permitted to touch it, and which borders it cannot cross now shapes the architecture before anyone talks about speed."</p><p>That pressure is driving a visible trend of enterprises repatriating AI workloads from public cloud onto infrastructure they own and govern directly. The architecture Smit described resolves this by decoupling applications from any single storage location and placing a unified control point between them that enforces consistent policy across all of them. </p><p>"Sovereignty, resilience, and cost stop being trade-offs you manage one region at a time," he explains. "They become a capability you run as a system."</p><h2>Storage-to-compute path as a managed control point</h2><p>To solve for these issues, enterprise teams need to stop treating the storage-to-compute path as a direct connection and start treating it as a managed control point, Smit says. SecureIQLab's independent validation of F5 BIG-IP in storage deployments has confirmed the approach delivers resilience without surrendering throughput.</p><p>"Insert a full-proxy ADC between the two, and the path becomes observable, programmable, and failure-aware, with health-based routing, quality of service, and security enforced inline," he explains. "That single move converts data delivery from an assumption into an engineered discipline, which is what keeps GPUs fed when conditions degrade." </p><hr /><p><i>Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact </i><a href="mailto:sales@venturebeat.com"><i><u>sales@venturebeat.com</u></i></a><i>.</i></p>