AMD eyes networking efficiency gains in bid to streamline AI data center operations

AMD has announced a sweeping expansion of its high-performance networking portfolio in a bid to supercharge data center efficiency in the age of AI.

Officially unveiled at its Advancing AI conference in San Francisco this week, the chip maker hailed the launch of its new Pensando Salina data processing unit (DPU), which is aimed specifically at streamlining AI cluster performance.

AMD said the new DPU offers 2x performance, bandwidth, and scale compared to previous generations and can support 400G throughput. All told, this means faster data transfer rates and ultimately more efficient infrastructure capable of contending with surging AI workload demands.

The importance of networking has grown significantly since the onset of the generative AI race in late 2022 and has emerged as a key bottleneck to data center efficiency, AMD executives told attendees.

This is a double-whammy problem many organizations face, with efficiency on both the front and back end becoming a key focus.

On the front end, the efficient delivery of data to an AI cluster is critical, while performance on the back end, which manages data transfers between clusters, means communication between central processing units (CPUs) and graphics processing units (GPUs) within these clusters is vital.

Blockages within these networking paths can create serious problems for data center operators, resulting in lower performance and heightened costs.

The Salina DPU is designed to tackle these potential congestion issues within the front end. The second announcement from AMD focused on the back end with the launch of the Pensando Pollara 400 network interface card (NIC).

In a press briefing ahead of the event, AMD executives revealed this is the industry’s first Ultra Ethernet Consortium (UEC)-ready AI NIC which “reduces complexity of performance tuning and helps improve time to production”.

AMD’s networking focus shows maturity

Andrew Buss, senior research director for EMEA at IDC, told ITPro that while this announcement may not have hit the headlines to the extent of its Instinct GPU or AI PC reveals, it marks a significant moment for the chip maker.

“To be strong in infrastructure, you’ve already got to be strong across the board, because you want to be able to store the bits, move the bits, and change the bits,” he said.

“They’ve got a pretty good stack, and I think they’re well set to drive that forward, particularly with Ultra Ethernet. So to me, some of the more important but maybe less visible announcements that they did make were this new DPU and the front-end and back-end in the network and the fabric and ultra Ethernet.”

Improving networking communication will be the key to optimizing large scale clusters, Buss added, and AMD appears very much focused on ramping up efforts in this domain.

Hints toward this strategy at AMD have emerged in recent months, especially when one examines recent acquisitions, such as the $4.9 billion deal for ZT Systems in August.

“I think that shows maturity,” he told ITPro.”AMD just recently acquired ZT systems as well. So what we’re going to see is them following Nvidia’s footsteps of being able to engineer systems-level approaches.”

Source link