Americas

  • United States
michael_cooney
Senior Editor

Juniper amps up AI networking plans

News
Feb 23, 20247 mins
Generative AINetworking

Juniper upgraded its PTX and QFX platforms to support 800G Ethernet as momentum builds around making Ethernet a go-to technology for AI networking.

Wired brain illustration - next step to artificial intelligence
Credit: Shutterstock

Juniper Networks is fleshing out its strategy for adding more power to the routers that will be at the heart of its AI network scheme in data center, enterprise and service provider systems. The enhancements are targeted at the company’s PTX routing and QFX switching platforms, which have been outfitted to support 800G Ethernet in anticipation of a future that relies on Ethernet-based AI networking environments.

“Our 800GE platforms [are] designed to manage AI training workloads effectively,” said Julius Francis, senior director of product marketing and strategy for Juniper. “We are now expanding the capabilities of our 800GE platforms to cater to a wider range of WAN use cases while advancing scalable network capacity and density.”

It’s a persistent challenge for service providers, cloud providers, and large enterprises to balance the need to meet traffic demands with the goals of sustainability and automation, Francis said. “These entities often struggle to provide the required capacity and scale across various congestion points, such as metro aggregation, peering, core networks, data center interconnects (DCI), and DCI edge.”

Achieving optimal GPU efficiency to reduce job completion times is essential for managing AI costs for both enterprises and cloud providers, Francis said.

“Traditionally, InfiniBand has been the go-to networking technology in the AI networking ecosystem, known for its performance yet hindered by its higher cost and limited availability compared to Ethernet – the most prevalent Layer 2 (L2) technology globally,” Francis said.

Juniper is now offering an Ethernet-based alternative with 400GE and 800GE options in its PTX and QFX platforms, which are enhanced by Apstra AIOps. Apstra is Juniper’s intent-based data center software that maintains a real-time repository of configuration, telemetry, security and validation information to ensure a network is doing what the organization wants it to do.

Juniper recently tightened ties between Apstra and its AI-Native Networking Platform, which is anchored by the vendor’s cloud-based, natural language Mist AI and Marvis virtual network assistant (VNA) technology.

Juniper expects the PTX and QFX platforms, which run the Junos operating system, to be at the forefront of its AI networking effort. With their support for a high-radix routing architecture, deep buffers, and a cell-based switch fabric, they’ll be ideal choices for spine or leaf roles in AI data center networking settings, Francis said.

Additional capabilities of the PTX and QFX platforms that are tailored for AI data center networking include: efficient, deep-buffered interfaces; a scalable cell-based fabric architecture; virtual output queue (VOQ) scheduling; RDMA over converged Ethernet (RoCEv2); adaptive load balancing; and integrated IPFIX and in-band network telemetry metadata (INT-MD), Francis said. Juniper PTX boxes also support IP over Dense Wavelength Division Multiplexing (IPoDWDM) as part of the company’s Converged Optical Routing Architecture (CORA).

“Moving from traditional, siloed IP and optical control planes to a converged mesh architecture lets you dramatically improve network utilization and sustainability. CORA collapses network layers, frees up unused WDM capacity and eliminates the need for external transponders in many applications – enabling up to 54% power savings and 55% lower carbon emissions,” wrote Juniper’s Amit Bhardwaj, vice president of product management, in a blog about the AI networking directions.

Cisco and Arista also developing AI networking plans

Juniper is expected to be a key player in the AI networking arena, where competitors Cisco and Arista also continue to develop technology to handle AI workloads.

A core component of Cisco’s AI blueprint is its Nexus 9000 data center switches, which support up to 25.6Tbps of bandwidth per ASIC and “have the hardware and software capabilities available today to provide the right latency, congestion management mechanisms, and telemetry to meet the requirements of AI/ML applications,” Cisco wrote in its Data Center Networking Blueprint for AI/ML Applications. “Coupled with tools such as Cisco Nexus Dashboard Insights for visibility and Nexus Dashboard Fabric Controller for automation, Cisco Nexus 9000 switches become ideal platforms to build a high-performance AI/ML network fabric.”

Another element of Cisco’s AI network infrastructure is its new high-end programmable Silicon One processors, which are aimed at large-scale AI/ML infrastructures for enterprises and hyperscalers. Core to the Silicon One system is its support for enhanced Ethernet features, such as improved flow control, congestion awareness, and avoidance. The system also includes advanced load-balancing capabilities and “packet-spraying” that spreads traffic across multiple GPUs or switches to avoid congestion and improve latency. Hardware-based link-failure recovery also helps ensure the network operates at peak efficiency, according to Cisco.

Combining these enhanced Ethernet technologies and taking them a step further ultimately lets customers set up what Cisco calls a Scheduled Fabric. In a Scheduled Fabric, the physical components – chips, optics, switches – are tied together like one big modular chassis and communicate with each other to provide optimal scheduling behavior and much higher bandwidth throughput, especially for flows like AI/ML, Cisco said.

Arista, meanwhile, has detailed AI networking technology called AI Spine. Controlled by Arista EOS, it uses data-center switches with deep packet buffers and networking software to efficiently control AI traffic.

Arista’s AI Spine is based on its 7800R3 Series data-center switches, which, at the high end, support 460Tbps of switching capacity and hundreds of 40Gbps, 50Gbps, 100Gbps, or 400Gbps interfaces, along with 384GB of deep buffering. AI Spine systems would create high-bandwidth, lossless, low-latency, Ethernet-based networks that can ultimately interconnect thousands of GPUs at speeds of 100Gbps, 400Gbps, and 800Gbps, according to Arista.

Juniper touts open-standard, interoperable Ethernet fabrics for AI

Juniper’s Francis said this about the competitive landscape: “The challenge of managing a relatively small number of large flows, typical of AI workloads, is a significant obstacle for traditional network designs that rely on per-flow load balancing. Efficient load balancing and effective congestion management protocols are crucial for supporting the network fabrics behind AI training workloads. Undetected or unresolved network bottlenecks and inefficiencies can lead to substantial costs for AI infrastructures.”

While proprietary, scheduled Ethernet fabric solutions that enhance load balancing exist, they bring about their own set of operational and visibility challenges, not to mention a dependency on vendor lock-in similar to that seen with InfiniBand fabrics, Francis said.

“The most effective strategy for addressing AI networking challenges involves leveraging open standard, interoperable Ethernet fabrics. This approach prioritizes enhancements in network operations to cater specifically to the diverse needs of various AI workload types,” Francis said.

 “Whether implemented in fixed form factors or large chassis switches suitable for multiplanar, multistage Clos architectures, or high-radix spine topologies, Ethernet offers the most cost-effective and flexible solution for data center technology,” Francis said. Clos is Juniper’s architecture for setting up large data center and fabrics. It utilizes Juniper’s EVPN-VXLAN fabric to offer increased network scalability and segmentation.

“As a converged technology, Ethernet fabrics support multivendor integration and operations, providing a range of design options to achieve the desired balance of performance, resiliency, and cost efficiency for the back-end networking of AI data centers and their broader AI infrastructures.”

Juniper’s AI technology was one of the core reasons Hewlett Packard Enterprise recently said it would acquire Juniper Networks for $14 billion. Networking will become the new core business and architecture foundation for HPE’s hybrid cloud and AI solutions delivered through the company’s GreenLake hybrid cloud platform, the companies stated.

The combined company will offer secure, end-to-end AI-native solutions that are built on the foundation of cloud, high performance, and experience-first, and will also have the ability to collect, analyze, and act on aggregated telemetry across a broader installed base, according to the companies. The deal is expected to close by early 2025 at the latest.

michael_cooney
Senior Editor

Michael Cooney is a Senior Editor with Network World who has written about the IT world for more than 25 years. He can be reached at michael_cooney@foundryco.com.

More from this author