Scale and Ethernet: Dell PowerScale Integration with NVIDIA DGX SuperPOD

By Rob Strechay | September 23, 2024

Introduction

Have you heard of a SuperPOD yet? No? In the age of generative AI and large-scale machine learning, processing and storing vast amounts of data efficiently is critical. NVIDIA’s DGX SuperPOD, a high-performance AI infrastructure blue print, is designed to meet the demanding needs of modern AI workflows. A significant advancement in this ecosystem is integrating Dell’s PowerScale storage solution, which brings unique advantages to the DGX SuperPOD architecture. This research note explores the architecture of NVIDIA’s DGX SuperPOD with Dell PowerScale, the benefits of this integration, and how it enables superior GPU utilization for AI workloads.

NVIDIA DGX SuperPOD Architecture with PowerScale

The NVIDIA DGX SuperPOD is a scalable, modular system composed of DGX servers, with each POD unit containing 32 servers. Traditionally, SuperPODs have used InfiniBand as the interconnect for high-performance networking, but Dell’s PowerScale integration introduces the first Ethernet-based storage fabric for DGX SuperPOD, a key differentiator in this evolving space.

Ethernet has become a dominant networking protocol in data centers due to its scalability and growing bandwidth capabilities. Dell’s PowerScale leverages high-performance Ethernet infrastructure (100-400 gigabit, with future scaling to 800 and 1600 gigabit) to offer a storage fabric that aligns with the rapidly increasing data demands of AI workloads. This solution can be integrated into existing data centers seamlessly, offering a turnkey AI infrastructure for enterprises looking to harness the power of AI with minimal disruption.

Advantages of Dell PowerScale in DGX SuperPOD

The integration of Dell’s PowerScale into the DGX SuperPOD architecture brings several key benefits, particularly in scalability, performance, and efficiency:

Scalability: Dell PowerScale’s modular architecture complements DGX SuperPOD’s scalable nature. Each PowerScale unit (such as the F710 platform) is a dense, rack-mounted node that can be added incrementally. This allows organizations to scale both storage and compute in tandem as their AI needs grow.
Concurrent Performance: AI workloads often require handling thousands of concurrent requests from GPUs. PowerScale is designed to manage high concurrency levels, ensuring that even in large AI infrastructures with thousands of GPUs, performance remains consistent. Its ability to process high numbers of concurrent connections is vital in environments where AI models are being fine-tuned or trained on large datasets.
Data Reduction and Efficiency: PowerScale offers data reduction capabilities, such as 2:1 data compression, which minimizes the total storage footprint. Coupled with PowerScale’s advanced power and cooling technologies, this results in a lower total cost of ownership (TCO) for organizations deploying DGX SuperPODs.
Multi-Protocol and Secure Access: PowerScale provides multiprotocol capabilities and secure, multi-tenant access, which makes it ideal for complex AI environments that may need to support various users, applications, and workflows simultaneously. This flexibility is critical for service providers offering GPU-as-a-service, who need to manage different types of AI workloads efficiently.
NFS over RDMA and Multipath Driver: One of the technical advantages of PowerScale is its support for NFS over RDMA (Remote Direct Memory Access), which allows for low-latency, high-throughput communication between the PowerScale nodes and DGX servers. Additionally, the introduction of a multipath driver in Dell’s latest software allows IO from all cluster nodes through a single mount point, simplifying storage management while enhancing performance for both read and write operations.

Maximizing GPU Utilization with PowerScale

One of the biggest challenges in AI infrastructure is keeping the GPUs fully utilized, particularly given their cost and the heavy investments organizations make in GPU-powered systems like DGX SuperPOD. Dell PowerScale’s architecture directly addresses this challenge.

Data Staging and Concurrency: PowerScale’s ability to handle vast numbers of concurrent connections ensures that data can be staged and fed to GPUs efficiently, keeping the GPUs busy with continuous data ingestion for AI model training and fine-tuning. Whether the workload involves hundreds or thousands of GPUs, PowerScale ensures consistent data flow, preventing idle GPUs and maximizing ROI.
Checkpointing for Fault Tolerance: As AI models are fine-tuned, creating checkpoints—stateful copies of the model at different stages—becomes crucial for ensuring fault tolerance. PowerScale efficiently handles the high-volume sequential writes required for checkpointing, enabling organizations to resume AI training from the last checkpoint in case of a failure, thus minimizing downtime.
Non-Disruptive Upgrades: PowerScale’s architecture also allows for non-disruptive upgrades, ensuring that the storage system can scale or be enhanced without taking the DGX SuperPOD offline. This is crucial for maintaining continuous AI workloads while benefiting from the latest advancements in PowerScale technology.

Key Use Cases for DGX SuperPOD with PowerScale

The combination of NVIDIA DGX SuperPOD and Dell PowerScale is ideal for a range of advanced AI applications, particularly those that involve fine-tuning and training large language models (LLMs), vision models, and healthcare-related AI workloads. The high-performance and secure multi-tenancy features make this integration particularly attractive for service providers offering GPU-as-a-service, where the flexibility to handle diverse AI workloads is paramount.

Our Perspective

We see integrating Dell PowerScale with NVIDIA’s DGX SuperPOD is just the first step for Dell in the AI-at-scale data journey, and presents a compelling solution for enterprises and service providers aiming to accelerate their AI initiatives. With its scalability, concurrency management, and data reduction features, PowerScale ensures that DGX SuperPOD GPUs remain fully utilized, maximizing performance and efficiency. Not only that, but currently deployed PowerScale can participate in this, meaning less data to copy, for exmaple if you are building Gen AI using RAG based on internal customer support knowledgebase articles stored on a PowerStore. This could also allow organizations to deploy new PowerScale and migrate or copy data efficiently to seed the AI SuperPOD. As AI workloads grow in complexity, this partnership between Dell and NVIDIA offers a powerful and flexible architecture capable of supporting the future of AI at scale.

Disclosure: This theCUBE Research Analyst Brief was commissioned by Dell Technologies and is distributed under license from theCUBE Research. theCUBE Research is a research and advisory services firm that engages or has engaged in research, analysis, and advisory services with many technology companies, which can include those mentioned in this article. Analysis and opinions expressed herein are specific to the analyst individually, and data and other information that might have been provided for validation, not those of theCUBE Research or SiliconANGLE Media as a whole.

Article Categories

By Rob Strechay | September 23, 2024

Rob Strechay

Analyst with a unique combination of product, engineering, marketing, sales, and operations experience. Rob has held senior executive positions within startups and Fortune 500 organizations. Leading world-class teams delivering, marketing, and selling products in the areas of cloud, SaaS, MSPs, storage, application management, disaster recovery, networks, analytics, infrastructure operations, and management.

You may also be interested in

307 | Breaking Analysis | theCUBE Research 2026 Predictions: The year of enterprise ROI

David Vellante February 15, 2026

2026 Enterprise AI Predictions

Scott Hebner February 14, 2026

Scale and Ethernet: Dell PowerScale Integration with NVIDIA DGX SuperPOD

Article Categories

Rob Strechay

You may also be interested in

2026 Enterprise AI Predictions

Studio Locations

Stay Connected

Research Areas

Podcasts

Solutions

Engage

theCUBE Research weekly

Scale and Ethernet: Dell PowerScale Integration with NVIDIA DGX SuperPOD

Article Categories

Rob Strechay

You may also be interested in

307 | Breaking Analysis | theCUBE Research 2026 Predictions: The year of enterprise ROI

2026 Enterprise AI Predictions

Book A Briefing