Authored by: Shar Narasimhan, Group Product Manager for AI – NVIDIA
Look who just set new speed records for training AI models fast: Dell Technologies, Inspur, Supermicro and—in its debut on the MLPerf benchmarks—Azure, all using NVIDIA AI.
Our platform set records across all eight popular workloads in the MLPerf training 1.1 results announced recently.
NVIDIA A100 Tensor Core GPUs delivered the best normalised per-chip performance. They scaled with NVIDIA InfiniBand networking and our software stack to deliver the fastest time to train on Selene, our in-house AI supercomputer based on the modular NVIDIA DGX SuperPOD.
A Cloud Sails to the Top
When it comes to training AI models, Azure’s NDm A100 v4 instance is the fastest on the planet, according to the latest results. It ran every test in the latest round and scaled up to 2,048 A100 GPUs.
Azure showed not only great performance, but great performance that is available for anyone to rent and use today—in six regions across the U.S.
Artificial Intelligence training is a big job that requires big iron. And we want users to train models at record speed with the service or system of their choice.
That is why we are enabling NVIDIA AI with products for cloud services, co-location services, corporations and scientific computing centres, too.
Server Makers Flex Their Muscles
Among OEMs, Inspur set the most records in single-node performance with its eight-way GPU systems, the NF5688M6 and the liquid-cooled NF5488A5. Dell and Supermicro set records on four-way A100 GPU systems.
A total of 10 NVIDIA partners submitted results in the round, eight OEMs and two cloud-service providers. They made up more than 90% of all submissions. This is the fifth and strongest showing to date for the NVIDIA ecosystem in training tests from MLPerf.
Our partners do this work because they know MLPerf is the only industry-standard, peer-reviewed benchmark for AI training and inference. It is a valuable tool for customers evaluating AI platforms and vendors.
Servers Certified for Speed
Baidu PaddlePaddle, Dell Technologies, Fujitsu, GIGABYTE, Hewlett Packard Enterprise, Inspur, Lenovo and Supermicro submitted results in local data centres, running jobs on both single and multiple nodes.
Nearly all our OEM partners ran tests on NVIDIA-Certified Systems, servers we validate for enterprise customers who want accelerated computing.
The range of submissions shows the breadth and maturity of an NVIDIA platform that provides optimal solutions for businesses working at any scale.
Both Fast and Flexible
NVIDIA AI was the only platform participants used to make submissions across all benchmarks and use cases, demonstrating versatility as well as high performance. Systems that are both fast and flexible provide the productivity customers need to speed their work.
The training benchmarks cover eight of today’s most popular AI workloads and scenarios—computer vision, natural language processing, recommendation systems, reinforcement learning and more.
MLPerf’s tests are transparent and objective, so users can rely on the results to make informed buying decisions. The industry benchmarking group, formed in May 2018, is backed by dozens of industry leaders, including Alibaba, Arm, Google, Intel and NVIDIA.
20x Speedups in Three Years
Looking back, the numbers show performance gains on our A100 GPUs of over 5x in just the last 18 months. That is thanks to continuous innovations in software, the lion’s share of our work these days.
NVIDIA’s performance has increased more than 20x since the MLPerf tests debuted three years ago. That massive speedup is a result of the advances we make across our full-stack offering of GPUs, networks, systems and software.
Constantly Improving Software
Our latest advances came from multiple software improvements.
For example, using a new class of memory copy operations, we achieved 2.5x faster operations on the 3D-UNet benchmark for medical imaging.
Thanks to ways you can fine-tune GPUs for parallel processing, we realised a 10% speed up on the Mask R-CNN test for object detection and a 27% boost for recommender systems. We simply overlapped independent operations, a technique that is especially powerful for jobs that run across many GPUs.
We expanded our use of CUDA graphs to minimise communication with the host CPU. That brought a 6% performance gain on the ResNet-50 benchmark for image classification.
And we implemented two new techniques on NCCL, our library that optimises communications among GPUs. That accelerated results up to 5% on large language models like BERT.
Leverage Our Hard Work
All the software we used is available from the MLPerf repository, so everyone can get our world-class results. We continuously fold these optimisations into containers available on NGC, our software hub for GPU applications.
It is part of a full-stack platform, proven in the latest industry benchmarks, and available from a variety of partners to tackle real AI jobs today.
Archive
- October 2024(44)
- September 2024(94)
- August 2024(100)
- July 2024(99)
- June 2024(126)
- May 2024(155)
- April 2024(123)
- March 2024(112)
- February 2024(109)
- January 2024(95)
- December 2023(56)
- November 2023(86)
- October 2023(97)
- September 2023(89)
- August 2023(101)
- July 2023(104)
- June 2023(113)
- May 2023(103)
- April 2023(93)
- March 2023(129)
- February 2023(77)
- January 2023(91)
- December 2022(90)
- November 2022(125)
- October 2022(117)
- September 2022(137)
- August 2022(119)
- July 2022(99)
- June 2022(128)
- May 2022(112)
- April 2022(108)
- March 2022(121)
- February 2022(93)
- January 2022(110)
- December 2021(92)
- November 2021(107)
- October 2021(101)
- September 2021(81)
- August 2021(74)
- July 2021(78)
- June 2021(92)
- May 2021(67)
- April 2021(79)
- March 2021(79)
- February 2021(58)
- January 2021(55)
- December 2020(56)
- November 2020(59)
- October 2020(78)
- September 2020(72)
- August 2020(64)
- July 2020(71)
- June 2020(74)
- May 2020(50)
- April 2020(71)
- March 2020(71)
- February 2020(58)
- January 2020(62)
- December 2019(57)
- November 2019(64)
- October 2019(25)
- September 2019(24)
- August 2019(14)
- July 2019(23)
- June 2019(54)
- May 2019(82)
- April 2019(76)
- March 2019(71)
- February 2019(67)
- January 2019(75)
- December 2018(44)
- November 2018(47)
- October 2018(74)
- September 2018(54)
- August 2018(61)
- July 2018(72)
- June 2018(62)
- May 2018(62)
- April 2018(73)
- March 2018(76)
- February 2018(8)
- January 2018(7)
- December 2017(6)
- November 2017(8)
- October 2017(3)
- September 2017(4)
- August 2017(4)
- July 2017(2)
- June 2017(5)
- May 2017(6)
- April 2017(11)
- March 2017(8)
- February 2017(16)
- January 2017(10)
- December 2016(12)
- November 2016(20)
- October 2016(7)
- September 2016(102)
- August 2016(168)
- July 2016(141)
- June 2016(149)
- May 2016(117)
- April 2016(59)
- March 2016(85)
- February 2016(153)
- December 2015(150)