NVIDIA and its partners continued to provide the best overall Artificial Intelligence (AI) training performance and the most submissions across all benchmarks with 90% of all entries coming from the ecosystem, according to MLPerf benchmarks released recently.
The NVIDIA AI platform covered all eight benchmarks in the MLPerf Training 2.0 round, highlighting its leading versatility.
No other accelerator ran all benchmarks, which represent popular AI use cases including speech recognition, natural language processing, recommender systems, object detection, image classification and more. NVIDIA has done so consistently since submitting in December 2018 to the first round of MLPerf, an industry-standard suite of AI benchmarks.
Leading Benchmark Results, Availability
In its fourth consecutive MLPerf Training submission, the NVIDIA A100 Tensor Core GPU based on the NVIDIA Ampere architecture continued to excel.
Selene — our in-house AI supercomputer based on the modular NVIDIA DGX SuperPOD and powered by NVIDIA A100 GPUs, our software stack and NVIDIA InfiniBand networking — turned in the fastest time to train on four out of eight tests.
NVIDIA A100 also continued its per-chip leadership, proving the fastest on six of the eight tests.
A total of 16 partners submitted results this round using the NVIDIA AI platform. They include ASUS, Baidu, CASIA (Institute of Automation, Chinese Academy of Sciences), Dell Technologies, Fujitsu, GIGABYTE, H3C, Hewlett Packard Enterprise, Inspur, KRAI, Lenovo, MosaicML, Nettrix and Supermicro.
Most of our OEM partners submitted results using NVIDIA-Certified Systems, servers validated by NVIDIA to provide great performance, manageability, security and scalability for enterprise deployments.
Many Models Power Real AI Applications
An AI application may need to understand a user’s spoken request, classify an image, make a recommendation and deliver a response as a spoken message.
These tasks require multiple kinds of AI models to work in sequence, also known as a pipeline. Users need to design, train, deploy and optimise these models fast and flexibly.
That’s why both versatility—the ability to run every model in MLPerf and beyond—as well as leading performance are vital for bringing real-world AI into production.
Delivering ROI with AI
For customers, their data science and engineering teams are their most precious resources, and their productivity determines the return on investment for AI infrastructure. Customers must consider the cost of expensive data science teams, which often plays a significant part in the total cost of deploying AI, as well as the relatively small cost of deploying the AI infrastructure itself.
Artificial Intelligence researcher productivity depends on the ability to quickly test new ideas, requiring both the versatility to train any model as well as the speed afforded by training those models at the largest scale. That’s why organisations focus on overall productivity per dollar to determine the best AI platforms—a more comprehensive view that more accurately represents the true cost of deploying AI.
In addition, the utilisation of their AI infrastructure relies on its fungibility, or the ability to accelerate the entire AI workflow—from data prep to training to inference—on a single platform.
With NVIDIA AI, customers can use the same infrastructure for the entire AI pipeline, repurposing it to match the varying demands between data preparation, training and inference, which dramatically boosts utilisation, leading to very high ROI.
And, as researchers discover new AI breakthroughs, supporting the latest model innovations is key to maximising the useful life of AI infrastructure.
NVIDIA AI delivers the highest productivity per dollar as it is universal and performant for every model, scales to any size and accelerates AI from end to end, from data prep to training to inference.
These results provide the latest demonstration of NVIDIA’s broad and deep AI expertise shown in every MLPerf training, inference and HPC round to date.
23x More Performance in 3.5 Years
In the two years since our first MLPerf submission with A100, our platform has delivered 6x more performance. Continuous optimisations to our software stack helped fuel those gains.
Since the advent of MLPerf, the NVIDIA AI platform has delivered 23x more performance in 3.5 years on the benchmark—the result of full-stack innovation spanning GPUs, software and at-scale improvements. It is this continuous commitment to innovation that assures customers that the AI platform that they invest in today and keep in service for three to five years, will continue to advance to support the state of the art.
In addition the NVIDIA Hopper architecture, announced in March, promises another giant leap in performance in future MLPerf rounds.
How We Did It
Software innovation continues to unlock more performance on the NVIDIA Ampere architecture.
For example, CUDA Graphs—software that helps minimise launch overhead on jobs that run across many accelerators—is used extensively across our submissions. Optimized kernels in our libraries like cuDNN and pre-processing in DALI unlocked additional speedups. We also implemented full stack improvements across hardware, software and networking such as NVIDIA Magnum IO and SHARP, which offloads some AI functions into the network to drive even greater performance, especially at scale.
All the software we use is available from the MLPerf repository, so everyone can get our world-class results. We continuously fold these optimisations into containers available on NGC, our software hub for GPU applications, and offer NVIDIA AI Enterprise to deliver optimized software, fully supported by NVIDIA.
Two years after the debut of A100, the NVIDIA AI platform continues to deliver the highest performance in MLPerf 2.0, and is the only platform to submit on every single benchmark. Our next-generation Hopper architecture promises another giant leap in future MLPerf rounds.
Our platform is universal for every model and framework at any scale and provides the fungibility to handle every part of the AI workload. It is available from every major cloud and server maker.
Archive
- October 2024(27)
- September 2024(94)
- August 2024(100)
- July 2024(99)
- June 2024(126)
- May 2024(155)
- April 2024(123)
- March 2024(112)
- February 2024(109)
- January 2024(95)
- December 2023(56)
- November 2023(86)
- October 2023(97)
- September 2023(89)
- August 2023(101)
- July 2023(104)
- June 2023(113)
- May 2023(103)
- April 2023(93)
- March 2023(129)
- February 2023(77)
- January 2023(91)
- December 2022(90)
- November 2022(125)
- October 2022(117)
- September 2022(137)
- August 2022(119)
- July 2022(99)
- June 2022(128)
- May 2022(112)
- April 2022(108)
- March 2022(121)
- February 2022(93)
- January 2022(110)
- December 2021(92)
- November 2021(107)
- October 2021(101)
- September 2021(81)
- August 2021(74)
- July 2021(78)
- June 2021(92)
- May 2021(67)
- April 2021(79)
- March 2021(79)
- February 2021(58)
- January 2021(55)
- December 2020(56)
- November 2020(59)
- October 2020(78)
- September 2020(72)
- August 2020(64)
- July 2020(71)
- June 2020(74)
- May 2020(50)
- April 2020(71)
- March 2020(71)
- February 2020(58)
- January 2020(62)
- December 2019(57)
- November 2019(64)
- October 2019(25)
- September 2019(24)
- August 2019(14)
- July 2019(23)
- June 2019(54)
- May 2019(82)
- April 2019(76)
- March 2019(71)
- February 2019(67)
- January 2019(75)
- December 2018(44)
- November 2018(47)
- October 2018(74)
- September 2018(54)
- August 2018(61)
- July 2018(72)
- June 2018(62)
- May 2018(62)
- April 2018(73)
- March 2018(76)
- February 2018(8)
- January 2018(7)
- December 2017(6)
- November 2017(8)
- October 2017(3)
- September 2017(4)
- August 2017(4)
- July 2017(2)
- June 2017(5)
- May 2017(6)
- April 2017(11)
- March 2017(8)
- February 2017(16)
- January 2017(10)
- December 2016(12)
- November 2016(20)
- October 2016(7)
- September 2016(102)
- August 2016(168)
- July 2016(141)
- June 2016(149)
- May 2016(117)
- April 2016(59)
- March 2016(85)
- February 2016(153)
- December 2015(150)