NVIDIA has announced major updates to its Artificial Intelligence (AI) inference platform, which is now being used by Capital One, Microsoft, Samsung Medison, Siemens Energy and Snap, among its 25,000+ customers.
The updates include new capabilities in the open source NVIDIA Triton Inference Server™ software, which provides cross-platform inference on all AI models and frameworks, and NVIDIA TensorRT™, which optimises AI models and provides a runtime for high-performance inference on NVIDIA GPUs.
The company also introduced the NVIDIA A2 Tensor Core GPU, a low-power, small-footprint accelerator for AI inference at the edge that offers up to 20x more inference performance than CPUs.
“NVIDIA’s AI inference platform is driving breakthroughs across virtually every industry, including healthcare, financial services, retail, manufacturing and supercomputing”, said Ian Buck, Vice President and General Manager of Accelerated Computing at NVIDIA. “Whether delivering smarter recommendations, harnessing the power of conversational AI or advancing scientific discovery, NVIDIA’s platform for inference provides low-latency, high-throughput, versatile performance with the ease of use required to power key new AI applications worldwide”.
Key software optimisations updates to the Triton Inference server include:
- Triton Model Analyzer. This new tool automates a key optimisation task by helping select the best configurations for AI models from hundreds of possibilities. It achieves the optimal performance while ensuring quality of service required for applications.
- Multi-GPU Multinode Functionality. This new functionality enables Transformer-based large language models, such as Megatron 530B, that no longer fit in a single GPU to be inferenced across multiple GPUs and server nodes and provides real-time inference performance.
- RAPIDS FIL. This new backend for GPU or CPU inference of random forest and gradient-boosted decision tree models provides developers a unified deployment engine for both deep learning and traditional machine learning with Triton.
- Amazon SageMaker Integration. This seamless integration allows customers to easily deploy multi-framework models with high performance using Triton within SageMaker, AWS’s fully managed AI service.
- Support for Arm CPUs. Triton now includes backends to optimise AI inference workloads on Arm CPUs, in addition to NVIDIA GPUs and x86 CPUs.
Triton provides AI inference on GPUs and CPUs in the cloud, data centre, enterprise edge and embedded; is integrated into AWS, Google Cloud, Microsoft Azure and Alibaba Cloud PAI-EAS; and is included in NVIDIA AI Enterprise.
NVIDIA AI Enterprise is an end-to-end software suite for development and deployment of AI. It is optimised, certified and supported by NVIDIA to enable customers to run AI workloads on mainstream servers in on-prem data centres and private clouds.
In addition to Triton, TensorRT is now integrated with TensorFlow and PyTorch, providing 3x faster performance versus inference in-framework with just one line of code. This provides developers with the power of TensorRT in a vastly simplified workflow.
NVIDIA TensorRT 8.2, the latest version of the SDK, accelerates high-performance, deep learning inference, delivering high throughput and low latency in the cloud, on premises or at the edge. With new optimisations, language models with billions of parameters can be run in real time.
Industry Leaders Embrace NVIDIA AI Platform for Inference
Industry leaders are using the NVIDIA AI inference platform to improve their business operations and offer customers new AI-enabled services.
Microsoft Azure Cognitive Services provide cloud-based APIs to high-quality AI models to create intelligent applications. It is using Triton to run speech-to-text models that provide Microsoft Teams users with accurate live captions and transcriptions.
“Microsoft Teams is an essential tool for communication and collaboration worldwide, with nearly 250 million monthly active users”, said Shalendra Chhabra, Principal PM Manager for Teams Calling and Meetings and Devices at Microsoft. “AI models like these are incredibly complex, requiring tens of millions of neural network parameters to deliver accurate results across dozens of different languages. The bigger a model is, the harder it is to run cost-effectively in real time. NVIDIA GPUs and Triton Inference Server on Microsoft Azure Cognitive Services are helping boost live captioning and transcription capabilities in a cost-effective way, using 28 languages and dialects, with AI in near real time”.
Samsung Medison, a global medical equipment company and an affiliate of Samsung Electronics, is using NVIDIA TensorRT to provide enhanced medical image quality using Intelligent Assist features for its ultrasound systems. Samsung Medison is dedicated to enhancing patient and healthcare professionals lives by enhancing their comfort, reducing scan time, simplifying workflow and ultimately increasing the system throughput.
“By leveraging NVIDIA TensorRT in the new coming V8 high-end Ultrasound system, we’re able to better support medical experts when reading and diagnosing images”, said Won-Chul Bang, Vice President and Head of the Customer Experience Team at Samsung Medison. “We are actively introducing AI-based technologies to our ultrasound systems for providing better support for medical professionals, so they can focus on the more important aspects of diagnosis and treatment of patients”.
Siemens Energy, a pure-play energy company with leading energy technology solutions, is using Triton to help its power plant customers manage their facilities with AI.
“The flexibility of NVIDIA Triton Inference Server is enabling highly complicated power plants, often equipped with cameras and sensors but with legacy software systems, to join the autonomous industrial revolution”, said Arik Ott, Portfolio Manager of Autonomous Operations at Siemens Energy.
Snap, the global camera and social media company comprising products and services such as Snapchat, Spectacles and Bitmoji, is using NVIDIA technology to improve monetisation and lower costs.
“Snap used NVIDIA GPUs and TensorRT to improve machine learning inference cost-efficiency by 50% and decrease serving latency by 2x,” said Nima Khajehnouri, Vice President of Engineering for the Mapping and Monetisation Group at Snap. “This provides us the compute headroom to experiment and deploy heavier, more accurate ad and content ranking models”.
NVIDIA AI Platform for Inference includes new NVIDIA-Certified Systems, new A2 GPU NVIDIA-Certified Systems™ enable customers to identify, acquire and deploy systems for diverse modern AI applications on a high-performance, cost-effective and scalable infrastructure and now includes two new categories for edge AI.
The expanded categories allow NVIDIA’s systems partners to offer customers a complete lineup of NVIDIA-Certified Systems powered by NVIDIA Ampere architecture-based GPUs to handle virtually every workload. This includes the new NVIDIA A2 GPU, an entry-level, low-power, compact accelerator for inference and edge AI in edge servers. With the NVIDIA A30 for mainstream enterprise servers and the NVIDIA A100 for the highest performance AI servers, the addition of NVIDIA A2 delivers comprehensive AI inference acceleration across edge, data centre and cloud.
Leading global enterprise system providers such as Atos, Dell Technologies, GIGABYTE, Hewlett Packard Enterprise, Inspur, Lenovo and Supermicro support NVIDIA AI Enterprise on NVIDIA-Certified Systems in their AI systems portfolios.
Additional system providers such as Advantech, ASRock Rack, ASUS, H3C, Nettrix and QCT also offer NVIDIA-Certified Systems for a variety of workloads. The first NVIDIA-Certified Systems to pass certification in the new edge categories will be available soon from leading providers including Advantech, GIGABYTE and Lenovo.
Availability
Triton is available from the NVIDIA NGC™ catalogue, a hub for GPU-optimised AI software including frameworks, toolkits, pretrained models and Jupyter Notebooks, and as open source code from the Triton GitHub repository. TensorRT is available to members of the NVIDIA Developer program from the TensorRT page. The latest versions of plugins, parsers and samples are also available as open source from the TensorRT GitHub repository.
The NVIDIA AI Enterprise software suite is available from worldwide NVIDIA channel partners, including Atea, Axians, Carahsoft Technology Corp., Computacenter, Insight Enterprises, Presidio, Sirius, SoftServe, SVA System Vertrieb Alexander GmbH, TD SYNNEX, Trace3 and WWT.
Archive
- October 2024(44)
- September 2024(94)
- August 2024(100)
- July 2024(99)
- June 2024(126)
- May 2024(155)
- April 2024(123)
- March 2024(112)
- February 2024(109)
- January 2024(95)
- December 2023(56)
- November 2023(86)
- October 2023(97)
- September 2023(89)
- August 2023(101)
- July 2023(104)
- June 2023(113)
- May 2023(103)
- April 2023(93)
- March 2023(129)
- February 2023(77)
- January 2023(91)
- December 2022(90)
- November 2022(125)
- October 2022(117)
- September 2022(137)
- August 2022(119)
- July 2022(99)
- June 2022(128)
- May 2022(112)
- April 2022(108)
- March 2022(121)
- February 2022(93)
- January 2022(110)
- December 2021(92)
- November 2021(107)
- October 2021(101)
- September 2021(81)
- August 2021(74)
- July 2021(78)
- June 2021(92)
- May 2021(67)
- April 2021(79)
- March 2021(79)
- February 2021(58)
- January 2021(55)
- December 2020(56)
- November 2020(59)
- October 2020(78)
- September 2020(72)
- August 2020(64)
- July 2020(71)
- June 2020(74)
- May 2020(50)
- April 2020(71)
- March 2020(71)
- February 2020(58)
- January 2020(62)
- December 2019(57)
- November 2019(64)
- October 2019(25)
- September 2019(24)
- August 2019(14)
- July 2019(23)
- June 2019(54)
- May 2019(82)
- April 2019(76)
- March 2019(71)
- February 2019(67)
- January 2019(75)
- December 2018(44)
- November 2018(47)
- October 2018(74)
- September 2018(54)
- August 2018(61)
- July 2018(72)
- June 2018(62)
- May 2018(62)
- April 2018(73)
- March 2018(76)
- February 2018(8)
- January 2018(7)
- December 2017(6)
- November 2017(8)
- October 2017(3)
- September 2017(4)
- August 2017(4)
- July 2017(2)
- June 2017(5)
- May 2017(6)
- April 2017(11)
- March 2017(8)
- February 2017(16)
- January 2017(10)
- December 2016(12)
- November 2016(20)
- October 2016(7)
- September 2016(102)
- August 2016(168)
- July 2016(141)
- June 2016(149)
- May 2016(117)
- April 2016(59)
- March 2016(85)
- February 2016(153)
- December 2015(150)