Covering Disruptive Technology Powering Business in The Digital Age

NVIDIA Announces Major Updates to Triton Inference Server as 25,000+ Companies Worldwide Deploy NVIDIA AI Inference
November 15, 2021 News


NVIDIA has announced major updates to its Artificial Intelligence (AI) inference platform, which is now being used by Capital One, Microsoft, Samsung Medison, Siemens Energy and Snap, among its 25,000+ customers.

The updates include new capabilities in the open source NVIDIA Triton Inference Server™ software, which provides cross-platform inference on all AI models and frameworks, and NVIDIA TensorRT™, which optimises AI models and provides a runtime for high-performance inference on NVIDIA GPUs.

The company also introduced the NVIDIA A2 Tensor Core GPU, a low-power, small-footprint accelerator for AI inference at the edge that offers up to 20x more inference performance than CPUs.

“NVIDIA’s AI inference platform is driving breakthroughs across virtually every industry, including healthcare, financial services, retail, manufacturing and supercomputing”, said Ian Buck, Vice President and General Manager of Accelerated Computing at NVIDIA. “Whether delivering smarter recommendations, harnessing the power of conversational AI or advancing scientific discovery, NVIDIA’s platform for inference provides low-latency, high-throughput, versatile performance with the ease of use required to power key new AI applications worldwide”.

Key software optimisations updates to the Triton Inference server include:

  • Triton Model Analyzer. This new tool automates a key optimisation task by helping select the best configurations for AI models from hundreds of possibilities. It achieves the optimal performance while ensuring quality of service required for applications.
  • Multi-GPU Multinode Functionality.  This new functionality enables Transformer-based large language models, such as Megatron 530B, that no longer fit in a single GPU to be inferenced across multiple GPUs and server nodes and provides real-time inference performance.
  • RAPIDS FIL. This new backend for GPU or CPU inference of random forest and gradient-boosted decision tree models provides developers a unified deployment engine for both deep learning and traditional machine learning with Triton.
  • Amazon SageMaker Integration. This seamless integration allows customers to easily deploy multi-framework models with high performance using Triton within SageMaker, AWS’s fully managed AI service.
  • Support for Arm CPUs. Triton now includes backends to optimise AI inference workloads on Arm CPUs, in addition to NVIDIA GPUs and x86 CPUs.

Triton provides AI inference on GPUs and CPUs in the cloud, data centre, enterprise edge and embedded; is integrated into AWS, Google Cloud, Microsoft Azure and Alibaba Cloud PAI-EAS; and is included in NVIDIA AI Enterprise.

NVIDIA AI Enterprise is an end-to-end software suite for development and deployment of AI. It is optimised, certified and supported by NVIDIA to enable customers to run AI workloads on mainstream servers in on-prem data centres and private clouds.

In addition to Triton, TensorRT is now integrated with TensorFlow and PyTorch, providing 3x faster performance versus inference in-framework with just one line of code. This provides developers with the power of TensorRT in a vastly simplified workflow.

NVIDIA TensorRT 8.2, the latest version of the SDK, accelerates high-performance, deep learning inference, delivering high throughput and low latency in the cloud, on premises or at the edge. With new optimisations, language models with billions of parameters can be run in real time.

Industry Leaders Embrace NVIDIA AI Platform for Inference

Industry leaders are using the NVIDIA AI inference platform to improve their business operations and offer customers new AI-enabled services.

Microsoft Azure Cognitive Services provide cloud-based APIs to high-quality AI models to create intelligent applications. It is using Triton to run speech-to-text models that provide Microsoft Teams users with accurate live captions and transcriptions.

“Microsoft Teams is an essential tool for communication and collaboration worldwide, with nearly 250 million monthly active users”, said Shalendra Chhabra, Principal PM Manager for Teams Calling and Meetings and Devices at Microsoft. “AI models like these are incredibly complex, requiring tens of millions of neural network parameters to deliver accurate results across dozens of different languages. The bigger a model is, the harder it is to run cost-effectively in real time. NVIDIA GPUs and Triton Inference Server on Microsoft Azure Cognitive Services are helping boost live captioning and transcription capabilities in a cost-effective way, using 28 languages and dialects, with AI in near real time”.

Samsung Medison, a global medical equipment company and an affiliate of Samsung Electronics, is using NVIDIA TensorRT to provide enhanced medical image quality using Intelligent Assist features for its ultrasound systems. Samsung Medison is dedicated to enhancing patient and healthcare professionals lives by enhancing their comfort, reducing scan time, simplifying workflow and ultimately increasing the system throughput.

“By leveraging NVIDIA TensorRT in the new coming V8 high-end Ultrasound system, we’re able to better support medical experts when reading and diagnosing images”, said Won-Chul Bang, Vice President and Head of the Customer Experience Team at Samsung Medison. “We are actively introducing AI-based technologies to our ultrasound systems for providing better support for medical professionals, so they can focus on the more important aspects of diagnosis and treatment of patients”.

Siemens Energy, a pure-play energy company with leading energy technology solutions, is using Triton to help its power plant customers manage their facilities with AI.

“The flexibility of NVIDIA Triton Inference Server is enabling highly complicated power plants, often equipped with cameras and sensors but with legacy software systems, to join the autonomous industrial revolution”, said Arik Ott, Portfolio Manager of Autonomous Operations at Siemens Energy.

Snap, the global camera and social media company comprising products and services such as Snapchat, Spectacles and Bitmoji, is using NVIDIA technology to improve monetisation and lower costs.

“Snap used NVIDIA GPUs and TensorRT to improve machine learning inference cost-efficiency by 50% and decrease serving latency by 2x,” said Nima Khajehnouri, Vice President of Engineering for the Mapping and Monetisation Group at Snap. “This provides us the compute headroom to experiment and deploy heavier, more accurate ad and content ranking models”.

NVIDIA AI Platform for Inference includes new NVIDIA-Certified Systems, new A2 GPU NVIDIA-Certified Systems™ enable customers to identify, acquire and deploy systems for diverse modern AI applications on a high-performance, cost-effective and scalable infrastructure and now includes two new categories for edge AI.

The expanded categories allow NVIDIA’s systems partners to offer customers a complete lineup of NVIDIA-Certified Systems powered by NVIDIA Ampere architecture-based GPUs to handle virtually every workload. This includes the new NVIDIA A2 GPU, an entry-level, low-power, compact accelerator for inference and edge AI in edge servers. With the NVIDIA A30 for mainstream enterprise servers and the NVIDIA A100 for the highest performance AI servers, the addition of NVIDIA A2 delivers comprehensive AI inference acceleration across edge, data centre and cloud.

Leading global enterprise system providers such as Atos, Dell Technologies, GIGABYTE, Hewlett Packard Enterprise, Inspur, Lenovo and Supermicro support NVIDIA AI Enterprise on NVIDIA-Certified Systems in their AI systems portfolios.

Additional system providers such as Advantech, ASRock Rack, ASUS, H3C, Nettrix and QCT also offer NVIDIA-Certified Systems for a variety of workloads. The first NVIDIA-Certified Systems to pass certification in the new edge categories will be available soon from leading providers including Advantech, GIGABYTE and Lenovo.


Triton is available from the NVIDIA NGC™ catalogue, a hub for GPU-optimised AI software including frameworks, toolkits, pretrained models and Jupyter Notebooks, and as open source code from the Triton GitHub repository. TensorRT is available to members of the NVIDIA Developer program from the TensorRT page. The latest versions of plugins, parsers and samples are also available as open source from the TensorRT GitHub repository.

The NVIDIA AI Enterprise software suite is available from worldwide NVIDIA channel partners, including Atea, Axians, Carahsoft Technology Corp., Computacenter, Insight Enterprises, Presidio, Sirius, SoftServe, SVA System Vertrieb Alexander GmbH, TD SYNNEX, Trace3 and WWT.