Covering Disruptive Technology Powering Business in The Digital Age

image
Hewlett Packard Enterprise Accelerates AI Journey from POC to Production with New Solution for AI Development and Training at Scale
image
April 29, 2022 News

 

Hewlett Packard Enterprise has announced that it is removing barriers for enterprises to easily build and train machine learning models at scale, to realise value faster, with the new HPE Machine Learning Development System. The new system, which is purpose-built for Artificial Intelligence (AI), is an end-to-end solution that integrates a machine learning software platform, compute, accelerators, and networking to develop and train more accurate AI models faster, and at scale.

The HPE Machine Learning Development System builds on HPE’s strategic investment in acquiring Determined AI to combine its robust Machine Learning (ML) platform, now formally called the HPE Machine Learning Development Environment, with HPE’s world-leading AI and high performance computing (HPC) offerings. With the new HPE Machine Learning Development System, users can speed up the typical time-to-value to start realising results from building and training machine models, from weeks and months, to days.

Early Adopter of HPE Machine Learning Development System Launches Training of Giant Multimodal AI Model in Record Speed

HPE also announced that Aleph Alpha, a German AI startup, has adopted the HPE Machine Learning Development System to train their multimodal AI, which includes Natural Language Processing (NLP) and computer vision.

By combining image and text processing in five languages with almost humanlike context understanding, the models push the boundaries of modern AI for all kinds of language and image-based transformative use cases, such as AI-assistants for the creation of complex texts, higher level understanding summaries, searching for highly specific information in hundreds of documents, and leveraging of specialized knowledge in a conversational context. By adopting the HPE Machine Learning Development System, Aleph Alpha had the system immediately up and began efficiently training in record time, combining and monitoring hundreds of GPUs.

“We are seeing astonishing efficiency and performance of more than 150 teraflops by using the HPE Machine Learning Development System. The system was quickly set up and we began training our models in hours instead of weeks. While running these massive workloads, combined with our ongoing research, being able to rely on an integrated solution for deployment and monitoring makes all the difference,” said Jonas Andrulis, Founder of and CEO at Aleph Alpha.

“Enterprises seek to incorporate AI and Machine Learning to differentiate their products and services, but are often confronted with complexity in setting up the infrastructure required to build and train accurate AI models at scale,” said Justin Hotard, Executive Vice President and General Manager, HPC and AI, at HPE. “The HPE Machine Learning Development System combines our proven end-to-end HPC solutions for deep learning with our innovative machine learning software platform into one system, to provide a performant out-of-the box solution to accelerate time to value and outcomes with AI.”

Removing Barriers to Realise Full Potential of AI with a Complete ML Solution

Organisations have yet to reach maturity in their AI infrastructure, which according to IDC, is the most significant and costly investment required for enterprises that want to speed up their experimentation or prototyping phase, to develop AI products and services. Typically, adopting AI infrastructure to support model development and training at scale, requires a complex, multi-step process involving the purchase, setup and management of a highly parallel software ecosystem and infrastructure spanning specialized compute, storage, interconnect and accelerators.

The HPE Machine Learning Development System helps enterprises bypass the high complexity associated with adopting AI infrastructure by offering the only solution that combines software and specialised computing such as accelerators, networking and services, allowing enterprises to immediately begin efficiently building and training optimised ML models at scale.

Gaining Accurate Models to Unlock Value Faster with the HPE Machine Learning Development System The system also helps improve accuracy in models faster with state-of-art distributed training, automated hyperparameter optimisation and neural architecture search, which are keys to ML algorithms. The HPE Machine Learning Development System delivers optimised compute, accelerated compute and interconnect, which are key performance drivers to scale models efficiently for a mix of workloads, starting at a small configuration of 32 GPUs, all the way to a larger configuration of 256 GPUs.

On a small configuration of 32 GPUs, the HPE Machine Learning Development System delivers approximately 90% scaling efficiency for workloads such as Natural Language Processing (NLP) and Computer Vision. Additionally, based on internal testing, the HPE Machine Learning Development System with 32 GPUs, delivers up to 5.7X faster throughout for an NLP workload compared to another offering containing 32 identical GPUs, but with a sub-optimal interconnect.

Speeding Up POC to Production with Ready-to-Use, AI Model Development and Training Solution

The HPE Machine Learning Development System is offered as one, integrated solution that provides preconfigured, fully installed AI infrastructure for turnkey model development and training at scale. As part of the offering, HPE Pointnext Services will provide onsite installation and software setup, allowing users to immediately implement and train ML models for faster and more accurate insights from their data.

The HPE Machine Learning Development System is offered starting in a small building block, with options to scale up. The small configuration starts with the following:

  • Innovative ML platform with the HPE Machine Learning Development Environment to enable enterprises to rapidly develop, iterate and scale high-quality models from POC to production
  • Optimised AI infrastructure using the HPE Apollo 6500 Gen10 Plus system to provide massive, specialised computing capabilities to train and optimise AI models, starting with eight NVIDIA A100 80GB GPUs for accelerated compute
  • Enabling fine-grained centralised monitoring and management of for optimal performance with the HPE Performance Cluster Management, a system management software solution
  • Management stack to control and manage system components using HPE ProLiant DL325 servers and 1Gb Ethernet Aruba CX 6300 switch
  • Ensuring performance of compute and storage communications using the NVIDIA Quantum InfiniBand networking platform

Availability

The HPE Machine Learning Development System is available now worldwide. For more information, please visit: hpe.com/info/machine-learning-development-system.

 

 

(0)(0)

Archive