At the Data + AI Summit, Databricks announced the latest generation of its industry-leading machine learning (ML) offering with the launch of Databricks Machine Learning, a new data-native platform built on top of an open lakehouse architecture. With Databricks Machine Learning, new and existing ML capabilities on the Databricks Lakehouse Platform are integrated into a collaborative, purpose-built experience that provides ML engineers with everything they need to build, train, deploy, and manage ML models from experimentation to production, uniquely combining data and the full ML lifecycle. Databricks Machine Learning also includes two new capabilities: Databricks AutoML to augment the machine learning process by automating all of the tedious steps that data scientists today have to manually do, while still exposing enough control and transparency, and Databricks Feature Store to improve discoverability, reuse, and governance of model features in a system integrated in the enterprise’s data engineering platform.
Many ML platforms fall short because they ignore a key challenge in machine learning: they assume that data are available at high quality and ready for training. That requires data teams to stitch together solutions that are good at data but not AI, with others that are good at AI but not data. To complicate things further, the people responsible for data platforms and pipelines (data engineers) are different from those that train ML models (data scientists), which are different from those who deploy product applications (engineering teams who own business applications). As a result, solutions for ML need to bridge gaps between data and AI, the tooling required, and the people involved.
Databricks Machine Learning provides each member of the data team with the right tools in one collaborative environment. Users can switch between Data Science / Engineering, SQL Analytics, and the new Machine Learning experiences to access tools and features relevant to their everyday workflow. Databricks Machine Learning also provides a new ML-focused start page that surfaces the new ML capabilities and resources, with quick access to Experiments, the Feature Store, and the Model Registry. Built on an open lakehouse foundation, Databricks Machine Learning ensures customers can easily work with any type of data, at any scale, for machine learning across traditional structured tables, to unstructured data like videos and images, to streaming data from real-time applications and IoT sensors, and quickly move through the ML workflow to get more models to production faster.
“Humana’s machine learning platform, FlorenceAI, is enabling us to automate and accelerate the delivery lifecycle of ML solutions at scale. Databricks has been an essential underlying technology, with hundreds of our data scientists using the platform to deliver dozens of models in production, so that our teams are able to operate at orders of magnitude faster than before,” said Slawek Kierner, Senior Vice President of Enterprise Data and Analytics at Humana.
Databricks AutoML: Jumpstart new projects and automate tedious ML tasks
AutoML has the potential to allow data teams to more quickly build ML models by automating a lot of heavy lifting involved in the experimentation and training phases. But, enterprises who use AutoML tools today often struggle with getting AutoML models to production. This happens because the tools provide no visibility into how they arrive at their final model, which makes it impossible to modify its performance or troubleshoot it when edge cases in data lead to low confidence predictions. Additionally, it can be difficult for organisations to satisfy compliance requirements that require them to explain how a model works, because they lack visibility into the model’s code.
The introduction of the AutoML capabilities within Databricks ML takes a unique ‘glass box’ approach instead. It allows data teams to not only quickly produce trained models either through a UI or API, but also auto-generates underlying experiments and notebooks with code so data scientists can easily validate an unfamiliar data set or modify the generated ML project. Data scientists have full transparency into how a model operates and can take control at any time. This transparency is critical in highly regulated environments and for collaboration with expert data scientists.
All AutoML experiments are integrated with the rest of the Databricks Lakehouse Platform, including MLflow, to track all the related parameters, metrics, artifacts, and models associated with every trial run to make it easy to compare models and easily deploy them to production.
Databricks Feature Store: Streamline ML at scale with simplified feature sharing and discovery
Machine learning models are built using features, which are the attributes used by a model to make a prediction. To work most efficiently, data scientists need to be able to discover what features exist within their organisation, how they are built, and where they are used, rather than wasting significant time repeatedly reinventing features. Additionally, feature code needs to be kept consistent across several teams that participate in the ML workflow, otherwise, model performance will drift apart between real-time and batch use cases – a problem called online/offline skew.
The Databricks Feature Store is the first of its kind that is co-designed with a data and MLOps platform. Tight integration with the popular open-source frameworks Delta Lake and MLflow guarantees that data stored in the Feature Store is open and that models trained with any ML framework can benefit from the integration of the Feature Store with the MLflow model format. Most importantly, the Feature Store eliminates online/offline skew by packaging feature store references with the model, so that the model itself can lookup features from the Feature Store instead of requiring a client application to do so. As a result, features can be updated without any changes to the client application that sends requests to the model. The Feature Store also enables reusability and discoverability with automated lineage tracking to automatically track the data sources used for feature computation, as well as the exact version of the code that was used. With this, a data scientist can find all of the features that have already been defined based on the raw data they are planning to use. Finally, the Feature Store knows exactly which models and endpoints consume any given feature, facilitating end-to-end lineage as well as safe decision-making on whether a feature can be updated or deleted.
Archive
- October 2024(44)
- September 2024(94)
- August 2024(100)
- July 2024(99)
- June 2024(126)
- May 2024(155)
- April 2024(123)
- March 2024(112)
- February 2024(109)
- January 2024(95)
- December 2023(56)
- November 2023(86)
- October 2023(97)
- September 2023(89)
- August 2023(101)
- July 2023(104)
- June 2023(113)
- May 2023(103)
- April 2023(93)
- March 2023(129)
- February 2023(77)
- January 2023(91)
- December 2022(90)
- November 2022(125)
- October 2022(117)
- September 2022(137)
- August 2022(119)
- July 2022(99)
- June 2022(128)
- May 2022(112)
- April 2022(108)
- March 2022(121)
- February 2022(93)
- January 2022(110)
- December 2021(92)
- November 2021(107)
- October 2021(101)
- September 2021(81)
- August 2021(74)
- July 2021(78)
- June 2021(92)
- May 2021(67)
- April 2021(79)
- March 2021(79)
- February 2021(58)
- January 2021(55)
- December 2020(56)
- November 2020(59)
- October 2020(78)
- September 2020(72)
- August 2020(64)
- July 2020(71)
- June 2020(74)
- May 2020(50)
- April 2020(71)
- March 2020(71)
- February 2020(58)
- January 2020(62)
- December 2019(57)
- November 2019(64)
- October 2019(25)
- September 2019(24)
- August 2019(14)
- July 2019(23)
- June 2019(54)
- May 2019(82)
- April 2019(76)
- March 2019(71)
- February 2019(67)
- January 2019(75)
- December 2018(44)
- November 2018(47)
- October 2018(74)
- September 2018(54)
- August 2018(61)
- July 2018(72)
- June 2018(62)
- May 2018(62)
- April 2018(73)
- March 2018(76)
- February 2018(8)
- January 2018(7)
- December 2017(6)
- November 2017(8)
- October 2017(3)
- September 2017(4)
- August 2017(4)
- July 2017(2)
- June 2017(5)
- May 2017(6)
- April 2017(11)
- March 2017(8)
- February 2017(16)
- January 2017(10)
- December 2016(12)
- November 2016(20)
- October 2016(7)
- September 2016(102)
- August 2016(168)
- July 2016(141)
- June 2016(149)
- May 2016(117)
- April 2016(59)
- March 2016(85)
- February 2016(153)
- December 2015(150)