Databricks announces major contributions to flagship open source projects
The Data and AI specialist will contribute new Delta Lake features and enhancements to the Linux Foundation
Databricks has announced several new contributions to popular data and AI open source projects, including Delta Lake, ML flow, and Apache Spark.
At its Data + AI Summit, the data and AI specialist said it will contribute all features and enhancements it has made to Delta Lake to the Linux Foundation, as well as make all Delta Lake APIs open source as part of its Delta Lake 2.0 release.
That means the open source community will benefit from the full functionality and enhanced performance of the Delta Lake 2.0 ecosystem, enabling the building of high-performance data lakehouses on open standards. The Delta Lake 2.0 Release Candidate is now available, with a full release expected later this year.
The firm also announced the next iteration of open source machine learning project MLflow 2.0, which introduces MLflow pipelines to the platform. The addition aims to substantially decrease time to production and improve execution at scale through standardisation.
MLflow Pipelines offers data scientists pre-defined, production-ready templates based on the model type they’re building, allowing them to bootstrap and accelerate model development without requiring intervention from production engineers.
Additionally, Databricks revealed its new Spark Connect, which will enable the use of the unified data analytics engine Spark on virtually any device, as well as Project Lightspeed, a next-gen Spark Structured Streaming engine for data streaming on the lakehouse platform.
“From the beginning, Databricks has been committed to open standards and the open source community,” commented Ali Ghodsi, Co-Founder and CEO of Databricks. “We have created, contributed to, fostered the growth of, and donated some of the most impactful innovations in modern open source technology.
“Open data lakehouses are quickly becoming the standard for how the most innovative companies handle their data and AI. Delta Lake, MLflow and Spark are all core to this architectural transformation, and we’re proud to do our part in accelerating their innovation and adoption.”
New Lakehouse innovations
Databricks also unveiled several innovations for its Lakehouse Platform at the Data + AI Summit. New capabilities include data warehousing performance and functionality, expanded data governance, and new data sharing innovations which include Databricks Marketplace and Data Cleanrooms for secure data collaboration.
There’s also automatic cost optimization for ETL operations, as well as machine learning lifecycle improvements to “radically simplify” MLOps at production scale.
“Today’s announcements are a significant step forward in advancing our Lakehouse vision, as we are making it faster and easier than ever to maximise the value of data, both within and across companies,” Ghodsi said.
The Total Economic Impact™ Of Turbonomic Application Resource Management for IBM Cloud® Paks
Business benefits and cost savings enabled by IBM Turbonomic Application Resource Management

The Total Economic Impact™ of IBM Watson Assistant
Cost savings and business benefits enabled by Watson Assistant

The field guide to application modernisation
Moving forward with your enterprise application portfolio

AI for customer service
Discover the industry-leading AI platform that customers and employees want to use
