Microsoft yesterday unveiled Microsoft Fabric, a new offering that unites its suite of data management, analytic, and machine learning tools into a single offering. The solution is built on OneLake, a new data lake that’s currently in preview.
Microsoft Fabric is an “end-to-end, unified analytics platform that brings together all the data and analytics tools that organizations need,” Arun Ulagaratchagan, Microsoft corporate VP of Azure Data, writes in a blog post.
That includes everything from data governance and ETL pipelines to traditional SQL analytic and machine learning workloads. PowerBI plays a role, as expected. And there’s even a streaming analytics component, as well as ChatGPT-like Copilot for authoring reports.
Fabric is based on OneLake, the new lakehouse that Microsoft also announced yesterday. Every piece of data that Microsoft Fabric users access comes from OneLake, which provides unified data governance, discovery, sharing, lineage, and compliance capabilities.
Data is stored in OneLake using Parquet and Delta, which is Databricks open table format (as opposed to other formats, like Apache Iceberg or Apache Hudi).
“By adopting OneLake as our store and Delta and Parquet as the common format for all workloads, we offer customers a data stack that’s unified at the most fundamental level,” Ulagaratchagan writes. “Customers do not need to maintain different copies of data for databases, data lakes, data warehousing, business intelligence, or real-time analytics. Instead, a single copy of the data in OneLake can directly power all the workloads.
OneLake can also “virtualize” data lake storage in Microsoft Azure Data Lake Storage generation 2 (ADLSg2), AWS’s Amazon S3), with support for Google Storage coming soon.
Atop OneLake are seven key components that deliver specific functionality. According to Ulagaratchagan, these include:
- Data Factory (in preview), which provides 150+ connectors to cloud and on-premises data sources, drag-and-drop experiences for data transformation, and the ability to orchestrate data pipelines;
- Synapse Data Engineering (in preview), which enables authoring experiences for Spark, instant start with live pools, and the ability to collaborate;
- Synapse Data Science (in preview), which provides an end-to-end workflow for data scientists to build sophisticated AI models, collaborate easily, and train, deploy, and manage machine learning models;
- Synapse Data Warehousing (in preview), which provides a converged lakehouse and data warehouse experience on open data formats;
- Synapse Real-Time Analytics (in preview), which enables developers to work with data streaming in from the Internet of Things (IoT) devices, telemetry, logs, and more, and analyze volumes of semi-structured data;
- Power BI in Fabric, which provides visualization and AI-driven analytics. Data Activator (coming soon) provides real-time detection and monitoring of data and can trigger notifications and actions when it finds specified patterns in data—all in a no-code experience.
Microsoft has a close partnership with OpenAI, and so it’s natural that Fabric will also utilize OpenAI to power Copilot for generative AI capabilities. Ulagaratchagan writes:
“We are infusing Fabric with Azure OpenAI Service at every layer to help customers unlock the full potential of their data, enabling developers to leverage the power of generative AI against their data and assisting business users to find insights in their data. With Copilot in Microsoft Fabric in every data experience, users can use conversational language to create dataflows and data pipelines, generate code and entire functions, build machine learning models, or visualize results. Customers can even create their own conversational language experiences that combine Azure OpenAI Service models and their data and publish them as plug-ins.”
Microsoft Fabric is currently in preview, but it already has several customers who have used early versions of Fabric, including Ferguson, T-Mobile, and Aon.
Geoffrey Freeman, who works in data solutions and analytics for T-Mobile, says Fabric will help it eliminate data silos. “Querying across the lakehouse and warehouse from a single engine–that’s a game changer,” Freeman says, according to the Microsoft blog. “Spark compute on-demand, rather than waiting for clusters to spin up, is a huge improvement for both standard data engineering and advanced analytics.”
For more information, see www.microsoft.com/en-us/microsoft-fabric.