The Data Lake feature allows you to perform analytics on your data usage and prepare reports. Data Lake is a large repository that stores both structured and unstructured data. Data Lake Storage combines the scalability and cost benefits of object storage with the reliability and performance of the Big Data file system capabilities. The following illustration shows how Azure Data Lake stores all your business data and makes it available for analysis.
The prefoliation of SaaS (Software as a Service) has made the delivery of technology for the business easier, faster and cheaper. SaaS is now a common system of record for organizations. This change has revolutionized the modern workplace and changed the traditional way of managing and securing the IT services for the organization. This shift has brought a completely new paradigm for IT teams on how to manage, secure and support this new landscape.
Organizations must understand exactly how SaaS applications operate and interact with each other. That includes understanding information that needs to be centralized and discovered and build insights on the data that is relevant to increase operational efficiencies. In order to reduce security risks and increase compliance, organizations must introduce automation where possible, and applying analytics on operational data to avoid alert fatigue.
A comprehensive data strategy including centralization, discoverability, insights, action, automation, delegation and auditability is needed to fill the gaps introduced by today’s SaaS environments and to gain the level of control and clarity that is essential for properly securing the corporate environment.
Suppose you work in the analytics department of a large health system. Your organization’s IT infrastructure is hybrid both on-premise and cloud-based, and all data, including customer interactions and services information, resides in Azure SQL Data Warehouse. Your department analyzes customer services usage patterns and proposes inefficiencies in the processes based on your findings. You can achieve the desired results by using the robust machine learning and deep learning functions of Azure Databricks in conjunctions with the Azure SQL Data Warehouse.
Azure Databricks is a fully managed, cloud-based big data and machine learning platform. It enables developers to accelerate AI implementation by simplifying the process of building enterprise-grade production data applications. Built in a joint effort by Microsoft and the team that started Apache Spark, Azure Databricks provides data science and engineering teams with a single platform for big data processing and machine learning.
By combining an end-to-end, managed Apache Spark platform optimized for the cloud with the enterprise scale and security of the Azure platform, Azure Databricks makes it easy to run large-scale Spark workloads.
You can access SQL Data Warehouse from Azure Databricks by using the SQL Data Warehouse connector. SQL Data Warehouse connector is a data source implementation for Apache Spark that uses Azure Blob storage and PolyBase in SQL Data Warehouse to transfer large volumes of data efficiently between an Azure Databricks cluster and a SQL Data Warehouse instance.
Both the Azure Databricks cluster and the SQL Data Warehouse instance access a common Blob storage container to exchange data. In Azure Databricks, Spark jobs are triggered by the SQL Data Warehouse connector to read data from and write data to the Blob storage container. On the SQL Data Warehouse side, data loading and unloading operations performed by PolyBase are triggered by the SQL Data Warehouse connector through JDBC.
PolyBase is a technology that accesses data outside of a database via the T-SQL language. In Azure SQL Data Warehouse, you can import and export data to and from Azure Blob storage and Azure Data Lake Store.
Azure Data Factory is a cloud-based data integration service. It lets you create data-driven workflows in the cloud for orchestrating and automating data movement and data transformation. Data Factory supports various data stores. In this case it uses Azure SQL Database as a data source.