The Data Lake feature allows you to perform analytics on your data usage and prepare reports. Data Lake is a large repository that stores both structured and unstructured data. Data Lake Storage combines the scalability and cost benefits of object storage with the reliability and performance of the Big Data file system capabilities. The following illustration shows how Azure Data Lake stores all your business data and makes it available for analysis.
The goal of cloud computing is to make running a business easier and more efficient, whether it’s a small start-up or a large enterprise. Every business is unique and has different needs. To meet those needs, cloud computing providers offer a wide range of services. Cloud compute services including Virtual Machines, Containers, App Service and Serverless computing offer application development and deployment approaches if applied correctly can save time and money. Each service provides benefits as well as tradeoffs against other options. IT needs to have a good understanding of these compute services.
Virtual Machines (VM) is an emulation of a physical computer, which offers more control that comes with maintenance overhead.
Containers provide a consistent, isolated execution environment for applications. They are similar to VMs except they don’t require a guest operating system. Instead, the application and all its dependencies is packaged into a “container” and then a standard runtime environment is used to execute the app. This allows the container to start up in just a few seconds, because there’s no OS to boot and initialize. You only need the app to launch.
Serverless computing lets you run application code without creating, configuring, or maintaining a server. Each approach is optimized for specific use case. The core idea is that your application is broken into separate functions that run when triggered by some action. This is ideal for automated tasks.
Suppose you work in the analytics department of a large health system. Your organization’s IT infrastructure is hybrid both on-premise and cloud-based, and all data, including customer interactions and services information, resides in Azure SQL Data Warehouse. Your department analyzes customer services usage patterns and proposes inefficiencies in the processes based on your findings. You can achieve the desired results by using the robust machine learning and deep learning functions of Azure Databricks in conjunctions with the Azure SQL Data Warehouse.
Azure Databricks is a fully managed, cloud-based big data and machine learning platform. It enables developers to accelerate AI implementation by simplifying the process of building enterprise-grade production data applications. Built in a joint effort by Microsoft and the team that started Apache Spark, Azure Databricks provides data science and engineering teams with a single platform for big data processing and machine learning.
By combining an end-to-end, managed Apache Spark platform optimized for the cloud with the enterprise scale and security of the Azure platform, Azure Databricks makes it easy to run large-scale Spark workloads.
You can access SQL Data Warehouse from Azure Databricks by using the SQL Data Warehouse connector. SQL Data Warehouse connector is a data source implementation for Apache Spark that uses Azure Blob storage and PolyBase in SQL Data Warehouse to transfer large volumes of data efficiently between an Azure Databricks cluster and a SQL Data Warehouse instance.
Both the Azure Databricks cluster and the SQL Data Warehouse instance access a common Blob storage container to exchange data. In Azure Databricks, Spark jobs are triggered by the SQL Data Warehouse connector to read data from and write data to the Blob storage container. On the SQL Data Warehouse side, data loading and unloading operations performed by PolyBase are triggered by the SQL Data Warehouse connector through JDBC.
PolyBase is a technology that accesses data outside of a database via the T-SQL language. In Azure SQL Data Warehouse, you can import and export data to and from Azure Blob storage and Azure Data Lake Store.
Azure Data Factory is a cloud-based data integration service. It lets you create data-driven workflows in the cloud for orchestrating and automating data movement and data transformation. Data Factory supports various data stores. In this case it uses Azure SQL Database as a data source.