In the previous post, we learned about the various ways to store data in Azure. In this post, let’s learn about the various ways Azure provides to process, analyze and transform data.
Azure Data Catalog: As the name suggests, this is a catalog or index, which lists the metadata for all data elements stored in various formats and stores across the organization. Data Catalog acts as a data dictionary where users can search for the data that they are after. For the data catalog to work efficiently, all data sources should be registered on the catalog first.
Azure Data Lake Storage Gen2 (ADLS Gen2) : This is a general-purpose data store for various structured and unstructured data. It can be used to store any type of data including files, videos, social media streams etc. Under the hood, ADLS Gen2 uses Azure blob storage and is optimized for data analytics workloads i.e. processing large amounts of data.
Azure Synapse Analytics (previously, Azure Data Warehouse): This solution marries Data warehousing and big data analytics. Data needs to be transformed into a relational format in order to load the data into Synapse analytics. Synapse analytics enables the creation of a schema optimized for data retrieval and analytics. In other words, it is optimized for business reporting.
Azure HD Insight: This is Microsoft’s implementation of the Hadoop stack. Hadoop is an open source suite of programs developed from ground up to handle and process big data workloads. HD Insight supports big data technologies like Hadoop, Spark, Hive, Storm and many others.
Azure Data Factory: This solution is used for automating various processes on Azure, primarily the data pipelines for moving data from one store to the other. E.g. Using data factory, it is possible to load data into Azure Data Lake Storage Gen2, process the data using HD Insight and move the results to the Azure Synapse Analytics without any human intervention.
Azure Data Lake Analytics: Data Lake Analytics is an analytics platform that provides features to analyze both structured and unstructured data on a petabyte scale. This service is intended to be paired with ADLS Gen2 data store solution but it does support processing of structured data stores on Azure such as Synapse Analytics and Azure SQL Database as well.
Azure Analysis Services: This is a PaaS for data modeling, based on SQL Service Analysis Services (SSAS). Traditionally, SSAS has been used for creating a semantic layer, optimized for analytical queries, for the data stored in a data warehouse. Azure Analysis Services extends the data modeling platform to include big data sources such as ADLS Gen2.
There are other data services available in Azure which will be discussed in future posts. In the next post, we will discuss the differences between ADLS Gen 2 and Azure Synapse Analytics.