The idea behind starting this blog was to help people who are interested in data engineering as a career. The reason this blog is named Azure Data Engineering is because my experience is mostly with Microsoft Technologies.
For the 100th post, I have listed the top 50 questions that are most likely to be asked in an interview for Microsoft Azure Data Engineer position.
I have provided a link to the relevant post(s) on the blog related to each of these questions in case you would like to learn more about the underlying concept. This will help you revisit the concepts that have been covered in the previous posts in this blog.
Also, the blog posts have a link to the relevant original MS Docs page about the concept.
Interview Questions:
- What is Microsoft Azure?
https://azurede.com/2020/04/23/what-is-azure/
- What are the various storage types available in Azure?
https://azurede.com/2020/04/27/azure-storage/
- What is data redundancy? What data redundancy options are available in Azure? Data redundancy is the practice of storing multiple copies of data to ensure that the data is always available even during unexpected events e.g. disk failure, in case of a natural disaster etc.
https://azurede.com/2020/05/07/azure-data-redundancy-options/
- What are multi-model databases? What is the primary multi-model database service available on the Microsoft Azure platform?
https://azurede.com/2020/05/03/azure-cosmos-db/
https://azurede.com/2020/05/04/azure-cosmos-db-data-models/
- What are some ways to ingest data from on-prem storage to Azure?
https://azurede.com/2020/06/11/azure-data-transfer-solutions/
- What is the best way to migrate data from on-prem databases to Azure?
https://azurede.com/2020/04/30/moving-on-prem-sql-server-to-azure/
- What is the difference between Azure Data Lake Storage (ADLS) and Azure Synapse Analytics?
https://azurede.com/2020/04/29/difference-between-azure-data-lake-gen2-and-synapse-analytics/
- What are the various consistency models available in Azure Cosmos DB?
https://azurede.com/2020/05/05/azure-cosmos-db-consistency-models/
- What is Cosmos DB Synthetic Partition Key?
https://azurede.com/2020/07/05/azure-cosmos-db-synthetic-partition-key/
- How do you capture streaming data (e.g., website clickstream, social media feed etc.) in Azure?
https://azurede.com/2020/05/22/azure-stream-analytics/
- What is Azure Storage Explorer? What are they used for?
https://azurede.com/2020/05/16/azure-storage-explorer/
- What is Azure Databricks? How is it different from the original Databricks?
https://azurede.com/2020/05/15/azure-databricks/
- What is the primary ETL (Extract Transform Load) service in Azure? How is it different from on-prem tools such as SSIS? Azure Data Factory is similar in functionality to SSIS in terms of data transformation and integration, with more comprehensive task automation and orchestration features.
https://azurede.com/2020/05/14/azure-data-factory/
- What is serverless database computing? How is it implemented in Azure?
https://azurede.com/2020/05/19/serverless-database-computing-azure-cosmos-db-triggers-and-azure-functions/
- How is data security implemented in ADLS Gen2?
https://azurede.com/2020/05/21/adls-gen2-security-layers/
- What are the various windowing functions in Azure Stream Analytics?
https://azurede.com/2020/05/23/azure-stream-analytics-windowing-functions/
- What data security options are available in Azure SQL DB?
https://azurede.com/2020/05/26/database-security-on-azure/
- Which service would you use to create a Data Warehouse in Azure? Azure Synapse Analytics
- Can you explain the architecture of Azure Synapse Analytics?
https://azurede.com/2020/05/29/azure-synapse-analytics-architecture/
- What are the data masking features available in Azure SQL Database?
https://azurede.com/2020/06
/04/azure-sql-database-data-masking/
- What is PolyBase? What are some use cases for PolyBase?
https://azurede.com/2020/06/05/polybase-introduction/ https://azurede.com/2020/06/06/importing-data-into-azure-synapse-analytics-using-polybase/
- What is reserved capacity in Azure Storage?
https://azurede.com/2020/06/09/azure-storage-reserved-capacity/
- What are pipelines and activities in Azure Data Factory? What is the difference between the two?
https://azurede.com/2020/06/14/azure-data-factory-pipelines-and-acitivities/
- How do you manually execute an Azure Data Factory pipeline? There are various ways to manually execute ADF Pipelines. One way is using PowerShell :
https://azurede.com/2020/06/15/azure-data-factory-pipeline-manual-execution/
- What is the difference between control flow and data flow in the context of Azure Data Factory?
https://azurede.com/2020/06/21/azure-data-factory-control-flow-vs-data-flow/
- What are the various Data Flow Partitioning Schemes availablein Azure Data Factory?
https://azurede.com/2020/06/24/azure-data-factory-data-flow-partitioning-schemes/
- What is Azure Table storage? How is it different from other storage types in Azure?
https://azurede.com/2020/04/27/azure-storage/
https://azurede.com/2020/06/27/azure-table-storage-table-entities/
https://azurede.com/2020/06/28/azure-table-storage-partitions/
- What are partition sets in Azure Cosmos DB?
https://azurede.com/2020/07/03/azure-cosmos-db-partition-sets/
- What is watermark in Azure Stream Analytics?
https://azurede.com/2020/07/11/time-concepts-in-azure-stream-analytics/
- What are some optimization best practices for Azure Stream Analytics?
https://azurede.com/2020/07/12/optimizing-azure-stream-analytics-output-to-azure-sql-database/ https://azurede.com/2020/07/13/azure-stream-analytics-sql-azure-output-and-in-memory-tables/
- What are streaming units?
https://azurede.com/2020/07/15/azure-synapse-analytics-streaming-units/
- Can you call an Azure Function from Azure Stream Analytics?
https://azurede.com/2020/07/18/run-azure-functions-from-azure-stream-analytics/
- What is Azure Synapse Link?
https://azurede.com/2021/03/07/azure-synapse-link-for-cosmos-db/
- What are the machine learning features available in Azure Synapse Analytics?
https://azurede.com/2021/04/05/machine-learning-features-of-azure-synapse-analytics/
- What is Azure Security Benchmark?
https://azurede.com/2021/05/03/azure-security-benchmark/
- What are the various ways to change the DWU allocation in Azure Synapse Analytics?
https://azurede.com/2021/05/15/how-to-change-the-dwu-allocation-in-azure-synapse-analytics/
- What are serverless SQL pools?
https://azurede.com/2021/03/04/serverless-sql-pool-in-azure-synapse-analytics/
- What are dedicated SQL pools?
https://azurede.com/2021/03/02/dedicated-sql-pool-in-azure-synapse-analytics/
- What are DWUs?
https://azurede.com/2021/04/27/dwus-and-cdwus-in-synapse-sql-pool/
- What are cDWUs? What is the difference between DWUs and cDWUs?
https://azurede.com/2021/04/27/dwus-and-cdwus-in-synapse-sql-pool/
- How do you estimate the costs before starting an Azure Synapse Analytics project?
https://azurede.com/2021/05/22/estimating-costs-for-azure-synapse-analytics/
- What are mapping data flows?
https://azurede.com/2021/04/10/mapping-data-flows-in-azure-data-factory/
- What is SSIS runtime?
https://azurede.com/2020/06/17/azure-data-factory-integration-runtime/
- What are the various runtime types in available in Azure Data Factory?
https://azurede.com/2020/06/18/azure-data-factory-integration-runtime-types/
- How can we monitor Azure Data Factory integration runtime?
https://azurede.com/2020/06/25/monitor-azure-data-factory-integration-runtime-using-powershell/
- What is Azure Data Factory trigger execution? What are the benefits of using trigger execution?
https://azurede.com/2020/06/16/azure-data-factory-pipeline-trigger-executions/
- What are the various data sources supported by Azure Data Factory? The current list of supported data stores can be found here:
https://docs.microsoft.com/en-us/azure/data-factory/connector-overview
- What is a sink in Azure Data Factory ?
https://azurede.com/2021/06/06/azure-data-factory-source-and-sink/
- What is a Linked Service in Azure Data Factory? Can it be parameterized?
https://azurede.com/2021/06/12/azure-data-factory-linked-services-and-datasets/
- What do you understand by Data Engineering? What are the responsibilities of a Data Engineer?
https://azurede.com/2020/04/26/responibilities-of-a-data-engineer/
Hello Thanks for your details questions.
If you have time can you answer my query pls
I have received a case study where I need to perform some ETL operation using Dataset provided by Interviewer.
I have below Inputs , My questions is Can I access this data set by using below information or Do I need anything else ?? Do I need any subscription to download this data set from the storage account or is there anyother way to access this dataset ..
Use the link to get access to the storage account:
Storage accout name: XX
Container: XX
Connection string: XX
A link to download the access key: XX
LikeLike
Hi Sushil,
Thanks for your comment.
I am assuming that the dataset that you would like to access is stored in an Azure Storage account and you want to use Azure Data Factory Pipelines for data transformation.
Based on the information you have provided; you will need access to an Azure subscription to transform the data in the storage account.
You have 2 options here:
1. Create your own subscription by going to https://azure.microsoft.com/ – Microsoft is currently offering free one year trial account with a spending limit.
OR
2. Ask to get access to the existing Azure subscription.
Once you have access to an Azure subscription, you will be able to login here: https://portal.azure.com
You will need a user id, password and some form of multi factor authentication( SMS or Authenticator app passcode) to login.
You can then create Azure Data Factory Pipeline to connect to the storage account using the connection string and the access key provided (by creating a Linked Service in Azure Data Factory.)
Hope that answers your question.
Best Regards,
Ashish
LikeLike
Hi Ashish,
Thanks for replying back.
I’m able to access the dataset from Azure Storage by using connection string & key.
The folder contain so many JSON file and I need to apply some basic cleaning & transformation logic to make it into readable format.
Do you have any link or any page where I can get info about how to process an JSON file using Pyspark ??
Thanks in advance.
LikeLike
Hi Sushil,
Glad to know that you are able to acccess the dataset now. I don’t have a post on this blog about Pyspark (which is supported within Azure Databricks) yet, but I found the links below after a quick search.
https://forums.databricks.com/questions/21594/how-to-read-multiple-files-in-a-loop-from-blob-sto.html
https://stackoverflow.com/questions/51314267/using-pyspark-how-do-i-read-multiple-json-documents-on-a-single-line-in-a-file
This is for Amazon S3 but should be similar for Azure storage: https://stackoverflow.com/questions/28685874/pyspark-how-to-read-many-json-files-multiple-records-per-file
Hope this helps.
Best Regards,
Ashish
LikeLike