The most convenient way to move data to the cloud for analytics use cases, is to create a Data Lake using ADLS. Azure Data Factory (or Synapse Analytics) Pipelines can then be used to transform this data for further analysis. This can be done by creating a Linked Service. We have discussed about linked services and Data Factory Pipelines in previous post.
When we create a linked service, we are creating an authenticated connection between Azure Data Factory and ADLS. In today’s post, let’s have a look at various options to configure access for Data Factory using the ADLS Gen2 connector.
There are four authentication types available for the ADLS Gen2 connector:
1. Account Key: This method requires two primary properties to be configured.
a. The first one is the URL to the ADLS Gen2 container, this can be found in the properties of the ADLS storage account in the Azure Portal.
b. A storage account access key. This is available under Settings-> Access Keys on the storage account page in the Azure Portal. Microsoft recommends storing the access keys in the Key Vault for secure access.
2. Service Principal Authentication: In this method, Azure Data Factory needs to be registered as an application in the Azure Active Directory (Azure AD). Instructions about the process can be found here. Once registered the following information is required to configure Service Principal authentication:
Also, proper permissions need to be provided to the service principal to enable access to the ADLS.
3. System Assigned Managed Identity: System assigned managed Identity is a unique Id for the Data Factory Instance which is created by default when the Data Factory is provisioned. The Azure Data Factory Managed Identity needs to be granted appropriate permission in ADLS according to the type of access/role to ADLS path as required.
|Linked Service Type||Role|
|Source||Storage Blob Data Reader|
|Sink||Storage Blob Data Contributor|
4. User Assigned Managed Identity: This method is similar to the System-assigned Managed Identity method with some variations such as:
a. A Data Factory can be assigned multiple User-Assigned Managed Identities instead of just one System-Assigned Managed Identity
b. Credentials must be created for each User-Assigned Managed Identity to be used for authentication while creating Linked Services