Azure Data Factory: Lookup Activity

In the previous post, about Execute Pipeline Activity, we used Lookup Activity to return the table name and then passed it on to the Execute Pipeline Activity. In today’s post, let’s discuss the Lookup Activity in more detail.

Lookup Activity is designed for use cases where we need to return a result set based on a dataset. The act of searching for and returning data is called “lookup”. This is akin to looking up someone’s contact number in the contact list on your smartphone by typing the person’s name.

There are some pre-requisites for using Lookup Activity. To perform a “lookup” we need to specify a dataset on which the “lookup” will be performed. So, we need to have a dataset created beforehand. Most data stores require a Linked Service to be created to enable access from Data Factory, so we will also need to create a Linked Service.

Lookup activity configuration options change depending on the type of dataset that we are looking up. For the sake of simplicity and keeping this post brief, we are going to use an example where we are “looking up” an SQL dataset.

Lookup Activity supports “looking up” data for SQL datasets in three ways:  

  1. Table:  We use table lookup when we simply want to read a column value from a database object such as a table or view. We must create a dataset that points to the database object and use the dataset in the Lookup activity settings as shown below:
Azure Data Factory: Lookup Activity Settings – Table

Other important setting to note here is the transaction isolation levels for SQL databases. For most scenarios, where the table is not being updated by multiple processes, leaving this as None would be alright. The other options will be discussed in detail in a future post.

The Partition Option setting enables users to specify, if we have any physical partitions setup for the table in the database. There is also an option to select Dynamic Range, which lets us specify the partition column (usually a date field), the upper bound and the lower bound for the partition. This will be discussed in more detail in a future post.

Azure Data Factory: Lookup Activity Partition Settings – Dynamic Range
  1. Query: Query lookup is very similar to the table lookup option in functionality, with the added benefit of being able to write custom SQL query which can be used to select a subset of the columns from the table.
Azure Data Factory: Lookup Activity Settings – Query
  1. Stored Procedure: If we would like to use more complex SQL to get to the lookup data. We can create a Stored Proc that accepts input parameters, does the data processing, and returns the values that we are after as an output parameter. We can then execute the Stored Proc using this option.
Azure Data Factory: Lookup Activity Settings – Stored Procedure

Lookup Activity comes with the following limitations:

  • The Lookup Activity result set is limited to 5000 rows and 4MB in size.
  • The longest timeout duration that can be set is 24 hours.
  • Query or Stored Proc (Options 2 and 3 above), options support one and only one result set.

The above limitations are there for a reason, the main purpose of the Lookup activity is to return a targeted piece of information and not to process huge datasets.

For a list of supported data sources, please check the reference link below.

Reference: https://docs.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity

One thought on “Azure Data Factory: Lookup Activity

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: