For Each activity is a Control Flow activity available in Azure Data Factory that lets user iterate through a collection and execute specific activities in a loop. To understand what is control flow, please read my previous post on Azure Data Factory control flows and data flows.
If you have worked in the data analytics space for any amount of time, you must have come across scenarios where there is a requirement to repeat certain tasks programmatically. One of the most common requirements in ETL is to be able to load multiple files from a shared drive folder. For Each activity provides this functionality within Azure Data Factory.
For Each activity is very similar in functionality to the For Each Container in SSIS.
There are three main settings that we need to configure for the For Each activity:
Sequential: This box provides the option to the user to run the for each loop in a sequential manner. If this box is checked, the activity will wait for the previous iteration to finish before starting the next iteration, otherwise, the iterations will run in parallel.
Batch Output: If the Sequential check box is not checked, then this setting lets the user provide the number of iterations to be run in parallel. Default number of parallel iterations is 20 and maximum allowed is 50.
Items: This is the most important setting for the For Each activity. This is where you must provide the items that the For Each activity will be looping over. E.g. a file name variable to load multiple files
For Each activity can be used to iterate over a single or multiple activities. For iterating over multiple activities, Microsoft recommends using separate child pipelines and using the Execute Pipeline activity in the For Each activity within the master pipeline.
For JSON code and some examples please visit the reference MS Docs article.
3 thoughts on “Azure Data Factory : For Each Activity”