We have learnt about activities and pipelines in previous post. In today’s post, we are going to discuss copy Activity. Copy Activity is the most used activity in Azure Data Factory. As the name suggests, a copy activity copies data between data stores. These data stores can be located on-prem or in the cloud.
Copy Activity is usually the first activity while designing a Data Factory pipeline. Once the data is copied, other activities can be used to further analyze and transform the data. Another important usage of copy activity is to publish the transformed data to data visualization and business intelligence tools.
Copy activity uses an integration runtime.
The type of integration runtime to be used for the copy activity depends on various factors e.g.
- If the data stores used in the copy activity are publicly accessible i.e. on the cloud, then we can use Azure Integration Runtime.
- If the data stores are located on-prem or an access-controlled network such as VPN then we must use a self-hosted integration runtime.
The source (source data store) and the sink (destination data store) also need to be associated with integration runtimes. The integration runtime types for source and sink will be discussed in a future post.
At the time of execution, the service that runs the copy activity, reads the data from the source data store, performs certain operations based on the activity configuration as well as the input and output dataset configurations and finally writes the data to the sink data store.
For a list of data stores and formats supported by the Copy activity, please see the reference below.