Sometimes during data load, there are scenarios where we would like to delete the existing files or folder and start with a clean slate. This will most likely be applicable to be a temporary storage area (such as Staging Area or Landing Zone), where data needs to be stored before it can be copied the final table (usually a Data Warehouse table). Once the data has been successfully copied to the final table, the data stored in the temporary storage area becomes redundant and is no longer needed. Data Factory has an activity to do just that, and it is aptly named the Delete Activity.
The Delete Activity can be found under General section in Data Factory Studio UI :
The Delete Activity can be used in the following three ways depending on the use case:
- Delete a specific file or the entire folder: As a pre-requisite we need to create a Linked Service to the data store where the files (or the folder) are stored and provide the Linked service name in the Delete Activity settings. There is an added benefit of using the Delete Activity, it allows you to log the list of deleted files or folder names in a CSV file. We can provide the folder path of the CSV file which will be populated with the names of the deleted file.
- Under the Source settings, there is an option to parameterize the items that we would like to delete. This option provides an easy and flexible way to delete files dynamically.
- Delete Activity can be used in combination with the Dataset settings, to delete files which were created within a certain time window (OR before/after a specific Date/time) This setting can be found under the Connection tab in the Dataset settings. The Start Time and End Time values can also be parameterized. We can use this method to delete only the expired files e.g., the files that were modified at least 30 days ago.