Azure Data Factory: Load Filename list into an SQL Table

In the previous post, we learnt about the Get Metadata activity and the various metadata types that can be extracted using the activity. One of the most common use cases of the Get Metadata activity is to extract a list of files from a storage folder. This is a very common scenario while loading data into a Data Warehouse.

In this example, we will be using SQL to parse the output from the Get Metadata activity, extract a list of Filenames (and the corresponding Filetypes) from a folder and load these into an SQL table named JSON_Output.

 The output from the Get Metadata activity is generated in JSON format, in-order- to run the SQL statements to parse the JSON output, we will use a Stored Procedure activity.

Let’s look at the steps:

Step 1: Create a table in the database using the following code:

CREATE TABLE dbo.JSON_Output
( 
Filelist [NVARCHAR] (max),
FileName [NVARCHAR] (max),
FileType [NVARCHAR] (100)
)

Step 2: Create a Stored Procedure as shown below. The Stored Proc takes an input variable @JSON, which we are going to pass from the data factory pipeline in the next steps.

CREATE PROC dbo.Process_JsonOutput @JSON [NVARCHAR] (max)
AS 
BEGIN
INSERT INTO dbo.JSON_Output
(
Filelist, 
FileName,
FileType
)

SELECT  Filelist
 	,filename
	,filetype
FROM OPENJSON(@JSON)
                                WITH (
                                Filelist NVARCHAR(MAX) '$' AS JSON
                                        
                                ) J1

                            OUTER APPLY OPENJSON (J1.Filelist)
                                WITH(
                                        filename NVARCHAR(MAX) '$.name' 
                                        ,filetype VARCHAR(500) '$.type'

                                )J2

END

Step 3: Link the output of the Get Metadata Activity to the stored Procedure activity. Configure the settings for the Linked Service and select the stored proc that we created in Step 2.

Azure Data Factory: Configure Stored Procedure Activity

Step 4: The last and the most important step is to pass the JSON output from the Get Metadata activity as an input parameter to the stored proc that we created in Step 2 above. To do this, we will create a new parameter of data type String named “Json” under the Stored Procedure activity settings.

Use the following code as the parameter value.

@{activity(‘Get Metadata1’).output.childItems}

Note: I have used Get Metadata1 as an example, replace ‘Get Metadata1’ with the actual name of the Get Metadata activity in your pipeline.

Azure Data Factory: Stored Proc Activity Input Parameter

2 thoughts on “Azure Data Factory: Load Filename list into an SQL Table

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: