{"id":15832,"date":"2018-12-12T18:03:00","date_gmt":"2018-12-13T02:03:00","guid":{"rendered":"https:\/\/devwww.3cloudsolutions.com\/post\/using-azure-data-factory-v2-activities-dynamic-content-to-direct-your-files-2\/"},"modified":"2024-03-01T12:49:10","modified_gmt":"2024-03-01T20:49:10","slug":"using-azure-data-factory-v2-activities-dynamic-content-to-direct-your-files","status":"publish","type":"post","link":"https:\/\/3cloudsolutions.com\/resources\/using-azure-data-factory-v2-activities-dynamic-content-to-direct-your-files\/","title":{"rendered":"Using Azure Data Factory V2 Activities &#038; Dynamic Content to Direct Your Files"},"content":{"rendered":"<p><a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/data-factory\/\">Azure Data Factory<\/a> (ADF) V2 is a powerful data movement service ready to tackle nearly any challenge. A common task includes movement of data based upon some characteristic of the data file. Maybe our CSV files need to be placed in a separate folder, we only want to move files starting with the prefix \u201cprod\u201d, or we want to append text to a filename. By combining Azure Data Factory V2 Dynamic Content and Activities, we can build in our own logical data movement solutions. Follow our walkthrough below to discover how.<\/p>\n<div class=\"hs-embed-wrapper hs-fullwidth-embed\" style=\"position: relative; overflow: hidden; width: 100%; height: auto; padding: 0px; min-width: 256px; display: block; margin: auto;\" data-service=\"youtube\" data-responsive=\"true\">\n<div class=\"hs-embed-content-wrapper\">\n<div style=\"position: relative; overflow: hidden; max-width: 100%; padding-bottom: 56.25%; margin: 0px;\"><iframe loading=\"lazy\" style=\"position: absolute; top: 0px; left: 0px; width: 100%; height: 100%; border: none;\" src=\"https:\/\/www.youtube.com\/embed\/Jj7S9ulFY5E?feature=oembed\" width=\"480\" height=\"270\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\" data-mce-src=\"https:\/\/www.youtube.com\/embed\/Jj7S9ulFY5E?feature=oembed\" data-mce-style=\"position: absolute; top: 0px; left: 0px; width: 100%; height: 100%; border: none;\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<h2>Dynamic Content: What, Why, Where and How to Use<\/h2>\n<p><strong>What:<\/strong> Dynamic Content is an expression language that uses built-in functions to alter the behavior of activities in your pipeline. Many of the functions, like IF() and CONCAT(), are familiar to many users of Excel or SQL.<\/p>\n<p><strong>Why:<\/strong> Dynamic Content decreases the need for hard-coded solutions and makes ADF V2 Pipelines flexible and reusable.<\/p>\n<p><strong>Where:<\/strong> To access Dynamic Content, place your cursor into the file path or file name areas of Datasets.<\/p>\n<p><strong>How to use:<\/strong> Combine expressions to easily create endless dynamic pathing options. From updating filenames using CONCAT() to complicated directory structures and file pathing based upon pipeline and file names, pipeline execution time, and more.<\/p>\n<p><!--more--><\/p>\n<h3><strong>Tutorial Requirements<\/strong><\/h3>\n<p>We\u2019ll use three additional Azure Data Factory V2 tools for our use case:<\/p>\n<ol>\n<li>To copy our data from one location to another we will use the <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-factory\/copy-activity-overview\" target=\"_blank\" rel=\"noopener\">Copy Activity<\/a><\/li>\n<li>For file or directory information (like the contents of a directory), we need the <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-factory\/control-flow-get-metadata-activity\" target=\"_blank\" rel=\"noopener\">Get Metadata Activity<\/a><\/li>\n<li>To iterate through a list of files contained within a directory, we have the <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-factory\/control-flow-for-each-activity\" target=\"_blank\" rel=\"noopener\">ForEach Activity<\/a><\/li>\n<\/ol>\n<h3><strong>Tackle Complex Data Movement with Ease<\/strong><\/h3>\n<p>Now that we have some background, let\u2019s get to our use case. The Census Bureau releases new American Community Survey data annually, which contains U.S. household education, housing, and demographic information. We\u2019re going to split the individual files out by multiple criteria. Estimate files and margin files each need to be grouped together and segmented by sequence, plus some files need to be handled based upon their file extension.<\/p>\n<p>To get our data, we used the HTTP connection with the ZipDeflate option to download Alabama_All_Geographies_Not_Tracts_Block_Groups.zip, a file containing census data for the state of Alabama, to our Azure Blob Storage account. Alabama_All_Geographies_Not_Tracts_Block_Groups.zip has over 200 files that contain city and county level information along with corresponding statistics.<\/p>\n<p><img decoding=\"async\" style=\"width: 805px; display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/Azure-Blob-Storage-1.png\" alt=\"Azure Blob Storage\" width=\"805\" \/><\/p>\n<p>We want to segment the files to make it possible for another application to efficiently process entire folders of files that share the same schema. The rules for segmentation are below:<\/p>\n<ul>\n<li>If file starts with \u2018e\u2019 for estimate, place in a folder like \u2018seq\/\/e\/\u2019.<\/li>\n<li>If file starts with \u2018m\u2019 for margin of error, place in a folder like \u2018seq\/\/m\/\u2019.<\/li>\n<li>If file starts with \u2018g\u2019 for geo, place in a folder like \u2018geo\/csv\u2019 or \u2018geo\/txt\u2019 based upon the file extension.<\/li>\n<\/ul>\n<p>For example, e20165fl0001000.txt starts with \u2018e\u2019 indicating it is an estimate file (e20165fl0001000.txt). The sequence number for the file is 001 (e20165fl0001000.txt). The final location for the file should then be \u2018seq\/001\/e\/e20165fl0001000.txt\u2019.<\/p>\n<p>We handle the margin file, m20165fl0001000.txt, in a similar way. The final location should be \u2018seq\/001\/m\/m20165fl0001000.txt\u2019<\/p>\n<p>The census bureau also has geo files like g20165fl.csv and g20165fl.txt, which are not specific to a particular sequence, so we will handle them differently.<\/p>\n<p>The final location for g20165fl.csv will be \u2018geo\/csv\/g20165fl.csv\u2019. For g20165fl.txt, the location we want is\u00a0 geo\/txt\/g20165fl.txt<\/p>\n<p>To complete our goal, we will need to use Get Metadata, ForEach, and Copy activities in combination with the Dynamic Content functionality provided in ADF V2.<\/p>\n<p>We start by creating a dataset referencing our Alabama census zip file. We can put in any name we would like to use (I named mine AlabamaCensusZip) and then point to the blob storage location where we have saved our zip file. Note that when we point to the zip, it displays in the Directory portion of the file path.<\/p>\n<p><img decoding=\"async\" style=\"width: 805px; display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/file-path-1-1.png\" alt=\"file path-1\" width=\"805\" \/><\/p>\n<p>Once we have the dataset created, we can start moving the activities into our pipeline. First is the Get Metadata activity. We point the Get Metadata activity at our newly created dataset and then add an Argument and choose the \u201cChild Items\u201d option. The Child Items option reads in the file names contained within the .zip and loads the names into an array we will iterate through.\u00a0 We can see this by running Debug on the pipeline and then viewing the output of the Metadata activity.<\/p>\n<p><img decoding=\"async\" style=\"width: 805px; display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/meta-data-1.png\" alt=\"meta data\" width=\"805\" \/><\/p>\n<p>Next, we add the \u201cForEach\u201d activity to the pipeline.\u00a0 We name our activity and connect the Get Metadata activity to the ForEach activity. Once named and connected, we access the Settings tab of the ForEach activity and then reference the Get Metadata activity child items in the \u201cItems\u201d section by entering \u201c@activity(\u2018ZipMetadata\u2019).output.childItems\u201d. Now the ForEach activity is setup to iterate through the files contained within the zip file (the same items we saw as output from the Get Metadata activity).<\/p>\n<p><img decoding=\"async\" style=\"width: 805px; display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/ZipMetaData.png\" alt=\"ZipMetaData\" width=\"805\" \/><\/p>\n<p>After we update the settings, access the Activities menu inside the ForEach activity and drag in the Copy activity.<\/p>\n<p><img decoding=\"async\" style=\"width: 805px; display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/Copy-Data-1.png\" alt=\"Copy Data\" width=\"805\" \/><\/p>\n<p>In the Copy activity, create a new Source that points at the zip file location in our blob store, then we use the \u201c<strong>@item().name<\/strong>\u201d value being generated by the ForEach activity to create the reference to the individual files within the zip.<\/p>\n<p><img decoding=\"async\" style=\"width: 805px; display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/Foreach.png\" alt=\"ForEach\" width=\"805\" \/><\/p>\n<p>Our next task is determining where our files will go by using Dynamic Content. Looking back at our requirements, we need to split the individual files out by multiple criteria. Estimate files and margin files each need to be grouped together and segmented by sequence, plus some files need to be handled based upon their file extension. Let\u2019s return to our Copy activity; we select Sink and add in a new location. Then select the Connection tab and place your cursor in the Directory box. You should see \u201cAdd dynamic content\u201d appear below.<\/p>\n<p><img decoding=\"async\" style=\"width: 805px; display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/Dynamic-Content-1.png\" alt=\"Dynamic Content\" width=\"805\" \/><\/p>\n<p>Click the Add dynamic content link to bring up the editor.<\/p>\n<p><img decoding=\"async\" style=\"width: 805px; display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/Add-Dynamic-Content-1.png\" alt=\"Add Dynamic Content\" width=\"805\" \/><\/p>\n<p>Now we get to translate our requirements into code using the Dynamic Content expressions provided by ADF V2. If we look back to our requirements, we see the following:<\/p>\n<ul>\n<li>If file starts with \u2018e\u2019 for estimate, place in a folder like \u2018seq\/\/e\/\u2019.<\/li>\n<li>If file starts with \u2018m\u2019 for margin of error, place in a folder like \u2018seq\/\/m\/\u2019.<\/li>\n<li>If file starts with \u2018g\u2019 for geo, place in a folder like \u2018geo\/csv\u2019 or \u2018geo\/txt\u2019 based upon the file extension.<\/li>\n<\/ul>\n<p>Given those requirements, we first identify the starting value of the file names. In the Dynamic Content editor, we can see there are several function categories, including logical functions and string functions. To reference the names of the files within our zip, recall we are operating inside of the ForEach activity and the files can be referenced with <span style=\"background-color: #e6e7e8;\"><code>item().<\/code><\/span>. For the purposes of this example, we will use <code><span style=\"background-color: #e6e7e8;\">item().name<\/span><\/code>. Let\u2019s combine the <span style=\"background-color: #e6e7e8;\"><code>IF<\/code><\/span> function with <span style=\"background-color: #e6e7e8;\"><code>STARTSWITH, CONCAT, SUBSTRING, and item().name<\/code><\/span> to tackle the first piece of our scenario, what to do when the file starts with \u2018e\u2019.<\/p>\n<p style=\"text-align: center;\"><code><span style=\"background-color: #e6e7e8;\">@{IF(STARTSWITH(item().name,'e'),CONCAT('seq\/',SUBSTRING(item().name,9,3),'\/e\/'),'false')}<\/span><\/code><\/p>\n<p>As you can see, Dynamic Content in ADF V2 looks like what we see in the Excel formula bar or SQL editor.\u00a0 This should be comforting for new users. However, there are a few things new users should be made aware. With Dynamic Content, we wrap our entire expression within braces <code><span style=\"background-color: #e6e7e8;\">{}<\/span><\/code> and place <code><span style=\"background-color: #e6e7e8;\">@<\/span><\/code> at the beginning. Also, <span style=\"background-color: #e6e7e8;\"><code>SUBSTRING<\/code><\/span> starting position is 0.<\/p>\n<p>After validating the pipeline and then selecting Debug, we let the pipeline run and check how our Dynamic Content expression handled our files.<\/p>\n<p><img decoding=\"async\" style=\"width: 805px; display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/DynamicContent.png\" alt=\"DynamicContent\" width=\"805\" \/><\/p>\n<p>It worked! From here, we can continue to expand our Dynamic Content expression to handle the remaining items from our use case.<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"background-color: #e6e7e8;\"><code>@{IF(STARTSWITH(item().name,'e'),CONCAT('seq\/',SUBSTRING(item().name,9,3),'\/e\/'),<br \/>\n<\/code><\/span><span style=\"background-color: #e6e7e8;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 IF(STARTSWITH(item().name,&#8217;m&#8217;),CONCAT(&#8216;seq\/&#8217;,SUBSTRING(item().name,9,3),&#8217;\/m\/&#8217;),<br \/>\n<\/span><span style=\"background-color: #e6e7e8;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 IF(AND(STARTSWITH(item().name,&#8217;g&#8217;),ENDSWITH(item().name,&#8217;txt&#8217;)),&#8217;geo\/txt\/&#8217;,<br \/>\n<\/span><span style=\"background-color: #e6e7e8;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 IF(AND(STARTSWITH(item().name,&#8217;g&#8217;),ENDSWITH(item().name,&#8217;csv&#8217;)),&#8217;geo\/csv\/&#8217;,&#8217;fail&#8217;)<br \/>\n<\/span><span style=\"background-color: #e6e7e8;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0)<br \/>\n<\/span><span style=\"background-color: #e6e7e8;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0)<br \/>\n<\/span><span style=\"background-color: #e6e7e8;\">\u00a0 \u00a0 \u00a0 \u00a0 \u00a0\u00a0)<br \/>\n<\/span><span style=\"background-color: #e6e7e8;\">}<\/span><\/p>\n<p><span style=\"background-color: transparent;\">Our use case is complete. The 200+ files contained in the Alabama census zip file will now automatically move to the proper location. Because our solution isn\u2019t hard-coded, we can use the same code to ingest additional states from the Census website. Dynamic Content can even be used to create <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/data-factory\/control-flow-lookup-activity\" target=\"_blank\" rel=\"noopener\">dynamic SQL queries<\/a><\/span><span style=\"background-color: transparent;\">! When Azure Data Factory V2 Dynamic Content and Activities are combined, even complicated data movement can be easily tackled.<\/span><\/p>\n<div class=\"blog-section\">\n<div class=\"blog-post-wrapper cell-wrapper\">\n<div class=\"section post-body\">\n<p>BlueGranite helps organizations discover opportunities to evolve through technology and realize their full potential. <a href=\"\/subscribe\">Subscribe to\u00a0our\u00a0blog<\/a> to gain insights into the latest tools for data management, the modern data platform, business intelligence, and AI.<\/p>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Azure Data Factory V2 is a powerful data service ready to tackle any challenge. A common task includes movement of data based upon some characteristic of the data file. By combining Azure Data Factory V2 Dynamic Content and Activities, we can build in our own logical data movement solutions.<\/p>\n","protected":false},"author":21,"featured_media":14152,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[260],"tags":[304],"class_list":["post-15832","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai","tag-modern-data-platform","topics-blog"],"acf":[],"_links":{"self":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15832","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/comments?post=15832"}],"version-history":[{"count":0,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15832\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media\/14152"}],"wp:attachment":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media?parent=15832"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/categories?post=15832"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/tags?post=15832"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}