{"id":10203,"date":"2018-11-06T00:00:00","date_gmt":"2018-11-06T06:00:00","guid":{"rendered":"https:\/\/threecloud.wpengine.com\/post\/azure-data-factory-data-flow-elements\/"},"modified":"2022-11-30T09:12:01","modified_gmt":"2022-11-30T15:12:01","slug":"azure-data-factory-data-flow-elements","status":"publish","type":"post","link":"https:\/\/3cloudsolutions.com\/resources\/azure-data-factory-data-flow-elements\/","title":{"rendered":"Azure Data Factory &#8211; Data Flow"},"content":{"rendered":"<p>I\u2019m excited to announce that Azure Data Factory Data Flow is now in public preview and I\u2019ll give you a look at it here. Data Flow is a new feature of Azure Data Factory (ADF) that allows you to develop graphical data transformation logic that can be executed as activities within ADF pipelines.<\/p>\n<p>The intent of ADF Data Flows is to provide a fully visual experience with no coding required. Your Data Flow will execute on your own Azure Databricks cluster for scaled out data processing using Spark. ADF handles all the code translation, spark optimization and execution of transformation in Data Flows; it can handle massive amounts of data in very rapid succession.<\/p>\n<p>In the current public preview, the Data Flow activities available are:<\/p>\n<ul>\n<li>Joins \u2013 where you can join data from 2 streams based on a condition<\/li>\n<li>Conditional Splits \u2013 allow you to route data to different streams based on conditions<\/li>\n<li>Union \u2013 collecting data from multiple data streams<\/li>\n<li>Lookups \u2013 looking up data from another stream<\/li>\n<li>Derived Columns \u2013 create new columns based on existing ones<\/li>\n<li>Aggregates \u2013 calculating aggregations on the stream<\/li>\n<li>Surrogate Keys \u2013 this will add a surrogate key column to output streams from a specific value<\/li>\n<li>Exists \u2013 check to see if data exists in another stream<\/li>\n<li>Select \u2013 choose columns to flow into the next stream that you\u2019re running<\/li>\n<li>Filter \u2013 you can filter streams based on a condition<\/li>\n<li>Sort \u2013 order data in the stream based on columns<\/li>\n<\/ul>\n<p>To get started with Data Flow, you\u2019ll need to sign up for the Preview by emailing <a href=\"mailto:adfdataflowext@microsoft.com\">adfdataflowext@microsoft.com<\/a> with your ID from the subscription you want to do your development in. You\u2019ll receive a reply when it\u2019s been added and then you\u2019ll be able to go in and add new Data Flow activities.<\/p>\n<p>At this point, when you go in and create a Data Factory, you\u2019ll now have 3 options: Version 1, Version 2 and Version 2 with Data Flow.<\/p>\n<p>Next, go to aka.ms\/adfdataflowdocs and this will give you all the documentation you need for building your first Data Flows, as well as work and play around with some samples already built. You can then create your own Data Flows and add a Data Flow activity to your pipeline to execute and test your own Data Flow in debug mode in the pipeline. Or you can use Trigger Now in the pipeline to test your Data Flow from a pipeline activity.<\/p>\n<p>Ultimately, you can operationalize your Data Flow by scheduling and monitoring your Data Factory pipeline that is executing the Data Flow activity.<\/p>\n<p>With Data Flow we have the data orchestration and transformation piece we\u2019ve been missing. It gives us a complete picture for the ETL\/ELT scenarios that we want to do in the cloud or hybrid environments, your on prem to cloud or cloud to cloud.<\/p>\n<p><iframe loading=\"lazy\" src=\"https:\/\/www.youtube.com\/embed\/ZYFVXsQimMo\" width=\"560\" height=\"315\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\"><\/iframe><br \/>\nWith Data Flow, Azure Data Factory has become the true cloud replacement for SSIS and this should be in GA by year\u2019s end. It is well designed and has some neat features. I like the new way that you can set up your expressions which works better than SSIS in my opinion.<\/p>\n<p>Need further help? Our expert team and solution offerings can help your business with any Azure product or service, including Managed Services offerings. Contact us at 888-8AZURE or\u00a0 <a href=\"mailto:sales@3cloudsolutions.com\">sales@3cloudsolutions.com<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I\u2019m excited to announce that Azure Data Factory Data Flow is now in public preview&mldr;<\/p>\n","protected":false},"author":21,"featured_media":9428,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[260],"tags":[],"class_list":["post-10203","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai","topics-blog"],"acf":[],"_links":{"self":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/10203","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/comments?post=10203"}],"version-history":[{"count":0,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/10203\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media\/9428"}],"wp:attachment":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media?parent=10203"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/categories?post=10203"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/tags?post=10203"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}