{"id":15988,"date":"2016-08-23T15:54:36","date_gmt":"2016-08-23T22:54:36","guid":{"rendered":"https:\/\/devwww.3cloudsolutions.com\/post\/deploy-cloud-analytics-solutions-with-power-bi-and-azure-2\/"},"modified":"2024-01-04T07:29:12","modified_gmt":"2024-01-04T15:29:12","slug":"deploy-cloud-analytics-solutions-with-power-bi-and-azure","status":"publish","type":"post","link":"https:\/\/3cloudsolutions.com\/resources\/deploy-cloud-analytics-solutions-with-power-bi-and-azure\/","title":{"rendered":"Deploy Cloud Analytics Solutions with Power BI and Azure"},"content":{"rendered":"<p>Power BI is able to connect to all sorts of different data sources (and more are being added every month), and as my colleague\u00a0Josh Fennessy\u00a0pointed out in a recent blog post, there&#8217;s no shortage on different design patterns for implementing a big data cloud solution with HDInsight. \u00a0This can make figuring out the best way to visualize your big data sources from HDInsight using Power BI a little\u00a0challenging. \u00a0Below, I will try\u00a0to outline some common approaches to using Power BI with HDInsight.<\/p>\n<p><!--more--><\/p>\n<h2>Refresh from\u00a0Hive Tables\/Queries<\/h2>\n<p><img decoding=\"async\" style=\"width: 479px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/diag-Hive.png\" alt=\"diag-Hive.png\" width=\"479\" \/><\/p>\n<p>If you are using a Hadoop cluster in HDInsight, one way you might use Power BI to connect to your data\u00a0is with Hive tables. \u00a0Hive provides a logical layer for you to extract the data from. \u00a0Using the <a href=\"https:\/\/www.microsoft.com\/en-us\/download\/details.aspx?id=40886\" target=\"_blank\" rel=\"noopener\">Microsoft Hive ODBC Driver<\/a>, you can import entire Hive tables into Power BI or write Hive queries to import data directly into Power BI.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/hiveodbc.png\" alt=\"hiveodbc.png\" width=\"596\" height=\"396\" \/><\/p>\n<p>Importing from Hive tables\/queries provides a familiar SQL-like feel and syntax, and you are able to use the whole\u00a0gamut of modeling capabilities within Power BI Desktop. \u00a0It is important to understand, however, that data refreshes with this method can often be slow as a Hive job will be executed on your cluster before transferring the data. \u00a0Being that this method imports data directly into the Power BI model, you may also be limited in the size of the data that you can work with, as the model will have to fit into the memory of your machine and\/or the file size limits of the Power BI service. \u00a0Your cluster will also need to be up and running at the time of a Power BI refresh so that Power BI can see\/access any Hive objects.<\/p>\n<h2>Refresh from Flat Files<\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/diag-File.png\" alt=\"diag-File.png\" width=\"593\" height=\"127\" \/><\/p>\n<p>Another way to import data from HDInsight into Power BI is by connecting to flat files in either Blob or the Data Lake Store. \u00a0In this scenario, you would use HDInsight to process your data\u00a0and write the resulting curated or aggregated data into text files (CSV, TAB, etc.)\u00a0\u00a0Unlike refreshing from Hive tables, refreshing from flat files would allow you to only run your HDInsight cluster while processing data (deleting a\u00a0cluster when it is not active can help save money in Azure consumption) as the resulting text files would still exist in Azure storage even when the cluster is deleted.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/flatfiles-1.png\" alt=\"flatfiles-1.png\" width=\"565\" height=\"371\" \/><\/p>\n<p>Data refreshes into Power BI are also likely to be faster when pulling directly from flat files versus using the Hive ODBC Driver referenced above. \u00a0You will still run into the same model size restrictions as when using Hive tables\/queries, as you are still importing data directly into a Power BI model.<\/p>\n<p>The previous\u00a0two methods for using Power BI with HDInsight required importing data into a Power BI model, thus limiting the size of your model to the constraints of your computer or the Power BI Service. The next\u00a0two methods will utilize DirectQuery in which your data will stay at the data source, thus removing the model size limitations discussed before.<\/p>\n<h2>DirectQuery with Spark<\/h2>\n<p><img decoding=\"async\" style=\"width: 483px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/diag-Spark-1.png\" alt=\"diag-Spark-1.png\" width=\"483\" \/><\/p>\n<p>The first method is connecting directly to tables in a Spark cluster. \u00a0With this method, you will process your data\u00a0in Spark and then put the resulting data into tables on the cluster. Power BI can then use\u00a0Spark SQL to interactively query the tables in Power BI&#8217;s DirectQuery mode.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/spark.png\" alt=\"spark.png\" width=\"603\" height=\"361\" \/><\/p>\n<p>Again, the key advantage here is that the data stays at the source, which removes the need to schedule Power BI refreshes and worry about Power BI model size. \u00a0A downside to this approach is that in order for Power BI to connect to a table in Spark, the cluster must be running, which likely means the cluster is on all the time (depending on when your users will be using the reports\/model). \u00a0This can sometimes be more expensive than only turning your cluster on for processing data and deleting it when you are done (as\u00a0with the flat files\u00a0method).<\/p>\n<h2>DirectQuery from Azure SQL DB<\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/diag-ASQL.png\" alt=\"diag-ASQL.png\" width=\"570\" height=\"160\" \/><\/p>\n<p>The final method\u00a0we will look at is DirectQuery using Azure SQL Database (DB). \u00a0In this approach, similar to the flat files approach, you would process your data in your cluster, but write the resulting curated and\/or aggregated data to tables in Azure SQL DB (or Azure SQL Data Warehouse).<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/azsqldb.png\" alt=\"azsqldb.png\" width=\"600\" height=\"368\" \/><\/p>\n<p>As with the flat files approach, this allows you to delete your cluster when you are done processing data (which helps control costs) as the processed data will still reside in the SQL DB when the cluster is gone. \u00a0However, unlike the flat files approach, Azure SQL DB can be highly\u00a0optimized and is well suited for DirectQuery in Power BI. \u00a0This means that you will not have to worry about data refreshes at the Power BI level. \u00a0When connected to Azure SQL DB with Power BI using DirectQuery, you also still have a lot of additional modeling capabilities included in Power BI, like creating new measures, calculated columns, and relationships between tables. \u00a0The downside to this approach is that you will need to set up and configure an Azure SQL DB service, and that service will likely need to run all the\u00a0time (depending on when your users will be using the reports\/model). \u00a0Also, while DirectQuery virtually removes any limitations on the Power BI model size, you still have to be concious of the size, performance, and cost of your Azure SQL DB instance.<\/p>\n<p>While there are likely other ways to work with HDInsight using Power BI, these are some of the more common scenarios we have run into and deployed. \u00a0Hopefully this helps in planning your architecture for visualizing your big data in the cloud.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Outline of design patterns for implementing a big data cloud solution with Microsoft Power BI and Azure HDInsight.<\/p>\n","protected":false},"author":21,"featured_media":14782,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[260],"tags":[343,305,304],"class_list":["post-15988","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai","tag-hdinsight","tag-modern-bi","tag-modern-data-platform","topics-blog"],"acf":[],"_links":{"self":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15988","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/comments?post=15988"}],"version-history":[{"count":0,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15988\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media\/14782"}],"wp:attachment":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media?parent=15988"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/categories?post=15988"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/tags?post=15988"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}