{"id":16043,"date":"2015-08-18T16:57:48","date_gmt":"2015-08-18T23:57:48","guid":{"rendered":"https:\/\/devwww.3cloudsolutions.com\/post\/demo-day-simplify-analysis-of-big-data-with-spark-on-azure-hdinsight-2\/"},"modified":"2023-11-29T15:13:27","modified_gmt":"2023-11-29T23:13:27","slug":"demo-day-simplify-analysis-of-big-data-with-spark-on-azure-hdinsight","status":"publish","type":"post","link":"https:\/\/3cloudsolutions.com\/resources\/demo-day-simplify-analysis-of-big-data-with-spark-on-azure-hdinsight\/","title":{"rendered":"Demo Day: Simplify Analysis of Big Data with Spark on Azure HDInsight"},"content":{"rendered":"<p>Some of the key tasks in data science involve basic exploration of new or existing data.\u00a0 Raw data is given structure, data can be joined to other datasets, features are selected for later analysis, and much more.\u00a0 Depending on the questions to which you seek answers as well as other requirements, the process repeats until you have data that is ideal for further, more advanced, analytics.<\/p>\n<p><img decoding=\"async\" style=\"width: 481px; float: right;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/ZeppelinChart.png\" alt=\"ZeppelinChart.png\" data-constrained=\"false\" \/><\/p>\n<p>With <a href=\"http:\/\/www.blue-granite.com\/blog\/first-impressions-of-apache-spark-on-azure-hdinsight\" target=\"_blank\" rel=\"noopener\">Apache Spark on Azure HDInsight<\/a>, these core tasks are made simpler with the inclusion of both the Apache Zeppelin and Jupyter notebooks.\u00a0 In this Demo Day video, I walk through basic exploration of a city&#8217;s traffic crash history using Zeppelin with both Spark DataFrames and Spark SQL.\u00a0 I discuss some of the advantages of using Zeppelin and Spark for data of any volume.\u00a0 Working with a new text file, I obtain an initial look at what features are available, see what cleansing may need to take place, and obtain a basic feel for the dataset through querying and visualization.\u00a0 At this stage, I compute summary statistics as well as develop a repeatable process that can be used later.\u00a0 While this is descriptive analysis, how can the data be prepared for other applications such as <a href=\"http:\/\/www.blue-granite.com\/blog\/3-things-you-need-to-know-about-using-azure-ml\" target=\"_blank\" rel=\"noopener\">predictive analytics<\/a>?<\/p>\n<p><!--more-->Overall, I can use the data to help bring me closer to answering my initial questions as well as prompt new questions.\u00a0 For example:<\/p>\n<ul>\n<li>Weather impacts road conditions.\u00a0 During a snow storm, am I usually safer taking a two lane road or a freeway?\u00a0 Freeways may have more accidents overall, but they also have a much higher traffic volume.\u00a0 Factoring in a road&#8217;s average daily traffic, do accidents during snow increase at similar rates for all road types&#8211;or increase at all?<\/li>\n<li>College football home games increase traffic congestion.\u00a0 Is there an increase in accidents that correlates with that congestion?\u00a0 Do accidents on game days take place along main corridors to the stadium, or are they dispersed throughout the city?<\/li>\n<\/ul>\n<p>View the video below to see how the Zeppelin notebook on a Spark on Azure HDInsight cluster can help me get answers.<\/p>\n<p><iframe loading=\"lazy\" class=\"wistia_embed\" src=\"\/\/fast.wistia.net\/embed\/iframe\/t1qqmtzbaa\" name=\"wistia_embed\" width=\"640\" height=\"360\" frameborder=\"0\" scrolling=\"no\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/p>\n<p><span id=\"hs_cos_wrapper_post_body\" class=\"hs_cos_wrapper hs_cos_wrapper_meta_field hs_cos_wrapper_type_rich_text\" data-hs-cos-type=\"rich_text\" data-hs-cos-general-type=\"meta_field\">Want to learn more about how data science in Azure can help your business?\u00a0 <a href=\"http:\/\/www.blue-granite.com\/contact-us\" target=\"_blank\" rel=\"noopener\">Contact us<\/a> for a consultation.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Early descriptive analysis helps explore existing questions and creates new ones. See how Zeppelin with Spark on Azure HDInsight can help get answers.<\/p>\n","protected":false},"author":21,"featured_media":14874,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[260],"tags":[342,343,351],"class_list":["post-16043","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai","tag-demo-day","tag-hdinsight","tag-spark","topics-blog"],"acf":[],"_links":{"self":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/16043","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/comments?post=16043"}],"version-history":[{"count":0,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/16043\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media\/14874"}],"wp:attachment":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media?parent=16043"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/categories?post=16043"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/tags?post=16043"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}