{"id":10301,"date":"2020-08-18T00:00:00","date_gmt":"2020-08-18T05:00:00","guid":{"rendered":"https:\/\/threecloud.wpengine.com\/post\/getting-started-with-azure-databricks-2\/"},"modified":"2022-11-30T09:24:38","modified_gmt":"2022-11-30T15:24:38","slug":"getting-started-with-azure-databricks","status":"publish","type":"post","link":"https:\/\/3cloudsolutions.com\/resources\/getting-started-with-azure-databricks\/","title":{"rendered":"Getting Started with Azure Databricks"},"content":{"rendered":"<p>As part of our ongoing series on Azure Databricks, I\u2019ll walk you through <strong>getting started by creating your own Databricks Service and Databricks cluster.<\/strong> First off, it\u2019s important to know that Databricks is not available with an Azure free subscription, you must have an Azure pay as you go account. However, there is a free 14-day premium trial available.<\/p>\n<p>My video included below is a demo that will take you <strong>step by step in creating a Databricks Service and cluster.<\/strong><\/p>\n<ul>\n<li>To get started, you\u2019ll need to log into the Azure portal and select the plus (+) to create a new resource. You can find Databricks in the list in the analytics link or by doing a search.<\/li>\n<li>Once selected, the Azure Databricks Service page will open. You\u2019ll create a new resource and enter the name for your Databricks workspace. Then select the location and pricing tier and click create.<\/li>\n<li>Once the deployment is complete, we can launch a Databricks workspace by clicking on the go to resource button.<\/li>\n<li>Databricks uses Azure Active Directory for authentication. Once you\u2019ve signed in, the Databricks workspace will open and the next step is to create a cluster.<\/li>\n<li>Select new cluster on the launch page or click the cluster icon on the left side menu.\n<ul>\n<li>This will open the cluster manager page. Select create cluster, enter a name, and select the cluster mode.\n<ul>\n<li>High concurrency is optimized for concurrent workloads with SQL, Python and R; it does not support Scala.<\/li>\n<li>Standard mode is recommended for single clusters and supports all languages.<\/li>\n<\/ul>\n<\/li>\n<li>Next, select the Databricks runtime version. The pull-down lists all the supported versions and beta versions currently available. There are also 2 machine learning variants available \u2013 standard or graphical processing unit.<\/li>\n<li>Auto pilot options include enable auto scaling which toggles between a variable or static number of workers.<\/li>\n<li>The auto terminate timebox allows you to select a specific period and it will shut down the cluster after that time has elapsed with no jobs running.<\/li>\n<li>The next field is where we\u2019ll select the types of workers for our cluster, either a fixed number or a minimum and maximum, depending on if you enabled auto scaling.\n<ul>\n<li>If you choose a fixed sized cluster, Databricks will always use that number of workers.<\/li>\n<li>If you provide a range, Databricks will pick the appropriate number of workers required for the job. The system will warn you if the account doesn\u2019t have enough CPU\u2019s for the level of workers selected based on validation of processing capabilities. If you get a warning, you\u2019ll have to lower the max level of workers.<\/li>\n<\/ul>\n<\/li>\n<li>Now, we need to choose the kind of workers from the dropdown. The bigger you go, the faster the speed but also will be more expensive. In my demo, I select a lightweight general-purpose worker with a DBU (Databricks Unit) of .75. This is the unit of processing capability per hour and prices range from 7 cents to 55 cents per DBU.<\/li>\n<li>I\u2019m also going to use the same driver type as the worker although I could use a beefier driver. The driver node runs the main functions and executes the parallel operations on the worker nodes.<\/li>\n<li>Now our information is all selected and we click the create cluster button.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><strong>We now have a Databricks workspace with a running cluster that we\u2019ll use in our Databricks notebook development.<\/strong> That is how how easy it is to get started with Databricks.<\/p>\n<div class=\"hs-embed-wrapper\" style=\"position: relative; overflow: hidden; width: 100%; height: auto; padding: 0; max-width: 560px; max-height: 315px; min-width: 256px; display: block; margin: auto;\" data-service=\"youtube\" data-responsive=\"true\">\n<div class=\"hs-embed-content-wrapper\">\n<div style=\"position: relative; overflow: hidden; max-width: 100%; padding-bottom: 56.25%; margin: 0px;\"><iframe loading=\"lazy\" src=\"https:\/\/www.youtube.com\/embed\/DusQJDFRAe4\" width=\"560\" height=\"315\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<p><strong>Need further help? Our expert team and solution offerings can help your business with any Azure product or service, including Managed Services offerings. Contact us at 888-8AZURE or\u00a0 <a href=\"mailto:sales@3cloudsolutions.com\">sales@3cloudsolutions.com<\/a>.<\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>As part of our ongoing series on Azure Databricks, I\u2019ll walk you through getting started&mldr;<\/p>\n","protected":false},"author":28,"featured_media":10823,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[260],"tags":[],"class_list":["post-10301","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai","topics-blog"],"acf":[],"_links":{"self":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/10301","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/users\/28"}],"replies":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/comments?post=10301"}],"version-history":[{"count":0,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/10301\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media\/10823"}],"wp:attachment":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media?parent=10301"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/categories?post=10301"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/tags?post=10301"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}