{"id":15869,"date":"2018-05-30T17:03:00","date_gmt":"2018-05-31T00:03:00","guid":{"rendered":"https:\/\/devwww.3cloudsolutions.com\/post\/statistics-functions-to-utilize-in-dax-power-bi-4\/"},"modified":"2024-06-18T09:03:35","modified_gmt":"2024-06-18T16:03:35","slug":"statistics-functions-to-utilize-in-dax-power-bi-4","status":"publish","type":"post","link":"https:\/\/3cloudsolutions.com\/resources\/statistics-functions-to-utilize-in-dax-power-bi-4\/","title":{"rendered":"Statistics Functions to Utilize in DAX &#038; Power BI"},"content":{"rendered":"<p>What is data science? At its heart, it\u2019s the ability to extract insight from data. Successful practitioners know that understanding basic statistics is the first step toward mastering this skill. In this post I\u2019ll cover some beginning statistics concepts, then explain how to calculate statistics in <a href=\"https:\/\/msdn.microsoft.com\/query-bi\/dax\/data-analysis-expressions-dax-reference\">Data Analysis Expressions<\/a> (DAX), and how to create histograms to communicate your statistical findings in Microsoft\u2019s <a href=\"https:\/\/powerbi.microsoft.com\/en-us\/get-started\/?&amp;OCID=AID631257_SEM_UGv5CoXC&amp;gclid=EAIaIQobChMIh5GEtbmc2wIVA9bACh2_OwRNEAAYASAAEgIk0fD_BwE\">Power BI<\/a>.<\/p>\n<p><img decoding=\"async\" style=\"width: 805px; display: block; margin: 0px auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/Statistics-in-Power-BI-2.png\" alt=\"Statistics in Power BI\" width=\"805\" \/><\/p>\n<p><!--more--><\/p>\n<h2>Definitions and Statistical Notation<\/h2>\n<p>Before we begin, let\u2019s cover a few mathematical terms and how to easily communicate those terms via notation.<\/p>\n<ul>\n<li><strong>x<\/strong> \u2013 A variable we\u2019re trying to find an answer for. <code><strong>[Amount]<\/strong><\/code> is our variable in these examples.<\/li>\n<li><strong>\u2211<\/strong> \u2013 This equates to <code><strong>SUM <\/strong><\/code>or <code><strong>SUMX\u00a0<\/strong><\/code>in DAX.<\/li>\n<li><strong>N<\/strong> \u2013 The Population Size of your data set. If there are 100 records in your data set your Population Size is 100. <strong>N<\/strong> = 100 in this case.<\/li>\n<li><strong>\u00b5<\/strong> \u2013 The Mean or Average of the measure in our data set. <code><strong>AVERAGE<\/strong><\/code> in DAX.<\/li>\n<li><strong>\u03c3<\/strong> \u2013 Standard Deviation. A measure of how far away from the Mean a particular data value is. The larger the Std Dev the more spread out our data set is. <code><strong>P<\/strong><\/code> in DAX.<\/li>\n<\/ul>\n<h2>Making a Histogram in Power BI<\/h2>\n<p>Histograms or Bell Curves are the most common ways to display statistics about data sets. In Power BI terms, the only real difference between these is the chart type; the Histogram uses a <strong>Bar Chart <\/strong>while the Bell Curve uses an <strong>Area Chart<\/strong>.<\/p>\n<table style=\"height: 342px;\" width=\"851\">\n<tbody>\n<tr>\n<td style=\"text-align: center; width: 430px;\"><img decoding=\"async\" style=\"background-color: transparent; width: 390px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/Histogram-1-2.png\" alt=\"Histogram 1\" width=\"390\" \/><\/td>\n<td style=\"text-align: center; width: 413px;\"><img decoding=\"async\" style=\"width: 374px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/Histogram-2-2.png\" alt=\"Histogram 2\" width=\"374\" \/><\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center; width: 430px;\"><strong>Bar Chart<\/strong><\/td>\n<td style=\"text-align: center; width: 413px;\"><strong>Area Chart<\/strong><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><span style=\"background-color: transparent;\"><br \/>\nA <\/span><strong style=\"background-color: transparent;\">Histogram<\/strong><span style=\"background-color: transparent;\"> differs slightly from a standard Bar Chart. A typical Bar Chart relates two variables; in BI speak, a <\/span><strong style=\"background-color: transparent;\">Measure<\/strong><span style=\"background-color: transparent;\"> and a <\/span><strong style=\"background-color: transparent;\">Dimension<\/strong><span style=\"background-color: transparent;\">. A Histogram, however, <\/span><em style=\"background-color: transparent;\">only visualizes a single variable<\/em><span style=\"background-color: transparent;\">. The variable on the x-axis (in this case <\/span><code><strong style=\"background-color: transparent;\">[Amount]<\/strong><\/code><span style=\"background-color: transparent;\">), and the frequency of that variable on the y-axis. To get the frequency, we just need to count the rows in the data set.<\/span><\/p>\n<p style=\"text-align: center;\"><strong><code>Row Count = COUNTROWS('MyTable')<\/code><\/strong><\/p>\n<p>We can then create our <strong>Amount<\/strong> groupings. I do this in two steps.<\/p>\n<p style=\"padding-left: 30px;\"><span style=\"background-color: transparent;\">1. Create a <\/span><strong style=\"background-color: transparent;\">New Column<\/strong><span style=\"background-color: transparent;\">.<\/span><\/p>\n<p style=\"padding-left: 60px;\"><span style=\"background-color: transparent;\"><img decoding=\"async\" style=\"width: 297px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/column.png\" alt=\"column\" width=\"297\" \/><\/span><\/p>\n<p style=\"padding-left: 60px;\"><code>Histogram Buckets = [Amount]<\/code><\/p>\n<p style=\"padding-left: 30px;\">2. Select your new Column and add a <strong>New Group.<\/strong><\/p>\n<table style=\"padding-left: 60px;\">\n<tbody style=\"padding-left: 60px;\">\n<tr style=\"padding-left: 60px;\">\n<td style=\"padding-left: 60px;\"><img decoding=\"async\" style=\"width: 147px; display: block; margin: 0px auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/new-group-1-2.png\" alt=\"new group 1\" width=\"147\" \/><\/td>\n<\/tr>\n<tr style=\"padding-left: 60px;\">\n<td style=\"padding-left: 60px;\"><img decoding=\"async\" style=\"width: 572px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/new-group-2-2.png\" alt=\"new group 2\" width=\"572\" \/><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><span style=\"background-color: transparent;\"><br \/>\nYou can then create a new <\/span><strong style=\"background-color: transparent;\">Histogram<\/strong><span style=\"background-color: transparent;\"> with the <\/span><code><strong style=\"background-color: transparent;\">[bins]<\/strong><\/code><span style=\"background-color: transparent;\"> on the Axis and <\/span><code><strong style=\"background-color: transparent;\">[Row Count]<\/strong><\/code><span style=\"background-color: transparent;\"> on the <\/span><strong style=\"background-color: transparent;\">Value<\/strong><span style=\"background-color: transparent;\">.<\/span><\/p>\n<p><img decoding=\"async\" style=\"width: 936px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/histogram-3-2.png\" alt=\"histogram 3\" width=\"936\" \/><\/p>\n<h2>Applying Statistics to Data<\/h2>\n<p>Now that we\u2019ve got our Histogram, we can apply our Statistics. For our example, assume we\u2019re looking for Outliers in our data set. An Outlier is typically defined as a data value that falls outside of 3 Standard Deviations of the Mean.<\/p>\n<p>First, we find the <strong>Mean<\/strong> of our data set.<\/p>\n<p style=\"padding-left: 60px;\"><strong><span style=\"text-decoration: underline;\">Mean:<\/span><br \/>\n<\/strong><strong style=\"background-color: transparent;\">\u00a0<img decoding=\"async\" style=\"width: 90px; margin: 0px 10px 10px 0px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/mean.png\" alt=\"mean\" width=\"90\" \/><\/strong><strong><br \/>\n<\/strong><\/p>\n<p style=\"padding-left: 60px;\"><span style=\"text-decoration: underline;\"><strong>DAX:<br \/>\n<\/strong><\/span><code>Mean =<br \/>\n<span style=\"color: #007cba;\">CALCULATE <\/span>(<br \/>\n<span style=\"color: #007cba;\">AVERAGE<\/span>(MyTable[Amount])<br \/>\n,<span style=\"color: #007cba;\">ALL<\/span>(MyTable)<br \/>\n)<\/code><\/p>\n<p>We can then apply the Mean to the Histogram. Using the formula <strong>(x &#8211; \u00b5) <\/strong>moves the center point of the curve to 0.<\/p>\n<p style=\"text-align: center;\"><code><strong style=\"text-align: center; background-color: transparent;\">Histogram Buckets = ([Amount]-[Mean])<\/strong><\/code><\/p>\n<p style=\"text-align: center;\"><code><strong style=\"text-align: center; background-color: transparent;\"><img decoding=\"async\" style=\"width: 638px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/histogram-buckets-3.png\" alt=\"histogram buckets\" width=\"638\" \/><\/strong><\/code><\/p>\n<p>Next, we need to find our <strong>Standard Deviation<\/strong> for the Population.<\/p>\n<p style=\"padding-left: 60px;\"><span style=\"text-decoration: underline;\"><strong>Standard Deviation:<\/strong><\/span><br \/>\n<img decoding=\"async\" style=\"width: 152px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/standard-deviation-2.png\" alt=\"standard deviation\" width=\"152\" \/><\/p>\n<p style=\"padding-left: 60px;\"><span style=\"text-decoration: underline;\"><strong>DAX:<br \/>\n<\/strong><\/span><code><span style=\"background-color: transparent;\">Std Dev =<br \/>\n<\/span><span style=\"background-color: transparent;\"><span style=\"color: #007cba;\">CALCULATE<\/span> <\/span><span style=\"background-color: transparent;\">(<br \/>\n<\/span><span style=\"background-color: transparent; color: #007cba;\">\u00a0 \u00a0 \u00a0STDEV.P<\/span><span style=\"background-color: transparent;\">(MyTable[Amount])<br \/>\n<\/span><span style=\"background-color: transparent;\">\u00a0 \u00a0 \u00a0,<span style=\"color: #007cba;\">ALL<\/span>(MyTable)<br \/>\n<\/span><span style=\"background-color: transparent;\">)<\/span><\/code><\/p>\n<p>We can then apply the Standard Deviation to the Histogram. Using the formula<strong> ((x &#8211; \u00b5)\/ <\/strong><strong>\u03c3) <\/strong>moves the center point of the data set to 0 and divides the values in <code><strong>[Amount] <\/strong><\/code>by the Standard Deviation converting our chart into a Normal Distribution. Apply the formula to <code><strong>[Histogram Buckets] <\/strong><\/code>and change the <strong>Bin Size<\/strong> to 0.5 (feel free to change the Bin Size to whatever makes sense in your data set).<\/p>\n<p style=\"text-align: center;\"><code><strong style=\"background-color: transparent;\">Histogram Buckets = <span style=\"color: #007cba;\">DIVIDE<\/span>(([Amount]-[Mean]),[Std Dev],0)<\/strong><\/code><\/p>\n<p style=\"text-align: center;\"><code><strong style=\"background-color: transparent;\"><img decoding=\"async\" style=\"width: 942px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/histogram-buckets-1-2.png\" alt=\"histogram buckets 1\" width=\"942\" \/><\/strong><\/code><\/p>\n<p>Now that our data has been normalized we can easily see our <strong>Outliers (bars over 3 or under -3)<\/strong>.<\/p>\n<p>For more DAX tips and tricks, be sure to check out this tutorial from BlueGranite\u2019s blog: <a href=\"\/blog\/5-useful-data-analysis-expressions-dax-functions-for-beginners\" target=\"_blank\" rel=\"noopener\">5 Useful DAX Functions for Beginners<\/a>;\u00a0and Microsoft\u2019s handy <a href=\"https:\/\/msdn.microsoft.com\/query-bi\/dax\/data-analysis-expressions-dax-reference\" target=\"_blank\" rel=\"noopener\">DAX reference<\/a>.<\/p>\n<p>Looking to master and truly own your organization\u2019s data? BlueGranite can help! <a href=\"https:\/\/www.blue-granite.com\/contact-us\">Contact us<\/a> today to learn more about our on-site <a href=\"https:\/\/www.blue-granite.com\/power-bi-training\">Power BI training<\/a>. Whatever your data requirements, we customize our analytics solution to meet your company\u2019s needs.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>What is data science? At its heart, it\u2019s the ability to extract insight from data. Successful practitioners know that understanding basic statistics is the first step toward mastering this skill. Read on to learn how to calculate statistics in DAX using Microsoft\u2019s Power BI.<\/p>\n","protected":false},"author":21,"featured_media":14358,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[260],"tags":[305,273],"class_list":["post-15869","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai","tag-modern-bi","tag-power-bi","topics-blog"],"acf":[],"_links":{"self":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15869","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/comments?post=15869"}],"version-history":[{"count":0,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15869\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media\/14358"}],"wp:attachment":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media?parent=15869"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/categories?post=15869"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/tags?post=15869"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}