{"id":15764,"date":"2020-01-07T13:30:00","date_gmt":"2020-01-07T21:30:00","guid":{"rendered":"https:\/\/devwww.3cloudsolutions.com\/post\/deep-dive-into-linear-regression-in-power-bi-key-influencers-3\/"},"modified":"2024-01-03T15:56:58","modified_gmt":"2024-01-03T23:56:58","slug":"deep-dive-into-linear-regression-in-power-bi-key-influencers","status":"publish","type":"post","link":"https:\/\/3cloudsolutions.com\/resources\/deep-dive-into-linear-regression-in-power-bi-key-influencers\/","title":{"rendered":"Deep Dive into Linear Regression in Power BI: Key Influencers"},"content":{"rendered":"<p>I\u2019ve gotten quite a few questions over the past few months about the new <a href=\"\/blog\/exploring-power-bis-key-influencers\" rel=\" noopener\">Power BI Key Influencers visualization<\/a>. While many of my colleagues and clients really love the concept and the visualization, others are struggling to understand what exactly it is and how it works. I\u2019ve done some research over the past few weeks on the underpinnings of the Key Influencers viz and have come up with some interesting results.<\/p>\n<p><img decoding=\"async\" style=\"width: 1000px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/key-influencers-img-11.png\" alt=\"key-influencers-img-11\" width=\"1000\" \/><\/p>\n<p><!--more--><\/p>\n<p>The example that follows resulted from my work helping an education client analyze student data. In their case, they wondered what effects certain variables had on their senior cohort\u2019s Admissions Index &#8211; a measure used within higher education to help qualify prospective students. After computing a few descriptive statistics for the cohort, we began throwing variables into the Key Influencers visualization to see what kind of answers we could come up with. The client was amazed by the visualization and the fact that they now had AI at their fingertips. And then we got the inevitable question: \u201cWhat do these numbers mean?\u201d<\/p>\n<p>The visualization itself has two tabs: Key Influencers and Top Segments. These two tabs operate by way of different models. Key Influencers uses regression models, while Top Segments uses Decision Trees. To make things just a little more complex, Key Influencers also uses two types of regression, depending on the situation; Linear Regression for continuous variables, and Logistic Regression for categorical variables.<\/p>\n<p style=\"text-align: center;\"><strong>Models Used in Key Influencers Visual in Power BI<\/strong><\/p>\n<p><img decoding=\"async\" style=\"width: 1692px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/key-influencers-img-10.png\" alt=\"key-influencers-img-10\" width=\"1692\" \/><\/p>\n<p>Today we\u2019ll be sticking strictly with the Linear Regression model as our example, as it\u2019s the most straightforward to understand. (For more info on Linear Regression, see my post on <a href=\"https:\/\/www.blue-granite.com\/blog\/simple-linear-regression-in-power-bi\">Simple Linear Regression in Power BI<\/a>.)<\/p>\n<p>In a nutshell, Linear Regression works by plotting two variables \u2013 x and y, or input and output, or independent and dependent \u2013 against each other, then calculating a trend line that is the best fit for the model. Without getting into the math, the typical method is to use Ordinary Least Squares, which attempts to find the trend line with the least amount of total distance, or error, between the line and the actual data points. The output of the model is the linear equation.<\/p>\n<div><img decoding=\"async\" style=\"display: block; margin: 0px auto 10px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/key-influencers-img-1.png\" alt=\"key-influencers-img-1\" \/><\/div>\n<p>The one thing that we need to take away from this equation is b. This number is the coefficient of the input. This number describes the relationship between the input and output: b is equal to the amount that y increases for every 1 increase in x. If you recall from your early math days, this is the Rise over Run.<\/p>\n<p>In the example we\u2019re using, plotting GPA Average against the Admissions Index results in a coefficient of 26.8217. So, for every 1 point increase in GPA we\u2019ll see a 27 point increase in the index.<\/p>\n<p>Now that we understand how to get our coefficient, let\u2019s make it a little more complex and add in multiple inputs like we see in our Key Influencers visualization.<\/p>\n<div><img decoding=\"async\" style=\"margin: 0px auto 10px; display: block;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/key-influencers-img-2.png\" alt=\"key-influencers-img-2\" \/><\/div>\n<p>Mathematics aside, we now have multiple coefficients for our multiple inputs.<\/p>\n<div><img decoding=\"async\" style=\"display: block; margin-left: auto; margin-right: auto; width: 300px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/key-influencers-img-3.png\" alt=\"key-influencers-img-3\" width=\"300\" \/><\/div>\n<p>There\u2019s one problem with this. The scale of these inputs can be (and usually are) completely different. Below I\u2019ve plotted four of our dimensions as histograms to demonstrate. (<a href=\"https:\/\/www.blue-granite.com\/blog\/statistics-functions-to-utilize-in-dax-power-bi\">More on histograms and how to make them in Power BI<\/a>). Notice how each histogram has its own scale on the x axis. It wouldn\u2019t make sense to compare the coefficients from these inputs when one ranges from 0 to 5, while another ranges from 0 to 20, and yet another from 15 to 40.<\/p>\n<div><img decoding=\"async\" style=\"display: block; margin-left: auto; margin-right: auto; width: 500px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/key-influencers-img-4.png\" alt=\"key-influencers-img-4\" width=\"500\" \/><\/div>\n<p>To get around this issue, the Key Influencers visualization normalizes the dimensions using their Standard Deviations. In our example, the standard deviation of Composite ACT scores is 4.56 and the standard deviation of GPA Average is .91.<\/p>\n<p><img decoding=\"async\" style=\"display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/12\/Applying-Standard-Deviation.gif\" alt=\"Applying Standard Deviation\" \/><\/p>\n<p>Using these numbers, in combination with their respective coefficients, allows Key Influencers to report the output variable on the same scale, so we are now comparing identical values in our visualization. And we can see from our example below that, according to our model, Unexcused Absences have the greatest impact on a student\u2019s Admissions Index, followed by GPA and Credits Earned, then by Composite ACT scores.<\/p>\n<div><img decoding=\"async\" style=\"display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/key-influencers-img-6.png\" alt=\"key-influencers-img-6\" \/><\/div>\n<div><\/div>\n<p>One last thing to note: this visualization uses a proprietary version of the Linear Regression algorithm called Fast Linear. You will not be able to get an exact match between Key Influencers and a standard Linear Regression model using Excel, Python, or R. The coefficients adjusted for standard deviation should follow a similar pattern to the visualization, though.<\/p>\n<p>As we\u2019ve demonstrated, at its core, Key Influencers is a visualization of multiple linear regression. Visualizing this type of model has historically been impossible; statisticians skip the visual and report only the model output. This, in my opinion, is the most impressive attempt at visualizing these types of models that I\u2019ve seen to date. It does take a little work to interpret, but the resulting usefulness is limitless, given proper usage.<\/p>\n<p>In our example today we\u2019ve seen how we can get insight into what drives student success in education by using Multiple Linear Regression via the Key Influencers AI visualization. This simple example can help educators understand the influences behind learning metrics and home in on students that may be having issues and get them on the correct path to a good education.<\/p>\n<p>To reiterate my comments in my previous post, Linear Regression has an endless number of uses. It can offer insights into our budget trends. We can analyze the effect of marketing on sales and profits. Or it can clue a company in to how raising prices may affect a consumer\u2019s buying habits. Insurance companies can also use this technique to assess risk between customer demographics and insurance claims.<\/p>\n<p>Next time we\u2019ll discuss some of the pitfalls of Linear Regression, and by extension Key Influencers, and explore solutions to avoid them. Explore more of 3Cloud&#8217;s Machine Learning &amp; AI experience here to learn how we can help your organization maximize <a href=\"https:\/\/powerbi.microsoft.com\/en-us\/\" target=\"_blank\" rel=\"noopener\">Power BI\u2019s<\/a> expansive capabilities.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In Power BI the Key Influencers visualization uses multiple linear regression models. Explore these models, what exactly they are, and how they work.<\/p>\n","protected":false},"author":21,"featured_media":13115,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[260],"tags":[319,273],"class_list":["post-15764","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai","tag-machine-learning-ai","tag-power-bi","topics-blog"],"acf":[],"_links":{"self":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15764","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/comments?post=15764"}],"version-history":[{"count":0,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15764\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media\/13115"}],"wp:attachment":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media?parent=15764"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/categories?post=15764"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/tags?post=15764"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}