{"id":15834,"date":"2018-11-27T18:08:00","date_gmt":"2018-11-28T02:08:00","guid":{"rendered":"https:\/\/devwww.3cloudsolutions.com\/post\/the-importance-of-structure-in-analytics-part-2-the-dimensional-model-and-power-bi-2\/"},"modified":"2024-01-08T11:47:00","modified_gmt":"2024-01-08T19:47:00","slug":"the-importance-of-structure-in-analytics-part-the-dimensional-model-and-power-bi","status":"publish","type":"post","link":"https:\/\/3cloudsolutions.com\/resources\/the-importance-of-structure-in-analytics-part-the-dimensional-model-and-power-bi\/","title":{"rendered":"The Importance of Structure in Analytics \u2013 Part 2: The Dimensional Model and Power BI"},"content":{"rendered":"<p>This is the second part in my running series on the importance of data structure in analytics, where I seek to demystify \u2013 and perhaps justify \u2013 some of the commonly accepted \u201cbest practices\u201d around how we, as data engineers and business intelligence developers, seek to arrange data for analytical purposes. For this entry, I\u2019ve decided to focus on a narrow (and thus defensible) slice of a broad and sometimes contentious topic: the dimensional model, or star schema, and its role in the Microsoft analytics world. Specifically, I want to take a look at why adherence to what can often seem like something of an esoteric concept really and truly can pay dividends when working with Microsoft analytics tools, and especially <a href=\"https:\/\/powerbi.microsoft.com\/en-us\/\">Power BI<\/a> \u2013 one of the most capable and popular analytics tools out there today.<\/p>\n<p>Let\u2019s first take a quick look at the star schema and talk a little bit about its history, and then from there look at how it has come to remain a best practice gold standard in Microsoft analytics, and how it can help guide your efforts when working in Power BI.<\/p>\n<p><img decoding=\"async\" style=\"width: 805px; display: block; margin: 0px auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/Structure-in-Analytics-\u2013-Part-2-1.png\" alt=\"Structure in Analytics \u2013 Part 2\" width=\"805\" \/><!--more--><\/p>\n<h3>The Dimensional Model \u2013 A (Very) Brief History<\/h3>\n<p>I realize there are some of you who are thinking, \u201cUgh, did he say \u2018schema\u2019?\u201d or who might otherwise have no idea what I\u2019m talking about when I refer to a <em>dimensional model<\/em>. For those of you who do \u2013 please, bear with me for a moment.<\/p>\n<p>In this case, both \u201cstar schema\u201d and \u201cdimensional model\u201d can be used interchangeably, and refer to a set of related data tables, both in terms of how those tables are built and what data they contain, as well as how \u2013 quite specifically \u2013 they are related to one another. We can think of it as being representative of a particular type of <em>data model<\/em>, or a way of structuring and storing our data, and one which is designed specifically with analytical use-cases in mind. This is different than data models designed to favor speed of <em>writing <\/em>data rather than reading it, such as highly normalized table structures. Dimensional models, conversely, tend to be de-normalized \u2013 or built with as few tables as possible, and in such a way that makes analysis more natural and intuitive \u2013 more on that in just a minute.<\/p>\n<p style=\"text-align: center;\">\u00a0<img decoding=\"async\" style=\"width: 600px; display: block; margin: 0px auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/Star-Schema-1.png\" alt=\"Star Schema\" width=\"600\" \/><em><span style=\"text-align: center; background-color: transparent;\">Figure 1 &#8211; The Star Schema<\/span><\/em><\/p>\n<p>The dimensional model has been around for quite some time, at least relative to the fast-paced advance of information technology as a whole, and continues to confer benefits to its adherents today. Made popular by <a href=\"https:\/\/en.wikipedia.org\/wiki\/Ralph_Kimball\">Ralph Kimball<\/a> years ago, the dimensional model was central to Kimball\u2019s prescribed approach to data warehouse architecture. Just as a simple level-set for the sake of this post, we can think of a data warehouse as a common repository where data is combined (originating from many different source systems) and conformed (made to adhere to common definitions and concepts) for the sake of helping facilitate analytics and reporting, usually at a large scale. While plenty of data warehouses are still built today using this design, some of the core benefits to its implementation were embraced by other analytics tools along the way, and those legacies live on in the here and now \u2013 particularly within the Microsoft Business Intelligence and analytics tool set.<\/p>\n<h3>The Dimensional Model and Power BI<\/h3>\n<p>Modern tools like Power BI might allow you to get away with building reports out of, and generating analytical value from, data in an unstructured, or improperly structured, data model. However, the fact that a traditional architecture was embraced early on in the evolution of analytics tools \u2013 especially by Microsoft \u2013 is key to its importance today, right alongside the features that transcend any particular technology altogether.<\/p>\n<p>For Microsoft in particular, the first incarnation of their dedicated data analytics engine, named <a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/analysis-services\/analysis-services?view=sql-server-2017\">SQL Server Analysis Services<\/a> (part of their data technology stack), was absolutely dependent on the data source being structured as a dimensional model. These models were referred to as \u201ccubes\u201d, and generated aggregate values as \u201cintersections\u201d of data. The current version of Analysis Services, known as Tabular, while being somewhat less dependent on this type of data structure, still benefits immensely from it. Power BI\u2019s analytics engine is based upon this same technology; so the same techniques that would benefit you in an enterprise setting \u2013 with something like Analysis Services or <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/analysis-services\/\">Azure Analysis Services<\/a> \u2013 apply just as much to analytics and self-service reporting using Power BI.<\/p>\n<p>In addition to being able to consume data from existing structured data sources, Power BI allows users to build their own data model right there within the tool itself, offering a number of useful features, such as being able to draw from and analyze many different sources of data at the same time, adding custom repeatable logic using <a href=\"https:\/\/docs.microsoft.com\/en-us\/dax\/data-analysis-expressions-dax-reference\">DAX<\/a> (Microsoft\u2019s Data Analytics Expression language) for translating business and analytical requirements into results, and to have control over the end-user experience. Reports and analyses should be performant and intuitive \u2013 and this all begins in the data model itself.<\/p>\n<h3>What It Is And Why It Matters<\/h3>\n<p>I\u2019ll be perfectly honest with you: there\u2019s a lot that can go into building a proper dimensional model, and sometimes familiarity \u2013 or knowing what to do \u2013 is contingent solely on good old exposure and time. But understanding how to get the most out of your Power BI data model, even if it\u2019s just knowing that a particular type of model works best, can often be all the difference in the world between building something that \u201cjust works\u201d and something that \u201cmaybe works OK some of the time\u201d. There\u2019s unfortunately, and perhaps obviously, more to it than I can fit into a blog post \u2013 or even a series of posts \u2013 but covering some of the basics is still warranted.<\/p>\n<p>In very general terms, a dimensional model seeks to divide your data into two big groups, and thus two different types of tables in your model: facts and dimensions. It\u2019s helpful to think of a \u201cfact\u201d in this case as being some event we\u2019re interested in analyzing. This can be a line item in a sales transaction, an encounter between a physician and a patient, or an entry in a general ledger. This event, and its frequency, forms what we would call the base \u201cgranularity\u201d of a fact table \u2013 or that which would spark the creation of a new row in such a table. As for what data a fact table contains \u2013 we strive to keep it as close to being strictly numeric values as possible. Thus, it would contain the <em>quantifiable<\/em> data points that describe an event: How much was an item sold for? How many items were sold? What was the amount posted to the general ledger? Sometimes our event doesn\u2019t have much, or even any, implicit numeric information \u2013 such as a patient making an appointment to see their doctor \u2013 and in these cases the fact table acts as a \u201cbridge\u201d between our related dimensions.<\/p>\n<p>If fact tables contain the quantifiable data we\u2019re interested in, dimension tables contain the <em>contextual<\/em> information that we wish to analyze: Who was the client? Where did the event take place? What day did the event take place on? In a dimensional model, we would generally seek to take all of this contextual data and create a dimension table for each logical grouping \u2013 and then make each value there distinct. In other words, we would have a <em>Date <\/em>dimension with each date, month, and year on the calendar to supply the temporal context. We might also have a<em> Client<\/em> dimension with each unique client we do business with and each aspect, or attribute, of a client that is analytically useful. The dimensional values represent the context of our analyses.<\/p>\n<p>The end result for users is a logical and intuitive grouping of values: numeric values are stored in one place, while each set of related contextual values is grouped in as few thematic sets as needed. This is especially helpful for analytics tools which feature a \u201cdrag-and-drop\u201d approach to data exploration and report building \u2013 a feature central in Power BI and Microsoft\u2019s Excel. Humans inherently break down events this way logically. We take some metric and then ask, \u201cWhere was this metric impacted?\u201d \u201cWho or what impacted it?\u201d And so on. This factor alone is no small part as to why tools, particularly those aimed at business users, were built around this approach \u2013 and that\u2019s something that can\u2019t be said enough when it comes to some of the less obvious benefits to embracing a dimensional model as the source for analytical initiatives.<\/p>\n<p>Power BI data models get loaded into memory which, for those not particularly interested in knowing the intricacies of modern computing hardware, generally means they can crunch numbers very quickly \u2013 no matter how your data is structured. However \u2013 and this may especially resonate with those of you who came here because you saw \u201cPower BI\u201d in the title \u2013 it\u2019s still easy to end up with a data model that isn\u2019t fast at all. Even a Ferrari can only go as fast as the cars in front of it when there\u2019s a traffic jam \u2013 and having the right data model design can often be thought of as the best defense against traffic jams \u2013 in short, because the tool was designed with just such a model in mind. Ninety-nine times out of 100, Power BI is simply most performant when the data it\u2019s using comes from a dimensional design.<\/p>\n<p>Lastly, there are analytical languages like the earlier-mentioned DAX. Just as the actual engine a tool uses to process data is designed with certain prerequisites in mind, analytical languages work best with certain formats. DAX, used in Power BI, is often described as \u201celegant\u201d because it can do in a single line of code what would maybe require hundreds of lines of SQL (or a lot of head scratching and convoluted statement building with earlier languages like <a href=\"https:\/\/docs.microsoft.com\/en-us\/sql\/analysis-services\/multidimensional-models\/mdx\/mdx-query-the-basic-query?view=sql-server-2017\">MDX<\/a>, multidimensional expressions.) But this is not at all guaranteed, no matter what the sales pitch is! Trying to create a Power BI data model consisting of one immense table with nearly a hundred columns could require you to write incredibly complicated (and slow-performing) DAX queries, whereas the exact same results could be produced simply and elegantly were that data transformed into a dimensional model.<\/p>\n<h3>What To Do Next<\/h3>\n<p>There are lots of great resources out there on this subject \u2013 and certainly enough to help just about any level of experience when working on designing or modifying data models to better suit analytics and reporting needs. At 3Cloud can lend our expertise, as well as offer training, to give you that extra leg up when working with Power BI, Dimensional Models, modern analytics and more!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The dimensional model has been around for quite some time and continues to confer benefits to its adherents today. Made popular by Ralph Kimball years ago, the dimensional model was central to Kimball\u2019s proscribed approach to data warehouse architecture.<\/p>\n","protected":false},"author":21,"featured_media":14154,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[260],"tags":[305,273],"class_list":["post-15834","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai","tag-modern-bi","tag-power-bi","topics-blog"],"acf":[],"_links":{"self":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15834","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/comments?post=15834"}],"version-history":[{"count":0,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15834\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media\/14154"}],"wp:attachment":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media?parent=15834"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/categories?post=15834"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/tags?post=15834"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}