{"id":15820,"date":"2019-02-13T14:18:00","date_gmt":"2019-02-13T22:18:00","guid":{"rendered":"https:\/\/devwww.3cloudsolutions.com\/post\/dimensional-modeling-in-the-advanced-analytics-age-2\/"},"modified":"2024-01-04T08:18:26","modified_gmt":"2024-01-04T16:18:26","slug":"dimensional-modeling-in-the-advanced-analytics-age","status":"publish","type":"post","link":"https:\/\/3cloudsolutions.com\/resources\/dimensional-modeling-in-the-advanced-analytics-age\/","title":{"rendered":"Dimensional Modeling in the Advanced Analytics Age"},"content":{"rendered":"<p>Over the past 5 years, the world of dimensional modeling and Business Intelligence has experienced major and disruptive technological change. Big data, machine learning, data science, deep learning \u2013 these terms and technologies are not all new, but they are now at the forefront of discussion for data practitioners.<\/p>\n<p>We\u2019ve seen the emergence of various Business Intelligence tools claiming to have either automated or replaced the need for data modeling. There has also been an explosion, both in terms of general interest and implementation, of new data storage technologies, such as data lakes. In today\u2019s climate, data modeling can be portrayed negatively, as a relic of the past.<\/p>\n<p><img decoding=\"async\" style=\"width: 805px; display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/Dimensional-Modeling-in-the-Advanced-Analytics-Age-1.png\" alt=\"Dimensional Modeling\" width=\"805\" \/><\/p>\n<p>For Business Intelligence projects, data modeling usually means dimensional modeling \u2013 the approach was developed by <a href=\"https:\/\/en.wikipedia.org\/wiki\/Ralph_Kimball\">Ralph Kimball<\/a>; this will be the heart of our discussion. Rather than rehash the merits of this approach over any other in the database structure context, let\u2019s examine its relevancy after recent upheavals in technology. To understand current best practices, we will start with a review of dimensional modeling\u2019s evolution.<\/p>\n<p>Dimensional modeling can apply to any data practitioner \u2013 from a financial analyst who needs to create an executive dashboard in Power Pivot, to a data architect in a multinational company, or even anyone who spends more than 20% of their time manipulating data. If your job involves creating reports, dashboards, or basic forecasts for the year, month, and weekday, then dimensional modeling is relevant to you.<\/p>\n<h2>Laying an Intuitive &amp; Effective Groundwork<\/h2>\n<p>Let\u2019s go back in time and review why dimensional modeling was created in the first place, to discover if these arguments still hold true today. The initial goals were:<\/p>\n<ol>\n<li><u>Performance:<\/u> By denormalizing and simplifying the schema (fewer joins), we were able to obtain better performance, and we were able to better predict the performance of our data warehouse. It was also easier to create aggregate tables on a star or snowflake schema than on a normalized schema.<\/li>\n<\/ol>\n<ol start=\"2\">\n<li><u>Integration:<\/u> The enterprise bus matrix was designed to integrate various business processes, agnostic to the application that implements them. For instance, a conformed customer dimension allowed finance, marketing, and sales teams to have one common customer reference regardless of the source application.<\/li>\n<\/ol>\n<ol start=\"3\">\n<li><u>Extensibility:<\/u> Dimensional modeling is modular by nature; many components can and should be re-used. While there was no agile project management 20 years ago, this modular nature, in theory, helped build the data warehouse incrementally and avoid a big bang approach.<\/li>\n<\/ol>\n<ol start=\"4\">\n<li><u>Ease of understanding:<\/u> The simple structure of the database allowed a non-technical end user, (e.g. an accountant or marketing analyst) to easily query the model without wondering if a relationship was 1-n, n-n, or if there was a loop in the model.<\/li>\n<\/ol>\n<h2>Stable Groundwork Still Key<\/h2>\n<p>Fast forward 20 years and the original tenets underlying a dimensional model\u2019s use still hold true today.<\/p>\n<h3 style=\"padding-left: 30px;\">Performance<\/h3>\n<p style=\"padding-left: 30px;\">Hardware and software have improved dramatically: a multi-month project, requiring a top-of-the-line server 20 years ago, can now be prototyped, with better performance, on a decent laptop with <a href=\"https:\/\/powerbi.microsoft.com\/en-us\/\">Power BI<\/a> in less than a week. The emergence of cloud computing also allows greater access to massively parallel databases for a fraction of the price. And today\u2019s big data technologies allow BI practitioners to manipulate a quantity of data that was unfathomable when dimensional modeling was created. While performance problems have improved, end users now expect less data latency. Most users won\u2019t accept having their finance data refreshed just once a month; this was the norm 15 years ago. Daily refresh, or even multiple daily refresh, is now typical, but is only achievable if the whole reporting infrastructure is optimized, proving dimensional modeling still plays a role in performance.<\/p>\n<h3 style=\"padding-left: 30px;\">Integration<\/h3>\n<p style=\"padding-left: 30px;\">The integration capability of the bus matrix was probably one of the most important features of dimensional modeling. However, the theory often failed to deliver in practice. Rotating stakeholders with changing needs often muddied an enterprise\u2019s efforts to make use of its data. Additionally, the wait to add a new business process to the data warehouse often seemed too long \u2013 businesses often opted for quicker departmental solutions.<\/p>\n<p style=\"padding-left: 30px;\">IT departments talked about data integration, while business units wanted to break departmental silos; though everyone agreed that solving these organizational issues was one of the most important data warehouse roles, success here was a challenge for most BI teams. Effective data modeling requires bringing the IT, marketing, and sales departments to the same table \u2013 to decide on common definitions and the skills required to create the databases, tables, and ETL processes \u2013 a capability rarely found together in BI teams of the past. Team members were typically either too technical or too functional.<\/p>\n<p style=\"padding-left: 30px;\">The creation of hybrid teams, combining data engineers, business experts, and, with an increasing frequency, data scientists, is part of today\u2019s answer to the integration challenge. These hybrid teams will take various names in an organization: BI Center of Excellence, BI Competency Center. These cross departmental teams are often derived from the more generic <a href=\"https:\/\/en.wikipedia.org\/wiki\/Community_of_practice\">Community of Practice<\/a>.<\/p>\n<p style=\"padding-left: 30px;\">Master Data Management (MDM) projects, which use a systematic data integration approach, tackle the other challenges; MDM emphasizes governance and business processes. Integration is still a valid argument in favor of dimensional modeling, but perhaps more so it is an argument for MDM.<\/p>\n<h3 style=\"padding-left: 30px;\">Extensibility<\/h3>\n<p style=\"padding-left: 30px;\">The main concepts comprising dimensional modeling are facts and dimension. Dimensions are (or should be) designed independently from their source system. The surrogate key, which identifies each member of a dimension, is independent from the source system. As a result, it can and should be re-used for different business areas. By designing each dimension (and as a result, the facts also) independently, the data warehouse is modular by design. This is still true. The advent of agile project management is only the confirmation that developing a data warehouse by manageable chunks is key to the success of a solution. The extensibility of dimensional modeling is still a relevant, key feature in favor of its use.<\/p>\n<h3 style=\"padding-left: 30px;\">Ease of Understanding<\/h3>\n<p style=\"padding-left: 30px;\">The recent dramatic evolution of technology has given rise to an exponentially complex data landscape. It includes semi-structured data, text analytics, sensor analytics, and web-related statistics. But if all of this information can\u2019t benefit its end users, it\u2019s useless to enterprise. And dimensional modeling still offers the best route to making sense of mounds of data.<\/p>\n<p style=\"padding-left: 30px;\">Data is typically consumed through reports or dashboards built from tables or graphs. A well-built dimensional model can field nearly any related end-user query, in the form of an easily digestible flat table or graph.<\/p>\n<p style=\"padding-left: 30px;\">Even seasoned report designers benefit from well-designed data models. Calculations are easier to develop, report creation is faster, and reports are more consistent from one developer to another. Dimensional models aren\u2019t just key to dashboards, reports, and simple data analysis \u2013 they also benefit data scientists.<\/p>\n<p style=\"padding-left: 30px;\">Most data scientists spend around 80% of their time wrangling, cleaning, and organizing data to obtain a tidy dataset\u00a0(<a href=\"https:\/\/www.jstatsoft.org\/article\/view\/v059i10\" target=\"_blank\" rel=\"noopener\">Wickham, 2014<\/a>): one observation per row and one variable per column. This type of data structure is extremely easy to obtain from dimensional modeling. A simple join between the relevant dimensions, aggregate the indicators, and you have a tidy tabular dataset.<\/p>\n<p style=\"padding-left: 30px;\">Cleaned, organized data ensures that data scientists \u2013 who are rare and expensive \u2013 can focus on actual data science, rather than on engineering tasks that your BI team has already completed.<\/p>\n<p>The real strength of dimensional modeling is its ability to be easily understood and used for a wide range of business problems, regardless of an end user\u2019s technical knowledge. A carefully designed model saves your analysts, report designers, and data scientists countless hours. The time saved from cleaning and organizing data allows them to focus on gaining valuable insight.<\/p>\n<p>Dimensional modeling is not dead; far from it. As the data landscape evolves toward more complexity, dimensional modeling continues to allow more people to access and use the information buried in the mountains of data generated every day.<\/p>\n<p>In other words, the question is not whether you <em>should<\/em> build a dimensional model, but <em>who will create it and when<\/em>. If you want to learn more about how 3Cloud can help with your data modeling needs, <a href=\"\/get-started\/\">contact us<\/a> today and we will be happy to answer your questions.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>For Business Intelligence projects, data modeling usually means dimensional modeling. Rather than rehash the merits of this approach over any other in the data warehousing context, let\u2019s examine its relevancy after recent upheavals in technology.<\/p>\n","protected":false},"author":21,"featured_media":14100,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[260],"tags":[305],"class_list":["post-15820","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai","tag-modern-bi","topics-blog"],"acf":[],"_links":{"self":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15820","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/comments?post=15820"}],"version-history":[{"count":0,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15820\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media\/14100"}],"wp:attachment":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media?parent=15820"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/categories?post=15820"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/tags?post=15820"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}