{"id":16127,"date":"2013-05-13T14:00:00","date_gmt":"2013-05-13T21:00:00","guid":{"rendered":"https:\/\/devwww.3cloudsolutions.com\/post\/overview-of-data-quality-assurance-in-data-warehousing-2\/"},"modified":"2024-01-08T10:49:22","modified_gmt":"2024-01-08T18:49:22","slug":"overview-of-data-quality-assurance-in-data-warehousing","status":"publish","type":"post","link":"https:\/\/3cloudsolutions.com\/resources\/overview-of-data-quality-assurance-in-data-warehousing\/","title":{"rendered":"Overview of Data Quality Assurance in Data Warehousing"},"content":{"rendered":"<div class=\"hs-migrated-cms-post\">\n<p>As with any IT initiative, proper quality assurance processes can make or break a project.\u00a0 In data warehousing, there are a number of steps that you can take to make sure your solution is not only successful, but highly trusted and extremely stable.<\/p>\n<p><span style=\"letter-spacing: 0.02em;\">One of the high points of a data warehousing project is when the <\/span><a style=\"letter-spacing: 0.02em; background-color: #ffffff;\" href=\"https:\/\/3cloudsolutions.com\/resources\/top-five-differences-between-data-lakes-and-data-warehouses\/\">data warehouse<\/a><span style=\"letter-spacing: 0.02em;\">, the &#8216;new kid on the block&#8217;, begins to point out data issues in legacy reporting that have been lingering for years.\u00a0 To me, this is a major milestone that I always celebrate on my projects (and you should do the same).\u00a0 However, there are a number of data warehouses that struggle to reach this point.\u00a0 This is likely due to a number of reasons, and one of the major obstacles is poor quality assurance!<\/span><\/p>\n<p><strong>Nothing casts doubt on a data warehouse quicker than incorrectly reporting information.\u00a0 It is critical that data warehousing projects do everything in their power to mitigate this risk.<\/strong><\/p>\n<p>This article is intended to give an overview of some of the key concepts of quality assurance in data warehousing and business intelligence projects to ensure you don&#8217;t have to struggle with quality issues on your project.\u00a0 If you follow the framework that I lay out below, the integrity and stability of your solutions should increase significantly.<\/p>\n<p><strong>Overview of Quality Assurance in Data Warehousing<\/strong><\/p>\n<p>There are 6 types of testing that must be considered when implementing a data warehouse, as illustrated in the image below.\u00a0\u00a0The four types of testing I will spend most of my time discussing are Unit Testing, System Integration Testing, Data Validation, and User Acceptance testing.\u00a0\u00a0I will revisit each of these in a future post to fully describe each type in turn.<\/p>\n<h2>\u00a0<img decoding=\"async\" style=\"display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/qafordw-resized-600.jpg\" alt=\"Quality Assurance in Data Warehousing\" border=\"0\" \/><\/h2>\n<p><strong>Unit Testing<\/strong><\/p>\n<p>Unit Testing is the process of validating each of the constituent parts of a solution.\u00a0 Unit Testing is entirely the responsibility of the developer, and MUST be done during development.\u00a0 There is no effective way to do Unit Testing after the fact, it will only serve to introduce bugs into the data warehouse.<\/p>\n<p>In a data warehouse\/business intelligence solution, the most critical items to unit test are ETL logic, business rules and calculations implemented in the OLAP layer, KPI logic, and individual report validation.\u00a0 Depending on the solution of course there could be other important items to unit test, but these are the high risk areas.<\/p>\n<p>Ideally, Unit Testing can be automated since this form of testing is done repeatedly throughout the course of a project.\u00a0 Consideration will be given to this in future articles.<\/p>\n<p><strong>System Integration Testing<\/strong><\/p>\n<p>System Integration Testing is intended to confirm the system acts as expected once the constituent parts of the solution are put together.\u00a0 System Integration Testing is completely dependent on successfully Unit Testing your data warehousing solution first!<\/p>\n<p>System Integration Testing should accomplish two main objectives.\u00a0 First you must perform system build testing to ensure that you can successfully build and deploy into your system integration testing environment.\u00a0 Once deployed and configured, all jobs must be executed and data processed to ensure no issues arise during job execution.\u00a0 If you have multiple production jobs, you will want to run them all under real world circumstances if possible.<\/p>\n<p>Adopting System Integration Testing into your data warehouse development cycle is a giant step forward, assuming\u00a0you haven&#8217;t yet adopted this form of testing.\u00a0 If you have\u00a0been using these techniques and are\u00a0ready to take it a step further, consider creating scripts to stage specific cases in your test data.\u00a0\u00a0These cases should stage tests to\u00a0check\u00a0the handling of specific circumstances such\u00a0as bounds testing, calculation logic, condition processing and any other core data processing logic to really put your system through its paces.<\/p>\n<p><strong>Data Validation\u00a0<\/strong><\/p>\n<p>Data Validation is the process of testing the data within a data warehouse.\u00a0 A common way to perform this test is by using an ad hoc query tool (Excel) to retrieve data in a format similar to existing operational reports.\u00a0 Data that should be validated includes dimension member completeness, base measure accuracy, and business calculations.<\/p>\n<p>If the data ties between the data warehouse and the operational report then the data is valid (unless of course the original report is flawed).\u00a0 Once a number of reports have been validated, then likely you can rest assured that the data within the warehouse is correct.<\/p>\n<p>It is imperative to note that data validation should be performed by a business representative!\u00a0It is marginal at best to have the data warehousing team perform data validation on the data within the warehouse.\u00a0 The data warehouse team can detect some data issues, but the individuals who know the data best are integral to successfully validating the data within the system.<\/p>\n<p>If you are unable to get commitment from a business representative, then likely you don&#8217;t have proper backing from the correct stakeholder for your data warehouse project.\u00a0 This is a huge red flag that will need to be addressed.\u00a0 Data warehouses without appropriate stakeholder backing and proper business representation are at high risk of failure, and will continue to struggle throughout the course of the project.<\/p>\n<p><strong>User Acceptance Testing<\/strong><\/p>\n<p>The objective of User Acceptance Testing is to ensure two things.\u00a0 First, that the data that is being provided to the end user is what is expected.\u00a0 Second, that the tools provided to the end user meet their expectations.<\/p>\n<p>Ideally if there is a problem with the scope of data being provided, it is best to discover potential issues early on.\u00a0 It is critical to reconcile end user expectations with the scope of the project at the beginning to reduce the risk of having to rearchitect the data model (and everything built on top of it)\u00a0once you&#8217;re ready to promote the data warehouse to production.<\/p>\n<p>Tool validation is typically much more flexible if there are issues discovered during user acceptance testing.\u00a0 In fact, I&#8217;ve found it helpful to conduct rapid development sessions where you construct dashboards or reports on the fly with end users.\u00a0 This is a great way to give them exactly what they want, provide the data warehouse team with immediate feedback, and greatly enhance the chances of successfully implementing the end user tools.<\/p>\n<p><strong>Performance Testing<\/strong><\/p>\n<p>Performance Testing is a very complex topic that I won&#8217;t spend much time on here.\u00a0 Suffice to say that properly validating the performance of your solution under real world conditions is important to satisfying user expectations.\u00a0 There are a number of factors to consider including data architecture, hardware configuration, system scalability, query complexity, concurrent users, etc.\u00a0 Look for discussions on these topics to come in future articles.<\/p>\n<p><strong>Regression Testing<\/strong><\/p>\n<p>A Regression Test is the process of retesting functionality to ensure that future development has not broken anything that was previously known to work.\u00a0 Each of the\u00a0different testing categories defined\u00a0above are\u00a0subject to\u00a0Regression Testing.<\/p>\n<p>Since\u00a0techniques for handling Regression Testing are different for each of the quality assurance categories noted, this topic will be integrated into the future articles about each respective quality assurance category.<\/p>\n<p><strong>Conclusion<\/strong><\/p>\n<p>The testing framework that I&#8217;ve laid out above has been critical to the success of the projects that I&#8217;ve been involved with.\u00a0\u00a0I&#8217;ve gotten feedback from a number of clients that\u00a0the data warehouse efforts that I&#8217;ve lead have been extremely stable, requiring minimal effort to keep them running.<\/p>\n<p>In fact, one of the technical\u00a0leads that I recently handed over a project to claimed that the solution was boring to maintain because nothing ever goes wrong!\u00a0 As far as criticism goes, that&#8217;s about the best feedback you can hope for.<\/p>\n<p>Its my hope that you will be able to take the concepts that I&#8217;ve presented above and use them on your implementations.\u00a0 If you do use any of these techniques, please let me know. \u00a0I&#8217;d love to hear about your experiences.\u00a0 And, of course I&#8217;m always open to feedback, please leave a comment!<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Introduction to a framework for implementing highly effective quality assurance on data warehouse projects.<\/p>\n","protected":false},"author":21,"featured_media":15594,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[297],"tags":[304],"class_list":["post-16127","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-platform","tag-modern-data-platform","topics-blog"],"acf":[],"_links":{"self":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/16127","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/comments?post=16127"}],"version-history":[{"count":0,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/16127\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media\/15594"}],"wp:attachment":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media?parent=16127"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/categories?post=16127"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/tags?post=16127"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}