{"id":16105,"date":"2014-07-09T19:44:00","date_gmt":"2014-07-10T02:44:00","guid":{"rendered":"https:\/\/devwww.3cloudsolutions.com\/post\/on-big-data-the-three-vs-are-only-one-third-of-the-story-2\/"},"modified":"2024-04-17T08:32:54","modified_gmt":"2024-04-17T15:32:54","slug":"on-big-data-the-three-vs-are-only-one-third-of-the-story","status":"publish","type":"post","link":"https:\/\/3cloudsolutions.com\/resources\/on-big-data-the-three-vs-are-only-one-third-of-the-story\/","title":{"rendered":"On Big Data, the Three V\u2019s are Only One-third of the Story"},"content":{"rendered":"<div class=\"hs-migrated-cms-post\">\n<p>We all know that the textbook definition of \u201cBig Data\u201d are data sets that fit one or more of the three \u201cV\u201d attributes: \u00a0High Volume, High Velocity, and High Variety.<img decoding=\"async\" id=\"img-1404938227333\" class=\"alignRight\" style=\"background-color: transparent; margin: 0px 0px 10px 10px; float: right;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/incompletebrain.png\" alt=\"IncompleteBrain\" border=\"0\" \/><\/p>\n<p><!--more--><\/p>\n<p>This simple definition of Big Data is what most vendors teach us in their product literature (such as <a title=\"this definition of Big Data on the SAS Institute's web site\" href=\"http:\/\/www.sas.com\/en_us\/insights\/big-data\/what-is-big-data.html\" target=\"_blank\" rel=\"noopener\">this definition of Big Data on the SAS Institute&#8217;s web site<\/a>). \u00a0It&#8217;s also the definition included by many authors who write books and articles on the topic.<\/p>\n<p>What most don&#8217;t know is that this isn&#8217;t really the way Gartner originally defined the term. Not entirely, anyway. It&#8217;s the first nine words. The first idea in a set of three related ideas that were meant to be taken as a set.<\/p>\n<h2>What\u2019s a \u201cBig Data Platform\u201d?<\/h2>\n<p>But most know only three Vs. And if the 3 Vs define Big Data, then a data platform is a \u201cBig Data Platform\u201d if it can\u2014somehow\u2014process data that has one or more of these &#8220;V attributes&#8221;. By this definition, many kinds of data platforms are \u201cBig Data Platforms\u201d. Of course platforms based on Hadoop are accepted as \u201cBig Data Platforms\u201d, because in the mind of most people, Hadoop and \u201cBig Data\u201d are almost synonymous (actually, they&#8217;re not). But some conventinoal data processing platforms have adopted &#8220;Big Data&#8221; in their brand message as well.<\/p>\n<p>Marketing ideas follow hype more often than they create it. Many kinds of data processing platforms (and even front-end data visualization tools) have co-opted the \u201cBig Data\u201d label. And who could blame them? If a traditional data platform company has a platform that can process \u201cHigh Volume\u201d, or can handle data streaming at \u201cHigh Velocity\u201d, why not describe it with the term \u201cBig Data\u201d to generate more interest in it? Is that wrong?<\/p>\n<h3>Big Data wasn\u2019t intended to describe a product<\/h3>\n<p>To be a relational database management system (RDBMS), a platform needs\u2014at least\u2014to comply with <a title=\"E.F. Codd\u2019s 12 rules\" href=\"http:\/\/en.wikipedia.org\/wiki\/Codd's_12_rules\" target=\"_blank\" rel=\"noopener\">E.F. Codd\u2019s 12 rules<\/a>, which define the relational database management system, and are quite specific. Because the definition is precise at an engineering level, if a vendor calls a system an \u201cRDBMS\u201d when it\u2019s not, it\u2019s plain as day to see that the system is being over-sold.<\/p>\n<p>Conversely, Big Data was a term coined (according to my reading of it) by Gartner to describe a concept, not as a label to be applied to a product category in the same way E.F. Codd\u2019s rules do. \u00a0As a result, it&#8217;s much harder to measure products and architectures against the much less precise&#8211;and not engineering based&#8211;&#8220;Big Data&#8221; definition.<\/p>\n<h3>So what is \u201cBig Data\u201d then?<\/h3>\n<p>Colleagues and clients are frustrated with what they perceive as the uselessness of the term \u201cBig Data\u201d today. But I suspect that frustration is partially rooted in a broad misunderstanding of the term (as well as its over-use to hype products and services).<\/p>\n<p>If we really look at what the term \u201cBig Data\u201d means, it becomes a much more useful tool in identifying and planning next-generation data analytics solutions.<\/p>\n<p>As restated in <a title=\"Gartner Research Director Svetlana Sicular\u2019s article in Forbes\" href=\"http:\/\/www.forbes.com\/sites\/gartnergroup\/2013\/03\/27\/gartners-big-data-definition-consists-of-three-parts-not-to-be-confused-with-three-vs\/\" target=\"_blank\" rel=\"noopener\">Gartner Research Director Svetlana Sicular\u2019s article in Forbes<\/a>:<\/p>\n<address><strong>\u201cBig data\u201d is high-volume, -velocity and -variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.\u201d<\/strong><\/address>\n<p>Concise? Yes. \u00a0Specific? Not really. Powerful idea? Definitely.<\/p>\n<p>The \u201creal\u201d big-data definition isn\u2019t merely \u201cThe Three V\u2019s\u201d, as many have been conditioned to believe. As Ms. Sicular reminds us, the definition has <strong>three<\/strong> parts:<\/p>\n<ol>\n<li><strong>Three V\u2019s<\/strong> \u2013 the physical characteristics of the data<\/li>\n<li><strong>Cost-effective, innovative technology<\/strong> \u2013 attributes of the platforms needed to process the data<\/li>\n<li><strong>Enhanced insight and Decision-making<\/strong> \u2013 the business outcomes that result from processing the data with the appropriate platform.<\/li>\n<\/ol>\n<p>Let&#8217;s dig into each of these parts of the true definition.<\/p>\n<h3>The Three V\u2019s<\/h3>\n<p>So, yes, the Three V\u2019s do exist! But they only serve to describe\u2014at a very high level\u2014the attributes of data sets. Nothing more. And they certainly don\u2019t tell us anything about whether a particular data platform is a good choice to process data that has characteristics like one or more of the V\u2019s.<\/p>\n<h3>Cost-effective, innovative<\/h3>\n<p>This is certainly the most subjective part of the definition. What is cost-effective? Well, it depends on what you spend now. Are you accustomed to shelling out $20 million per year for an exotic MPP platform, or is a $10,000 SMB SQL database your idea of cost-effective?<\/p>\n<p>How about innovative? I might think a solution I just designed is innovative, but you might not. If I\u2019m a vendor, then of course I think my product is innovative, and it\u2019s my job to help you see my point of view!<\/p>\n<p>I don\u2019t think this part of the \u201cBig Data\u201d definition is intended to sort systems by their level of cost-effectiveness or innovation. What it really means (to me), is that while organizations might have streaming data or high volumes of web logs, they won\u2019t blindly spend exorbitant sums of cash to buy systems to process these data. Do vendors who sell \u201cBig Data Platforms\u201d built on expensive, exotic hardware and software foundations ignore this part of the \u201cBig Data\u201d definition on purpose? You be the judge.<\/p>\n<h3>Enhanced Insight and Decision-making<\/h3>\n<p>This part of the definition is crucially important. No customer will buy or implement a system that analyzes data just for the sake of doing so. There has to be a reward. An ROI. A purpose.<\/p>\n<p><img decoding=\"async\" class=\"alignLeft\" style=\"float: left;\" src=\"\/\/cdn2.hubspot.net\/hub\/257922\/file-1207354360-jpg\/images\/infoknowledge.jpg\" alt=\"InfoKnowledge\" border=\"0\" \/>If a Big Data project is doomed to fail, it\u2019s often because this part of the Big Data definition was unknown (or worse, ignored). Technical staff may enjoy exploring new technologies because it brings them a sense of accomplishment and pleasure to learn new things. And arguably if you employ technical staff who <strong>aren\u2019t<\/strong> interested in learning new technologies you probably should revisit your hiring strategy!<\/p>\n<p>But for a Big Data initiative to get beyond the first, low-cost lab experiment it has to excite the business as well. For example, it may need to bring insights not possible with existing technologies. It needs to pickup the use cases for which RDBMS systems are too expensive to operate (extremely large scale) or cannot process (unstructured data).<\/p>\n<p>It\u2019s too bad that the perfectly valid \u201cBig Data\u201d term has been misunderstood by so many who use it. We tend to learn about new technology paradigms from product vendors. But in the case of Big Data there are few (if any) commercial products that truly address all three parts of the actual definition, so we often get only part of the story.<\/p>\n<p>But if we, as practitioners, really look at the conceptual idea in its entirety, we can begin to map out how to make our organizations (or our clients\u2019 organizations) successful in their use of Big Data Technology.<\/p>\n<h3>Where to go from here?<\/h3>\n<p>I\u2019ll leave you with some thoughts about how to examine your own Big Data opportunities, and select the right ideas, technologies, and approaches:<\/p>\n<h4>Make sure the data sources\/sets you\u2019re looking at aren\u2019t already being fully addressed by existing technologies.<\/h4>\n<p><img decoding=\"async\" id=\"img-1404938385415\" class=\"alignRight\" style=\"float: right;\" src=\"\/\/cdn2.hubspot.net\/hub\/257922\/file-1201395284-jpg\/images\/whichwaytogo.jpg\" alt=\"WhichWayToGo\" border=\"0\" \/><\/p>\n<p>The fastest way for your Big Data idea to die on the vine is to solve a problem that\u2019s already been solved. Look for opportunities to bring insights the business side wants, but they can\u2019t get any other way. Talk to the mid-level analysts and managers at your company. They all have questions that aren\u2019t being answered by your ERP or Data Warehouse system.<\/p>\n<p>Conversely, don\u2019t be drawn into proving Big Data technologies in your company by, for example, using Hadoop to implement an ERP, CRM or small-scale Data Warehouse workload. Refer to definition part #1 \u2013 the three V\u2019s. A relational data warehouse having 6TB of structured, transactional data isn\u2019t a big data problem. It\u2019s an RDBMS problem, so leave it where it belongs. Find out whether there\u2019s another 100TB of useful, historical data at <a title=\"Iron Mountain\" href=\"http:\/\/www.ironmountain.com\" target=\"_blank\" rel=\"noopener\">Iron Mountain<\/a> that nobody can query. That could be a meaningful problem to solve.<\/p>\n<h3>Don\u2019t use exotic hardware<\/h3>\n<p>Your organization probably wants enhanced insight for data it isn\u2019t able to process and query today. But that doesn\u2019t mean it will spend a fortune to do so. When weighing the status quo against gigantic capital investments, the status quo is a powerful contender.<\/p>\n<p>You need to show that you can address 3V workloads for 1\/10th the cost of traditional systems. You might not be able to do that if you buy your Big Data system from a hardware vendor who prefers to sell exotic hardware at a high gross margin. Think out of the box on this, and be ready to leave your comfort zone.<\/p>\n<h3>Business users don\u2019t use the command line<\/h3>\n<p>If you\u2019re an IT Professional experimenting with Big Data technologies like <a title=\"Hadoop\" href=\"http:\/\/hadoop.apache.org\/\" target=\"_blank\" rel=\"noopener\">Hadoop<\/a>, most likely you\u2019re using very \u201ctechie\u201d user interfaces. Technical people are used to the command line. We\u2019re used to highly complex, difficult to learn and over-engineered software. Learning some new IT platforms can be like fighting a Balrog in the Mines of Moria, and many technical people love the challenge. I promise you that the business sponsor who will fund your \u201cPhase 2\u201d project doesn\u2019t share this same feeling.<\/p>\n<p>The business needs to see insights, and see them <strong>visually<\/strong>. If the first thing you plan to demonstrate is how to run a MapReduce job in a 10 node cluster, don&#8217;t bother. The business sponsors will leave the meeting (literally or mentally), and you&#8217;ll accomplish nothing. Instead, plan a dashboard over the resulting data. Plan a real-time visualization of Twitter keyword for your industry. Use <a title=\"Excel\" href=\"http:\/\/office.microsoft.com\/en-us\/excel\/\" target=\"_blank\" rel=\"noopener\">Excel<\/a> to demonstrate accesibility. Or Power BI. Or <a title=\"Tableau\" href=\"http:\/\/www.tableausoftware.com\" target=\"_blank\" rel=\"noopener\">Tableau<\/a>. Make the output of your Big Data project satisfying and visually engaging for those whose support you\u2019ll need going forward.<\/p>\n<p style=\"text-align: center;\"><!-- end HubSpot Call-to-Action Code --><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>We all know that the textbook definition of \u201cBig Data\u201d are data sets that fit one or more of the three \u201cV\u201d attributes.<\/p>\n","protected":false},"author":21,"featured_media":15018,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[297],"tags":[304],"class_list":["post-16105","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-platform","tag-modern-data-platform","topics-blog"],"acf":[],"_links":{"self":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/16105","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/comments?post=16105"}],"version-history":[{"count":0,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/16105\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media\/15018"}],"wp:attachment":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media?parent=16105"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/categories?post=16105"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/tags?post=16105"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}