{"id":16107,"date":"2014-07-08T11:40:00","date_gmt":"2014-07-08T18:40:00","guid":{"rendered":"https:\/\/devwww.3cloudsolutions.com\/post\/hadoops-momentum-is-unstoppable-2\/"},"modified":"2024-01-04T10:33:55","modified_gmt":"2024-01-04T18:33:55","slug":"hadoops-momentum-is-unstoppable","status":"publish","type":"post","link":"https:\/\/3cloudsolutions.com\/resources\/hadoops-momentum-is-unstoppable\/","title":{"rendered":"Hadoop&#8217;s Momentum is Unstoppable"},"content":{"rendered":"<div class=\"hs-migrated-cms-post\">\n<p><img loading=\"lazy\" decoding=\"async\" id=\"img-1404819680859\" class=\"alignLeft\" style=\"margin: 0px 0px 0px 10px; float: right;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/pastpresentfuture.jpg\" alt=\"Past Present Future\" width=\"200\" height=\"150\" border=\"0\" \/>Every new technology has a cycle of introduction, awareness, evaluation and&#8211;if the technology is relevant&#8211;eventual mainstream acceptance and adoption. \u00a0Gartner has even coined a term for it: <a title=\"the hype cycle\" href=\"http:\/\/en.wikipedia.org\/wiki\/Hype_cycle\" target=\"_blank\" rel=\"noopener\">the hype cycle<\/a>.<\/p>\n<p><!--more--><\/p>\n<h2>Hadoop&#8217;s Meteoric Rise<\/h2>\n<p>Hadoop technology is working through its acceptance cycle too. What&#8217;s been stunning, though, is to watch how Hadoop technologies have moved so quickly through the cycle. \u00a0Its progress in finding effective use cases and proving its worth has been nothing short of stunning. Its momentum seems&#8211;as Forrester calls it in its <a title=\"Q1\/2014 Forrester Wave analysis\" href=\"http:\/\/info.hortonworks.com\/ForresterWave_Hadoop.html\" target=\"_blank\" rel=\"noopener\">Q1\/2014 Forrester Wave analysis (downloadable from HortonWorks)<\/a>:&#8221;Unstoppable&#8221;.<\/p>\n<p>As part of our <a title=\"Strategy &amp; Architecture Services\" href=\"\/services-and-solutions\/services\/strategy-architecture\/\" target=\"_blank\" rel=\"noopener\">Strategy &amp; Architecture Services<\/a>, BlueGranite frequently consults with key clients to understand what technologies are on their radar and which are the most relevant to them. Not two years ago when we spoke with clients about <a title=\"Hadoop\" href=\"http:\/\/en.wikipedia.org\/wiki\/Apache_Hadoop\" target=\"_blank\" rel=\"noopener\">Hadoop<\/a>, most viewed it as a curiosity, and possibly an over-hyped technology that would be soon exposed as one not relevant to &#8220;real&#8221; enterprise customers.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" id=\"img-1404819647362\" class=\"alignRight\" style=\"float: right;\" src=\"\/\/cdn2.hubspot.net\/hub\/257922\/file-1186152434-jpg\/images\/launchroad.jpg\" alt=\"Launch Road\" width=\"200\" height=\"200\" border=\"0\" \/><\/p>\n<p>How things have changed! \u00a0Today about half our enterprise customers have deployed Hadoop. Some with the help of our design and implementation teams, others on their own. \u00a0For the majority, deployments are still within the scope of evaluation and organizational learning. \u00a0But we&#8217;re starting to see production deployments for new data processing and analytical uses cases&#8211;and the results are compelling.<\/p>\n<h2>Why Hadoop will evolve much faster than Linux<\/h2>\n<p>A short few years ago, the best thing Hadoop had going for it was its <a title=\"open source\" href=\"http:\/\/en.wikipedia.org\/wiki\/Open-source_software\" target=\"_blank\" rel=\"noopener\">open source<\/a> distribution model that <em>can<\/em> bring large-scale data processing capabilties free of licensing cost and exotic hardware. \u00a0Yet&#8211;at that time&#8211;the trade-off to enjoying this low-cost platform was to give up the comfort of enterprise-grade vendor support and product stability.<\/p>\n<p>I clearly recall not long ago listening to Hadoop thought leaders suggesting that only large-scale tech companies could support Hadoop because only they could hire Stanford PhDs that were required to patch and <a title=\"recompile the Apache code\" href=\"https:\/\/cwiki.apache.org\/confluence\/display\/Hive\/DeveloperGuide\" target=\"_blank\" rel=\"noopener\">recompile the Apache code<\/a>. \u00a0With the rock-solid backstop of BlueGranite&#8217;s Hadoop distribution partners,\u00a0<a title=\"HortonWorks\" href=\"http:\/\/hortonworks.com\/partner\/bluegranite\/\" target=\"_blank\" rel=\"noopener\">HortonWorks<\/a>\u00a0and\u00a0<a title=\"Microsoft\" href=\"http:\/\/pinpoint.microsoft.com\/en-US\/partners\/BlueGranite-4295488054?LocId=282716222779958\" target=\"_blank\" rel=\"noopener\">Microsoft<\/a>, that line of thinking today would be laughable. The vast majority of enterprise customers will only think about Hadoop as a vendor-provided software that&#8217;s installed (not open source software that&#8217;s compiled on-site).<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" id=\"img-1404819361582\" class=\"alignLeft\" style=\"float: left;\" src=\"\/\/cdn2.hubspot.net\/hub\/257922\/file-1188010391-jpg\/images\/timeevolvestechnology.jpg\" alt=\"TimeEvolvesTechnology\" width=\"200\" height=\"150\" border=\"0\" \/>The situation was not unlike\u00a0<a title=\" the early days of Linux\" href=\"http:\/\/en.wikipedia.org\/wiki\/Linux\" target=\"_blank\" rel=\"noopener\">the early days of Linux<\/a>, when using the free operating system meant needing to add staff with the ability to provide self-support for low-level operating system malfunctions.\u00a0But Linux matured, distribution vendors like <a title=\"Red Hat\" href=\"http:\/\/www.redhat.com\" target=\"_blank\" rel=\"noopener\">Red Hat<\/a> and <a title=\"SuSE\" href=\"https:\/\/www.suse.com\" target=\"_blank\" rel=\"noopener\">SuSE<\/a> made Linux a supportable platform enterprise customers could trust. A healthy ecosystem of systems integrators and consultants to fill knowledge gaps developed to further reduce the risk and cost of Linux implementation for enterprises.<\/p>\n<p>The same is happening with Hadoop. But this time, a <a title=\"successful distribution\/support model\" href=\"http:\/\/wiki.apache.org\/hadoop\/Distributions%20and%20Commercial%20Support\" target=\"_blank\" rel=\"noopener\">successful distribution\/support model<\/a> pulled directly from the playbook of\u00a0<a title=\"Linux vendors and free distributions\" href=\"http:\/\/en.wikipedia.org\/wiki\/Linux_distribution\" target=\"_blank\" rel=\"noopener\">Linux vendors and free distributions<\/a> was quickly adopted. Add to that an unprecedented level of venture capital and corporate sponsorship that funds the development of the platform, and a sometimes mystifying level of collaboration by competitors on the technology itself. The result is progress at a break-neck pace.\u00a0Today Hadoop has already evolved into a data platform whose capabilities&#8211;in many ways&#8211;rival the world&#8217;s best proprietary data platforms.<\/p>\n<h2>Is Hadoop for Everyone?<\/h2>\n<p>Whether Hadoop is for Everyone is certainly a loaded question. \u00a0Of course not! \u00a0No single technology is a silver bullet for everyone. \u00a0There are organizations who will never need Hadoop, just as there are those who will never need an intranet portal or a cloud strategy.<\/p>\n<p>But for most enterprises (and midsize companies that have all the trappings of enterprises but at a smaller scale), the answer is &#8220;Yes&#8221;.<\/p>\n<p>Hadoop will soon be about much more than <a title=\"Gartner's the three V's model (Volume, Velocity, Variability of data)\" href=\"http:\/\/blogs.gartner.com\/doug-laney\/deja-vvvue-others-claiming-gartners-volume-velocity-variety-construct-for-big-data\/\" target=\"_blank\" rel=\"noopener\">Gartner&#8217;s the three V&#8217;s model (Volume, Velocity, Variability of data)<\/a>. \u00a0It&#8217;s evolving into a data processing platform that will be better and less expensive than what most enterprises use today. \u00a0<a title=\"Yarn\" href=\"http:\/\/hadoop.apache.org\/docs\/current\/hadoop-yarn\/hadoop-yarn-site\/YARN.html\" target=\"_blank\" rel=\"noopener\">Yarn<\/a> gives it resource management (control over shared computing power allocated to various workloads). \u00a0<a title=\"Hive\" href=\"https:\/\/hive.apache.org\" target=\"_blank\" rel=\"noopener\">Hive<\/a>, <a title=\"Impala\" href=\"http:\/\/www.cloudera.com\/content\/cloudera\/en\/products-and-services\/cdh\/impala.html\" target=\"_blank\" rel=\"noopener\">Impala<\/a> and <a title=\"Spark\" href=\"http:\/\/spark.apache.org\" target=\"_blank\" rel=\"noopener\">Spark<\/a> are beginning to <a title=\"turbocharge SQL performance against data stored in Hadoop\" href=\"http:\/\/msbiacademy.com\/?p=8651\" target=\"_blank\" rel=\"noopener\">turbocharge SQL performance against data stored in Hadoop<\/a>. And the bottom line: TCO. Hadoop provides Saks Fifth Avenue level <a title=\"MPP (Massively Parallel Processing)\" href=\"http:\/\/en.wikipedia.org\/wiki\/Massively_parallel_computer\" target=\"_blank\" rel=\"noopener\">MPP (Massively Parallel Processing)<\/a> technology at WalMart prices anyone can afford.<\/p>\n<p>Even deploying Hadoop at small scale to address specific workloads is compelling. \u00a0For example, I recently posted <a title=\"an article on my personal blog\" href=\"http:\/\/robkerr.com\/how-to-stream-twitter-data-with-hortonworks-hdp-2-1-and-flume\/\" target=\"_blank\" rel=\"noopener\">an article on my personal blog<\/a> showing how to use Hadoop to <a title=\"collect Twitter data\" href=\"https:\/\/dev.twitter.com\/docs\/streaming-apis\/streams\/public\" target=\"_blank\" rel=\"noopener\">collect Twitter data<\/a> containing a set of keywords. Sure, you can probably figure out how to do this in Excel. \u00a0But how scalable is that? It&#8217;s not. \u00a0<a title=\"Try it yourself\" href=\"http:\/\/robkerr.com\/how-to-stream-twitter-data-with-hortonworks-hdp-2-1-and-flume\/\" target=\"_blank\" rel=\"noopener\">Try it yourself<\/a>: it takes maybe a half hour to get up and running on a Hadoop cluster. \u00a0Don&#8217;t have one? Use a <a title=\"free, pre-configured, single-server Sandbox VM\" href=\"http:\/\/hortonworks.com\/products\/hortonworks-sandbox\/\" target=\"_blank\" rel=\"noopener\">free, pre-configured, single-server Sandbox VM<\/a>. \u00a0Value? High. Cost? Free. Easy? Yes. Stanford PhD required? Definitely not.<\/p>\n<h2>We live in interesting times<\/h2>\n<p>In technology, every few years we seem to be on the cusp of a pivotal moment. \u00a0Moving from inflexible, not-very-scalable <a title=\"ISAM databases\" href=\"http:\/\/en.wikipedia.org\/wiki\/ISAM\" target=\"_blank\" rel=\"noopener\">ISAM databases<\/a> to <a title=\"RDBMS technologies\" href=\"http:\/\/en.wikipedia.org\/wiki\/RDBMS\" target=\"_blank\" rel=\"noopener\">RDBMS technologies<\/a> was pivotal. \u00a0Moving from <a title=\"Client\/Server\" href=\"http:\/\/en.wikipedia.org\/wiki\/Client\/server\" target=\"_blank\" rel=\"noopener\">Client\/Server<\/a> to <a title=\"multi-tiered software architectures\" href=\"http:\/\/en.wikipedia.org\/wiki\/3-tier\" target=\"_blank\" rel=\"noopener\">multi-tiered software architectures<\/a>\u00a0was pivotal. Moving to <a title=\"the cloud\" href=\"http:\/\/en.wikipedia.org\/wiki\/Cloud_computing\" target=\"_blank\" rel=\"noopener\">the cloud<\/a> is definitely pivotal.<\/p>\n<p>Putting <a title=\"massively parallel processing (MPP) technology\" href=\"http:\/\/en.wikipedia.org\/wiki\/Massively_parallel_computer\" target=\"_blank\" rel=\"noopener\">massively parallel processing (MPP) technology<\/a> in the hands of <em>every organization <\/em>will prove to be equally pivotal. \u00a0It&#8217;s not less significant than how the Internet has leveled the playing field, giving small <a title=\"Indie Developers\" href=\"http:\/\/en.wikipedia.org\/wiki\/Independent_video_game_development\" target=\"_blank\" rel=\"noopener\">Indie Developers<\/a> equal marketing opportunities to Fortune 1000 companies.<\/p>\n<p>Yes, we live in <em>very<\/em> interesting times.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Every new technology has a cycle of introduction, awareness, evaluation and&#8211;if the technology is relevant&#8211;eventual mainstream acceptance.<\/p>\n","protected":false},"author":21,"featured_media":15020,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[260],"tags":[304],"class_list":["post-16107","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai","tag-modern-data-platform","topics-blog"],"acf":[],"_links":{"self":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/16107","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/comments?post=16107"}],"version-history":[{"count":0,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/16107\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media\/15020"}],"wp:attachment":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media?parent=16107"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/categories?post=16107"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/tags?post=16107"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}