{"id":15961,"date":"2017-01-12T15:08:00","date_gmt":"2017-01-12T23:08:00","guid":{"rendered":"https:\/\/devwww.3cloudsolutions.com\/post\/data-lakes-a-mess-of-data-a-mess-of-insight-2\/"},"modified":"2024-01-03T15:00:35","modified_gmt":"2024-01-03T23:00:35","slug":"data-lakes-a-mess-of-data-a-mess-of-insight","status":"publish","type":"post","link":"https:\/\/3cloudsolutions.com\/resources\/data-lakes-a-mess-of-data-a-mess-of-insight\/","title":{"rendered":"Data Lakes: A Mess of Data, A Mess of Insight"},"content":{"rendered":"<h3><strong><span style=\"font-size: 24px;\">Everything, Including the Kitchen Sink<\/span><\/strong><\/h3>\n<p>A <a href=\"https:\/\/3cloudsolutions.com\/resources\/top-five-differences-between-data-lakes-and-data-warehouses\/\">data lake<\/a> is a persistent raw archive of any potentially actionable data.\u00a0 The philosophy really is \u201ceverything, <em>including<\/em> the kitchen sink.\u201d This means that a data lake will archive data from many different business systems and non-traditional sources, including sensor data, logs, image data, streaming data, and audio or video data.<\/p>\n<p><!--more--><\/p>\n<p style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/iStock-524578371edited.png\" alt=\"iStock-524578371edited.png\" width=\"805\" height=\"509\" \/><\/p>\n<p>An ambitious data lake may also include information from external sources, such as weather, traffic, or stock market data.\u00a0 But that\u2019s not all! \u00a0A data lake won\u2019t just store the current version of a record or file; it will also retain every revision it can get.\u00a0 By capturing everything, undiluted, a business will be able to answer the questions of today and the new questions of tomorrow.<\/p>\n<p>That raw material, though, is a whole mess of data.\u00a0 It requires gobs of storage, gobs of processing power, and gobs of connectivity to continually archive all this data as it is generated.\u00a0 For this whole hog process, a traditional relational database will not scale easily into the petabyte range, nor does it effectively store or consume unstructured data.\u00a0 At this scale, you\u2019re looking for a platform with near infinite scalability paired with elastic storage and processing power.<\/p>\n<h3><strong><span style=\"font-size: 24px;\">An Organized Mess<\/span><\/strong><\/h3>\n<p>Despite the name, the mess of data in a data lake need not be total chaos.\u00a0 In fact, it should be an <em>organized<\/em> mess.\u00a0 There\u2019s metadata that can be captured in the data ingestion process that does not transform the data, but will give the data additional context.\u00a0 At the very least, data can then be categorized by its source and the date it was captured.\u00a0 This enriches the data and gives it some level of organization without contaminating its raw nature.<\/p>\n<h3><span style=\"font-size: 24px;\">Multiple Tools\u00a0Have Many Uses<\/span><\/h3>\n<p>With a variety of data sources and types, there are an array of tools to get the job done.\u00a0 For ingesting relational data alone, there\u2019s at least half a dozen tools.\u00a0 Broadly speaking, there are three types of data to be ingested: <strong>batched data<\/strong>, <strong>streaming data<\/strong>, and binary data.<\/p>\n<p>Much to my dismay, there is no one tool that is ideally suited for all three types. A true data lake at an enterprise may wind up using two or three (or more!) ingestion tools for dozens of data sources.\u00a0 Orchestrating that aspect alone is a significant task, but that giant mess of data is the raw material for insights now and into the future.<\/p>\n<h3><strong><span style=\"font-size: 24px;\">What to Do with All that Mess<\/span><\/strong><\/h3>\n<p>Here\u2019s the problem with a data lake (stop me if you&#8217;ve heard this already):\u00a0 It\u2019s raw data.\u00a0 It\u2019s a mess.\u00a0 To get insight out of it, you need to make sense of that mess and integrate it into something coherent, which might sound like a data warehouse.\u00a0 And for some enterprises, it can be little more than a massive primary staging layer.\u00a0 For others, it can be a data science playground.<\/p>\n<p>With a data lake feeding a data warehouse, adding new items to the warehouse is merely a matter of sourcing the required information from the data lake.\u00a0 The data will already be available and ready to go.\u00a0 In fact, it may even be possible to make a virtual data warehouse as a layer of views on the data lake itself.\u00a0 It adds much more agility to a data warehouse.<\/p>\n<p>Additionally, a data lake is not only for feeding data warehouses.\u00a0 It can become a one-stop shop for data science efforts too.\u00a0 By capturing everything, there is a treasure trove of insights that may be hidden in the data lake.\u00a0 Machine learning, text analysis, image recognition, and other processes will have the gobs of data they need.\u00a0 It can open new insights about the workings of a business and audit conventional wisdom about your business processes.<\/p>\n<p>In the data driven world of today and tomorrow, having all your business data available to gain a competitive edge is a must, not an option. For more information on data lakes and data warehouses, check out <a href=\"\/resources\/top-five-differences-between-data-lakes-and-data-warehouses\">this blog post<\/a> to learn about the differences. If you are planning your data lake and need help getting started, contact 3Cloud today for insights into the right solution for your company.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A data lake is a raw archive of any potentially actionable data which will archive data from many different business systems and non-traditional sources.<\/p>\n","protected":false},"author":21,"featured_media":14732,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[297],"tags":[336,304],"class_list":["post-15961","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-platform","tag-data-lake","tag-modern-data-platform","topics-blog"],"acf":[],"_links":{"self":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15961","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/comments?post=15961"}],"version-history":[{"count":0,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15961\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media\/14732"}],"wp:attachment":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media?parent=15961"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/categories?post=15961"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/tags?post=15961"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}