{"id":16104,"date":"2014-07-18T20:06:00","date_gmt":"2014-07-19T03:06:00","guid":{"rendered":"https:\/\/devwww.3cloudsolutions.com\/post\/want-to-get-started-with-big-data-stop-throwing-data-away-2\/"},"modified":"2024-04-17T14:05:23","modified_gmt":"2024-04-17T21:05:23","slug":"want-to-get-started-with-big-data-stop-throwing-data-away","status":"publish","type":"post","link":"https:\/\/3cloudsolutions.com\/resources\/want-to-get-started-with-big-data-stop-throwing-data-away\/","title":{"rendered":"Want to get Started with Big Data? Stop Throwing Data Away!"},"content":{"rendered":"<div class=\"hs-migrated-cms-post\">\n<p><img decoding=\"async\" id=\"img-1405713427824\" style=\"float: right;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/do_not_delete.jpg\" alt=\"Do Not Delete\" border=\"0\" hspace=\"10\" \/>Recently I\u2019ve been talking with a number of customers with similar problems. They have systems that are generating high volumes of transaction-level data that they are unable to collect and store in order to use it for analysis. For example, most manufacturers have sensors on their production lines. Every piece of equipment has one or more sensors that is collecting data about the operation. Information like tolerances, temperatures and cycle times all give valuable insight into how the process is going. Very often, these systems are measuring multiple times a minute or even second. This data is used to keep track of processes as they happen but since the data volume is so high, it is either archived or discarded after it is a week or even just a day old. Because the data volume is so high, it\u2019s not really feasible to keep around 1, 5 or 10 years\u2019 worth of it. But, if you had that deep history, imagine how much insight you could gain into your processes. The ability to do trend analysis to detect anomalies that might be missed in a span of days but stick out over months could indicate places where you could improve processes and quality.<\/p>\n<p><!--more--><\/p>\n<p>Manufacturing is one example but many other types of businesses have data that they regularly throw away or ignore because they don\u2019t think it\u2019s relevant to business operations or there is too much of it in a format that is hard to deal with. Web server and other software logs usually accumulate until someone decides to either delete them or archive them. How about survey results?\u00a0Companies do lots of surveys and then act on the outcome. Do you keep the data accessible so that you can combine past results with new ones and\u00a0analyze them over time? Do you cross-compare them with your sales results? How about clicks, tweets and likes? Do you have customer comments or product reviews on your web site? Can you correlate them over time and show trends in sentiment toward your products and demonstrate how they impact sales?<\/p>\n<p>The promise of the <a href=\"https:\/\/3cloudsolutions.com\/resources\/big-data-management-challenges-solutions\/\">big data<\/a> is that it will give you access to all of your data no matter what the volume, source or format. But many companies are saying things like \u201cwe want to look into big data but we don\u2019t have time\/resources\/money right now\u201d. OK, so you\u2019re going to do big data in six months. You still don\u2019t have to throw all that valuable data away between now and then. Those log files, process streams and social feeds have value. Don\u2019t just arbitrarily say \u201cwe\u2019ll get to it when we can\u201d. Start collecting that data so that when you are ready to try some analysis on it, you\u2019ll already have built up a stockpile.<\/p>\n<p>\u201cBut wait!\u201d I hear you saying. \u201cIT told me it\u2019s too expensive to keep all that data around! They told me all that disk space costs too much!\u201d IT has a point. Disk space in the data center is limited and it can be costly to add more. One great way to mitigate that is to utilize cloud storage like Azure Blob Storage. If you\u2019re new to the concept of <a href=\"http:\/\/en.wikipedia.org\/wiki\/Cloud_computing\">cloud<\/a> beyond the buzzword, it really just means computing services that are hosted by a service provider and made available over the Internet. With cloud-based services, you don\u2019t have to buy any infrastructure or software, you just pay a monthly fee. <a href=\"http:\/\/azure.microsoft.com\/\">Microsoft Azure<\/a> is Microsoft\u2019s umbrella for their cloud-based offerings one of which includes data storage. The term \u201c<a href=\"http:\/\/en.wikipedia.org\/wiki\/Binary_large_object\">Blob<\/a>\u201d (written correctly it\u2019s BLOb) is an acronym for Binary Large Object. It just means a collection of binary data that is stored together. So, put that all together and Azure Blob Storage is Microsoft\u2019s cloud offering for storing large blocks of data. Whew!<\/p>\n<p>Cloud storage has a lot of advantages over storage in the data center:<\/p>\n<ul>\n<li>There\u2019s nothing to buy up front, you just sign up for an account and pay monthly.<\/li>\n<li>You only pay for what you use. You can reserve 100 terabytes if you want but if you only use one, you only pay for one.<\/li>\n<li>Data is stored redundantly either multiple times in the same data center or multiple times around the world. You get to choose how safe you want it to be and you don\u2019t have to worry about backups.<\/li>\n<li>You can shut it down at any time if you decide you don\u2019t need it and you\u2019re not out any investment beyond what you paid monthly.<\/li>\n<li>IT won\u2019t keep asking you when you\u2019re going to delete all those old files.<\/li>\n<\/ul>\n<p>Using cloud storage is a bit different than using folders on your hard drive or network but not too different. There are a number of tools available that make it as easy as dragging and dropping files around on your PC. One that is easy to use and free is <a href=\"http:\/\/blogs.msdn.com\/b\/windowsazurestorage\/archive\/2010\/04\/17\/windows-azure-storage-explorers.aspx\">Cloud Berry Explorer<\/a>. There are also ways to automate the transfer of files into cloud storage that you may want to explore.<\/p>\n<p>Don\u2019t worry about the file formats, just start saving the files and let them pile up. Name them so that you can keep straight what\u2019s in them. If they are log files, they\u2019ll probably be dated anyway. Otherwise, make sure to include the source and date in the file name. An example might be \u201cProductSurveyResults-20140718.csv\u201d. This collection becomes the beginning of your <a title=\"Data Lake\" href=\"http:\/\/en.wiktionary.org\/wiki\/data_lake\" target=\"_self\" rel=\"noopener\">Data Lake<\/a>.<\/p>\n<p>Once you have some data accumulated and you\u2019re getting ready to think about using it, the really great news is that that can all be done from the cloud too!<\/p>\n<ul>\n<li>In small scale, Microsoft\u2019s Power Query can read data directly out of Blob storage into Excel where you can build data models with Power Pivot and visualizations with Power View and Power Map. All of these tools fall under the heading of Power BI.<\/li>\n<li>If you\u2019re ready for a big data solution, Hadoop is the way to go. Microsoft offers their HDInsight Hadoop distribution in Azure. HDInsight is Hortonworks Hadoop optimized for the cloud. It is easy to set up and can read directly from Blob storage.<\/li>\n<li>If a larger Hadoop cluster is what you\u2019re after, it is possible to create a full Hadoop infrastructure in the cloud using Azure Virtual Machines and Blob storage.<\/li>\n<\/ul>\n<p>It\u2019s time to start thinking about big data. If you\u2019re not, your competitors are. If you\u2019re not ready to get started, at least stop throwing all of that valuable data away!<\/p>\n<p>If you have questions about big data, Microsoft Azure\u00a0or\u00a0Hadoop or would like to have a conversation about your data analytics, please c<a href=\"\/get-started\/\">ontact us.<\/a><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Want to get started with Big Data? Stop throwing data away! Cloud storage offers the ability to store unlimited data affordably.<\/p>\n","protected":false},"author":21,"featured_media":15017,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[297],"tags":[304],"class_list":["post-16104","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-platform","tag-modern-data-platform","topics-blog"],"acf":[],"_links":{"self":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/16104","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/comments?post=16104"}],"version-history":[{"count":0,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/16104\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media\/15017"}],"wp:attachment":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media?parent=16104"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/categories?post=16104"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/tags?post=16104"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}