{"id":15819,"date":"2019-02-23T15:37:00","date_gmt":"2019-02-23T23:37:00","guid":{"rendered":"https:\/\/devwww.3cloudsolutions.com\/post\/lambda-architecture-in-azure-2\/"},"modified":"2024-01-08T15:18:07","modified_gmt":"2024-01-08T23:18:07","slug":"lambda-architecture-in-azure","status":"publish","type":"post","link":"https:\/\/3cloudsolutions.com\/resources\/lambda-architecture-in-azure\/","title":{"rendered":"Lambda Architecture in Azure"},"content":{"rendered":"<p>Lambda architecture is the state-of-the-industry, Big Data workload pattern for handling batch and streaming workloads in a single system. If you\u2019re researching how to modernize your data program, the lambda architecture is the place to start.<\/p>\n<p>Let\u2019s review the key concepts, parse through the tooling options in <a href=\"https:\/\/azure.microsoft.com\/en-us\/\" target=\"_blank\" rel=\"noopener\">Microsoft Azure<\/a>, examine some sample reference architectures, and discuss common criticisms of lambda. In a follow-up post, we\u2019ll introduce the emerging kappa architecture and compare the benefits and limitations against lambda.<\/p>\n<p><!--more--><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" style=\"display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/Lambda-architechture-1.png\" alt=\"Lambda architechture \" width=\"805\" height=\"509\" \/><\/p>\n<h2>Lambda Architecture Overview<\/h2>\n<p>The key components of the lambda architecture are the <strong>hot<\/strong> and <strong>cold<\/strong> data processing paths, and a common serving layer that combines outputs for both paths. The <strong>hot path<\/strong> refers to streaming data workloads and the <strong>cold path<\/strong> applies to batch-processed data. The goal of the architecture is to present a holistic view of an organization\u2019s data, both from history and in near real-time, within a combined serving layer, as the following <a href=\"https:\/\/blogs.msdn.microsoft.com\/uk_faculty_connection\/2017\/02\/24\/big-data-on-azure-with-no-limits-data-analytics-and-managed-clusters\/\" target=\"_blank\" rel=\"noopener\">Microsoft<\/a> visual illustrates.<\/p>\n<table style=\"height: 393px; margin-left: auto; margin-right: auto;\" width=\"758\">\n<tbody>\n<tr>\n<td style=\"width: 752px;\"><span style=\"font-size: 9px;\"><a href=\"https:\/\/blogs.msdn.microsoft.com\/uk_faculty_connection\/2017\/02\/24\/big-data-on-azure-with-no-limits-data-analytics-and-managed-clusters\/\" target=\"_blank\" rel=\"noopener\" data-mce-target=\"_blank\"><img loading=\"lazy\" decoding=\"async\" style=\"display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/Hot-Cold-Path-1.png\" alt=\"Lambda Architecture Hot Cold Path\" width=\"744\" height=\"319\" \/><\/a><\/span><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 752px;\"><span style=\"font-size: 9px;\">Lee Stott, Microsoft UK,<em>\u00a0Big Data on Azure with No Limits Data, Analytics and Managed Clusters.<\/em>\u00a0Retrieved from \u00a0<a href=\"https:\/\/blogs.msdn.microsoft.com\/uk_faculty_connection\/2017\/02\/24\/big-data-on-azure-with-no-limits-data-analytics-and-managed-clusters\/\" target=\"_blank\" rel=\"noopener\">https:\/\/blogs.msdn.microsoft.com\/uk_faculty_connection\/2017\/02\/24\/big-data-on-azure-with-no-limits-data-analytics-and-managed-clusters<\/a><\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2><span style=\"background-color: transparent;\">Tooling for Lambda\u00a0<\/span><span style=\"background-color: transparent;\">Architecture\u00a0in Azure<\/span><\/h2>\n<p><span style=\"background-color: transparent;\">You can run any tooling you want in Azure as Infrastructure as a Service (IaaS), but the value-add in cloud platforms for Big Data is in the Platform as a Service and Software as a Service tiers. The non-IaaS options in Azure are to either use the fully managed, native Azure services, <\/span><a style=\"background-color: transparent;\" href=\"https:\/\/azure.microsoft.com\/en-us\/services\/hdinsight\/\" target=\"_blank\" rel=\"noopener\">HDInsight<\/a><span style=\"background-color: transparent;\">, or the new <\/span><a style=\"background-color: transparent;\" href=\"https:\/\/www.blue-granite.com\/blog\/microsoft-azure-databricks-cloud-scale-spark-power\" target=\"_blank\" rel=\"noopener\">Azure Databricks offering<\/a><span style=\"background-color: transparent;\"> (currently in preview).<\/span><\/p>\n<h3>Reference Architecture using Native Azure-Managed Services<\/h3>\n<p>Implementing native, Azure-managed services for lambda simplifies your list of services to choose from. This may or may not be a good thing. The exciting and frustrating pace of change in distributed systems means there\u2019s always something new on the horizon, so if you go all in with native Azure-managed services, you lock in your capabilities to the Azure product team\u2019s release cycle. This isn\u2019t a concern most of the time, however, there might be edge cases where managed services just can\u2019t meet your organization\u2019s needs. For example, when considering <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/stream-analytics\/\" target=\"_blank\" rel=\"noopener\">Stream Analytics<\/a>: is your reference data for streaming greater than 100MB? Do you need throughput of more than 1GB \/ sec? Is a maximum time to live (TTL) of 7 days sufficient?<\/p>\n<p>Unless you\u2019re outside of the current capabilities of a service (or will be soon), then the managed service architecture is the best place to start.<\/p>\n<p><img decoding=\"async\" style=\"width: 823px; display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/AMS-Lambda-1.jpg\" alt=\"AMS Lambda Architecture\" width=\"823\" \/><\/p>\n<p>There are lots of options for augmenting, substituting, and extending this architecture. The goal here is to give a baseline of what you would probably need in your ecosystem to support lambda if preferring native Azure services.<\/p>\n<h3>Reference Architecture for HDInsight<\/h3>\n<p>Below is an implementation with preference for <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/hdinsight\/\">HDInsight services<\/a>:<\/p>\n<p><img decoding=\"async\" style=\"width: 823px; display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/HDInsight-Lambda-1.jpg\" alt=\"HDInsight Lambda Architecture\" width=\"823\" \/><\/p>\n<p>You\u2019ll notice that we\u2019ve listed <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/data-factory\/\">Azure Data Factory<\/a> (ADF) as the ingest engine for batch. The data movement, lineage, monitoring, and orchestration capabilities of ADF are extremely difficult to substitute for in the Azure cloud. Even running HDInsight jobs as Data Factory linked services automatically handles the spin up and tear down of clusters for you. So, while you have lots of options as far as analytics, machine learning, querying, and compute services, if you\u2019re considering any type of Big Data workload in Azure, then planning on ADF as part of your architecture will simplify and accelerate your development cycle.<\/p>\n<h2>Limitations of Lambda<\/h2>\n<h3>Extending for concurrency and frequency<\/h3>\n<p>An assumption that you often see with lambda, as modeled above, is low concurrency and\/or frequency, specifically in the cold path. This does not fit many large organizations\u2019 internal needs, or even those of small organizations offering reporting and analytics to their end customers. In practice, the serving layer is usually extended to include a hub-and-spoke architecture that incorporates a structured data mart to support the most commonly queried data (either by partition or entity). I personally don\u2019t know of a common name for this, but I like to think of it as the<strong> warm path<\/strong>. We are purposely prioritizing some batch-processed data into services that support higher concurrency at a lower cost. Over time, the partitions are aged out of the warm path, but persist within the cold query path. This augmentation to lambda also helps simplify multi-tenancy, self-service BI, and embedded analytics use cases.<\/p>\n<p><img decoding=\"async\" style=\"width: 823px; display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/AMS-Lambda-hub-and-spoke-1.jpg\" alt=\"AMS Lambda Architecture hub and spoke\" width=\"823\" \/><\/p>\n<h2>Common Criticisms<\/h2>\n<p>Lambda is an organic result of the limitations of existing tools. Distributed systems architects and developers commonly criticize its complexity \u2013 and rightly so. Those of us that have worked extensively in Extract-Transform-Load and symmetric multiprocessing systems see red flags when code is replicated in multiple services. Ensuring data quality and code conformity across multiple systems, whether massively parallel processing (MPP) or symmetrically parallel system (SMP), has the same best practice: <strong>the least amount of times you reproduce code is always the correct number of times.<\/strong><\/p>\n<p>We reproduce code in lambda because different services in MPP systems are better at different tasks. The maturity of tools historically hasn\u2019t allowed us to process streams and batch in a single tool. This is starting to change, with <a href=\"https:\/\/spark.apache.org\/\">Apache Spark<\/a> emerging as a single preferred compute service for stream and batch querying, hence the timing of <a href=\"https:\/\/databricks.com\/product\/azure\">Azure Databricks<\/a>. However, on the storage side, what was meant to be an immutable store that is the data lake in practice, can become the dreaded swamp when governance or testing fails; which is not uncommon. A fundamentally different assumption to how we process data is required to combat this degradation. Enter: the kappa architecture, which we\u2019ll examine in the next post of this series.<\/p>\n<p>For those looking to delve further in the lambda architecture, there are several highly detailed resources available. Check out <a href=\"https:\/\/www.blue-granite.com\/blog\/lambda-architecture-low-latency-data-in-a-batch-processing-world\">this blog post by BlueGranite\u2019s Josh Fennessy<\/a> for starters.<\/p>\n<p>Want to discover the best way to handle your organization\u2019s data? We design our custom cloud analytics solutions around your business. <a href=\"https:\/\/www.blue-granite.com\/contact-us\">Contact<\/a> BlueGranite today to learn more.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Lambda architecture is the state-of-the-industry, Big Data workload pattern for handling batch and streaming workloads in a single system. If you want to learn how to modernize your data program, the lambda architecture is the place to start.<\/p>\n","protected":false},"author":21,"featured_media":14095,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[260],"tags":[304],"class_list":["post-15819","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai","tag-modern-data-platform","topics-blog"],"acf":[],"_links":{"self":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15819","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/comments?post=15819"}],"version-history":[{"count":0,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15819\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media\/14095"}],"wp:attachment":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media?parent=15819"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/categories?post=15819"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/tags?post=15819"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}