{"id":15760,"date":"2020-02-03T20:15:26","date_gmt":"2020-02-04T04:15:26","guid":{"rendered":"https:\/\/devwww.3cloudsolutions.com\/post\/recap-of-rstudioconf2020-for-data-science-and-machine-learning-3\/"},"modified":"2024-06-18T08:58:07","modified_gmt":"2024-06-18T15:58:07","slug":"recap-of-rstudioconf2020-for-data-science-and-machine-learning","status":"publish","type":"post","link":"https:\/\/3cloudsolutions.com\/resources\/recap-of-rstudioconf2020-for-data-science-and-machine-learning\/","title":{"rendered":"Recap of rstudio::conf(2020) for Data Science and Machine Learning"},"content":{"rendered":"<p>Hosted in the beautiful San Francisco from January 27th to the 30th, the rstudio::conf(2020) kicked off with two days of training followed by two days of jam-packed session on everything R. Every year, this conference grows and welcomes more and more statisticians, data scientists, researchers, viz experts and more. This year, the conference attendance jumped to an impressive 2,242 people! In this post, I want to recap some common themes that I heard at the event along with some cool packages I learned about along the way.<\/p>\n<p><img decoding=\"async\" style=\"width: 848px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/99BFF544-7EF5-4A63-91A7-F7E7A3C5EEE8_2-1.jpg\" alt=\"99BFF544-7EF5-4A63-91A7-F7E7A3C5EEE8_2-1\" width=\"848\" \/><\/p>\n<p><!--more--><\/p>\n<h2>RStudio is now a B Corporation<\/h2>\n<p>At the keynote on day 1, RStudio CEO, J. J. Allaire made the announcement that RStudio, Inc. is now RStudio, PBC, a Public Benefit Corporation. This new structure allows for the company to focus on open source development of software and put their mission and other stakeholders on equal footing with shareholders. It will be interesting to see how this change allows RStudio to balance the creation of professional products (like RStudio Server and RStudio Connect, which they sell to companies) with the continuation of making amazing open source products that benefit us all. To me, the most interesting part is the open annual reporting of RStudio&#8217;s contributions to benefit everyone, which can be seen at <a href=\"https:\/\/bcorporation.net\/directory\/rstudio\" target=\"_blank\" rel=\"noopener\">bcorporation.net\/directory\/rstudio<\/a>.<\/p>\n<p style=\"text-align: right;\">Read more about their change to a B Corp <a href=\"https:\/\/blog.rstudio.com\/2020\/01\/29\/rstudio-pbc\/\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<h2>Cool Packages<\/h2>\n<p>Many of the sessions included speakers talking about their development of innovative packages for a variety of purposes. Here are just a few that stuck out to me as either really useful or just plain fun&#8230;<\/p>\n<table style=\"width: 100%; margin-left: auto; margin-right: auto; border-color: #99acc2; border-style: none; border-collapse: collapse; table-layout: fixed; height: 324px;\" border=\"0\" cellpadding=\"4\">\n<tbody>\n<tr style=\"height: 36px;\">\n<td style=\"width: 33.3333%; height: 36px; vertical-align: top;\"><span style=\"font-family: 'courier new', courier;\"><a href=\"https:\/\/github.com\/r-lib\/vctrs\" target=\"_blank\" rel=\"noopener\">vctrs<\/a><\/span><\/td>\n<td style=\"width: 33.3333%; height: 36px; vertical-align: top;\">Provides size and type-stability to vectors and avoid undesirable behavior when mixing different S3 types.<\/td>\n<td style=\"width: 33.3333%; height: 36px; text-align: center;\"><img decoding=\"async\" style=\"width: 173px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/vctrs.png\" alt=\"vctrs\" width=\"173\" \/><\/td>\n<\/tr>\n<tr style=\"height: 36px;\">\n<td style=\"width: 33.3333%; height: 36px; vertical-align: top;\"><span style=\"font-family: 'courier new', courier;\"><a href=\"https:\/\/github.com\/vegawidget\/vlbuildr\" target=\"_blank\" rel=\"noopener\">vlbuilder<\/a><\/span><\/td>\n<td style=\"width: 33.3333%; height: 36px; vertical-align: top;\">Create interactive Vega-lite graphics in R.<\/td>\n<td style=\"width: 33.3333%; height: 36px;\"><img decoding=\"async\" style=\"width: 174px; display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/vlbuildr_hex_big-1.png\" alt=\"vlbuildr_hex_big-1\" width=\"174\" \/><\/td>\n<\/tr>\n<tr style=\"height: 36px;\">\n<td style=\"width: 33.3333%; height: 36px; vertical-align: top;\"><span style=\"font-family: 'courier new', courier;\"><a href=\"https:\/\/github.com\/tidymodels\/tidymodels\" target=\"_blank\" rel=\"noopener\">tidymodels<\/a><\/span><\/td>\n<td style=\"width: 33.3333%; height: 36px; vertical-align: top;\">A suite of packages that includes a core set of packages for modeling and statistical analysis using the grammar and data structures of the tidyverse.<\/p>\n<p>Includes <span style=\"font-family: 'courier new', courier;\"><a href=\"https:\/\/tidymodels.github.io\/recipes\/\" target=\"_blank\" rel=\"noopener\">recipes<\/a><\/span>, <span style=\"font-family: 'courier new', courier;\"><a href=\"https:\/\/tidymodels.github.io\/parsnip\/\" target=\"_blank\" rel=\"noopener\">parsnip<\/a><\/span>, and <span style=\"font-family: 'courier new', courier;\"><a href=\"https:\/\/tidymodels.github.io\/tune\/\" target=\"_blank\" rel=\"noopener\">tune<\/a><\/span>, which are awesome packages for data preparation, ML modeling, and tuning, respectively.<\/td>\n<td style=\"width: 33.3333%; height: 36px; text-align: center;\"><img decoding=\"async\" style=\"width: 200px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/tidymodels.png\" alt=\"tidymodels\" width=\"200\" \/><\/td>\n<\/tr>\n<tr style=\"height: 36px;\">\n<td style=\"width: 33.3333%; height: 36px; vertical-align: top;\"><span style=\"font-family: 'courier new', courier;\"><a href=\"https:\/\/github.com\/ryantimpe\/brickr\" rel=\" noopener\">brickr<\/a><\/span><\/td>\n<td style=\"width: 33.3333%; height: 36px; vertical-align: top;\">Create 2D and 3D LEGO\u00ae art and generate real step-by-step instructions for building your brick art in the real world.<\/td>\n<td style=\"width: 33.3333%; height: 36px;\"><img decoding=\"async\" style=\"width: 174px; display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/brickr.png\" alt=\"brickr\" width=\"174\" \/><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Asynchronous Programming<\/h2>\n<p>In addition to the packages listed above, I attended quite a few talks around scalability in R. Out of the box, R is a serial, scripting language.\u00a0 Due to the hard work of some developers, we can now use packages that allow R to complete tasks asynchronously. In other words, we can do scale our code by doing multiple tasks at once. Previously, this was done using the <span style=\"font-family: 'courier new', courier;\">foreach<\/span> package along with a package like <span style=\"font-family: 'courier new', courier;\">doParallel<\/span> to parallelize iterations of a task.<\/p>\n<p>Henrik Bengtsson&#8217;s package, called <span style=\"font-family: 'courier new', courier;\"><a href=\"https:\/\/github.com\/HenrikBengtsson\/future\" target=\"_blank\" rel=\"noopener\">future<\/a><\/span>, provides a very simple and uniform way of evaluating R expressions asynchronously on whatever resources the user has available. This is accomplished by using the new <span style=\"font-family: 'courier new', courier; background-color: #cccccc;\">%&lt;-%<\/span> operator and giving the package a &#8220;plan&#8221; of how to execute.<\/p>\n<table style=\"width: 500px; margin-left: auto; margin-right: auto; border-color: #99acc2; border-style: none; border-collapse: collapse; table-layout: fixed;\" border=\"0\" cellpadding=\"4\">\n<tbody>\n<tr>\n<td style=\"width: 100%;\">\n<pre class=\"hljs\" style=\"display: block; overflow-x: auto; padding: 0.5em; background: #f0f0f0 none repeat scroll 0% 0%; color: #444444;\"><span class=\"hljs-keyword\" style=\"font-weight: bold;\">library<\/span>(future)\r\n\r\nplan(sequential)\r\n<span class=\"hljs-comment\" style=\"color: #888888;\"># or plan(multiprocess), plan(cluster), etc.<\/span>\r\ny %&lt;-% {\r\n     x &lt;- <span class=\"hljs-number\" style=\"color: #880000;\">2<\/span>\r\n     <span class=\"hljs-number\" style=\"color: #880000;\">2<\/span> * x\r\n}<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>One complaint programmers have about distributing your code is that it&#8217;s difficult to tell how much work has been completed while you&#8217;re waiting. Bengtsson&#8217;s solution to this is the <span style=\"font-family: 'courier new', courier;\"><a href=\"https:\/\/github.com\/HenrikBengtsson\/progressr\" target=\"_blank\" rel=\"noopener\">progressr<\/a><\/span>\u00a0package, which will allow for overall progress of a task to be returned to the user, even when distributing the task across CPU cores or machines in a cluster.<\/p>\n<table style=\"width: 500px; margin-left: auto; margin-right: auto; border-color: #99acc2; border-style: none; border-collapse: collapse; table-layout: fixed;\" border=\"0\" cellpadding=\"4\">\n<tbody>\n<tr>\n<td style=\"width: 100%;\">\n<pre class=\"hljs\" style=\"display: block; overflow-x: auto; padding: 0.5em; background: #f0f0f0 none repeat scroll 0% 0%; color: #444444;\"><span class=\"hljs-keyword\" style=\"font-weight: bold;\">library<\/span>(progressr)\r\n\r\nslow_sum &lt;- <span class=\"hljs-keyword\" style=\"font-weight: bold;\">function<\/span>(x) {\r\np &lt;- progressr::progressor(along = x)\r\nsum &lt;- <span class=\"hljs-number\" style=\"color: #880000;\">0<\/span>\r\n<span class=\"hljs-keyword\" style=\"font-weight: bold;\">for<\/span> (kk <span class=\"hljs-keyword\" style=\"font-weight: bold;\">in<\/span> seq_along(x)) {\r\nSys.sleep(<span class=\"hljs-number\" style=\"color: #880000;\">0.1<\/span>)\r\nsum &lt;- sum + x[kk]\r\np(message = sprintf(<span class=\"hljs-string\" style=\"color: #880000;\">\"Added %g\"<\/span>, x[kk]))\r\n}\r\nsum\r\n}\r\n\r\nwith_progress(y &lt;- slow_sum(<span class=\"hljs-number\" style=\"color: #880000;\">1<\/span>:<span class=\"hljs-number\" style=\"color: #880000;\">10<\/span>))\r\n[<span class=\"hljs-number\" style=\"color: #880000;\">1<\/span>] |===================== | <span class=\"hljs-number\" style=\"color: #880000;\">40<\/span>%<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Lot&#8217;s of Shiny<\/h2>\n<p>For those of you who don&#8217;t know, Shiny is a web development framework for R. It allows for R programmers to create responsive web applications easily using R and all your favorite packages.<\/p>\n<p>Traditionally, Shiny works by rendering a Bootstrap page complete with UI elements, graphics, and the works. If you&#8217;ve ever created a Shiny app before, you&#8217;ve probably used the <span style=\"font-family: 'courier new', courier;\"><a href=\"https:\/\/rstudio.github.io\/shinythemes\/\" target=\"_blank\" rel=\"noopener\">shinythemes<\/a><\/span> package to quickly give your app some color and style.<\/p>\n<p><img decoding=\"async\" style=\"width: 600px; display: block; margin: 0px auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/united_shiny.png\" alt=\"united_shiny\" width=\"600\" \/>However, if you wanted to completely change all the styles of a Shiny app (for example, to match your company&#8217;s brand standards), you have to write custom CSS. Now, thanks to the <span style=\"font-family: 'courier new', courier;\"><a href=\"https:\/\/github.com\/rstudio\/bootstraplib\/\" target=\"_blank\" rel=\"noopener\">bootstraplib<\/a><\/span> package, you can easily make global variables in your R code to finely tune your style. And there&#8217;s even a theme customizer to interactively try out a new look.<\/p>\n<p><img decoding=\"async\" style=\"width: 600px; display: block; margin: 0px auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/68747470733a2f2f692e696d6775722e636f6d2f696c366e64384a2e676966.gif\" alt=\"68747470733a2f2f692e696d6775722e636f6d2f696c366e64384a2e676966\" width=\"600\" \/><\/p>\n<p>Lastly, developers often create Shiny apps to provide an easy-to-use interface into a particular type of analysis, data, or visualization.<\/p>\n<p>Let&#8217;s take my application for example: <a href=\"https:\/\/strainhub.io\/\" target=\"_blank\" rel=\"noopener\">StrainHub<\/a>. In a nutshell, StrainHub is a phylogenetic tools built in Shiny that allows researchers to build transmission networks from metadata. In the screenshot below, a transmission network is generated using a dataset of Hepatitis C isolates. (Specifically, a phylogenetic tree + accompanying country metadata.)<\/p>\n<p><img decoding=\"async\" style=\"width: 600px; display: block; margin: 0px auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/Strainhub.png\" alt=\"Strainhub\" width=\"600\" \/>While this application is a great interactive tool for epidemiologists or public health researchers that don&#8217;t want to use R to generate the visuals, Shiny apps can be a bit ephemeral in the sense that reproducing this output (for research transparency and publication purposes) requires that the user have the exact data and analyze it using the same version of StrainHub in the future to guarantee the same result. This poses a problem if there are code changes down the line.<\/p>\n<p>Welcome, <span style=\"font-family: 'courier new', courier;\">shinymeta<\/span>! The <span style=\"font-family: 'courier new', courier;\"><a href=\"https:\/\/github.com\/rstudio\/shinymeta\" target=\"_blank\" rel=\"noopener\">shinymeta<\/a><\/span> package provides tools for auto-generating R code that captures the logic from your Shiny app. In other words, a user can now interact with your app and then generate a code and output bundle of the results. This is PERFECT for researchers who want to use someone&#8217;s app and want to publish on the results of the app as this allows them to create reproducible artifacts from the analysis in Shiny. As a researcher myself, this was one of the coolest things I saw at the conference!<\/p>\n<h2>R + Microsoft = \u2764<\/h2>\n<p>In previous years, you may have noticed Microsoft&#8217;s focus on the use of Python in Azure Machine Learning Service (AMLS). However, as of November 2019, Microsoft has released an R SDK for AMLS. For more information about the R AMLS SDK, see: <a href=\"https:\/\/azure.github.io\/azureml-sdk-for-r\/\">https:\/\/azure.github.io\/azureml-sdk-for-r\/<\/a>.<\/p>\n<p>Also, if you&#8217;re an Azure Databricks aficionado, you already know you can take advantage of the R API for Spark (SparkR) or you can use RStudio&#8217;s <span style=\"font-family: 'courier new', courier;\"><a href=\"https:\/\/spark.rstudio.com\/\" target=\"_blank\" rel=\"noopener\">sparklyr<\/a><\/span> package for a more <span style=\"font-family: 'courier new', courier;\">dplyr<\/span>-like experience. For more information about R in Azure Databricks, click <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/databricks\/spark\/latest\/sparkr\/?toc=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fazure-databricks%2FTOC.json&amp;bc=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fbread%2Ftoc.json\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<h2>Resources<\/h2>\n<p>You can find the recordings of all the conference sessions here: <a href=\"https:\/\/resources.rstudio.com\/rstudio-conf-2020\">https:\/\/resources.rstudio.com\/rstudio-conf-2020<\/a><\/p>\n<p>&#8230;and Emil Hvitfeldt has curated a list of the slide links, etc. here: <a href=\"https:\/\/github.com\/EmilHvitfeldt\/RStudioConf2020Slides\">https:\/\/github.com\/EmilHvitfeldt\/RStudioConf2020Slides<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recap of rstudio::conf(2020) conference with highlights of the top announcements, new packages, and the Shiny development framework.<\/p>\n","protected":false},"author":21,"featured_media":13087,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[260],"tags":[319],"class_list":["post-15760","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai","tag-machine-learning-ai","topics-blog"],"acf":[],"_links":{"self":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15760","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/comments?post=15760"}],"version-history":[{"count":0,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15760\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media\/13087"}],"wp:attachment":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media?parent=15760"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/categories?post=15760"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/tags?post=15760"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}