{"id":15667,"date":"2021-08-24T13:30:00","date_gmt":"2021-08-24T20:30:00","guid":{"rendered":"https:\/\/devwww.3cloudsolutions.com\/post\/azure-purview-series-part-3-scanning-and-classification-3\/"},"modified":"2023-08-15T15:00:30","modified_gmt":"2023-08-15T22:00:30","slug":"azure-purview-series-part-scanning-and-classification","status":"publish","type":"post","link":"https:\/\/3cloudsolutions.com\/resources\/azure-purview-series-part-scanning-and-classification\/","title":{"rendered":"Azure Purview Series \u2013 Part 3: Scanning and Classification"},"content":{"rendered":"<p style=\"margin-left: .5in !msorm;\">In our previous article for this series, <span style=\"color: #007cba;\"><a style=\"color: #007cba;\" href=\"\/resources\/azure-purview-series-part-2-data-catalog\" target=\"_blank\" rel=\"noopener\">Purview Part 2: Data Catalog<\/a><\/span>, we examined the portion of the end user experience where people will spend the majority of their time. But the question is, how does that Data Catalog get populated? The Data Catalog is populated by the Scanning and Classification features of Purview, which is the focus of this article.<\/p>\n<p><img decoding=\"async\" style=\"width: 1000px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/10\/iStock-1172252198-1.jpg\" alt=\"iStock-1172252198-1\" width=\"1000\" \/><\/p>\n<p>There are some prerequisites that need to be mentioned before starting with the Scanning and Classification features. You will need an <span style=\"color: #007cba;\"><a style=\"color: #007cba;\" href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/cost-management-billing\/manage\/switch-azure-offer\" target=\"_blank\" rel=\"noopener\">Azure subscription<\/a><\/span>, an <span style=\"color: #007cba;\"><a style=\"color: #007cba;\" href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/purview\/create-catalog-portal\" target=\"_blank\" rel=\"noopener\">Azure Purview account<\/a><\/span>, <span style=\"color: #007cba;\"><a style=\"color: #007cba;\" href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/key-vault\/general\/basic-concepts\" target=\"_blank\" rel=\"noopener\">Azure Key Vault<\/a><\/span> (for managing data source credentials), and the appropriate <span style=\"color: #007cba;\"><a style=\"color: #007cba;\" href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/purview\/catalog-permissions\" target=\"_blank\" rel=\"noopener\">Azure Purview roles<\/a><\/span>.<\/p>\n<h2 style=\"font-size: 45px;\"><span style=\"color: #007cba;\">Prerequisites<\/span><\/h2>\n<p style=\"font-weight: bold;\"><span style=\"color: #000000;\">Azure Subscription<br \/>\n<\/span><span style=\"font-weight: normal;\">Within your Azure Subscription, you will need administrative access permissions and the ability to create resources. The administrative access is required because you will have to register some <span style=\"color: #007cba;\"><a style=\"color: #007cba;\" href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/azure-resource-manager\/management\/resource-providers-and-types\" target=\"_blank\" rel=\"noopener\">Resource Providers<\/a><\/span><\/span><span style=\"font-weight: normal;\"> if they do not already exist. Those resource providers are:<\/span><\/p>\n<ul>\n<li>Microsoft.Purview<\/li>\n<li>Microsoft.Storage<\/li>\n<li>Microsoft.EventHub<\/li>\n<\/ul>\n<p><span style=\"color: #000000;\"><span style=\"font-weight: bold;\">Azure Purview Account<br \/>\n<\/span><\/span>Once your Azure subscription has been configured, you will need a Purview account. While you can have multiple Purview accounts (three max per tenant), you can only add one Purview account at a time. Part of creating the Purview account is selecting the location (Azure region) for your Purview account. Your location will depend on your situation, but usually you want the region closest to where your data resides, or your users, if you are primarily an on-premises organization. All Purview accounts are created with a default Data Map size of 1 capacity unit (CU), where 1 CU supports up to 25 data map operations per second and includes up to 2GB of storage for your meta data. The data map is elastic, which means it will automatically scale based on the load request up to a maximum of 100 CUs. By default, the scaling is configured to not scale more than 10 times the steady state capacity in order to control costs. For more detailed information about the cost and <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/purview\/concept-elastic-data-map\">Elastic <span style=\"color: #007cba;\">Data<\/span> Map<\/a>, see the <a href=\"https:\/\/azure.microsoft.com\/en-us\/pricing\/details\/azure-purview\/\" target=\"_blank\" rel=\"noopener\"><span style=\"color: #007cba;\">Purview<\/span> Pricing page<\/a> and the <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/purview\/how-to-manage-quotas\" target=\"_blank\" rel=\"noopener\"><span style=\"color: #007cba;\">quotas for resources<\/span><\/a> from Microsoft. Azure Purview accounts can be created using the Azure portal interface via your browser, but if you prefer to do it programmatically, they can also be created using <span style=\"color: #007cba;\"><a style=\"color: #007cba;\" href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/purview\/create-catalog-powershell?tabs=azure-powershell\" target=\"_blank\" rel=\"noopener\">Azure PowerShell<\/a><\/span>, the <span style=\"color: #007cba;\"><a style=\"color: #007cba;\" href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/purview\/create-purview-dotnet\" target=\"_blank\" rel=\"noopener\">.NET SDK<\/a><\/span>, or <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/purview\/create-purview-python\" target=\"_blank\" rel=\"noopener\"><span style=\"color: #007cba;\">Python<\/span><\/a>.<\/p>\n<p><span style=\"color: #000000;\"><span style=\"font-weight: bold;\">Azure Key Vault<br \/>\n<\/span><\/span><a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/key-vault\/general\/\"><span style=\"color: #007cba;\">Azure Key Vault<\/span><\/a> is Microsoft\u2019s cloud service for securely storing keys, secrets, and certificates. Purview uses Azure Key Vault to securely store your data source credentials. There are currently three supported authentication methods that use Azure Key Vault: Account Key, SQL Authentication, and Service Principal. You can also use <span style=\"color: #007cba;\"><a style=\"color: #007cba;\" href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/purview\/manage-credentials#use-purview-managed-identity-to-set-up-scans\" target=\"_blank\" rel=\"noopener\">Azure Purview Managed Identity<\/a><\/span>, which does not require creating credentials in Azure Key Vault. Your authentication method will be determined by data source type and networking requirements. Microsoft wrote a great article on <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/purview\/manage-credentials\">Credentials for source <span style=\"color: #007cba;\">Authentication<\/span> in Azure Purview<\/a> to help you decide what authentication method to use.<\/p>\n<p><span style=\"color: #000000;\"><span style=\"font-weight: bold;\">Azure Purview Roles<br \/>\n<\/span><\/span><span style=\"color: #007cba;\"><a style=\"color: #007cba;\" href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/purview\/catalog-permissions\" target=\"_blank\" rel=\"noopener\">Azure Purview Roles <\/a><\/span> determine who can do what in Purview. In order to scan your data sources, one or more security principals need to be added to one of the predefined Data Plane roles: Purview Data Reader, Purview Data Curator, or Purview Data Source Administrator. Azure Purview roles support individual users, Azure Active Directory Groups, and Service Principals. By default, the creator of the Azure Purview Account will be treated as if they are in both the Purview Data Curator and Purview Data Source Administrator roles.<\/p>\n<div style=\"overflow-x: auto; max-width: 100%; width: 60%; margin-left: auto; margin-right: auto;\" data-hs-responsive-table=\"true\">\n<table style=\"width: 100%; border-collapse: collapse; table-layout: fixed; border: 1px solid #99acc2; height: 437.433px;\">\n<tbody>\n<tr style=\"height: 38.0333px;\">\n<td style=\"width: 43.8749%; padding: 4px; height: 38px; vertical-align: middle; background-color: #e6e7e8; border: 1px solid #007cba;\"><strong><span style=\"color: #4c4c51;\">Role<\/span><\/strong><\/td>\n<td style=\"width: 81.0513%; padding: 4px; height: 38px; vertical-align: middle; background-color: #e6e7e8; border: 1px solid #007cba;\"><strong>Activities<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 63.0667px;\">\n<td style=\"width: 43.8749%; padding: 4px; height: 63px; vertical-align: middle; background-color: #ffffff; border: 1px solid #007cba;\"><strong><span style=\"color: #4c4c51;\">Purview<br \/>\nData Reader<\/span><\/strong><\/td>\n<td style=\"width: 81.0513%; padding: 4px; height: 63px; vertical-align: middle; background-color: #ffffff; border: 1px solid #007cba;\">Access to Purview Portal<br \/>\nRead all content except scan bindings<\/td>\n<\/tr>\n<tr style=\"height: 225.233px;\">\n<td style=\"width: 43.8749%; padding: 4px; height: 225px; vertical-align: middle; background-color: #ffffff; border: 1px solid #007cba;\"><strong><span style=\"color: #4c4c51;\">Purview<br \/>\nData Curator<br \/>\n<\/span><\/strong><\/td>\n<td style=\"width: 81.0513%; padding: 4px; height: 225px; vertical-align: middle; background-color: #ffffff; border: 1px solid #007cba;\">\n<p style=\"font-size: 14px;\"><span style=\"color: #4c4c51;\">A<span style=\"font-size: 17px;\">ccess to Purview Portal<br \/>\n<\/span><\/span><span style=\"color: #4c4c51; font-size: 17px;\">Read all content except scan bindings<br \/>\n<\/span><span style=\"color: #4c4c51; font-size: 17px;\">Edit Asset information<br \/>\n<\/span><span style=\"font-size: 17px; color: #4c4c51;\">Edit Classification definitions<br \/>\n<\/span><span style=\"font-size: 17px; color: #4c4c51;\">Edit Glossary terms<br \/>\n<\/span><span style=\"font-size: 17px; color: #4c4c51;\">Assign Classification definitions<br \/>\nAssign Glossary terms<br \/>\n<\/span><\/p>\n<\/td>\n<\/tr>\n<tr style=\"height: 110.1px;\">\n<td style=\"width: 43.8749%; padding: 4px; height: 110px; vertical-align: middle; background-color: #ffffff; border: 1px solid #007cba;\"><strong><span style=\"color: #4c4c51;\">Purview<br \/>\nData Source Administrator<br \/>\n<\/span><\/strong><\/td>\n<td style=\"border: 1px solid #007cba; width: 81.0513%; padding: 4px; height: 110px; vertical-align: top; background-color: #ffffff;\" width=\"300\">\n<p style=\"font-weight: normal; font-size: 17px;\"><span style=\"color: #4c4c51;\">No Access to Purview Portal<br \/>\n<\/span><span style=\"color: #4c4c51;\">Manage scan bindings information only<br \/>\n<\/span><span style=\"color: #4c4c51;\">No access to non-scan related content<\/span><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<h2><span style=\"color: #007cba; font-size: 30px;\"><span style=\"font-size: 45px;\">Register your Data Sources<\/span><br \/>\n<\/span><\/h2>\n<p>Now that the prerequisites have been met, the final step before scanning is to register your data sources. Registering can be done manually via the <span style=\"color: #007cba;\"><a style=\"color: #007cba;\" href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/purview\/manage-data-sources\" target=\"_blank\" rel=\"noopener\">Purview portal<\/a><\/span> or programmatically via the REST API. Currently, there are several supported data sources from on-premises like SQL Server and Oracle DB, to SaaS sources like Power BI and SAP HANA, to cloud providers like Azure and Amazon S3. Check out the <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/purview\/purview-connector-overview\"><span style=\"color: #007cba;\">Supported<\/span> Data Stores<\/a> page from Microsoft for a complete and up to date list. If you have multiple data sources in Azure, Amazon, or Azure Synapse Analytics, you can register them in a single effort using the <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/purview\/register-scan-azure-multiple-sources\" target=\"_blank\" rel=\"noopener\"><span style=\"color: #007cba;\">Multiple<\/span> feature<\/a>.<\/p>\n<h2 style=\"font-size: 45px;\"><span style=\"color: #007cba;\"><span style=\"color: #007cba;\">Scanning<\/span><\/span><\/h2>\n<p>Scanning is when the Purview engine connects to your data source and starts collecting its metadata. What metadata is it collecting? Well, that depends on your data source type. For example, when scanning a SQL Server database, schema names, table names, view names, column names, and their data types are collected. In addition to the meta data about your data source, you can specify Scan Rules, which will assist with classification efforts. There are some out-of-the-box scan rules, called System Scan Rules to get you started. However, if those don\u2019t provide all the information you need, you can always <span style=\"color: #007cba;\"><a style=\"color: #007cba;\" href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/purview\/create-a-scan-rule-set\" target=\"_blank\" rel=\"noopener\">create custom Scan Rule Sets<\/a><\/span> specific to your organization.<\/p>\n<p>For example, the System Scan Rule set for AzureStorage scans the following file types: CSV, JSON, PSV, SSV, TSV, GZIP, TXT, XML, PARQUET, AVRO, ORC, DOC, DOCM, DOCX, DOT, ODP, ODS, ODT, PDF, POT, PPS, PPSX, PPT, PPTM, PPTX, XLC, XLS, XLSB, XLSM, XLSX, XLT, and includes 206 classification rules. If you wanted to exclude a file type or modify the classification rules, you can create a Custom Scan Rule set that will only scan the file types you specify and use the classification rules you select.<\/p>\n<p><span style=\"color: #000000;\"><span style=\"font-weight: bold;\">On-Premises Scanning<br \/>\n<\/span><\/span><\/p>\n<p>Scanning your on-premises data sources deserves a special call out. To scan your on-premises data sources, you will need to install the latest <a href=\"https:\/\/www.microsoft.com\/download\/details.aspx?id=39717\" target=\"_blank\" rel=\"noopener\">self-hosted <span style=\"color: #007cba;\">integration<\/span> runtime<\/a> (IR). The self-hosted integration runtime is the compute infrastructure that Azure Data Factory uses to provide data collection abilities across different network environments. This is how Purview will communicate with your data source and import the metadata about your data source. It is best practice to install the self-hosted integration runtime on its own machine, which can be either a physical or virtual machine. Depending on your data source type, you may also need to install <a href=\"https:\/\/www.oracle.com\/java\/technologies\/javase-jdk11-downloads.html\" target=\"_blank\" rel=\"noopener\"><span style=\"text-decoration: none;\">JDK 11<\/span><\/a>, the <a href=\"https:\/\/www.microsoft.com\/download\/details.aspx?id=30679\" target=\"_blank\" rel=\"noopener\"><span style=\"color: #007cba;\">Visual<\/span> C++ Redistributable 2012 Update 4<\/a>, and any necessary data source drivers on the same machine where the self-hosted integration runtime is running.<\/p>\n<h2><span style=\"color: #007cba;\"><span style=\"color: #007cba;\">Classification<\/span><\/span><\/h2>\n<p>Classifications help you identify what types of data you have in your data estate. Currently there are five categories of classifications: Government, Financial, Personal, Security, and Miscellaneous. A few examples with these classifications with their respective categories include:<\/p>\n<ul>\n<li style=\"margin-left: 0in !msorm;\">various National Identification Numbers, Passport numbers, and Taxpayer Identification Numbers for the Government category<\/li>\n<li style=\"margin-left: 0in !msorm;\">ABA routing numbers, Credit Card Numbers, and various national Bank Account Numbers for the Financial category<\/li>\n<li style=\"margin-left: 0in !msorm;\">email address, date of birth, and phone number for the Personal category<\/li>\n<\/ul>\n<p>There are currently over 200 classifications available in Purview, but if they don\u2019t meet your need, you have the option of <span style=\"color: #007cba;\"><a style=\"color: #007cba;\" href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/purview\/create-a-custom-classification-and-classification-rule\" target=\"_blank\" rel=\"noopener\">creating custom classifications<\/a><\/span> based on Regular Expressions or a Dictionary of values.<\/p>\n<p>Classifications can be applied <span style=\"color: #007cba;\"><a style=\"color: #007cba;\" href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/purview\/apply-classifications\" target=\"_blank\" rel=\"noopener\">manually<\/a><\/span> or automatically through the <span style=\"color: #007cba;\"><a style=\"color: #007cba;\" href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/purview\/create-sensitivity-label\" target=\"_blank\" rel=\"noopener\">scan rule sets<\/a><\/span>. Classifications can be applied at the resource set level (manually), table level (manually) or column level (automatically). Once classifications have been applied, re-scanning will not overwrite the assigned classifications, but new classifications will be added if they are detected. You can only remove classifications manually via the Purview portal or programmatically via the REST API.<\/p>\n<h2><span style=\"color: #007cba; font-size: 48px;\">In Summary<\/span><\/h2>\n<p><span style=\"font-size: 17px;\">That\u2019s it for the Scanning and Classification installment of our Azure Purview series. <\/span><span style=\"font-size: 17px;\">If you missed the other posts in this series, they can be found here:<br \/>\n<span style=\"color: #007cba;\"><a style=\"color: #007cba;\" href=\"\/blog\/azure-purview-serires-part-1-an-overview\" target=\"_blank\" rel=\"noopener\">Azure Purview Series &#8211; Part 1: An Overview<\/a><\/span><br \/>\n<span style=\"color: #007cba;\"><a style=\"color: #007cba;\" href=\"\/blog\/azure-purview-series-part-2-data-catalog\" target=\"_blank\" rel=\"noopener\">Azure Purview Series &#8211; Part 2: Data Catalog<\/a><br \/>\n<a style=\"color: #007cba;\" href=\"\/blog\/azure-purview-series-part-4-data-map\" rel=\"noopener\">Azure Purview Series &#8211; Part 4: Data Map<\/a><\/span><\/span><\/p>\n<p><span style=\"font-size: 17px;\"><span style=\"color: #007cba;\"><span style=\"color: #58595b;\">Note: Azure Purview is now <a style=\"color: #58595b;\" href=\"https:\/\/azure.microsoft.com\/en-us\/updates\/azure-purview-is-now-generally-available\/\" target=\"_blank\" rel=\"noopener\"><span style=\"color: #007cba;\">Generally Available<\/span><\/a> as of 9\/28\/21.<\/span><\/p>\n<p><\/span><\/span><\/p>\n<p><span style=\"color: #000000;\"><span style=\"font-weight: bold;\">More Information<br \/>\n<\/span><\/span>3Cloud offers a variety of <a href=\"\/resources\/\" target=\"_blank\" rel=\"noopener\"><span style=\"color: #007cba;\">resources<\/span><\/a> to help you learn how you can leverage Modern Data Analytics.<\/p>\n<p>Please\u00a0<a href=\"\/get-started\/\" target=\"_blank\" rel=\"noopener\"><span style=\"color: #007cba;\">contact us<\/span><\/a> directly to see how we can help you explore your about modern data analytics options and accelerate your business value.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This series will take a deep dive into each of the building blocks that make up Azure Purview. This third article is meant to give a detailed view into the Scanning and Classification features capabilities.<\/p>\n","protected":false},"author":21,"featured_media":12524,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[260],"tags":[307,304],"class_list":["post-15667","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-ai","tag-digital-transformation","tag-modern-data-platform","topics-blog"],"acf":[],"_links":{"self":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15667","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/comments?post=15667"}],"version-history":[{"count":0,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15667\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media\/12524"}],"wp:attachment":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media?parent=15667"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/categories?post=15667"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/tags?post=15667"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}