{"id":15861,"date":"2018-06-28T14:27:00","date_gmt":"2018-06-28T21:27:00","guid":{"rendered":"https:\/\/devwww.3cloudsolutions.com\/post\/cognitive-services-showcase-api-speech-tools-2\/"},"modified":"2024-01-03T14:34:23","modified_gmt":"2024-01-03T22:34:23","slug":"cognitive-services-showcase-api-speech-tools","status":"publish","type":"post","link":"https:\/\/3cloudsolutions.com\/resources\/cognitive-services-showcase-api-speech-tools\/","title":{"rendered":"Cognitive Services Showcase: API Speech Tools"},"content":{"rendered":"<p>In this next installment in 3Cloud&#8217;s <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/\">series<\/a> on Microsoft\u2019s <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/\">Cognitive Services<\/a>, the Speech APIs will be considered. For app and website developers, these Speech APIs provide natural language processing capabilities that add functionality and value to the customer experience.<\/p>\n<p><!--more--><\/p>\n<p>In addition to an enhanced customer experience, the Speech APIs can be used by organizations to improve business practices and\/or increase customer security. To improve business performance, an organization may use Speech and Language APIs to transcribe call-center recordings to develop deeper understanding of product performance and customers\u2019 concerns. To provide increased customer security, an organization may use the Speaker Recognition API to add a second layer of security by verifying customers\u2019 identities via voice recognition.<\/p>\n<p><img decoding=\"async\" style=\"width: 773px; display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/Banner_Speech.png\" alt=\"Banner_Speech\" width=\"773\" \/><\/p>\n<h2>All About Speech<\/h2>\n<p>First an overview of the Speech APIs \u2013 these pre-trained AI Speech models can hear and speak to your customers with personal and convenient voice-based interactions.<\/p>\n<table style=\"margin-left: auto; margin-right: auto; height: 400px;\" width=\"737\">\n<tbody>\n<tr>\n<td style=\"width: 78px;\"><img decoding=\"async\" style=\"width: 61px; display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/speech_to_text_icon2.jpg\" alt=\"speech_to_text_icon2\" width=\"61\" \/><\/td>\n<td style=\"text-align: center; width: 194px;\"><a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/speech-to-text\/\"><span style=\"font-size: 20px;\"><strong><span style=\"color: #007cba;\">Speech to Text<\/span><\/strong><\/span><\/a><\/td>\n<td style=\"width: 455px; vertical-align: middle; text-align: left;\">Transcribes spoken audio to text with standard or custom models. A custom model can be trained for specific vocabulary or unique speaking styles.<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 78px;\"><img decoding=\"async\" style=\"width: 61px; display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/text_to_speech_icon2.jpg\" alt=\"text_to_speech_icon2\" width=\"61\" \/><\/td>\n<td style=\"text-align: center; width: 194px;\"><a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/text-to-speech\/\"><span style=\"font-size: 20px;\"><strong><span style=\"color: #007cba;\">Text to Speech<\/span><\/strong><\/span><\/a><\/td>\n<td style=\"width: 455px; vertical-align: middle; text-align: left;\">Bring voice to any app by converting text to audio in near real-time with the choice of over 75 default voices.<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 78px;\"><img decoding=\"async\" style=\"width: 52px; display: block; margin-left: auto; margin-right: auto;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/speaker_recognition_icon2.jpg\" alt=\"speaker_recognition_icon2\" width=\"52\" \/><\/td>\n<td style=\"text-align: center; width: 194px;\"><span style=\"font-size: 20px;\"><strong><span style=\"color: #007cba;\">\u00a0<a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/speaker-recognition\/?v=18.05\">Speaker Recognition<\/a><\/span><\/strong><\/span><\/td>\n<td style=\"width: 455px; vertical-align: middle; text-align: left;\">Voice verification and speaker identification can identify who is speaking; providing increased security in authentication experiences for customers.<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align: center; width: 78px;\"><img decoding=\"async\" style=\"width: 48px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/speech_translation_icon2.jpg\" alt=\"speech_translation_icon2\" width=\"48\" \/><\/td>\n<td style=\"text-align: center; width: 194px;\"><span style=\"font-size: 20px;\"><strong><span style=\"color: #007cba;\">\u00a0<a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/speech-translation\/\">Speech Translation<\/a><\/span><\/strong><\/span><\/td>\n<td style=\"width: 455px; vertical-align: middle; text-align: left;\">Provides speech-to-speech or speech-to-text translation in 10 different languages.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p style=\"text-align: center;\">For more information, see <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/directory\/speech\/\">Microsoft\u2019s Cognitive Services Speech Directory<\/a>.<\/p>\n<h2>Develop Business Insight with Text Analytics using Speech to Text and Text Analytics APIs<\/h2>\n<p>Organizations with call centers can use the Custom Speech to Text API to transcribe call center recordings that then could be explored with the Language Text Analytics API. Text analysis of call center interactions could lead to answers for questions such as \u2018<em>what are our top three product-related issues<\/em>\u2019 or \u2018<em>what issues are of most concern to our customers<\/em>\u2019? While analysis of call center recordings is not a new data analytic practice, use of Microsoft\u2019s Cognitive Services pre-trained AI models can make the development of the data analytic pipeline faster and more robust.<\/p>\n<h3><strong>Speech to Text API<\/strong><\/h3>\n<p>Microsoft\u2019s Speech to Text API is the powerful speech recognition technology used by Cortana and several other Microsoft products. The Custom Speech to Text API allows an organization to build on this technology by training the model to accurately ingest terminology that is unique to the organization\u2019s business practices; for example, distinct sounding terms for products or product functionality. The steps necessary to train the Custom Speech to Text API may be repeated until the desired level of accuracy is reached. Once this has been achieved, the organization\u2019s call center recordings can be transcribed and readied for text analysis.<\/p>\n<p><a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/speech-to-text\/\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" style=\"width: 929px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/speech-to-text-API-demo-1.jpg\" alt=\"speech to text API demo\" width=\"929\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><em>Click on the image to try the demo with your own recording.<\/em><\/p>\n<h3><strong>Text Analytics API<\/strong><\/h3>\n<p>With accurately transcribed call center recordings, Microsoft\u2019s Language Cognitive Services Text Analytics API provides in-depth text analysis. Text analytics is an umbrella term that can encompass a wide range of practices for analyzing transcribed speech. Practices can include the identification of themes, entities, and sentiment. Themes represent the general \u2018gist\u2019 within the recording; frequently occurring patterns found within the communication can be identified. Identification of entities within the text is the primary reason for use of the Custom Speech to Text API \u2013 that is, an organization may now learn what specific products or services are being discussed in the call center recordings. Finally, there is sentiment analysis, sometimes referred to as Emotion AI; this analysis identifies the positive and negative language used within the interaction.<\/p>\n<p><a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/text-analytics\/?v=18.05\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" style=\"width: 1197px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/text-to-analytics-API-demo-1.jpg\" alt=\"text to analytics API demo\" width=\"1197\" \/><\/a><\/p>\n<p style=\"text-align: center;\"><em>Click on the image to try the demo with your own transcription.<\/em><\/p>\n<p>In combination, these services provide businesses with several options for the development of a highly customized and flexible data analytic pipeline for the analysis of their call center recordings.<\/p>\n<h2>Enhance Customer Authentication Processes with the Speaker Verification API<\/h2>\n<p>Identity and data privacy threats are as rampant as ever and have revealed that one-factor authentication processes (e.g. basic username and password usage) are quite vulnerable to theft. Two-factor authentication (2FA) allows customers and organizations to increase the security of their data and identities. One form of a two-factor authentication process is the biometric authentication known as voice recognition. The Speaker Verification API can be used to create a voice recognition system that will recognize the customers\u2019 voices as a means by which to identify them. Such a two-factor authentication can take place during a routine call center interaction. Voice recognition can be accomplished by modeling a customer\u2019s unique voiceprint using the Speaker Verification API. Once a voiceprint of a customer has been created, it can be saved for use for whenever the customer calls again, wherein a call center system can compare the current voice to the voiceprint on file. There are several options for two-factor authentication processes, however a voiceprint authentication has the distinct advantage in that a customer does not need to know or provide additional information (e.g. confirmation codes sent via text or answers to secret questions.) The Speaker Recognition API can create the voiceprint files for authentication and security processes and is robust enough to operate in quite complex acoustic environments.<\/p>\n<h3><strong>Speaker Recognition API<\/strong><\/h3>\n<p><a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/speaker-recognition\/?v=18.05\" target=\"_blank\" rel=\"noopener\"><strong><img decoding=\"async\" style=\"width: 1200px;\" src=\"https:\/\/3cloudsolutions.com\/wp-content\/uploads\/2022\/11\/speaker-recognition-API-demo-1.jpg\" alt=\"speaker recognition API demo\" width=\"1200\" \/><\/strong><\/a><\/p>\n<p style=\"text-align: center;\"><em>Click on the image to try the demo with your own recordings.<\/em><\/p>\n<p>Currently, Microsoft\u2019s Cognitive Services has more than 20 pre-built APIs; many of those allow for customization. Individually or in combination, these APIs enable software developers and data scientists to implement powerful AI solutions that can significantly transform and improve business processes.<\/p>\n<h2>More to Come<\/h2>\n<p>3Cloud has surveyed several Microsoft Cognitive Service APIs, including search and vision. We will continue to explore the remaining Cognitive Services categories: knowledge and language. Subscribe to our blog so that you don&#8217;t miss out and <a href=\"\/get-started\/\">contact us<\/a> if you would like to learn more about incorporating Cognitive Services into your organization\u2019s business practices.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this next installment in 3Cloud\\&#8217;s series on Microsoft\u2019s Cognitive Services, the Speech APIs will be reviewd. For app and website developers, these Speech APIs provide natural language processing capabilities that add functionality and value to the customer experience.<\/p>\n","protected":false},"author":21,"featured_media":14321,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[395,260],"tags":[331,319],"class_list":["post-15861","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science-ai","category-data-ai","tag-cognitive-services","tag-machine-learning-ai","topics-blog"],"acf":[],"_links":{"self":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15861","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/comments?post=15861"}],"version-history":[{"count":0,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/posts\/15861\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media\/14321"}],"wp:attachment":[{"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/media?parent=15861"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/categories?post=15861"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/3cloudsolutions.com\/wp-json\/wp\/v2\/tags?post=15861"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}