From e077619f882e32c5fa2b7bdaf6521546750eb51b Mon Sep 17 00:00:00 2001 From: SeokJin Han <4353157+dem108@users.noreply.github.com> Date: Fri, 6 Sep 2024 18:07:53 -0700 Subject: [PATCH 1/4] remove torchserve HF textgen example --- articles/machine-learning/how-to-deploy-custom-container.md | 1 - 1 file changed, 1 deletion(-) diff --git a/articles/machine-learning/how-to-deploy-custom-container.md b/articles/machine-learning/how-to-deploy-custom-container.md index 17eb5050c43..7a9e9224378 100644 --- a/articles/machine-learning/how-to-deploy-custom-container.md +++ b/articles/machine-learning/how-to-deploy-custom-container.md @@ -33,7 +33,6 @@ The following table lists various [deployment examples](https://github.com/Azure |[tfserving/half-plus-two](https://github.com/Azure/azureml-examples/blob/main/cli/endpoints/online/custom-container/tfserving/half-plus-two)|[deploy-custom-container-tfserving-half-plus-two](https://github.com/Azure/azureml-examples/blob/main/cli/deploy-custom-container-tfserving-half-plus-two.sh)|Deploy a Half Plus Two model using a TensorFlow Serving custom container using the standard model registration process.| |[tfserving/half-plus-two-integrated](https://github.com/Azure/azureml-examples/blob/main/cli/endpoints/online/custom-container/tfserving/half-plus-two-integrated)|[deploy-custom-container-tfserving-half-plus-two-integrated](https://github.com/Azure/azureml-examples/blob/main/cli/deploy-custom-container-tfserving-half-plus-two-integrated.sh)|Deploy a Half Plus Two model using a TensorFlow Serving custom container with the model integrated into the image.| |[torchserve/densenet](https://github.com/Azure/azureml-examples/blob/main/cli/endpoints/online/custom-container/torchserve/densenet)|[deploy-custom-container-torchserve-densenet](https://github.com/Azure/azureml-examples/blob/main/cli/deploy-custom-container-torchserve-densenet.sh)|Deploy a single model using a TorchServe custom container.| -|[torchserve/huggingface-textgen](https://github.com/Azure/azureml-examples/blob/main/cli/endpoints/online/custom-container/torchserve/huggingface-textgen)|[deploy-custom-container-torchserve-huggingface-textgen](https://github.com/Azure/azureml-examples/blob/main/cli/deploy-custom-container-torchserve-huggingface-textgen.sh)|Deploy Hugging Face models to an online endpoint and follow along with the Hugging Face Transformers TorchServe example.| |[triton/single-model](https://github.com/Azure/azureml-examples/blob/main/cli/endpoints/online/custom-container/triton/single-model)|[deploy-custom-container-triton-single-model](https://github.com/Azure/azureml-examples/blob/main/cli/deploy-custom-container-triton-single-model.sh)|Deploy a Triton model using a custom container| This article focuses on serving a TensorFlow model with TensorFlow (TF) Serving. From e758124a398eb93fd577aac10c156f5be2ccf914 Mon Sep 17 00:00:00 2001 From: Eric Urban Date: Fri, 6 Sep 2024 19:20:47 -0700 Subject: [PATCH 2/4] cts is retiring --- .../how-to-async-meeting-transcription.md | 22 ++++--- .../how-to-use-meeting-transcription.md | 24 +++++--- .../meeting-transcription/real-time-csharp.md | 2 +- .../real-time-javascript.md | 2 +- .../meeting-transcription/real-time-python.md | 2 +- .../speech-service/meeting-transcription.md | 59 +++++++++++-------- articles/ai-services/speech-service/toc.yml | 8 +-- 7 files changed, 69 insertions(+), 50 deletions(-) diff --git a/articles/ai-services/speech-service/how-to-async-meeting-transcription.md b/articles/ai-services/speech-service/how-to-async-meeting-transcription.md index cba50011947..160db9136b2 100644 --- a/articles/ai-services/speech-service/how-to-async-meeting-transcription.md +++ b/articles/ai-services/speech-service/how-to-async-meeting-transcription.md @@ -1,19 +1,25 @@ --- -title: Asynchronous meeting transcription - Speech service +title: Asynchronous conversation transcription - Speech service titleSuffix: Azure AI services -description: Learn how to use asynchronous meeting transcription using the Speech service. Available for Java and C# only. +description: Learn how to use asynchronous conversation transcription using the Speech service. Available for Java and C# only. manager: nitinme ms.service: azure-ai-speech ms.topic: how-to -ms.date: 1/18/2024 +ms.date: 9/9/2024 ms.devlang: csharp ms.custom: cogserv-non-critical-speech, devx-track-csharp, devx-track-extended-java zone_pivot_groups: programming-languages-set-twenty-one --- -# Asynchronous meeting transcription +# Asynchronous conversation transcription -In this article, asynchronous meeting transcription is demonstrated using the **RemoteMeetingTranscriptionClient** API. If you have configured meeting transcription to do asynchronous transcription and have a `meetingId`, you can obtain the transcription associated with that `meetingId` using the **RemoteMeetingTranscriptionClient** API. +> [!NOTE] +> This feature is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). + +In this article, asynchronous conversation transcription is demonstrated using the **RemoteMeetingTranscriptionClient** API. If you have configured conversation transcription to do asynchronous transcription and have a `meetingId`, you can obtain the transcription associated with that `meetingId` using the **RemoteMeetingTranscriptionClient** API. + +> [!IMPORTANT] +> Conversation transcription multichannel diarization (preview) is retiring on March 28, 2025. For more information about migrating to other speech to text features, see [Migrate away from conversation transcription multichannel diarization](#migrate-away-from-conversation-transcription-multichannel-diarization). ## Asynchronous vs. real-time + asynchronous @@ -32,7 +38,7 @@ Two steps are required to accomplish asynchronous transcription. The first step ::: zone-end -## Next steps +## Related content -> [!div class="nextstepaction"] -> [Explore our samples on GitHub](https://aka.ms/csspeech/samples) +- [Try the real-time diarization quickstart](get-started-stt-diarization.md) +- [Try batch transcription with diarization](batch-transcription.md) diff --git a/articles/ai-services/speech-service/how-to-use-meeting-transcription.md b/articles/ai-services/speech-service/how-to-use-meeting-transcription.md index bddd4e9641a..35a76423e7b 100644 --- a/articles/ai-services/speech-service/how-to-use-meeting-transcription.md +++ b/articles/ai-services/speech-service/how-to-use-meeting-transcription.md @@ -1,28 +1,34 @@ --- -title: Real-time meeting transcription quickstart - Speech service +title: Real-time conversation transcription quickstart - Speech service titleSuffix: Azure AI services description: In this quickstart, learn how to transcribe meetings. You can add, remove, and identify multiple participants by streaming audio to the Speech service. author: eric-urban manager: nitinme ms.service: azure-ai-speech ms.topic: quickstart -ms.date: 1/21/2024 +ms.date: 9/9/2024 ms.author: eur zone_pivot_groups: acs-js-csharp-python ms.custom: cogserv-non-critical-speech, references_regions, devx-track-extended-java, devx-track-js, devx-track-python --- -# Quickstart: Real-time meeting transcription +# Quickstart: Real-time conversation transcription (preview) -You can transcribe meetings with the ability to add, remove, and identify multiple participants by streaming audio to the Speech service. You first create voice signatures for each participant using the REST API, and then use the voice signatures with the Speech SDK to transcribe meetings. See the meeting transcription [overview](meeting-transcription.md) for more information. +> [!NOTE] +> This feature is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). + +You can transcribe meetings with the ability to add, remove, and identify multiple participants by streaming audio to the Speech service. You first create voice signatures for each participant using the REST API, and then use the voice signatures with the Speech SDK to transcribe meetings. See the conversation transcription [overview](meeting-transcription.md) for more information. + +> [!IMPORTANT] +> Conversation transcription multichannel diarization (preview) is retiring on March 28, 2025. For more information about migrating to other speech to text features, see [Migrate away from conversation transcription multichannel diarization](#migrate-away-from-conversation-transcription-multichannel-diarization). ## Limitations * Only available in the following subscription regions: `centralus`, `eastasia`, `eastus`, `westeurope` * Requires a 7-mic circular multi-microphone array. The microphone array should meet [our specification](./speech-sdk-microphone.md). -> [!NOTE] -> The Speech SDK for C++, Java, Objective-C, and Swift support meeting transcription, but we haven't yet included a guide here. +> [!IMPORTANT] +> For the conversation transcription multichannel diarization feature, use `MeetingTranscriber` instead of `ConversationTranscriber`, and use `CreateMeetingAsync` instead of `CreateConversationAsync`. A new "conversation transcription" feature is released without the use of user profiles and voice signatures. For more information, see the [release notes](releasenotes.md?tabs=speech-sdk). ::: zone pivot="programming-language-javascript" [!INCLUDE [JavaScript Basics include](includes/how-to/meeting-transcription/real-time-javascript.md)] @@ -36,7 +42,7 @@ You can transcribe meetings with the ability to add, remove, and identify multip [!INCLUDE [Python Basics include](includes/how-to/meeting-transcription/real-time-python.md)] ::: zone-end -## Next steps +## Related content -> [!div class="nextstepaction"] -> [Asynchronous meeting transcription](how-to-async-meeting-transcription.md) +- [Try the real-time diarization quickstart](get-started-stt-diarization.md) +- [Try batch transcription with diarization](batch-transcription.md) \ No newline at end of file diff --git a/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-csharp.md b/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-csharp.md index 23a3ac4f91c..aa0049c8ece 100644 --- a/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-csharp.md +++ b/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-csharp.md @@ -2,7 +2,7 @@ author: eric-urban ms.service: azure-ai-speech ms.topic: include -ms.date: 01/24/2022 +ms.date: 9/9/2024 ms.author: eur --- diff --git a/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-javascript.md b/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-javascript.md index 9bb336c214c..8c490b62280 100644 --- a/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-javascript.md +++ b/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-javascript.md @@ -2,7 +2,7 @@ author: eric-urban ms.service: azure-ai-speech ms.topic: include -ms.date: 01/24/2022 +ms.date: 9/9/2024 ms.author: eur --- diff --git a/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-python.md b/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-python.md index 9642d5ecd50..5834b9c8649 100644 --- a/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-python.md +++ b/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-python.md @@ -2,7 +2,7 @@ author: jyotsna-ravi ms.service: azure-ai-speech ms.topic: include -ms.date: 11/11/2022 +ms.date: 9/9/2024 ms.author: jyravi --- diff --git a/articles/ai-services/speech-service/meeting-transcription.md b/articles/ai-services/speech-service/meeting-transcription.md index 08353d6e53f..2522266e1bd 100644 --- a/articles/ai-services/speech-service/meeting-transcription.md +++ b/articles/ai-services/speech-service/meeting-transcription.md @@ -1,26 +1,40 @@ --- -title: Meeting transcription overview - Speech service +title: Conversation transcription overview (preview) titleSuffix: Azure AI services -description: You use the meeting transcription feature for meetings. It combines recognition, speaker ID, and diarization to provide transcription of any meeting. +description: You use the conversation transcription feature for meetings. It combines recognition, speaker ID, and diarization to provide transcription of any meeting. author: eric-urban manager: nitinme ms.service: azure-ai-speech ms.topic: overview -ms.date: 1/21/2024 +ms.date: 9/9/2024 ms.author: eur ms.custom: cogserv-non-critical-speech, references_regions --- -# What is meeting transcription? (Preview) +# What is conversation transcription multichannel diarization? (preview) -Meeting transcription is a [speech to text](speech-to-text.md) solution that provides real-time or asynchronous transcription of any meeting. This feature, which is currently in preview, combines speech recognition, speaker identification, and sentence attribution to determine who said what, and when, in a meeting. +> [!NOTE] +> This feature is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). + +Conversation transcription multichannel diarization is a [speech to text](speech-to-text.md) solution that provides real-time or asynchronous transcription of any meeting. This feature combines speech recognition, speaker identification, and sentence attribution to determine who said what, and when, in a meeting. > [!IMPORTANT] -> The former "conversation transcription" scenario is renamed to "meeting transcription." For example, use `MeetingTranscriber` instead of `ConversationTranscriber`, and use `CreateMeetingAsync` instead of `CreateConversationAsync`. A new "conversation transcription" feature is released without the use of user profiles and voice signatures. For more information, see the [release notes](releasenotes.md?tabs=speech-sdk). +> Conversation transcription multichannel diarization (preview) is retiring on March 28, 2025. For more information about migrating to other speech to text features, see [Migrate away from conversation transcription multichannel diarization](#migrate-away-from-conversation-transcription-multichannel-diarization). + +## Migrate away from conversation transcription multichannel diarization + +Conversation transcription multichannel diarization (preview) is retiring on March 28, 2025. + +To continue using speech to text with diarization, use the following features instead: + +- [Real-time speech to text with diarization](get-started-stt-diarization.md) +- [Batch transcription with diarization](batch-transcription.md) + +These speech to text features only support diarization for single-channel audio. Multichannel audio that you used with conversation transcription multichannel diarization isn't supported. ## Key features -You might find the following features of meeting transcription useful: +You might find the following features of conversation transcription useful: - **Timestamps:** Each speaker utterance has a timestamp, so that you can easily find when a phrase was said. - **Readable transcripts:** Transcripts have formatting and punctuation added automatically to ensure the text closely matches what was being said. @@ -31,33 +45,26 @@ You might find the following features of meeting transcription useful: - **Asynchronous transcription:** Provide transcripts with higher accuracy by using a multichannel audio stream. > [!NOTE] -> Although meeting transcription doesn't put a limit on the number of speakers in the room, it's optimized for 2-10 speakers per session. - -## Get started - -See the real-time meeting transcription [quickstart](how-to-use-meeting-transcription.md) to get started. +> Although conversation transcription doesn't put a limit on the number of speakers in the room, it's optimized for 2-10 speakers per session. ## Use cases -To make meetings inclusive for everyone, such as participants who are deaf and hard of hearing, it's important to have transcription in real-time. Meeting transcription in real-time mode takes meeting audio and determines who is saying what, allowing all meeting participants to follow the transcript and participate in the meeting, without a delay. +To make meetings inclusive for everyone, such as participants who are deaf and hard of hearing, it's important to have transcription in real-time. Conversation transcription in real-time mode takes meeting audio and determines who is saying what, allowing all meeting participants to follow the transcript and participate in the meeting, without a delay. -Meeting participants can focus on the meeting and leave note-taking to meeting transcription. Participants can actively engage in the meeting and quickly follow up on next steps, using the transcript instead of taking notes and potentially missing something during the meeting. +Meeting participants can focus on the meeting and leave note-taking to conversation transcription. Participants can actively engage in the meeting and quickly follow up on next steps, using the transcript instead of taking notes and potentially missing something during the meeting. ## How it works The following diagram shows a high-level overview of how the feature works. -![Diagram that shows the relationships among different pieces of the meeting transcription solution.](media/scenarios/meeting-transcription-service.png) +![Diagram that shows the relationships among different pieces of the conversation transcription solution.](media/scenarios/meeting-transcription-service.png) ## Expected inputs -Meeting transcription uses two types of inputs: +Conversation transcription uses two types of inputs: - **Multi-channel audio stream:** For specification and design details, see [Microphone array recommendations](./speech-sdk-microphone.md). -- **User voice samples:** Meeting transcription needs user profiles in advance of the conversation for speaker identification. Collect audio recordings from each user, and then send the recordings to the [signature generation service](https://aka.ms/cts/signaturegenservice) to validate the audio and generate user profiles. - -> [!NOTE] -> Single channel audio configuration for meeting transcription is currently only available in private preview. +- **User voice samples:** Conversation transcription needs user profiles in advance of the conversation for speaker identification. Collect audio recordings from each user, and then send the recordings to the [signature generation service](https://aka.ms/cts/signaturegenservice) to validate the audio and generate user profiles. User voice samples for voice signatures are required for speaker identification. Speakers who don't have voice samples are recognized as *unidentified*. Unidentified speakers can still be differentiated when the `DifferentiateGuestSpeakers` property is enabled (see the following example). The transcription output then shows speakers as, for example, *Guest_0* and *Guest_1*, instead of recognizing them as pre-enrolled specific speaker names. @@ -65,7 +72,7 @@ User voice samples for voice signatures are required for speaker identification. config.SetProperty("DifferentiateGuestSpeakers", "true"); ``` -## Real-time vs. asynchronous +## Real-time or asynchronous The following sections provide more detail about transcription modes you can choose. @@ -81,11 +88,11 @@ Audio data is batch processed to return the speaker identifier and transcript. S Audio data is processed live to return the speaker identifier and transcript, and, in addition, requests a high-accuracy transcript through asynchronous processing. Select this mode if your application has a need for real-time transcription, and also requires a higher accuracy transcript for use after the meeting occurred. -## Language support +## Language and region support -Currently, meeting transcription supports [all speech to text languages](language-support.md?tabs=stt) in the following regions: `centralus`, `eastasia`, `eastus`, `westeurope`. +Currently, conversation transcription supports [all speech to text languages](language-support.md?tabs=stt) in the following regions: `centralus`, `eastasia`, `eastus`, `westeurope`. -## Next steps +## Related content -> [!div class="nextstepaction"] -> [Quickstart: Real-time meeting transcription](how-to-use-meeting-transcription.md) +- [Try the real-time diarization quickstart](get-started-stt-diarization.md) +- [Try batch transcription with diarization](batch-transcription.md) diff --git a/articles/ai-services/speech-service/toc.yml b/articles/ai-services/speech-service/toc.yml index 926cdde0bae..a9fff82f750 100644 --- a/articles/ai-services/speech-service/toc.yml +++ b/articles/ai-services/speech-service/toc.yml @@ -321,13 +321,13 @@ items: displayName: pronounce, learn language, assess pron, chatting - name: Azure OpenAI speech to speech chat href: openai-speech.md - - name: Meeting transcription + - name: Conversation transcription multichannel diarization (preview) items: - - name: Meeting transcription overview + - name: Conversation transcription overview href: meeting-transcription.md - - name: Real-time Meeting transcription quickstart + - name: Real-time conversation transcription quickstart href: how-to-use-meeting-transcription.md - - name: Asynchronous Meeting transcription + - name: Asynchronous conversation transcription href: how-to-async-meeting-transcription.md - name: Multi-device conversation items: From 5ca66b17278578ef56863c26909560d5a87d13af Mon Sep 17 00:00:00 2001 From: Eric Urban Date: Sat, 7 Sep 2024 07:31:25 -0700 Subject: [PATCH 3/4] cts is retiring --- .../how-to-async-meeting-transcription.md | 6 +++--- .../how-to-use-meeting-transcription.md | 12 ++++++------ .../includes/common/azure-prerequisites.md | 2 +- .../how-to/meeting-transcription/real-time-csharp.md | 5 +++-- .../meeting-transcription/real-time-javascript.md | 4 ++-- .../how-to/meeting-transcription/real-time-python.md | 4 ++-- .../how-to/remote-meeting/csharp/examples.md | 6 +++--- .../includes/how-to/remote-meeting/java/examples.md | 8 ++++---- .../speech-service/meeting-transcription.md | 2 +- 9 files changed, 25 insertions(+), 24 deletions(-) diff --git a/articles/ai-services/speech-service/how-to-async-meeting-transcription.md b/articles/ai-services/speech-service/how-to-async-meeting-transcription.md index 160db9136b2..a4f1fe8cf3d 100644 --- a/articles/ai-services/speech-service/how-to-async-meeting-transcription.md +++ b/articles/ai-services/speech-service/how-to-async-meeting-transcription.md @@ -11,15 +11,15 @@ ms.custom: cogserv-non-critical-speech, devx-track-csharp, devx-track-extended-j zone_pivot_groups: programming-languages-set-twenty-one --- -# Asynchronous conversation transcription +# Asynchronous conversation transcription multichannel diarization > [!NOTE] > This feature is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). -In this article, asynchronous conversation transcription is demonstrated using the **RemoteMeetingTranscriptionClient** API. If you have configured conversation transcription to do asynchronous transcription and have a `meetingId`, you can obtain the transcription associated with that `meetingId` using the **RemoteMeetingTranscriptionClient** API. +In this article, asynchronous conversation transcription multichannel diarization is demonstrated using the **RemoteMeetingTranscriptionClient** API. If you configured conversation transcription to do asynchronous transcription and have a `meetingId`, you can obtain the transcription associated with that `meetingId` using the **RemoteMeetingTranscriptionClient** API. > [!IMPORTANT] -> Conversation transcription multichannel diarization (preview) is retiring on March 28, 2025. For more information about migrating to other speech to text features, see [Migrate away from conversation transcription multichannel diarization](#migrate-away-from-conversation-transcription-multichannel-diarization). +> Conversation transcription multichannel diarization (preview) is retiring on March 28, 2025. For more information about migrating to other speech to text features, see [Migrate away from conversation transcription multichannel diarization](meeting-transcription.md#migrate-away-from-conversation-transcription-multichannel-diarization). ## Asynchronous vs. real-time + asynchronous diff --git a/articles/ai-services/speech-service/how-to-use-meeting-transcription.md b/articles/ai-services/speech-service/how-to-use-meeting-transcription.md index 35a76423e7b..023bcb4a6f9 100644 --- a/articles/ai-services/speech-service/how-to-use-meeting-transcription.md +++ b/articles/ai-services/speech-service/how-to-use-meeting-transcription.md @@ -1,5 +1,5 @@ --- -title: Real-time conversation transcription quickstart - Speech service +title: Real-time conversation transcription multichannel diarization quickstart - Speech service titleSuffix: Azure AI services description: In this quickstart, learn how to transcribe meetings. You can add, remove, and identify multiple participants by streaming audio to the Speech service. author: eric-urban @@ -12,23 +12,23 @@ zone_pivot_groups: acs-js-csharp-python ms.custom: cogserv-non-critical-speech, references_regions, devx-track-extended-java, devx-track-js, devx-track-python --- -# Quickstart: Real-time conversation transcription (preview) +# Quickstart: Real-time conversation transcription multichannel diarization (preview) > [!NOTE] > This feature is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). -You can transcribe meetings with the ability to add, remove, and identify multiple participants by streaming audio to the Speech service. You first create voice signatures for each participant using the REST API, and then use the voice signatures with the Speech SDK to transcribe meetings. See the conversation transcription [overview](meeting-transcription.md) for more information. +With conversation transcription multichannel diarization, you can transcribe meetings with the ability to add, remove, and identify multiple participants by streaming audio to the Speech service. You first create voice signatures for each participant using the REST API, and then use the voice signatures with the Speech SDK to transcribe meetings. See the conversation transcription [overview](meeting-transcription.md) for more information. > [!IMPORTANT] -> Conversation transcription multichannel diarization (preview) is retiring on March 28, 2025. For more information about migrating to other speech to text features, see [Migrate away from conversation transcription multichannel diarization](#migrate-away-from-conversation-transcription-multichannel-diarization). +> Conversation transcription multichannel diarization (preview) is retiring on March 28, 2025. For more information about migrating to other speech to text features, see [Migrate away from conversation transcription multichannel diarization](meeting-transcription.md#migrate-away-from-conversation-transcription-multichannel-diarization). ## Limitations * Only available in the following subscription regions: `centralus`, `eastasia`, `eastus`, `westeurope` * Requires a 7-mic circular multi-microphone array. The microphone array should meet [our specification](./speech-sdk-microphone.md). -> [!IMPORTANT] -> For the conversation transcription multichannel diarization feature, use `MeetingTranscriber` instead of `ConversationTranscriber`, and use `CreateMeetingAsync` instead of `CreateConversationAsync`. A new "conversation transcription" feature is released without the use of user profiles and voice signatures. For more information, see the [release notes](releasenotes.md?tabs=speech-sdk). +> [!NOTE] +> For the conversation transcription multichannel diarization feature, use `MeetingTranscriber` instead of `ConversationTranscriber`, and use `CreateMeetingAsync` instead of `CreateConversationAsync`. ::: zone pivot="programming-language-javascript" [!INCLUDE [JavaScript Basics include](includes/how-to/meeting-transcription/real-time-javascript.md)] diff --git a/articles/ai-services/speech-service/includes/common/azure-prerequisites.md b/articles/ai-services/speech-service/includes/common/azure-prerequisites.md index 7fd7a36539c..91ba19c2fdd 100644 --- a/articles/ai-services/speech-service/includes/common/azure-prerequisites.md +++ b/articles/ai-services/speech-service/includes/common/azure-prerequisites.md @@ -1,7 +1,7 @@ --- author: eric-urban ms.service: azure-ai-speech -ms.date: 08/07/2024 +ms.date: 9/9/2024 ms.topic: include ms.author: eur --- diff --git a/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-csharp.md b/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-csharp.md index aa0049c8ece..31798964e99 100644 --- a/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-csharp.md +++ b/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-csharp.md @@ -11,11 +11,12 @@ ms.author: eur [!INCLUDE [Prerequisites](../../common/azure-prerequisites.md)] ## Set up the environment + The Speech SDK is available as a [NuGet package](https://www.nuget.org/packages/Microsoft.CognitiveServices.Speech) and implements .NET Standard 2.0. You install the Speech SDK later in this guide, but first check the [platform-specific installation instructions](../../../quickstarts/setup-platform.md?pivots=programming-language-csharp) for any more requirements. ## Create voice signatures -If you want to enroll user profiles, the first step is to create voice signatures for the meeting participants so that they can be identified as unique speakers. This isn't required if you don't want to use pre-enrolled user profiles to identify specific participants. +If you want to enroll user profiles, the first step is to create voice signatures for the meeting participants so that they can be identified as unique speakers. This isn't required if you don't want to use preenrolled user profiles to identify specific participants. The input `.wav` audio file for creating voice signatures must be 16-bit, 16-kHz sample rate, in single channel (mono) format. The recommended length for each audio sample is between 30 seconds and two minutes. An audio sample that is too short results in reduced accuracy when recognizing the speaker. The `.wav` file should be a sample of one person's voice so that a unique voice profile is created. @@ -89,7 +90,7 @@ Running the function `GetVoiceSignatureString()` returns a voice signature strin ## Transcribe meetings -The following sample code demonstrates how to transcribe meetings in real-time for two speakers. It assumes you've already created voice signature strings for each speaker as shown above. Substitute real information for `subscriptionKey`, `region`, and the path `filepath` for the audio you want to transcribe. +The following sample code demonstrates how to transcribe meetings in real-time for two speakers. It assumes that you created voice signature strings for each speaker as shown above. Substitute real information for `subscriptionKey`, `region`, and the path `filepath` for the audio you want to transcribe. If you don't use pre-enrolled user profiles, it takes a few more seconds to complete the first recognition of unknown users as speaker1, speaker2, etc. diff --git a/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-javascript.md b/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-javascript.md index 8c490b62280..fee420eeb49 100644 --- a/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-javascript.md +++ b/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-javascript.md @@ -58,7 +58,7 @@ Running this script returns a voice signature string in the variable `voiceSigna ## Transcribe meetings -The following sample code demonstrates how to transcribe meetings in real-time for two speakers. It assumes you've already created voice signature strings for each speaker as shown above. Substitute real information for `subscriptionKey`, `region`, and the path `filepath` for the audio you want to transcribe. +The following sample code demonstrates how to transcribe meetings in real-time for two speakers. It assumes that you created voice signature strings for each speaker as shown above. Substitute real information for `subscriptionKey`, `region`, and the path `filepath` for the audio you want to transcribe. If you don't use pre-enrolled user profiles, it takes a few more seconds to complete the first recognition of unknown users as speaker1, speaker2, etc. @@ -72,7 +72,7 @@ This sample code does the following: * Creates a `MeetingTranscriber` using the constructor. * Adds participants to the meeting. The strings `voiceSignatureStringUser1` and `voiceSignatureStringUser2` should come as output from the steps above. * Registers to events and begins transcription. -* If you want to differentiate speakers without providing voice samples, please enable `DifferentiateGuestSpeakers` feature as in [Meeting Transcription Overview](../../../meeting-transcription.md). +* If you want to differentiate speakers without providing voice samples, enable `DifferentiateGuestSpeakers` feature as in [Meeting Transcription Overview](../../../meeting-transcription.md). If speaker identification or differentiate is enabled, then even if you have already received `transcribed` results, the service is still evaluating them by accumulated audio information. If the service finds that any previous result was assigned an incorrect `speakerId`, then a nearly identical `Transcribed` result is sent again, where only the `speakerId` and `UtteranceId` are different. Since the `UtteranceId` format is `{index}_{speakerId}_{Offset}`, when you receive a `transcribed` result, you could use `UtteranceId` to determine if the current `transcribed` result is going to correct a previous one. Your client or UI logic could decide behaviors, like overwriting previous output, or to ignore the latest result. diff --git a/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-python.md b/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-python.md index 5834b9c8649..22f00d9df7e 100644 --- a/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-python.md +++ b/articles/ai-services/speech-service/includes/how-to/meeting-transcription/real-time-python.md @@ -58,7 +58,7 @@ You can use these two voice_signature_string as input to the variables `voice_si ## Transcribe meetings -The following sample code demonstrates how to transcribe meetings in real-time for two speakers. It assumes you've already created voice signature strings for each speaker as shown previously. Substitute real information for `subscriptionKey`, `region`, and the path `filepath` for the audio you want to transcribe. +The following sample code demonstrates how to transcribe meetings in real-time for two speakers. It assumes that you created voice signature strings for each speaker as shown previously. Substitute real information for `subscriptionKey`, `region`, and the path `filepath` for the audio you want to transcribe. If you don't use pre-enrolled user profiles, it takes a few more seconds to complete the first recognition of unknown users as speaker1, speaker2, etc. @@ -75,7 +75,7 @@ Here's what the sample does: * Read the whole wave files at once and stream it to SDK and begins transcription. * If you want to differentiate speakers without providing voice samples, you enable the `DifferentiateGuestSpeakers` feature as in [Meeting Transcription Overview](../../../meeting-transcription.md). -If speaker identification or differentiate is enabled, then even if you have already received `transcribed` results, the service is still evaluating them by accumulated audio information. If the service finds that any previous result was assigned an incorrect `speakerId`, then a nearly identical `Transcribed` result is sent again, where only the `speakerId` and `UtteranceId` are different. Since the `UtteranceId` format is `{index}_{speakerId}_{Offset}`, when you receive a `transcribed` result, you could use `UtteranceId` to determine if the current `transcribed` result is going to correct a previous one. Your client or UI logic could decide behaviors, like overwriting previous output, or to ignore the latest result. +If speaker identification or differentiate is enabled, then even if you received `transcribed` results, the service is still evaluating them by accumulated audio information. If the service finds that any previous result was assigned an incorrect `speakerId`, then a nearly identical `Transcribed` result is sent again, where only the `speakerId` and `UtteranceId` are different. Since the `UtteranceId` format is `{index}_{speakerId}_{Offset}`, when you receive a `transcribed` result, you could use `UtteranceId` to determine if the current `transcribed` result is going to correct a previous one. Your client or UI logic could decide behaviors, like overwriting previous output, or to ignore the latest result. ```python import azure.cognitiveservices.speech as speechsdk diff --git a/articles/ai-services/speech-service/includes/how-to/remote-meeting/csharp/examples.md b/articles/ai-services/speech-service/includes/how-to/remote-meeting/csharp/examples.md index 9ee22478d77..512a1a5f7ed 100644 --- a/articles/ai-services/speech-service/includes/how-to/remote-meeting/csharp/examples.md +++ b/articles/ai-services/speech-service/includes/how-to/remote-meeting/csharp/examples.md @@ -2,16 +2,16 @@ author: eric-urban ms.service: azure-ai-speech ms.topic: include -ms.date: 07/26/2022 +ms.date: 9/9/2024 ms.author: eur ms.custom: devx-track-csharp --- ## Upload the audio -The first step for asynchronous transcription is to send the audio to the Meeting Transcription Service using the Speech SDK. +The first step for asynchronous transcription is to send the audio to the conversation transcription service using the Speech SDK. -This example code shows how to create a `MeetingTranscriber` for asynchronous-only mode. In order to stream audio to the transcriber, you add audio streaming code derived from [Transcribe meetings in real-time with the Speech SDK](../../../../how-to-use-meeting-transcription.md). +This example code shows how to use conversation transcription in asynchronous-only mode. In order to stream audio to the transcriber, you need to add audio streaming code derived from the [real-time conversation transcription quickstart](../../../../how-to-use-meeting-transcription.md). ```csharp async Task CompleteContinuousRecognition(MeetingTranscriber recognizer, string meetingId) diff --git a/articles/ai-services/speech-service/includes/how-to/remote-meeting/java/examples.md b/articles/ai-services/speech-service/includes/how-to/remote-meeting/java/examples.md index 95e1d317ea4..1fb14127629 100644 --- a/articles/ai-services/speech-service/includes/how-to/remote-meeting/java/examples.md +++ b/articles/ai-services/speech-service/includes/how-to/remote-meeting/java/examples.md @@ -2,15 +2,15 @@ author: eric-urban ms.service: azure-ai-speech ms.topic: include -ms.date: 04/25/2022 +ms.date: 9/9/2024 ms.author: eur --- ## Upload the audio -Before asynchronous transcription can be performed, you need to send the audio to Meeting Transcription Service using the Speech SDK. +Before asynchronous conversation transcription can be performed, you need to send the audio to the conversation transcription service using the Speech SDK. -This example code shows how to create meeting transcriber for asynchronous-only mode. In order to stream audio to the transcriber, you will need to add audio streaming code derived from [Transcribe meetings in real-time with the Speech SDK](../../../../how-to-use-meeting-transcription.md). Refer to the **Limitations** section of that topic to see the supported platforms and languages APIs. +This example code shows how to use conversation transcription in asynchronous-only mode. In order to stream audio to the transcriber, you need to add audio streaming code derived from the [real-time conversation transcription quickstart](../../../../how-to-use-meeting-transcription.md). Refer to the **Limitations** section of that topic to see the supported platforms and languages APIs. ```java // Create the speech config object @@ -124,7 +124,7 @@ You can obtain **remote-meeting** by editing your pom.xml file as follows. ### Sample transcription code -After you have the `meetingId`, create a remote meeting transcription client **RemoteMeetingTranscriptionClient** at the client application to query the status of the asynchronous transcription. Use **GetTranscriptionOperation** method in **RemoteMeetingTranscriptionClient** to get a [PollerFlux](https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/core/azure-core/src/main/java/com/azure/core/util/polling/PollerFlux.java) object. The PollerFlux object will have information about the remote operation status **RemoteMeetingTranscriptionOperation** and the final result **RemoteMeetingTranscriptionResult**. Once the operation has finished, get **RemoteMeetingTranscriptionResult** by calling **getFinalResult** on a [SyncPoller](https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/core/azure-core/src/main/java/com/azure/core/util/polling/SyncPoller.java). In this code we simply print the result contents to system output. +After you have the `meetingId`, create a remote meeting transcription client **RemoteMeetingTranscriptionClient** at the client application to query the status of the asynchronous transcription. Use **GetTranscriptionOperation** method in **RemoteMeetingTranscriptionClient** to get a [PollerFlux](https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/core/azure-core/src/main/java/com/azure/core/util/polling/PollerFlux.java) object. The PollerFlux object has information about the remote operation status **RemoteMeetingTranscriptionOperation** and the final result **RemoteMeetingTranscriptionResult**. Once the operation is finished, get **RemoteMeetingTranscriptionResult** by calling **getFinalResult** on a [SyncPoller](https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/core/azure-core/src/main/java/com/azure/core/util/polling/SyncPoller.java). In this code, we print the result contents to system output. ```java // Create the speech config object diff --git a/articles/ai-services/speech-service/meeting-transcription.md b/articles/ai-services/speech-service/meeting-transcription.md index 2522266e1bd..29a2d7aed46 100644 --- a/articles/ai-services/speech-service/meeting-transcription.md +++ b/articles/ai-services/speech-service/meeting-transcription.md @@ -66,7 +66,7 @@ Conversation transcription uses two types of inputs: - **Multi-channel audio stream:** For specification and design details, see [Microphone array recommendations](./speech-sdk-microphone.md). - **User voice samples:** Conversation transcription needs user profiles in advance of the conversation for speaker identification. Collect audio recordings from each user, and then send the recordings to the [signature generation service](https://aka.ms/cts/signaturegenservice) to validate the audio and generate user profiles. -User voice samples for voice signatures are required for speaker identification. Speakers who don't have voice samples are recognized as *unidentified*. Unidentified speakers can still be differentiated when the `DifferentiateGuestSpeakers` property is enabled (see the following example). The transcription output then shows speakers as, for example, *Guest_0* and *Guest_1*, instead of recognizing them as pre-enrolled specific speaker names. +User voice samples for voice signatures are required for speaker identification. Speakers who don't have voice samples are recognized as *unidentified*. Unidentified speakers can still be differentiated when the `DifferentiateGuestSpeakers` property is enabled (see the following example). The transcription output then shows speakers as, for example, *Guest_0* and *Guest_1*, instead of recognizing them as preenrolled specific speaker names. ```csharp config.SetProperty("DifferentiateGuestSpeakers", "true"); From 6a754ccec0158713042f07c4d9e240d688a4fe01 Mon Sep 17 00:00:00 2001 From: Eric Urban Date: Sat, 7 Sep 2024 07:39:42 -0700 Subject: [PATCH 4/4] Update translator-overview.md --- articles/ai-services/translator/translator-overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/articles/ai-services/translator/translator-overview.md b/articles/ai-services/translator/translator-overview.md index a94d5e148d1..4328285d275 100644 --- a/articles/ai-services/translator/translator-overview.md +++ b/articles/ai-services/translator/translator-overview.md @@ -12,7 +12,7 @@ ms.author: lajanuar # What is Azure AI Translator? -Translator Service is a cloud-based neural machine translation service that is part of the [Azure AI services](../what-are-ai-services.md) family and can be used with any operating system. Translator powers many Microsoft products and services used by thousands of businesses worldwide for language translation and other language-related operations. In this overview, you learn how Translator can enable you to build intelligent, multi-language solutions for your applications across all [supported languages](./language-support.md). +Azure AI Translator is a cloud-based neural machine translation service that is part of the [Azure AI services](../what-are-ai-services.md) family and can be used with any operating system. Translator powers many Microsoft products and services used by thousands of businesses worldwide for language translation and other language-related operations. In this overview, you learn how Translator can enable you to build intelligent, multi-language solutions for your applications across all [supported languages](./language-support.md). ## Translator features and development options