Skip to content

Commit

Permalink
Merge pull request #180 from MicrosoftDocs/main
Browse files Browse the repository at this point in the history
Publish to live, Sunday 4 AM PST, 9/8
  • Loading branch information
ttorble authored Sep 8, 2024
2 parents c6d7eeb + 164b65b commit f07b06a
Show file tree
Hide file tree
Showing 12 changed files with 85 additions and 66 deletions.
Original file line number Diff line number Diff line change
@@ -1,19 +1,25 @@
---
title: Asynchronous meeting transcription - Speech service
title: Asynchronous conversation transcription - Speech service
titleSuffix: Azure AI services
description: Learn how to use asynchronous meeting transcription using the Speech service. Available for Java and C# only.
description: Learn how to use asynchronous conversation transcription using the Speech service. Available for Java and C# only.
manager: nitinme
ms.service: azure-ai-speech
ms.topic: how-to
ms.date: 1/18/2024
ms.date: 9/9/2024
ms.devlang: csharp
ms.custom: cogserv-non-critical-speech, devx-track-csharp, devx-track-extended-java
zone_pivot_groups: programming-languages-set-twenty-one
---

# Asynchronous meeting transcription
# Asynchronous conversation transcription multichannel diarization

In this article, asynchronous meeting transcription is demonstrated using the **RemoteMeetingTranscriptionClient** API. If you have configured meeting transcription to do asynchronous transcription and have a `meetingId`, you can obtain the transcription associated with that `meetingId` using the **RemoteMeetingTranscriptionClient** API.
> [!NOTE]
> This feature is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
In this article, asynchronous conversation transcription multichannel diarization is demonstrated using the **RemoteMeetingTranscriptionClient** API. If you configured conversation transcription to do asynchronous transcription and have a `meetingId`, you can obtain the transcription associated with that `meetingId` using the **RemoteMeetingTranscriptionClient** API.

> [!IMPORTANT]
> Conversation transcription multichannel diarization (preview) is retiring on March 28, 2025. For more information about migrating to other speech to text features, see [Migrate away from conversation transcription multichannel diarization](meeting-transcription.md#migrate-away-from-conversation-transcription-multichannel-diarization).
## Asynchronous vs. real-time + asynchronous

Expand All @@ -32,7 +38,7 @@ Two steps are required to accomplish asynchronous transcription. The first step
::: zone-end


## Next steps
## Related content

> [!div class="nextstepaction"]
> [Explore our samples on GitHub](https://aka.ms/csspeech/samples)
- [Try the real-time diarization quickstart](get-started-stt-diarization.md)
- [Try batch transcription with diarization](batch-transcription.md)
Original file line number Diff line number Diff line change
@@ -1,28 +1,34 @@
---
title: Real-time meeting transcription quickstart - Speech service
title: Real-time conversation transcription multichannel diarization quickstart - Speech service
titleSuffix: Azure AI services
description: In this quickstart, learn how to transcribe meetings. You can add, remove, and identify multiple participants by streaming audio to the Speech service.
author: eric-urban
manager: nitinme
ms.service: azure-ai-speech
ms.topic: quickstart
ms.date: 1/21/2024
ms.date: 9/9/2024
ms.author: eur
zone_pivot_groups: acs-js-csharp-python
ms.custom: cogserv-non-critical-speech, references_regions, devx-track-extended-java, devx-track-js, devx-track-python
---

# Quickstart: Real-time meeting transcription
# Quickstart: Real-time conversation transcription multichannel diarization (preview)

You can transcribe meetings with the ability to add, remove, and identify multiple participants by streaming audio to the Speech service. You first create voice signatures for each participant using the REST API, and then use the voice signatures with the Speech SDK to transcribe meetings. See the meeting transcription [overview](meeting-transcription.md) for more information.
> [!NOTE]
> This feature is currently in public preview. This preview is provided without a service-level agreement, and is not recommended for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
With conversation transcription multichannel diarization, you can transcribe meetings with the ability to add, remove, and identify multiple participants by streaming audio to the Speech service. You first create voice signatures for each participant using the REST API, and then use the voice signatures with the Speech SDK to transcribe meetings. See the conversation transcription [overview](meeting-transcription.md) for more information.

> [!IMPORTANT]
> Conversation transcription multichannel diarization (preview) is retiring on March 28, 2025. For more information about migrating to other speech to text features, see [Migrate away from conversation transcription multichannel diarization](meeting-transcription.md#migrate-away-from-conversation-transcription-multichannel-diarization).
## Limitations

* Only available in the following subscription regions: `centralus`, `eastasia`, `eastus`, `westeurope`
* Requires a 7-mic circular multi-microphone array. The microphone array should meet [our specification](./speech-sdk-microphone.md).

> [!NOTE]
> The Speech SDK for C++, Java, Objective-C, and Swift support meeting transcription, but we haven't yet included a guide here.
> For the conversation transcription multichannel diarization feature, use `MeetingTranscriber` instead of `ConversationTranscriber`, and use `CreateMeetingAsync` instead of `CreateConversationAsync`.
::: zone pivot="programming-language-javascript"
[!INCLUDE [JavaScript Basics include](includes/how-to/meeting-transcription/real-time-javascript.md)]
Expand All @@ -36,7 +42,7 @@ You can transcribe meetings with the ability to add, remove, and identify multip
[!INCLUDE [Python Basics include](includes/how-to/meeting-transcription/real-time-python.md)]
::: zone-end

## Next steps
## Related content

> [!div class="nextstepaction"]
> [Asynchronous meeting transcription](how-to-async-meeting-transcription.md)
- [Try the real-time diarization quickstart](get-started-stt-diarization.md)
- [Try batch transcription with diarization](batch-transcription.md)
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
author: eric-urban
ms.service: azure-ai-speech
ms.date: 08/07/2024
ms.date: 9/9/2024
ms.topic: include
ms.author: eur
---
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
author: eric-urban
ms.service: azure-ai-speech
ms.topic: include
ms.date: 01/24/2022
ms.date: 9/9/2024
ms.author: eur
---

Expand All @@ -11,11 +11,12 @@ ms.author: eur
[!INCLUDE [Prerequisites](../../common/azure-prerequisites.md)]

## Set up the environment

The Speech SDK is available as a [NuGet package](https://www.nuget.org/packages/Microsoft.CognitiveServices.Speech) and implements .NET Standard 2.0. You install the Speech SDK later in this guide, but first check the [platform-specific installation instructions](../../../quickstarts/setup-platform.md?pivots=programming-language-csharp) for any more requirements.

## Create voice signatures

If you want to enroll user profiles, the first step is to create voice signatures for the meeting participants so that they can be identified as unique speakers. This isn't required if you don't want to use pre-enrolled user profiles to identify specific participants.
If you want to enroll user profiles, the first step is to create voice signatures for the meeting participants so that they can be identified as unique speakers. This isn't required if you don't want to use preenrolled user profiles to identify specific participants.

The input `.wav` audio file for creating voice signatures must be 16-bit, 16-kHz sample rate, in single channel (mono) format. The recommended length for each audio sample is between 30 seconds and two minutes. An audio sample that is too short results in reduced accuracy when recognizing the speaker. The `.wav` file should be a sample of one person's voice so that a unique voice profile is created.

Expand Down Expand Up @@ -89,7 +90,7 @@ Running the function `GetVoiceSignatureString()` returns a voice signature strin
## Transcribe meetings

The following sample code demonstrates how to transcribe meetings in real-time for two speakers. It assumes you've already created voice signature strings for each speaker as shown above. Substitute real information for `subscriptionKey`, `region`, and the path `filepath` for the audio you want to transcribe.
The following sample code demonstrates how to transcribe meetings in real-time for two speakers. It assumes that you created voice signature strings for each speaker as shown above. Substitute real information for `subscriptionKey`, `region`, and the path `filepath` for the audio you want to transcribe.

If you don't use pre-enrolled user profiles, it takes a few more seconds to complete the first recognition of unknown users as speaker1, speaker2, etc.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
author: eric-urban
ms.service: azure-ai-speech
ms.topic: include
ms.date: 01/24/2022
ms.date: 9/9/2024
ms.author: eur
---

Expand Down Expand Up @@ -58,7 +58,7 @@ Running this script returns a voice signature string in the variable `voiceSigna
## Transcribe meetings

The following sample code demonstrates how to transcribe meetings in real-time for two speakers. It assumes you've already created voice signature strings for each speaker as shown above. Substitute real information for `subscriptionKey`, `region`, and the path `filepath` for the audio you want to transcribe.
The following sample code demonstrates how to transcribe meetings in real-time for two speakers. It assumes that you created voice signature strings for each speaker as shown above. Substitute real information for `subscriptionKey`, `region`, and the path `filepath` for the audio you want to transcribe.

If you don't use pre-enrolled user profiles, it takes a few more seconds to complete the first recognition of unknown users as speaker1, speaker2, etc.

Expand All @@ -72,7 +72,7 @@ This sample code does the following:
* Creates a `MeetingTranscriber` using the constructor.
* Adds participants to the meeting. The strings `voiceSignatureStringUser1` and `voiceSignatureStringUser2` should come as output from the steps above.
* Registers to events and begins transcription.
* If you want to differentiate speakers without providing voice samples, please enable `DifferentiateGuestSpeakers` feature as in [Meeting Transcription Overview](../../../meeting-transcription.md).
* If you want to differentiate speakers without providing voice samples, enable `DifferentiateGuestSpeakers` feature as in [Meeting Transcription Overview](../../../meeting-transcription.md).

If speaker identification or differentiate is enabled, then even if you have already received `transcribed` results, the service is still evaluating them by accumulated audio information. If the service finds that any previous result was assigned an incorrect `speakerId`, then a nearly identical `Transcribed` result is sent again, where only the `speakerId` and `UtteranceId` are different. Since the `UtteranceId` format is `{index}_{speakerId}_{Offset}`, when you receive a `transcribed` result, you could use `UtteranceId` to determine if the current `transcribed` result is going to correct a previous one. Your client or UI logic could decide behaviors, like overwriting previous output, or to ignore the latest result.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
author: jyotsna-ravi
ms.service: azure-ai-speech
ms.topic: include
ms.date: 11/11/2022
ms.date: 9/9/2024
ms.author: jyravi
---

Expand Down Expand Up @@ -58,7 +58,7 @@ You can use these two voice_signature_string as input to the variables `voice_si
## Transcribe meetings

The following sample code demonstrates how to transcribe meetings in real-time for two speakers. It assumes you've already created voice signature strings for each speaker as shown previously. Substitute real information for `subscriptionKey`, `region`, and the path `filepath` for the audio you want to transcribe.
The following sample code demonstrates how to transcribe meetings in real-time for two speakers. It assumes that you created voice signature strings for each speaker as shown previously. Substitute real information for `subscriptionKey`, `region`, and the path `filepath` for the audio you want to transcribe.

If you don't use pre-enrolled user profiles, it takes a few more seconds to complete the first recognition of unknown users as speaker1, speaker2, etc.

Expand All @@ -75,7 +75,7 @@ Here's what the sample does:
* Read the whole wave files at once and stream it to SDK and begins transcription.
* If you want to differentiate speakers without providing voice samples, you enable the `DifferentiateGuestSpeakers` feature as in [Meeting Transcription Overview](../../../meeting-transcription.md).

If speaker identification or differentiate is enabled, then even if you have already received `transcribed` results, the service is still evaluating them by accumulated audio information. If the service finds that any previous result was assigned an incorrect `speakerId`, then a nearly identical `Transcribed` result is sent again, where only the `speakerId` and `UtteranceId` are different. Since the `UtteranceId` format is `{index}_{speakerId}_{Offset}`, when you receive a `transcribed` result, you could use `UtteranceId` to determine if the current `transcribed` result is going to correct a previous one. Your client or UI logic could decide behaviors, like overwriting previous output, or to ignore the latest result.
If speaker identification or differentiate is enabled, then even if you received `transcribed` results, the service is still evaluating them by accumulated audio information. If the service finds that any previous result was assigned an incorrect `speakerId`, then a nearly identical `Transcribed` result is sent again, where only the `speakerId` and `UtteranceId` are different. Since the `UtteranceId` format is `{index}_{speakerId}_{Offset}`, when you receive a `transcribed` result, you could use `UtteranceId` to determine if the current `transcribed` result is going to correct a previous one. Your client or UI logic could decide behaviors, like overwriting previous output, or to ignore the latest result.

```python
import azure.cognitiveservices.speech as speechsdk
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,16 @@
author: eric-urban
ms.service: azure-ai-speech
ms.topic: include
ms.date: 07/26/2022
ms.date: 9/9/2024
ms.author: eur
ms.custom: devx-track-csharp
---

## Upload the audio

The first step for asynchronous transcription is to send the audio to the Meeting Transcription Service using the Speech SDK.
The first step for asynchronous transcription is to send the audio to the conversation transcription service using the Speech SDK.

This example code shows how to create a `MeetingTranscriber` for asynchronous-only mode. In order to stream audio to the transcriber, you add audio streaming code derived from [Transcribe meetings in real-time with the Speech SDK](../../../../how-to-use-meeting-transcription.md).
This example code shows how to use conversation transcription in asynchronous-only mode. In order to stream audio to the transcriber, you need to add audio streaming code derived from the [real-time conversation transcription quickstart](../../../../how-to-use-meeting-transcription.md).

```csharp
async Task CompleteContinuousRecognition(MeetingTranscriber recognizer, string meetingId)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@
author: eric-urban
ms.service: azure-ai-speech
ms.topic: include
ms.date: 04/25/2022
ms.date: 9/9/2024
ms.author: eur
---

## Upload the audio

Before asynchronous transcription can be performed, you need to send the audio to Meeting Transcription Service using the Speech SDK.
Before asynchronous conversation transcription can be performed, you need to send the audio to the conversation transcription service using the Speech SDK.

This example code shows how to create meeting transcriber for asynchronous-only mode. In order to stream audio to the transcriber, you will need to add audio streaming code derived from [Transcribe meetings in real-time with the Speech SDK](../../../../how-to-use-meeting-transcription.md). Refer to the **Limitations** section of that topic to see the supported platforms and languages APIs.
This example code shows how to use conversation transcription in asynchronous-only mode. In order to stream audio to the transcriber, you need to add audio streaming code derived from the [real-time conversation transcription quickstart](../../../../how-to-use-meeting-transcription.md). Refer to the **Limitations** section of that topic to see the supported platforms and languages APIs.

```java
// Create the speech config object
Expand Down Expand Up @@ -124,7 +124,7 @@ You can obtain **remote-meeting** by editing your pom.xml file as follows.

### Sample transcription code

After you have the `meetingId`, create a remote meeting transcription client **RemoteMeetingTranscriptionClient** at the client application to query the status of the asynchronous transcription. Use **GetTranscriptionOperation** method in **RemoteMeetingTranscriptionClient** to get a [PollerFlux](https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/core/azure-core/src/main/java/com/azure/core/util/polling/PollerFlux.java) object. The PollerFlux object will have information about the remote operation status **RemoteMeetingTranscriptionOperation** and the final result **RemoteMeetingTranscriptionResult**. Once the operation has finished, get **RemoteMeetingTranscriptionResult** by calling **getFinalResult** on a [SyncPoller](https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/core/azure-core/src/main/java/com/azure/core/util/polling/SyncPoller.java). In this code we simply print the result contents to system output.
After you have the `meetingId`, create a remote meeting transcription client **RemoteMeetingTranscriptionClient** at the client application to query the status of the asynchronous transcription. Use **GetTranscriptionOperation** method in **RemoteMeetingTranscriptionClient** to get a [PollerFlux](https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/core/azure-core/src/main/java/com/azure/core/util/polling/PollerFlux.java) object. The PollerFlux object has information about the remote operation status **RemoteMeetingTranscriptionOperation** and the final result **RemoteMeetingTranscriptionResult**. Once the operation is finished, get **RemoteMeetingTranscriptionResult** by calling **getFinalResult** on a [SyncPoller](https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/core/azure-core/src/main/java/com/azure/core/util/polling/SyncPoller.java). In this code, we print the result contents to system output.

```java
// Create the speech config object
Expand Down
Loading

0 comments on commit f07b06a

Please sign in to comment.