Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: cache tts audio 缓存tts语音 #5650

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions app/client/api.ts
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ export interface MultimodalContent {
export interface RequestMessage {
role: MessageRole;
content: string | MultimodalContent[];
audio_url?: string;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里需要多测试一下不同的模型。因为不同的client/platform/xxxx.ts里面处理消息的逻辑可能是不一致的。要测一下不同的模型,这里加了一个字段会不会有影响。

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该去扩展ChatMessage的类型 而不是RequestMessage
RequestMessage是模型参数
audio_url属于扩展属性

}

export interface LLMConfig {
Expand Down
14 changes: 11 additions & 3 deletions app/components/chat.module.scss
Original file line number Diff line number Diff line change
Expand Up @@ -430,7 +430,7 @@

.chat-message-item {
box-sizing: border-box;
max-width: 100%;
max-width: 300px;
margin-top: 10px;
border-radius: 10px;
background-color: rgba(0, 0, 0, 0.05);
Expand All @@ -443,6 +443,10 @@
transition: all ease 0.3s;
}

.audio-message {
min-width: 350px;
}

.chat-message-item-image {
width: 100%;
margin-top: 10px;
Expand Down Expand Up @@ -471,6 +475,10 @@
border: rgba($color: #888, $alpha: 0.2) 1px solid;
}

.chat-message-item-audio {
margin-top: 10px;
width: 100%;
}

@media only screen and (max-width: 600px) {
$calc-image-width: calc(100vw/3*2/var(--image-count));
Expand Down Expand Up @@ -519,7 +527,7 @@
background-color: var(--second);

&:hover {
min-width: 0;
//min-width: 350px;
}
}

Expand Down Expand Up @@ -693,4 +701,4 @@
.shortcut-key span {
font-size: 12px;
color: var(--black);
}
}
63 changes: 48 additions & 15 deletions app/components/chat.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ import { MultimodalContent } from "../client/api";

const localStorage = safeLocalStorage();
import { ClientApi } from "../client/api";
import { createTTSPlayer } from "../utils/audio";
import { createTTSPlayer, arrayBufferToWav } from "../utils/audio";
import { MsEdgeTTS, OUTPUT_FORMAT } from "../utils/ms_edge_tts";

const ttsPlayer = createTTSPlayer();
Expand Down Expand Up @@ -1121,6 +1121,15 @@ function _Chat() {
);
};

const updateMessageAudio = (msgId?: string, audio_url?: string) => {
chatStore.updateCurrentSession(
(session) =>
(session.messages = session.messages.map((m) =>
m.id === msgId ? { ...m, audio_url } : m,
)),
Dakai marked this conversation as resolved.
Show resolved Hide resolved
);
};

const onDelete = (msgId: string) => {
deleteMessage(msgId);
};
Expand Down Expand Up @@ -1197,7 +1206,7 @@ function _Chat() {
const accessStore = useAccessStore();
const [speechStatus, setSpeechStatus] = useState(false);
const [speechLoading, setSpeechLoading] = useState(false);
async function openaiSpeech(text: string) {
async function openaiSpeech(text: string): Promise<string | undefined> {
if (speechStatus) {
ttsPlayer.stop();
setSpeechStatus(false);
Expand Down Expand Up @@ -1227,16 +1236,22 @@ function _Chat() {
});
}
setSpeechStatus(true);
ttsPlayer
.play(audioBuffer, () => {
setSpeechStatus(false);
})
.catch((e) => {
console.error("[OpenAI Speech]", e);
showToast(prettyObject(e));
try {
const waveFile = arrayBufferToWav(audioBuffer);
lloydzhou marked this conversation as resolved.
Show resolved Hide resolved
const audioFile = new Blob([waveFile], { type: "audio/wav" });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wav格式体积比较大,后面可以尝试增加mp3格式进行保存?


const audioUrl: string = await uploadImageRemote(audioFile);
await ttsPlayer.play(audioBuffer, () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不应该在保存音频之后再播放。这样会增加延迟。体验不好

setSpeechStatus(false);
})
.finally(() => setSpeechLoading(false));
});
return audioUrl;
} catch (e) {
console.error("[OpenAI Speech]", e);
Dakai marked this conversation as resolved.
Show resolved Hide resolved
showToast(prettyObject(e));
setSpeechStatus(false);
} finally {
setSpeechLoading(false);
}
}
}

Expand Down Expand Up @@ -1793,9 +1808,12 @@ function _Chat() {
<SpeakIcon />
)
}
onClick={() =>
openaiSpeech(getMessageTextContent(message))
}
onClick={async () => {
const url = await openaiSpeech(
getMessageTextContent(message),
);
updateMessageAudio(message.id, url);
}}
/>
)}
</>
Expand Down Expand Up @@ -1830,7 +1848,11 @@ function _Chat() {
))}
</div>
)}
<div className={styles["chat-message-item"]}>
<div
className={`${styles["chat-message-item"]} ${
message.audio_url ? styles["audio-message"] : ""
}`}
>
<Markdown
key={message.streaming ? "loading" : "done"}
content={getMessageTextContent(message)}
Expand Down Expand Up @@ -1879,6 +1901,17 @@ function _Chat() {
})}
</div>
)}
{message.audio_url && (
<audio
id="audio"
preload="auto"
controls
className={styles["chat-message-item-audio"]}
>
<source type="audio/mp3" src={message.audio_url} />
Sorry, your browser does not support HTML5 audio.
</audio>
Dakai marked this conversation as resolved.
Show resolved Hide resolved
)}
</div>

<div className={styles["chat-message-action-date"]}>
Expand Down
1 change: 1 addition & 0 deletions app/icons/play.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions app/icons/stop.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 9 additions & 0 deletions app/styles/globals.scss
Original file line number Diff line number Diff line change
Expand Up @@ -399,3 +399,12 @@ pre {
.copyable {
user-select: text;
}

audio {
height: 35px;
}

audio::-webkit-media-controls-play-button,
audio::-webkit-media-controls-panel {
background-color: none;
}
Dakai marked this conversation as resolved.
Show resolved Hide resolved
13 changes: 13 additions & 0 deletions app/utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,19 @@ export function getMessageImages(message: RequestMessage): string[] {
return urls;
}

//export function getMessageAudio(message: RequestMessage): string[] {
// if (typeof message.content === "string") {
// return [];
// }
// const urls: string[] = [];
// for (const c of message.content) {
// if (c.type === "image_url") {
// urls.push(c.image_url?.url ?? "");
// }
// }
// return urls;
//}
Dakai marked this conversation as resolved.
Show resolved Hide resolved

export function isVisionModel(model: string) {
// Note: This is a better way using the TypeScript feature instead of `&&` or `||` (ts v5.5.0-dev.20240314 I've been using)

Expand Down
51 changes: 51 additions & 0 deletions app/utils/audio.ts
Original file line number Diff line number Diff line change
Expand Up @@ -43,3 +43,54 @@ export function createTTSPlayer(): TTSPlayer {

return { init, play, stop };
}

export function arrayBufferToWav(buffer: ArrayBuffer): ArrayBuffer {
const numOfChannels = 1; // Mono
const sampleRate = 24000; // 24kHz
const bitsPerSample = 16;

const bytesPerSample = bitsPerSample / 8;
const blockAlign = numOfChannels * bytesPerSample;
const byteRate = sampleRate * blockAlign;

// WAV header size is 44 bytes
const wavHeaderSize = 44;
const dataSize = buffer.byteLength;
const totalSize = wavHeaderSize + dataSize;

const wavBuffer = new ArrayBuffer(totalSize);
const view = new DataView(wavBuffer);

// RIFF chunk descriptor
writeString(view, 0, "RIFF");
view.setUint32(4, totalSize - 8, true); // File size minus RIFF header
writeString(view, 8, "WAVE");

// FMT sub-chunk
writeString(view, 12, "fmt ");
view.setUint32(16, 16, true); // Sub-chunk size (16 for PCM)
view.setUint16(20, 1, true); // Audio format (1 for PCM)
view.setUint16(22, numOfChannels, true); // Number of channels
view.setUint32(24, sampleRate, true); // Sample rate
view.setUint32(28, byteRate, true); // Byte rate
view.setUint16(32, blockAlign, true); // Block align
view.setUint16(34, bitsPerSample, true); // Bits per sample

// Data sub-chunk
writeString(view, 36, "data");
view.setUint32(40, dataSize, true); // Data size

// Write the PCM samples
const audioData = new Uint8Array(buffer);
const wavData = new Uint8Array(wavBuffer);
wavData.set(audioData, wavHeaderSize);

return wavBuffer;
}

// Helper function to write a string to the DataView
function writeString(view: DataView, offset: number, string: string) {
for (let i = 0; i < string.length; i++) {
view.setUint8(offset + i, string.charCodeAt(i));
}
}
Dakai marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@
"html-to-image": "^1.11.11",
"idb-keyval": "^6.2.1",
"lodash-es": "^4.17.21",
"mermaid": "^10.6.1",
"markdown-to-txt": "^2.0.1",
"mermaid": "^10.6.1",
"nanoid": "^5.0.3",
"next": "^14.1.1",
"node-fetch": "^3.3.1",
Expand Down
Loading