Add rate boost support for SAPI5 voices #17610

gexgd0419 · 2025-01-11T15:14:37Z

Link to issue number:

Summary of the issue:

SAPI5 voices do not support rate boosting, but some of the SAPI5 voices are not fast enough even at the highest rate for some experienced users.

Description of user facing changes

The "rate boost" option will be available to users when using SAPI5 voices, which supports rates ranging from 0.5x to 6x.

If rate boost is disabled, the behavior will be the same as before.

Description of development approach

The Sonic library, which is also used by eSpeak NG, is used to change the speed when rate boost is enabled.

When rate boost is enabled, to preserve quality, the SAPI5 voice is set to output at its original speed (1x), and then Sonic is used to change the speed of the original audio. When rate boost is disabled, Sonic is no longer used to change the speed, and the rate of the SAPI5 voice itself is set instead to preserve the previous behavior.

As Sonic is used by eSpeak NG, it has already been included in the NVDA repo as a submodule (/include/sonic/). However, in eSpeak NG, it is compiled as a static library, which cannot be easily reused. So some build steps are changed to build Sonic as a DLL, sonic.dll, instead, which is installed in the synthDrivers folder. eSpeak-NG is also changed to dynamically link to sonic.dll. As importing functions from DLL needs __declspec(dllimport), the header file sonic.h is copied to nvdaHelper/eSpeak and then have __declspec(dllimport) added to functions, which replaces the original sonic.h file when compiling eSpeak.

A new file _sonic.py is created inside synthDrivers to handle the interoperation with sonic.dll. There's initialize() to load the Sonic DLL which is called in speech.initialize(), and there's a wrapper class SonicStream for the Sonic stream mode functions.

The SAPI5 synthesizer now passes the audio through a SonicStream first, before sending the audio to the WavePlayer. To speed up audio processing in Sonic, which uses 16-bit integer wave format internally, we explicitly choose a 16-bit wave format for the SAPI5 voice and the WavePlayer to avoid unnecessary format conversion.

This is the approach I chose currently. The implementation details are open for discussion. Other ways I've thought of:

Move the Sonic library inside nvdaHelperLocal, and process the audio with Sonic in WasapiPlayer before feeding the data to the device. Then add some functions such as getRate and setRate to the WavePlayer.
Implement some kind of "audio plugin" system to allow easy modification to audio streams.

Testing strategy:

Seemed to work on my system.

Known issues with pull request:

None

Code Review Checklist:

Documentation:
- Change log entry
- User Documentation
- Developer / Technical Documentation
- Context sensitive help for GUI changes
Testing:
- Unit tests
- System (end to end) tests
- Manual testing
UX of all users considered:
- Speech
- Braille
- Low Vision
- Different web browsers
- Localization in other languages / culture than English
API is compatible with existing add-ons.
Security precautions taken.

@coderabbitai summary

SaschaCowley · 2025-01-13T01:04:06Z

@gexgd0419 do you know if it would be possible to switch eSpeak-ng to use the dynamically linked Sonic as well?

gexgd0419 · 2025-01-13T03:41:20Z

Sonic itself is not prepared to be exported as a DLL. There isn't any __declspec(dllexport) or __declspec(dllimport) used in the header file, so I added a .def file to export its functions.

When importing functions from a DLL, typically __declspec(dllimport) should be used, which is missing in sonic.h. We can choose to use our own copy of the header that have __declspec(dllimport) added to the functions, but then we'll have to maintain this every time Sonic is updated.

We can also just use the original header file without __declspec(dllimport). It can still work, but the compiler won't apply some optimizations for imported functions. See also this SO question and this blog post.

Should we change the header file or not?

gexgd0419 · 2025-01-13T06:35:17Z

Here eSpeak is changed to use sonic.dll, and the sonic.dll is moved to the synthDrivers directory.

To add __declspec(dllimport) to the functions, I put a modified copy of sonic.h inside nvdaHelper/eSpeak.

@SaschaCowley Is this a better way? If so, I will change the development approach above.

You can use the original header file without __declspec(dllimport) (reverting the last commit), and it can still work. It's just that calls to Sonic functions go through an extra jump.

AppVeyorBot · 2025-01-13T08:08:40Z

PASS: Translation comments check.
PASS: License check.
PASS: Unit tests.
FAIL: Lint check. See test results and lint artifacts for more information.
PASS: System tests (tags: installer NVDA).
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/5q97ocj1r6kysejl/artifacts/output/nvda_snapshot_pr17610-35016,c7d0ad8f.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 1.3,
INSTALL_END 0.9,
BUILD_START 0.0,
BUILD_END 28.3,
TESTSETUP_START 0.0,
TESTSETUP_END 0.4,
TEST_START 0.0,
TEST_END 19.1,
FINISH_END 0.2

See test results for failed build of commit c7d0ad8f83

LeonarddeR · 2025-01-13T15:02:32Z

I really like your contributions @gexgd0419!
The only concern I have, is how indexing is supposed to work when Sonic is active. Is that still accurate?

SaschaCowley · 2025-01-13T22:22:04Z

@gexgd0419 my main concern was having 2 copies of Sonic in NVDA: a statically linked one for eSpeak, and a dynamically linked one for SAPI5 (and potentially other synths in future). It sounds like maintaining the dllexport and dllimport headers for eSpeak on our side will be unmaintainable, and dynamically linking eSpeak without them will come at a performance penalty, which is the opposite of what we're trying to achieve here. Apart from an increase in built size, is there any disadvantage to having 2 copies of sonic? Is creating the dllexport and dllimports upstream in sonic something you'd be willing to either do, or open an issue for? @seanbudd what are your thoughts here?

gexgd0419 · 2025-01-14T01:11:09Z

There is a PR in sonic waywardgeek/sonic#27 that does something similar, but it hasn't been merged for years.

gexgd0419 · 2025-01-14T15:10:56Z

sonic.dll is 87.5 KiB. eSpeak using static sonic is 637 KiB, while using sonic DLL is 632 KiB, which only decreases 5 KiB in size after splitting out the Sonic part. So it will be larger if we decide to build Sonic as a standalone DLL.

The "performance penalty" for not using dllimport is that all calls to Sonic functions will go to a thunk first, then jump to the actual target stored in the import table. If dllimport is used, the thunk will be inlined. I guess that the performance penalty would be small compared to the transferring and calculation of audio data.

If whole-program optimization is enabled, the linker will be able to figure out that the Sonic functions are actually imported from a DLL, and apply the optimization even without dllimport, so there would be no "performance penalty". However, whole-program optimization is intentionally turned off for eSpeak.

# Whole-program optimization causes eSpeak to distort and warble with its Klatt4 voice
# Therefore specifically force it off
"/GL-",

I'm not sure what exactly the reason would be, but turning on whole-program optimization can benefit a lot. For example, calls to statically linked Sonic functions can be inlined, and the eSpeak DLL size dropped to 624 KiB even with a statically linked Sonic inside. It's unfortunate that some bugs prevented us from enabling whole-program optimization.

Edit: The "disable whole-program optimization" part was introduced by #11928 which was in the year 2020. It is possible that this issue got fixed later.

gexgd0419 added 2 commits January 11, 2025 21:12

Build Sonic as a DLL and add wrapper class for SonicStream

9d7da35

Add rate boost support in SAPI5

79b51b9

gexgd0419 changed the title ~~Sapi5 rate boost~~ Add rate boost support for SAPI5 voices Jan 11, 2025

Add changelog entry

6c43198

gexgd0419 marked this pull request as ready for review January 12, 2025 12:48

gexgd0419 requested a review from a team as a code owner January 12, 2025 12:48

gexgd0419 requested a review from SaschaCowley January 12, 2025 12:48

gexgd0419 added 3 commits January 13, 2025 12:15

Merge branch 'master' into sapi5-rate-boost

a895d0f

Make eSpeak dynamically link to Sonic

d141591

Change sonic.h to include __declspec(dllimport)

7b1c22f

Lint fix

91f0031

seanbudd added the blocked/needs-product-decision A product decision needs to be made. Decisions about NVDA UX or supported use-cases. label Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rate boost support for SAPI5 voices #17610

Add rate boost support for SAPI5 voices #17610

gexgd0419 commented Jan 11, 2025 •

edited

Loading

SaschaCowley commented Jan 13, 2025

gexgd0419 commented Jan 13, 2025

gexgd0419 commented Jan 13, 2025

AppVeyorBot commented Jan 13, 2025

LeonarddeR commented Jan 13, 2025

SaschaCowley commented Jan 13, 2025

gexgd0419 commented Jan 14, 2025

gexgd0419 commented Jan 14, 2025 •

edited

Loading

Add rate boost support for SAPI5 voices #17610

Are you sure you want to change the base?

Add rate boost support for SAPI5 voices #17610

Conversation

gexgd0419 commented Jan 11, 2025 • edited Loading

Link to issue number:

Summary of the issue:

Description of user facing changes

Description of development approach

Testing strategy:

Known issues with pull request:

Code Review Checklist:

SaschaCowley commented Jan 13, 2025

gexgd0419 commented Jan 13, 2025

gexgd0419 commented Jan 13, 2025

AppVeyorBot commented Jan 13, 2025

LeonarddeR commented Jan 13, 2025

SaschaCowley commented Jan 13, 2025

gexgd0419 commented Jan 14, 2025

gexgd0419 commented Jan 14, 2025 • edited Loading

gexgd0419 commented Jan 11, 2025 •

edited

Loading

gexgd0419 commented Jan 14, 2025 •

edited

Loading