Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to whispe.cpp 1.5.0 and use grammar parser #15

Merged
merged 1 commit into from
Nov 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 21 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

A JNI wrapper for [whisper.cpp](https://github.com/ggerganov/whisper.cpp), allows transcribe speech to text in Java.

## Platform support
## Platform support

This library aims to support the following platforms:

Expand Down Expand Up @@ -44,9 +44,7 @@ On `Linux/macOs` you need to provide the library path to the `loadLibrary` metho

On `windows` it's automatically used if `whisper.dll` exists in some of the directories in the $env:PATH variable.

## Example

A basic example extracted from the tests.
## Basic Example

```java
...
Expand All @@ -64,10 +62,27 @@ A basic example extracted from the tests.
assertEquals(1, numSegments);
String text = whisper.fullGetSegmentText(ctx,0);
assertEquals(" And so my fellow Americans ask not what your country can do for you ask what you can do for your country.", text);
ctx.close();
ctx.close(); // free native memory, should be called when we don't need the context anymore.
...
```

## Grammar usage

This wonderful functionality added in whisper.cpp v1.5.0 was integrated into the wrapper.
It makes use of the grammar parser implementation provided among the whisper.cpp examples,
so you can use the [gbnf grammar](https://github.com/ggerganov/whisper.cpp/blob/master/grammars/) to improve the transcriptions results.
```java
...
try (WhisperGrammar grammar = whisper.parseGrammar(Paths.of("/my_grammar.gbnf"))) {
var params = new WhisperFullParams();
params.grammar = grammar;
params.grammarPenalty = 100f;
...
int result = whisper.full(ctx, params, samples, samples.length);
...
}
...
```
## Building and testing the project.

You need Java and Cpp setup.
Expand All @@ -82,7 +97,7 @@ Then you need to download the model used in the tests using the script 'download

Run the appropriate build script for your platform (build_debian.sh, build_macos.sh or build_win.ps1), it will place the native library file on the resources directory.

Finally you can run the project tests to confirm it works:
Finally, you can run the project tests to confirm it works:

```sh
mvn test
Expand Down
6 changes: 4 additions & 2 deletions build_debian.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,14 @@ build_lib() {
cc $CFLAGS -c ./src/main/native/whisper/ggml-quants.c -o ./src/main/native/ggml-quants.o
# build whisper object
g++ -c -I src/main/native/whisper/ $CXXFLAGS src/main/native/whisper/whisper.cpp -o src/main/native/whisper.o
# build grammar-parser object
g++ -c -I src/main/native/whisper/ $CXXFLAGS src/main/native/whisper/examples/grammar-parser.cpp -o src/main/native/grammar-parser.o
# build whisper jni wrapper object
g++ -c -I src/main/native -I src/main/native/whisper $INCLUDE_JAVA $CXXFLAGS src/main/native/io_github_givimad_whisperjni_WhisperJNI.cpp -o src/main/native/io_github_givimad_whisperjni_WhisperJNI.o
g++ -c -I src/main/native -I src/main/native/whisper -I src/main/native/whisper/examples $INCLUDE_JAVA $CXXFLAGS src/main/native/io_github_givimad_whisperjni_WhisperJNI.cpp -o src/main/native/io_github_givimad_whisperjni_WhisperJNI.o
# link whisper shared object
g++ -shared -I src/main/native/whisper/ src/main/native/ggml.o src/main/native/ggml-alloc.o src/main/native/ggml-backend.o src/main/native/ggml-quants.o src/main/native/whisper.o -o libwhisper.so
# link whisper jni wrapper shared object
g++ -shared -I src/main/native/whisper/ -Wl,-rpath='${ORIGIN}' src/main/native/io_github_givimad_whisperjni_WhisperJNI.o -L. -lwhisper -o libwhisperjni.so
g++ -shared -I src/main/native/whisper/ -Wl,-rpath='${ORIGIN}' src/main/native/grammar-parser.o src/main/native/io_github_givimad_whisperjni_WhisperJNI.o -L. -lwhisper -o libwhisperjni.so
# clean
mv libwhisper.so src/main/resources/debian-$AARCH/libwhisper$LIB_VARIANT.so
mv libwhisperjni.so src/main/resources/debian-$AARCH/libwhisperjni.so
Expand Down
10 changes: 7 additions & 3 deletions build_macos.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,17 +32,21 @@ cc --target="$TARGET" -arch "$AARCH" $CFLAGS -c ./src/main/native/whisper/ggml-a
cc --target="$TARGET" -arch "$AARCH" $CFLAGS -c ./src/main/native/whisper/ggml-backend.c -o ./src/main/native/ggml-backend.o
cc --target="$TARGET" -arch "$AARCH" $CFLAGS -c ./src/main/native/whisper/ggml-quants.c -o ./src/main/native/ggml-quants.o
# build whisper object
g++ -c -arch "$AARCH" $INCLUDE_JAVA \
g++ -c -arch "$AARCH" \
-I src/main/native/whisper/ $CXXFLAGS --target="$TARGET" \
src/main/native/whisper/whisper.cpp -o src/main/native/whisper.o
# build whisper grammar parser object
g++ -c -arch "$AARCH" \
-I src/main/native/whisper/ $CXXFLAGS --target="$TARGET" \
src/main/native/whisper/examples/grammar-parser.cpp -o src/main/native/grammar-parser.o
# build whisper jni wrapper object
g++ -c -arch "$AARCH" $INCLUDE_JAVA \
-I src/main/native/ -I src/main/native/whisper/ $CXXFLAGS --target="$TARGET" \
-I src/main/native/ -I src/main/native/whisper/ -I src/main/native/whisper/examples/ $CXXFLAGS --target="$TARGET" \
src/main/native/io_github_givimad_whisperjni_WhisperJNI.cpp -o src/main/native/io_github_givimad_whisperjni_WhisperJNI.o
# link whisper shared object
g++ -arch "$AARCH" --target="$TARGET" -dynamiclib -I src/main/native/whisper/ -o libwhisper.dylib src/main/native/ggml.o src/main/native/ggml-alloc.o src/main/native/ggml-backend.o src/main/native/ggml-quants.o src/main/native/whisper.o -lc $LDFLAGS
# link whisper jni wrapper shared object
g++ -arch "$AARCH" --target="$TARGET" -dynamiclib -I src/main/native/ -I src/main/native/whisper/ -L. -lwhisper -o libwhisperjni.dylib src/main/native/io_github_givimad_whisperjni_WhisperJNI.o -lc $LDFLAGS
g++ -arch "$AARCH" --target="$TARGET" -dynamiclib -I src/main/native/ -I src/main/native/whisper/ -L. -lwhisper -o libwhisperjni.dylib src/main/native/grammar-parser.o src/main/native/io_github_givimad_whisperjni_WhisperJNI.o -lc $LDFLAGS
# force search for libwhisper.dylib on same dir
install_name_tool -change libwhisper.dylib @loader_path/libwhisper.dylib libwhisperjni.dylib
# clean
Expand Down
17 changes: 12 additions & 5 deletions build_win.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,21 @@ gcc -c -DNDEBUG -O3 -std=c11 -fPIC -pthread -mf16c -mfma -mavx -mavx2 -D_XOPEN_S
gcc -c -DNDEBUG -O3 -std=c11 -fPIC -pthread -mf16c -mfma -mavx -mavx2 -D_XOPEN_SOURCE=600 .\src\main\native\whisper\ggml-alloc.c -o .\src\main\native\ggml-alloc.o
# build whisper object
g++ -c -DNDEBUG -O3 -std=c++11 -fPIC -pthread -mf16c -mfma -mavx -mavx2 -D_XOPEN_SOURCE=600 -I src\main\native\whisper -c .\src\main\native\whisper\whisper.cpp -o .\src\main\native\whisper.o
# build gbnf grammar-parser object
g++ -c -DNDEBUG -O3 -std=c++11 -fPIC -pthread -D_XOPEN_SOURCE=600 -I src\main\native\whisper -c .\src\main\native\whisper\examples\grammar-parser.cpp -o .\src\main\native\grammar-parser.o
# build whisper jni wrapper object
g++ -c -DNDEBUG -O3 -std=c++11 -fPIC -pthread -mf16c -mfma -mavx -mavx2 -D_XOPEN_SOURCE=600 -I $env:JAVA_HOME\include -I $env:JAVA_HOME\include\win32 -I src\main\native\whisper src\main\native\io_github_givimad_whisperjni_WhisperJNI.cpp -o src\main\native\io_github_givimad_whisperjni_WhisperJNI.o
# build tmp whisper dll to link non full jni wrapper version agains it
g++ -c -DNDEBUG -O3 -std=c++11 -fPIC -pthread -mf16c -mfma -mavx -mavx2 -D_XOPEN_SOURCE=600 -I $env:JAVA_HOME\include -I $env:JAVA_HOME\include\win32 -I src\main\native\whisper -I src\main\native\whisper\examples src\main\native\io_github_givimad_whisperjni_WhisperJNI.cpp -o src\main\native\io_github_givimad_whisperjni_WhisperJNI.o
# build tmp whisper dll to link non full jni wrapper version against it
g++ -shared -static -I src\main\native -I src\main\native\whisper -o whisper.dll src\main\native\whisper.o src\main\native\ggml.o src\main\native\ggml-alloc.o src\main\native\ggml-quants.o src\main\native\ggml-backend.o
# link full whisper jni shared object
g++ -shared -static -I src\main\native -I src\main\native\whisper -o src\main\resources\win-amd64\whisperjni_full.dll src\main\native\whisper.o src\main\native\ggml.o src\main\native\ggml-alloc.o src\main\native\ggml-quants.o src\main\native\ggml-backend.o src\main\native\io_github_givimad_whisperjni_WhisperJNI.o
# link whisper jni wrapper shared object, forcing whisper.dll depencency to be dynamic
g++ "-Wl,-Bdynamic,-lwhisper" "-Wl,-Bstatic" -shared -static -I src\main\native -I src\main\native\whisper -L. -o src\main\resources\win-amd64\whisperjni.dll src\main\native\io_github_givimad_whisperjni_WhisperJNI.o
g++ -shared -static -I src\main\native -I src\main\native\whisper -o src\main\resources\win-amd64\whisperjni_full.dll src\main\native\grammar-parser.o src\main\native\whisper.o src\main\native\ggml.o src\main\native\ggml-alloc.o src\main\native\ggml-quants.o src\main\native\ggml-backend.o src\main\native\io_github_givimad_whisperjni_WhisperJNI.o
# abort on error
if ($LastExitCode -ne 0) {
Write-Error "Unable to build library"
Exit 1
}
# link whisper jni wrapper shared object, forcing whisper.dll dependency to be dynamic
g++ "-Wl,-Bdynamic,-lwhisper" "-Wl,-Bstatic" -shared -static -I src\main\native -I src\main\native\whisper -L. -o src\main\resources\win-amd64\whisperjni.dll src\main\native\grammar-parser.o src\main\native\io_github_givimad_whisperjni_WhisperJNI.o
# abort on error
if ($LastExitCode -ne 0) {
Write-Error "Unable to build library"
Expand Down
1 change: 1 addition & 0 deletions gen_header.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ javac -h src/main/native \
$LIB_SRC/internal/LibraryUtils.java \
$LIB_SRC/WhisperContextParams.java \
$LIB_SRC/WhisperContext.java \
$LIB_SRC/WhisperGrammar.java \
$LIB_SRC/WhisperSamplingStrategy.java \
$LIB_SRC/WhisperFullParams.java \
$LIB_SRC/WhisperState.java \
Expand Down
2 changes: 1 addition & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
<artifactId>whisper-jni</artifactId>
<name>whisper-jni</name>
<url>https://github.com/GiviMAD/whisper-jni</url>
<version>1.4.3-4</version>
<version>1.5.0</version>
<description>A JNI wrapper for [whisper.cpp](https://github.com/ggerganov/whisper.cpp), allows to transcribe speech to text in Java</description>

<licenses>
Expand Down
19 changes: 16 additions & 3 deletions src/main/java/io/github/givimad/whisperjni/WhisperFullParams.java
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@
* @author Miguel Álvarez Díez - Initial contribution
*/
public class WhisperFullParams {

/**
* Whisper search strategy.
*/
private final int strategy;

/**
Expand All @@ -33,6 +35,10 @@ public class WhisperFullParams {
* Translate
*/
public boolean translate;
/**
* Do not generate timestamps
*/
public boolean noTimestamps;
/**
* Detect language
*/
Expand Down Expand Up @@ -121,7 +127,14 @@ public class WhisperFullParams {
* Specific to bean search sampling strategy
*/
public float beamSearchPatience = -1.0f;

/**
*
*/
public WhisperGrammar grammar;
/**
*
*/
public float grammarPenalty = 100f;
/**
* Creates a new {@link WhisperFullParams} instance using the provided {@link WhisperSamplingStrategy}
*
Expand All @@ -135,6 +148,6 @@ public WhisperFullParams(WhisperSamplingStrategy strategy) {
* Creates a new {@link WhisperFullParams} instance using the greedy {@link WhisperSamplingStrategy}
*/
public WhisperFullParams() {
this(WhisperSamplingStrategy.GREEDY);
this(WhisperSamplingStrategy.BEAN_SEARCH);
}
}
28 changes: 28 additions & 0 deletions src/main/java/io/github/givimad/whisperjni/WhisperGrammar.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
package io.github.givimad.whisperjni;

/**
* The {@link WhisperGrammar} class represents a native whisper.cpp parsed grammar.
*
* You need to dispose the native memory for its instances by calling {@link #close}
*
* @author Miguel Álvarez Díez - Initial contribution
*/
public class WhisperGrammar extends WhisperJNI.WhisperJNIPointer {
private final WhisperJNI whisper;
private final String grammarText;

/**
* Internal context constructor
* @param whisper library instance
* @param ref native pointer identifier
*/
protected WhisperGrammar(WhisperJNI whisper, int ref, String text) {
super(ref);
this.whisper = whisper;
this.grammarText = text;
}
@Override
public void close() {
whisper.free(this);
}
}
46 changes: 44 additions & 2 deletions src/main/java/io/github/givimad/whisperjni/WhisperJNI.java
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import io.github.givimad.whisperjni.internal.LibraryUtils;

import java.io.FileNotFoundException;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
Expand All @@ -22,6 +23,8 @@ public class WhisperJNI {

private native int initState(int model);

private native int loadGrammar(String text);

private native void initOpenVINOEncoder(int model, String device);

private native boolean isMultilingual(int model);
Expand Down Expand Up @@ -50,6 +53,8 @@ public class WhisperJNI {

private native void freeState(int state);

private native void freeGrammar(int grammar);

private native String printSystemInfo();

private native static void setLogger(boolean enabled);
Expand Down Expand Up @@ -133,6 +138,24 @@ public WhisperState initState(WhisperContext context) {
return new WhisperState(this, ref, context);
}

public WhisperGrammar parseGrammar(Path grammarPath) throws IOException {
if(!Files.exists(grammarPath) || Files.isDirectory(grammarPath)){
throw new FileNotFoundException("Grammar file not found");
}
return parseGrammar(Files.readString(grammarPath));
}

public WhisperGrammar parseGrammar(String text) throws IOException {
if(text.isBlank()) {
throw new IOException("Grammar text is blank");
}
int ref = loadGrammar(text);
if(ref == -1) {
return null;
}
return new WhisperGrammar(this, ref, text);
}

/**
* Initializes OpenVino encoder.
*
Expand Down Expand Up @@ -166,6 +189,9 @@ public boolean isMultilingual(WhisperContext context) {
*/
public int full(WhisperContext context, WhisperFullParams params, float[] samples, int numSamples) {
WhisperJNIPointer.assertAvailable(context);
if(params.grammar != null) {
WhisperJNIPointer.assertAvailable(params.grammar);
}
return full(context.ref, params, samples, numSamples);
}

Expand All @@ -182,6 +208,9 @@ public int full(WhisperContext context, WhisperFullParams params, float[] sample
public int fullWithState(WhisperContext context, WhisperState state, WhisperFullParams params, float[] samples, int numSamples) {
WhisperJNIPointer.assertAvailable(context);
WhisperJNIPointer.assertAvailable(state);
if(params.grammar != null) {
WhisperJNIPointer.assertAvailable(params.grammar);
}
return fullWithState(context.ref, state.ref, params, samples, numSamples);
}

Expand Down Expand Up @@ -288,8 +317,8 @@ public void free(WhisperContext context) {
if (context.isReleased()) {
return;
}
context.release();
freeContext(context.ref);
context.release();
}

/**
Expand All @@ -301,8 +330,21 @@ public void free(WhisperState state) {
if (state.isReleased()) {
return;
}
state.release();
freeState(state.ref);
state.release();
}

/**
* Release grammar memory in native implementation.
*
* @param grammar the {@link WhisperGrammar} to release
*/
public void free(WhisperGrammar grammar) {
if (grammar.isReleased()) {
return;
}
freeGrammar(grammar.ref);
grammar.release();
}

/**
Expand Down
Loading
Loading