[SEP] Token in the QA model response #2607

aruggero · 2023-05-17T13:58:08Z

Description

The [SEP] token used in input for the Question Answering model "distilbert" of the DJL is returned as part of the extracted answer to the question.
Shouldn't be the answer extracted from the context only? And not from the concatenation of query + context?

Expected Behavior

I would expect to see the answer belonging only to the context, without any delimitation token.

Error Message

With an input like:
var question = "BBC Japan broadcasting";
var resourceDocument = "BBC Japan was a general entertainment Channel.\nWhich operated between December 2004 and April 2006.\nIt ceased operations after its Japanese distributor folded.";

The following answer is returned: "bbc japan broadcasting [SEP] bbc japan"

How to Reproduce?

public class QuestionAnsweringBug {
    public static void main(String[] args) throws ModelNotFoundException, MalformedModelException, IOException, TranslateException {
        var question = "BBC Japan broadcasting";
        var resourceDocument = "BBC Japan was a general entertainment Channel.\n" +
                "Which operated between December 2004 and April 2006.\n" +
                "It ceased operations after its Japanese distributor folded.";

        QAInput input = new QAInput(question, resourceDocument);

        ZooModel<QAInput, String> modelPyTorch = loadPyTorchModel();
        Predictor<QAInput, String> predictorPyTorch = modelPyTorch.newPredictor();
        String answerPyTorch = predictorPyTorch.predict(input);
        System.out.println(answerPyTorch);
    }

    public static ZooModel<QAInput, String> loadPyTorchModel() throws ModelNotFoundException, MalformedModelException, IOException {
        Criteria<QAInput, String> criteria = Criteria.builder()
                .optApplication(Application.NLP.QUESTION_ANSWER)
                .setTypes(QAInput.class, String.class)
                .optFilter("modelType", "distilbert")
                .optEngine("PyTorch") // Use PyTorch engine
                .optProgress(new ProgressBar()).build();
        return criteria.loadModel();
    }
}

Answer: "bbc japan broadcasting [SEP] bbc japan"

What have you tried to solve it?

In the processOutput method of the translator: ai.djl.pytorch.zoo.nlp.qa.PtBertQATranslator#processOutput the tokens variable is used to extract the answer. This variable is a concatenation of the query + the context: ["CLS", "bbc", "japan", "broadcasting", "[SEP]", "bbc", "japan", "was", "a", "general", ...
The index returned is used on this arrayList and therefore, if less then the SEP index, it could potentially extract part of the query as the answer.

Environment Info

java.vm.vendor: Homebrew
java.version: 17.0.6
os.arch: aarch64
DJL version: 0.23.0-SNAPSHOT
OS: macOS Ventura 13.3.1
Chip: Apple M2 Pro
Memory: 16 GB

The text was updated successfully, but these errors were encountered:

frankfliu · 2023-05-18T17:42:01Z

I would like to recommend you to use HuggingFace model, it uses Huggingface tokenizer and a more consistent post processing translator ai.djl.huggingface.translator.QuestionAnsweringTranslator:

String question = "When did BBC Japan start broadcasting?";
String paragraph =
        "BBC Japan was a general entertainment Channel. "
                + "Which operated between December 2004 and April 2006. "
                + "It ceased operations after its Japanese distributor folded.";

Criteria<QAInput, String> criteria = Criteria.builder()
                .setTypes(QAInput.class, String.class)
                .optModelUrls("djl://ai.djl.huggingface.pytorch/deepset/minilm-uncased-squad2")
                .optEngine("PyTorch")
                .optTranslatorFactory(new QuestionAnsweringTranslatorFactory())
                .optProgress(new ProgressBar())
                .build();

try (ZooModel<QAInput, String> model = criteria.loadModel();
     Predictor<QAInput, String> predictor = model.newPredictor()) {
    QAInput input = new QAInput(question, paragraph);
    String res = predictor.predict(input);
    System.out.println("answer: " + res);
}

See example: https://github.com/deepjavalibrary/djl-demo/blob/master/huggingface/nlp/src/main/java/com/examples/QuestionAnswering.java

aruggero · 2023-05-19T06:25:12Z

Thanks @frankfliu
I've tried with the huggingface model.
The answer is better, even if I'm still getting separator tokens:
[SEP] bbc japan was a general entertainment channel. which operated between december 2004 and april 2006. it ceased operations after its japanese distributor folded. [SEP]

aruggero · 2023-05-19T06:35:58Z

Another question.. I see that the implementation of ai.djl.huggingface.translator.QuestionAnsweringTranslator is a bit different from the one suggested here https://docs.djl.ai/master/docs/demos/jupyter/pytorch/load_your_own_pytorch_bert.html

I've this custom translator implemented in order to use this HuggingFace model:
`public class HuggingFaceBERTQATranslator implements NoBatchifyTranslator<QAInput, String> {
private String[] tokens;
private HuggingFaceTokenizer tokenizer;

@Override
public void prepare(TranslatorContext ctx) throws IOException {
    tokenizer = HuggingFaceTokenizer.newInstance("distilbert-base-uncased-distilled-squad");
}

@Override
public NDList processInput(TranslatorContext ctx, QAInput input) {
    Encoding encoding =
            tokenizer.encode(
                    input.getQuestion().toLowerCase(),
                    input.getParagraph().toLowerCase());
    tokens = encoding.getTokens();
    NDManager manager = ctx.getNDManager();

    long[] indices = encoding.getIds();
    NDArray indicesArray = manager.create(indices);

    return new NDList(indicesArray);
}

@Override
public String processOutput(TranslatorContext ctx, NDList list) {
    NDArray startLogits = list.get(0);
    NDArray endLogits = list.get(1);
    int startIdx = (int) startLogits.argMax().getLong();
    int endIdx = (int) endLogits.argMax().getLong();
    return Arrays.toString(Arrays.copyOfRange(tokens, startIdx, endIdx + 1));
}

@Override
public Batchifier getBatchifier() {
    return Batchifier.STACK;
}

}`

This is having the same problem (the one reported in this issue) in returning separator tokens in the response.
Should I change the translator implementation and adapt it to the one of ai.djl.huggingface.translator.QuestionAnsweringTranslator?

From what I could see, the problem of the tokens seems to be related more to the model than the translator, because the problem are the returned indexes. I do not know if changing this helps.

frankfliu · 2023-05-19T15:34:43Z

ai.djl.huggingface.translator.QuestionAnsweringTranslator is preferred.

It's seems your model didn't return correct result. Did you try run inference with python?

aruggero · 2023-05-19T15:48:49Z

Yes, in python it returns the first part of the context as the answer. Therefore: "BBC Japan".
This is fine since the question is ambiguous "BBC Japan broadcasting".
It returns no separator token anyway, therefore the indexes are different.

frankfliu · 2023-05-19T15:52:42Z

A few thing to check:

Check the tokenizer generate the identical embedding
if model forward return the identical tensor for both java and python
make sure your processOutput matches the python implementation

aruggero · 2023-05-25T07:55:15Z

Hi @frankfliu sorry for the delay
Is this analysis necessary?
We have the problem also with the model given by the DJL.. the distilbert:

public static ZooModel<QAInput, String> loadPyTorchModel() throws ModelNotFoundException, MalformedModelException, IOException {
        Criteria<QAInput, String> criteria = Criteria.builder()
                .optApplication(Application.NLP.QUESTION_ANSWER)
                .setTypes(QAInput.class, String.class)
                .optFilter("modelType", "distilbert")
                .optEngine("PyTorch")
                .optProgress(new ProgressBar()).build();
        return criteria.loadModel();
    }

The points you mentioned, in this case, are managed internally, I have no power over them..
Therefore I suppose there is something in the library that is not 100% working...

frankfliu · 2023-05-25T15:37:56Z

@aruggero
I'm not able to reproduce your issue with above code. The example can be found: https://github.com/deepjavalibrary/djl/blob/master/examples/src/main/java/ai/djl/examples/inference/BertQaInference.java

We have unit-test test against this model nightly and didn't see the issue you mentioned.

aruggero · 2023-05-25T16:20:51Z

Hi @frankfliu
Do you use the code I put at the top of the issue?
The question I used is a bit different from the original one. I used "BBC Japan broadcasting".
Did you try this one?

I see that your example is a bit different.
Thank you

frankfliu · 2023-05-25T16:41:58Z

@aruggero

will take a look.

frankfliu · 2023-05-26T19:09:57Z

@aruggero

You can use .optArgument("addSpecialTokens", "false") for your case:

Criteria<QAInput, String> criteria = Criteria.builder()
                .setTypes(QAInput.class, String.class)
                .optModelUrls("djl://ai.djl.huggingface.pytorch/deepset/minilm-uncased-squad2")
                .optEngine("PyTorch")
                .optTranslatorFactory(new QuestionAnsweringTranslatorFactory())
                .optArgument("addSpecialTokens", "false")
                .optProgress(new ProgressBar())
                .build();

aruggero · 2023-05-29T07:11:36Z

Hi @frankfliu
From what I know it is important to pass an input that reflects the one the model has seen during training (therefore with special tokens).
What does removing them imply in terms of model prediction?

demq · 2023-09-15T01:15:47Z

Another question.. I see that the implementation of ai.djl.huggingface.translator.QuestionAnsweringTranslator is a bit different from the one suggested here https://docs.djl.ai/master/docs/demos/jupyter/pytorch/load_your_own_pytorch_bert.html

I've this custom translator implemented in order to use this HuggingFace model: `public class HuggingFaceBERTQATranslator implements NoBatchifyTranslator<QAInput, String> { private String[] tokens; private HuggingFaceTokenizer tokenizer;
@Override
public void prepare(TranslatorContext ctx) throws IOException {
    tokenizer = HuggingFaceTokenizer.newInstance("distilbert-base-uncased-distilled-squad");
}

@Override
public NDList processInput(TranslatorContext ctx, QAInput input) {
    Encoding encoding =
            tokenizer.encode(
                    input.getQuestion().toLowerCase(),
                    input.getParagraph().toLowerCase());
    tokens = encoding.getTokens();
    NDManager manager = ctx.getNDManager();

    long[] indices = encoding.getIds();
    NDArray indicesArray = manager.create(indices);

    return new NDList(indicesArray);
}

@Override
public String processOutput(TranslatorContext ctx, NDList list) {
    NDArray startLogits = list.get(0);
    NDArray endLogits = list.get(1);
    int startIdx = (int) startLogits.argMax().getLong();
    int endIdx = (int) endLogits.argMax().getLong();
    return Arrays.toString(Arrays.copyOfRange(tokens, startIdx, endIdx + 1));
}

@Override
public Batchifier getBatchifier() {
    return Batchifier.STACK;
}
}`

This is having the same problem (the one reported in this issue) in returning separator tokens in the response. Should I change the translator implementation and adapt it to the one of ai.djl.huggingface.translator.QuestionAnsweringTranslator?

From what I could see, the problem of the tokens seems to be related more to the model than the translator, because the problem are the returned indexes. I do not know if changing this helps.

You can use the Encoding::getSpecialTokenMask() from ai.djl.huggingface.tokenizers to limit the prediction to be only in the original "context" and exclude the special tokens in your custom translator's processOutput().

In general, I would recommend following the suggestion by @frankfliu to run all the inference steps in Python with a traced model of interest on a simple input and compare the raw outputs/logits to those you get in djl. Then you would know if it is an issue with the model itself behaving differently with djl (highly unlikely) or the pre/post processing steps are different (and you can look up HF source code for the details). As to why the model predicts the special tokens to be a part of the answer, I don't think there are any strict masks applied to the output to prevent it, and the fine-tuning objective might not be penalizing it (enough).

aruggero added the bug Something isn't working label May 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SEP] Token in the QA model response #2607

[SEP] Token in the QA model response #2607

aruggero commented May 17, 2023

frankfliu commented May 18, 2023

aruggero commented May 19, 2023

aruggero commented May 19, 2023 •

edited by frankfliu

Loading

frankfliu commented May 19, 2023

aruggero commented May 19, 2023 •

edited

Loading

frankfliu commented May 19, 2023

aruggero commented May 25, 2023 •

edited

Loading

frankfliu commented May 25, 2023

aruggero commented May 25, 2023

frankfliu commented May 25, 2023

frankfliu commented May 26, 2023

aruggero commented May 29, 2023

demq commented Sep 15, 2023 •

edited by frankfliu

Loading

[SEP] Token in the QA model response #2607

[SEP] Token in the QA model response #2607

Comments

aruggero commented May 17, 2023

Description

Expected Behavior

Error Message

How to Reproduce?

What have you tried to solve it?

Environment Info

frankfliu commented May 18, 2023

aruggero commented May 19, 2023

aruggero commented May 19, 2023 • edited by frankfliu Loading

frankfliu commented May 19, 2023

aruggero commented May 19, 2023 • edited Loading

frankfliu commented May 19, 2023

aruggero commented May 25, 2023 • edited Loading

frankfliu commented May 25, 2023

aruggero commented May 25, 2023

frankfliu commented May 25, 2023

frankfliu commented May 26, 2023

aruggero commented May 29, 2023

demq commented Sep 15, 2023 • edited by frankfliu Loading

aruggero commented May 19, 2023 •

edited by frankfliu

Loading

aruggero commented May 19, 2023 •

edited

Loading

aruggero commented May 25, 2023 •

edited

Loading

demq commented Sep 15, 2023 •

edited by frankfliu

Loading