Can this model be used for the Generative Question Answering? #197

AayushSameerShah · 2023-03-06T11:12:29Z

I am looking for this model to fine-tune my own data (such as medical science) and after the training, I want it to be able to answer the questions. Then I am not looking for the "extractive answers" where it returns the start and end sequence (which is pretty much related to the given context scenario) but a "generative case" where I train the model with my data and then answer the question, and from its (the model's) own understanding for my data, it should be able to give me the answers.

Please let me know if anybody knows how to achieve that with this model!
Thank you so much 🤗

Coriana · 2023-03-06T14:29:11Z

It sounds like you are looking for something like https://github.com/sanjeevanahilan/nanoChatGPT
I am currently also trying to do similar, but have yet to even get data gathering working right, let alone working with it.

arivero · 2023-03-07T13:07:05Z

Prompt / Completion tasks are usually trained with a 0.01 factor in the loss for the prompt. At least this is the default in openAI API. I do not see such parameter in the fine tuning here.

AayushSameerShah · 2023-03-07T13:31:08Z

@arivero so you mean this can not be fine-tuned for the Question Answering?

arivero · 2023-03-08T15:57:54Z

@AayushSameerShah my guess is that the loss function must be customised to decide how to evaluate the prediction of prompt tokens.

AayushSameerShah · 2023-03-09T04:12:12Z

@arivero Thanks for the response, I have followed some of the threads in this library but now I am thinking to shift on the huggingface. I can understand I am taking the discussion out of the context of this thread, but please pardon me if I do so.

I am less experienced with huggingface transformers. But from my findings, I have observed that it provides 2 types of pipelines which can be helpful in my case:

text-generation
question-answering

The text-generation simply writes the text from the prompt like: "When I was 12 I went to" and the rest is filled by model. There is no question answering. Even if I question something in the prompt it continues the question instead of answering it. That makes sense.

Then there is the question-answering pipeline which takes 2 inputs: 1st Question and 2nd Context. Now based on these two it "extracts" the answer from the context.

This seems to be working but it fails in generalization because it "extracts" the answer and does not generate it. Additionally, we need to give the context with the question to get the answer, which is not intuitive.

What I am asking for is...

If there are models which can be fine-tuned on some specific dataset (say medical) and then can answer the question by themselves. I am not sure once fine-tuned, it requires the context to be given or not, but either way, it should return some response to the question by generating it. Like how "DaVinci" does for example in the notebooks.

Do you have any idea how can I move forward with this? I have found that these models on huggingface to workwith because they seem promissing:

GPT-Neo-125M
GPT-Neo-1.3B
GPT-J-6B

Can be my open-source mates and I can go forward with them, but don't know how to solve my problem with these models and how to get the training done.

It will really be a huge help from you buddy,
Thanks a lot 🤗

timothylimyl · 2023-03-09T08:17:33Z

hi @AayushSameerShah , seems like you want to kind of replicate what BioGPT is doing. You can check out their paper:

https://arxiv.org/pdf/2210.10341

AayushSameerShah · 2023-03-15T16:28:37Z

Hello @timothylimyl
Thank you so much! I have found the direction with your answer. I have checked out the BioGPT but after further research, I could see that instead of "training" the model with specific data, I need to "retrieve" the data based on the question, and then I will generate the answer.

So it is the GenerativeQA approach. It is to use "haystack". This framework has many QA pipelines. And I am interested in 2 of them.

RAG Pipeline (RAG approach)
Seq2seq pipeline (LFQA approach)

Where I have found that the RAG generates small (one-liner answers) while LFQA is giving answers in the passage. Which is really something I was looking for.

So, on giving a bunch of different documents and storing them in the data storage, the pipeline retrieves the documents from the question embeddings and the reader reads those documents and then generates the answers.

Though the answer quality isn't on the level of GPT-3 that can work pretty well.

--

Now I have another query. Along with my unstructured data (wiki pages, blogs ..) I also want to feed the structured data in a tabular fashion.

But there this LFQA pipeline fails because it is unable to find meaning!

For that Haystack has a "TableReader" which can take tables as inputs, My hopes rose! But when I tried that, it returns a single word or "extractive" response. Such as on asking "Which country won the highest number of medals in the Olympics 2022?" it returns "USA".

I am looking for a generative response here. With some explanation.

Is it possible? Please direct.

If I summarise my whole situation, then I am looking for a "generative way of answering" where I should be able to put the unstructured+ structured data as the context and then on querying, the model should generate some answer.

Thanks 🙏

timothylimyl · 2023-03-17T10:10:40Z

seems like a prompt engineering problem, you can try out different instruction prompts for starters

laurentm255 · 2023-07-26T15:53:41Z

Hi @AayushSameerShah : I've just stumbled upon this page since I am trying to do exactly what you're describing )

We're now 5 months ahead :

did you stick with Haystack's Long-Form Question Answering (LFQA) pipeline ?
Which LLM did you use with it, in order to have a "generative" QA talkative enough ?
did you fine tune this LLM ? did you use LoRA ?
I guess you used a document retriever / reader for the LFQA to work : interested to you have your feedback also ! :-)

thanks very much for your help ;)
Regards, Laurent (France).

AayushSameerShah · 2023-07-26T15:59:53Z

Hie @laurentm255 👋,
Actually, I've got an opportunity to explore a LOT in this field. Have faced many problems, failed trainings and found ways to get out of this maze.

Luckily, I have provided a comprehensive response by clicking my response link where I have tried to explore the available options that we currently have by giving examples.

Hopefully, that might give a bit of direction.
Thank you, please let me know if there is anything fuzzy.
🤗

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can this model be used for the Generative Question Answering? #197

Can this model be used for the Generative Question Answering? #197

AayushSameerShah commented Mar 6, 2023

Coriana commented Mar 6, 2023

arivero commented Mar 7, 2023

AayushSameerShah commented Mar 7, 2023

arivero commented Mar 8, 2023

AayushSameerShah commented Mar 9, 2023 •

edited

Loading

timothylimyl commented Mar 9, 2023

AayushSameerShah commented Mar 15, 2023

timothylimyl commented Mar 17, 2023

laurentm255 commented Jul 26, 2023

AayushSameerShah commented Jul 26, 2023 •

edited

Loading

Can this model be used for the Generative Question Answering? #197

Can this model be used for the Generative Question Answering? #197

Comments

AayushSameerShah commented Mar 6, 2023

Coriana commented Mar 6, 2023

arivero commented Mar 7, 2023

AayushSameerShah commented Mar 7, 2023

arivero commented Mar 8, 2023

AayushSameerShah commented Mar 9, 2023 • edited Loading

What I am asking for is...

timothylimyl commented Mar 9, 2023

AayushSameerShah commented Mar 15, 2023

timothylimyl commented Mar 17, 2023

laurentm255 commented Jul 26, 2023

AayushSameerShah commented Jul 26, 2023 • edited Loading

AayushSameerShah commented Mar 9, 2023 •

edited

Loading

AayushSameerShah commented Jul 26, 2023 •

edited

Loading