Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generation with custom data & evaluator #98

Open
amitshermann opened this issue Sep 15, 2024 · 3 comments
Open

Generation with custom data & evaluator #98

amitshermann opened this issue Sep 15, 2024 · 3 comments

Comments

@amitshermann
Copy link

Hi,

We'd like to use AutoPrompt for a generation task where both the input and output are text. We've also developed an evaluator that scores the input-output pairs (e.g., a float between 0 and 1).

Our goal is to optimize the output using our dataset and evaluator, but we're unsure how to set this up with AutoPrompt. Could you provide guidance on how to achieve this?

Thanks in advance,

@Eladlev
Copy link
Owner

Eladlev commented Sep 15, 2024

Hi,
Yes, it is relatively simple to tweak the system for this use case. The steps that you should follow:

  1. Remove the first step in the optimization (the ranker optimization). Lines 40-54 in the run_generation_pipeline.py
  2. You should prepare a csv with your dataset inputs, following this comment instructions, the only difference is that in your case you can also leave the annotation field empty
  3. Put the csv from step 2 in a folder <base_folder>/generator/dataset.csv, and add the flag --load_dump <base_folder>
  4. In this if you should add the option custom and set it to:
    return utils.set_function_from_iterrow(lambda record: custom_score( '###User input:\n' + generation_dataset['text'] + '\n####model prediction:\n' + generation_dataset['prediction']))
    where custom_score is your score function (adapt the format according to the function input)
  5. In the config file change this value to custom, and the error_threshold to 0.5
  6. Here change the scale to be 0-1

That's all! It should work with all these changes. If there are any issues I can help with the integration also on the discord server.

@amitshermann
Copy link
Author

amitshermann commented Sep 16, 2024

Thank you,
What does the error_threshold mean? Will it make the score Boolean? Because then my custom_eval function kinda looses it's meaning. For example, I want the model to understand the different between a score of 0.8 and a score of 0.6.

Thanks in advance,

@Eladlev
Copy link
Owner

Eladlev commented Sep 17, 2024

The error_threshold determines what is the list of examples that is provided to the analyzer (we get the worst from this list). These samples are considered as samples that potentially could be improved.
You can put here very high threshold (for example 0.9), if there are too many samples it simply takes the worst from this list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants