Custom Prompts via YAML #37

falquaddoomi · 2023-12-13T20:59:57Z

This PR addresses #31. Specifically it adds support for parsing two new configuration files, ai_revision-config.yaml and ai_revision-prompts.yaml that specify how filenames are mapped to prompts, and what the text of those prompts are.

In the following, I refer to determining the prompt for a given piece of text as "prompt resolution". The current code supports the following mechanisms for prompt resolution:

explicit prompt specification via the AI_EDITOR_CUSTOM_PROMPT environment variable; if specified, this text is used as-is as the prompt.
inferring the "section name" of the text (i.e., "abstract", "introduction", etc.) from the filename, then using that to retrieve a fixed prompt, which occurs in models.GPT3CompletionModel.get_prompt()
failing the above, a generic prompt is used.

This PR adds a YAML-based filename to prompt resolution mechanism, detailed in issue #31. The mechanism is implemented in the class prompt_config.ManuscriptPromptConfig.

Usage:

Once instantiated, an instance of the ManuscriptPromptConfig class can be used to resolve a prompt based on a filename via the ManuscriptPromptConfig.get_prompt_for_filename() method. The method takes filename as an argument, which it then uses to consult its configuration for a matching prompt. The method returns a tuple like (prompt : str|None, match : re.match|None), where:

prompt is the prompt text matching the filename, or None if the file was explicitly ignored in the config or if no match was found, and
match is the resulting re.Match instance from when the filename was applied to a regex expression. In cases where a regex wasn't applied (e.g. if we're returning the default prompt), this value is None.

The optional use_default argument (True by default) allows the method to return a default prompt specified in the config files if no match was found. If it's False and no match was found, (None, None) is returned.

Integration into the existing code:

An instance of this class is created in ManuscriptEditor's constructor as self.prompt_config. This instance is used in ManuscriptEditor.revise_manuscript() to resolve a prompt for each filename in the manuscript.

The resolved prompt is passed down all the way to each models' get_prompt() method. In GPT3CompletionModel.get_prompt(), after checking for a custom prompt, the resolved prompt is then used. If the resolved prompt is falsey, then prompt resolution occurs as it did before (i.e., the section name is inferred and used to find a prompt and, failing that, a default prompt is used.)

History of changes

04/17: performed first round of testing on PhenoPLIER and found some issues.
4/29: MP reported testing issue w/DebuggingManuscriptRevisionModel requiring a title
and keyword, but code outside of manubot-ai-editor wasn't supplying it, leading
to an exception. As of commit 3ae73de, a default title and keyword
list is now provided, which should resolve the exception.
5/3: missing title/keyword issues appeared to come from upstream in manubot; fixed as of this (merged) PR: Fixes AI revision title, keyword propagation for models inheriting from GPT3CompletionModel manubot#370,
but no GitHub/PyPI release as of yet.

Testing Procedure

Here, I describe the live testing procedure using this feature through GitHub Actions and the ai-revision workflow.
This is the typical use case for the Manubot AI Editor.

Setting up the Manubot AI Editor

These steps create the conda environment, install Manubot, and override the Manubot AI Editor package using this branch's version.

gh pr checkout 37
conda env create -f environment.yml -n manubot-ai-editor-testing
conda activate manubot-ai-editor-testing
pip install --upgrade manubot[ai-rev]
pip install -e .

Export your OpenAI API key:

export OPENAI_API_KEY="<api_key>"

Unit tests

Run all the unit tests:

pytest --runcost tests/

All unit tests should have passed.

Manuscripts used for testing

I've tested this new version of the Manubot AI Editor using the original three manuscripts we used in the preprint.
I forked the original manuscript repositories here:

For each of these repositories, I've created different branches that reflect the different test cases present in folder tests/config_loader_fixtures:

pr37_test-both_prompts_config
pr37_test-conflicting_promptsfiles_matchings
pr37_test-only_revision_prompts
pr37_test-single_generic_prompt (proofreading prompt)
pr37_test-prompts_are_used (checks that prompts are used in each section; similar to test_prompt_config.py/test_prompts_apply_gpt3 which adds a sentinel value to the paragraph)

For convenience, I put here the git command to clone each manuscript repo:

git clone [email protected]:pivlab/phenoplier-manuscript-test.git
git clone [email protected]:pivlab/ccc-manuscript-test.git
git clone [email protected]:pivlab/manubot-gpt-manuscript-test.git

Make sure also that there is a repository secret named OPENAI_API_KEY with your API key.

Local tests: DebuggingManuscriptRevisionModel

Before hitting the OpenAI's API, I ran a local revision model (DebuggingManuscriptRevisionModel) to ensure the right parameters, such as the prompt, is being used for each paragraph.

First, clone one of the manuscripts above, such as PhenoPLIER, and checkout one of the test cases branches.

cd <MANUSCRIPT_REPO_DIR>
git co <TESTING_BRANCH>

Run using a local model for debugging:

manubot ai-revision \
  --content-directory content/ \
  --model-type DebuggingManuscriptRevisionModel

Then, I checked the following:

All files supposed to be modified according to the configuration file are modified.
All files ignored according to the configuration file are not modified.
If applicable, the section-specific prompt is used in the section file.
- Ensure that variables like {section_name}, {title} and keywords are correctly replaced.
If applicable, the default prompt is used.

GitHub Actions tests (uses GPT3CompletionModel)

Setting up workflow

Make sure each manuscript's ai-revision workflow is installing the manubot-ai-editor version to be tested.
For this, open .github/workflows/ai-revision.yaml and make sure the last line here is present:

- name: Install Manubot AI revision dependencies
  run: |
    # install using the same URL used for manubot in build/environment.yml
    manubot_line=$(grep "github.com/manubot/manubot" build/environment.yml)
    manubot_url=$(echo "$manubot_line" | awk -F"- " '{print $2}')

    pip install ${manubot_url}#egg=manubot[ai-rev]

    # install manubot-ai-editor for this PR
    pip install -U git+https://github.com/falquaddoomi/manubot-ai-editor@issue-31-customprompts-yaml

Kicking off workflow

For the live tests using the OpenAI models:

Go to each manuscript's repository and click on the Actions tab.
Click on the ai-revision workflow.
Click on the Run workflow combobox, and enter this information in each field:
- Use workflow from: main
- Branch to revise: <BRANCH_TO_REVISE> (such as pr37_test-both_prompts_config)
- File names to revise: leave empty
- Language model: leave default value
- Custom prompt: leave empty
- Output branch: ai-revision-<BRANCH_TO_REVISE>
Click on Run workflow and wait for it to finish.

Assertions

Click on the new ai-revision workflow run and then on AI Revise:
- In section Install Manubot AI revision dependencies, make sure manubot-ai-editor-0.5.0 was installed.
Ensure a new PR was created from this run.
- Open the PR and ensure the branch from which it was created matches the one you put on field "Branch to revise" above.
- Click on "Files changed" tab and check:
  - All files supposed to be modified according to the configuration file are modified.
  - All files ignored according to the configuration file are not modified.
  - If applicable, the section-specific prompt is used in the section file. The only way to check this in GitHub is by using a sentinel value.
  - If applicable, the default prompt is used.

tests/config_loader/fixtures/content/00.front-matter.md

miltondp

I finished the review of the code, and it looks great, @falquaddoomi. I've added a couple minor comments. Now, I'm gonna run/test it and report back.

libs/manubot_ai_editor/prompt_config.py

miltondp · 2024-02-02T21:28:40Z

libs/manubot_ai_editor/prompt_config.py

+        # FIXME: which takes priority, the files collection in ai_revision-config.yaml
+        #  or the prompt_file? we went with config taking precendence for now


I think that only one of both should be present: either 1) prompt_files key in file ai_revision-prompts.yaml or 2) having a files/matchings key in ai_revision-config.yaml. But the user could add both, and I agree that the config should take precedence. What if we print a warning here? It might be useful for the user.

libs/manubot_ai_editor/prompt_config.py

tests/utils/dir_union.py

miltondp · 2024-02-14T20:39:25Z

Deprecated: This testing procedure is now part of the PR description

Here, I describe the live testing procedure using this feature through GitHub Actions and the ai-revision workflow.
This is the typical use case for the Manubot AI Editor.

Setting up the Manubot AI Editor

These steps create the conda environment, install Manubot, and override the Manubot AI Editor package using this branch's version.

gh pr checkout 37
conda env create -f environment.yml -n manubot-ai-editor-testing
conda activate manubot-ai-editor-testing
pip install --upgrade manubot[ai-rev]
pip install -e .

Export your OpenAI API key:

export OPENAI_API_KEY="<api_key>"

Unit tests

Run all the unit tests:

pytest --runcost tests/

All unit tests should have passed.

Manuscripts used for testing

I've tested this new version of the Manubot AI Editor using the original three manuscripts we used in the preprint.
I forked the original manuscript repositories here:

For each of these repositories, I've created different branches that reflect the different test cases present in folder tests/config_loader_fixtures:

pr37_test-both_prompts_config
pr37_test-conflicting_promptsfiles_matchings
pr37_test-only_revision_prompts
pr37_test-single_generic_prompt (proofreading prompt)
pr37_test-prompts_are_used (checks that prompts are used in each section; similar to test_prompt_config.py/test_prompts_apply_gpt3 which adds a sentinel value to the paragraph)

For convenience, I put here the git command to clone each manuscript repo:

git clone [email protected]:pivlab/phenoplier-manuscript-test.git
git clone [email protected]:pivlab/ccc-manuscript-test.git
git clone [email protected]:pivlab/manubot-gpt-manuscript-test.git

Make sure also that there is a repository secret named OPENAI_API_KEY with your API key.

Local tests: DebuggingManuscriptRevisionModel

Before hitting the OpenAI's API, I ran a local revision model (DebuggingManuscriptRevisionModel) to ensure the right parameters, such as the prompt, is being used for each paragraph.

First, clone one of the manuscripts above, such as PhenoPLIER, and checkout one of the test cases branches.

cd <MANUSCRIPT_REPO_DIR>
git co <TESTING_BRANCH>

Run using a local model for debugging:

manubot ai-revision \
  --content-directory content/ \
  --model-type DebuggingManuscriptRevisionModel

Then, I checked the following:

All files supposed to be modified according to the configuration file are modified.
All files ignored according to the configuration file are not modified.
If applicable, the section-specific prompt is used in the section file.
- Ensure that variables like {section_name}, {title} and keywords are correctly replaced.
If applicable, the default prompt is used.

GitHub Actions tests (uses GPT3CompletionModel)

Setting up workflow

Make sure each manuscript's ai-revision workflow is installing the manubot-ai-editor version to be tested.
For this, open .github/workflows/ai-revision.yaml and make sure the last line here is present:

- name: Install Manubot AI revision dependencies
  run: |
    # install using the same URL used for manubot in build/environment.yml
    manubot_line=$(grep "github.com/manubot/manubot" build/environment.yml)
    manubot_url=$(echo "$manubot_line" | awk -F"- " '{print $2}')

    pip install ${manubot_url}#egg=manubot[ai-rev]

    # install manubot-ai-editor for this PR
    pip install -U git+https://github.com/falquaddoomi/manubot-ai-editor@issue-31-customprompts-yaml

Kicking off workflow

For the live tests using the OpenAI models:

Go to each manuscript's repository and click on the Actions tab.
Click on the ai-revision workflow.
Click on the Run workflow combobox, and enter this information in each field:
- Use workflow from: main
- Branch to revise: <BRANCH_TO_REVISE> (such as pr37_test-both_prompts_config)
- File names to revise: leave empty
- Language model: leave default value
- Custom prompt: leave empty
- Output branch: ai-revision-<BRANCH_TO_REVISE>
Click on Run workflow and wait for it to finish.

Assertions

Click on the new ai-revision workflow run and then on AI Revise:
- In section Install Manubot AI revision dependencies, make sure manubot-ai-editor-0.5.0 was installed.
Ensure a new PR was created from this run.
- Open the PR and ensure the branch from which it was created matches the one you put on field "Branch to revise" above.
- Click on "Files changed" tab and check:
  - All files supposed to be modified according to the configuration file are modified.
  - All files ignored according to the configuration file are not modified.
  - If applicable, the section-specific prompt is used in the section file. The only way to check this in GitHub is by using a sentinel value.
  - If applicable, the default prompt is used.

…on via YAML config files.

…anuscript(), passed down to model-specific prompt resolution via the 'resolved_prompt' argument.

…t if it's explicitly ignored. revise_manuscript() now checks for this sentinel and ignores the file.

…script folders and config folders temporarily for testing. Adds set_directory() helper to temporarily switch folders.

… on the phenoplier full manuscript for the "both files" scenario.

we are doing prompt testing separately now

…ed as a literal string rather than an index into prompts

…fied. Adds tests to verify the warning is shown.

…erence as a reference into prompts, not a literal.

…n to all invocations of model.get_prompt()

…t was with the other prompts.

… adds a title so we can test the placeholder replacement of {title}.

…t be literal text; added the default prompt where it was missing.

…lls in GPT3CompletionModel.get_prompt()

…ManuscriptRevisionModel and then looks for the prompts' text in the resulting .md files

…r keyword args, since we typically want to customize them

miltondp · 2024-04-18T23:01:09Z

I'm pretty sure this is because it's explicitly ignored in https://github.com/manubot/manubot-ai-editor/blob/main/libs/manubot_ai_editor/editor.py#L449. I tried not to make any possibly destructive edits to things that weren't directly involved with the custom prompts system, but I'm happy to remove that code that skips front-matter if it's no longer needed. I can also look for other exclusions of specific files if you want the custom prompts system to be the source of truth.

Ah, good catch! Yes, let's remove that code and keep the configuration in one place. Thank you!

…mechanisms

…ply title, keywords.

…esult().

…ince it's no longer hardcoded as ignored

falquaddoomi · 2024-04-18T23:14:41Z

Hey @miltondp, I made a few changes that should check for the correct title and keyword replacements in the prompts. Feel free to run your tests again, and thank you.

FYI, the code that does the replacements is here, in case there's something missing that should be replaced:

manubot-ai-editor/libs/manubot_ai_editor/models.py

Lines 314 to 322 in f9f89d1

    
           # use the resolved prompt from the ai_revision config files, if available 
        
           # replace placeholders with their actual values 
        
           replacements = { 
        
               "paragraph_text": paragraph_text.strip(), 
        
               "section_name": section_name, 
        
               "title": self.title, 
        
               "keywords": ", ".join(self.keywords), 
        
           } 
        
           prompt = resolved_prompt.format(**replacements)

It's pretty much identical to the other parts, and probably should be refactored to just use that replacement dictionary now that I'm looking at it again. In fact, I'm going to go ahead and change that, since there's no reason to compute the replacement dict twice.

EDIT: the commit after this comment factors it into a shared dict.

…s dict

…s so that tests that don't specify one will pass

falquaddoomi · 2024-05-06T22:07:14Z

Testing results

Current status (5/6/2024):

I followed the procedure described in this comment.

History of changes

04/17: performed first round of testing on PhenoPLIER and found some issues.
4/29: MP reported testing issue w/DebuggingManuscriptRevisionModel requiring a title
and keyword, but code outside of manubot-ai-editor wasn't supplying it, leading
to an exception. As of commit 3ae73de, a default title and keyword
list is now provided, which should resolve the exception.
5/3: missing title/keyword issues appeared to come from upstream in manubot; fixed as of this (merged) PR: Fixes AI revision title, keyword propagation for models inheriting from GPT3CompletionModel manubot#370,
but no GitHub/PyPI release as of yet.

Tests

PhenoPLIER:
- pr37_test-prompts_are_used (uses the sentinel values that @falquaddoomi wrote for the unit tests).
  - Local tests: 🟩 PASSED
  - GitHub Actions tests: 🟩 PASSED
    - Workflow link
    - PR link
    FA: Note that this test may not always pass since it involves telling the LLM to include a specific word in the text, and since LLMs are stochastic it may just not do what we ask. Anyway, this time it worked! All the words we asked for were added to the text. Also note that 04.00.results.md and 04.05.00.results_framework.md are not updated since they're listed in the ignore section in ai_revision-config.yaml.
- pr37_test-both_prompts_config
  - Local tests: 🟥 FAILED
    - Keyword {title} is incorrectly replaced by "Debugging Title"
    - Keyword {keywords} is incorrectly replaced by "debugging, keywords".
    - Keyword {section_name} is correctly replaced by introduction in the introduction and discussion in the discussion
    FA: This issue occurs because manubot doesn't pass the title/kewords from the manuscript unless the model is exactly GPT3CompletionModel, and since this test uses DebuggingManuscriptRevisionModel it doesn't match and thus never gets pass the correct title/keywords for the manuscript. This is an issue with manubot, not manubot-ai-editor, that's fixed in PR Fixes AI revision title, keyword propagation for models inheriting from GPT3CompletionModel manubot#370. While that PR has been merged, a new release hasn't been made yet, so running pip install --upgrade manubot[ai-rev] per the testing directions will still get you the old, broken version.
  - GitHub Actions tests: 🟨 PASSED?
    - Workflow link
    - PR link
    FA: The action runs to completion, which is why I marked it 'PASSED?' but I can't verify from the output that it's using the title and keywords from the manuscript manifest. I assume it is getting the right arguments, since it's using GPT3CompletionModel, and manubot is hardcoded to pass the title and keywords to that model, but I can't be sure.
CCC:
- TODO (@miltondp, you mentioned you might run this one, right?)
Manubot AI Editor:
- TODO (@miltondp, you mentioned you might run this one, right?)

miltondp · 2024-05-07T20:44:40Z

Thank you for the testing results, @falquaddoomi. You followed this testing procedure, right? (Your link to the comment does not work).

Some comments for the tests on the pr37_test-both_prompts_config branch:

Local tests: does it work if you upgrade Manubot by changing this line? If it does, we can upgrade to the new Manubot version in all manuscripts.
GitHub Actions tests: I see you created a new branch for the PhenoPLIER manuscript (ending with -v2). Sorry if I'm missing something: is there any reason to use it? Can we use the original branch (pr37_test-both_prompts_config)? When we run the action again, the PR gets overwritten.

falquaddoomi · 2024-05-07T21:04:36Z

Thank you for the testing results, @falquaddoomi. You followed this testing procedure, right? (Your link to the comment does not work).

Sorry about that; somehow the PR number got dropped from the link, but I updated it and verified that the link is now working. Anyway, yes, I did follow your testing procedure from the now corrected link.

Some comments for the tests on the pr37_test-both_prompts_config branch:

Local tests: does it work if you upgrade Manubot by changing this line? If it does, we can upgrade to the new Manubot version in all manuscripts.

No, that line you linked to references a commit that's older than the fix I introduced at commit manubot/manubot@06fcecb. You'd have to update that line to the following:

    - git+https://github.com/manubot/manubot@06fcecb4f9eaf9c62f804e0213e9a905c82725e4

I've tested the phenoplier-manuscript-test repo on the branch pr37_test-both_prompts_config and verified that, with the commit you'd mentioned it produces the default "Debugging Title", "debugging, keywords" values in the text. With the later commit I mentioned, it produces the correct title and keywords as taken from the manuscript manifest.

GitHub Actions tests: I see you created a new branch for the PhenoPLIER manuscript (ending with -v2). Sorry if I'm missing something: is there any reason to use it? Can we use the original branch (pr37_test-both_prompts_config)? When we run the action again, the PR gets overwritten.

Well, I didn't want to overwrite your existing PRs so that you could compare the two. Sure, though, I'll just go ahead and run it without the suffix, no problem.

miltondp · 2024-05-07T21:31:31Z

I've tested the phenoplier-manuscript-test repo on the branch pr37_test-both_prompts_config and verified that, with the commit you'd mentioned it produces the default "Debugging Title", "debugging, keywords" values in the text. With the later commit I mentioned, it produces the correct title and keywords as taken from the manuscript manifest.

Perfect! Thank you for checking that. Do you want to create a PR on the three manuscript repos so we update the Manubot version on them? Once we merge this change to use the fixed Manubot version, we can update the branches (with the tests) on each manuscript repository so all these local tests would pass.

Well, I didn't want to overwrite your existing PRs so that you could compare the two. Sure, though, I'll just go ahead and run it without the suffix, no problem.

Ah, of course, I see. No problem with overriding these PRs, we can keep overriding them until the tests pass.

Do you want to finish the other GitHub Actions tests on the PhenoPLIER manuscript test repository? Maybe we can first fix the manubot version (comment above), and then create one branch per each of the unit tests that you wrote in the file tests/config_loader_fixtures. I tried to document this on the testing procedure comment where I put a list of branch names based on your unit tests, and also on other ideas where we use a simple proofreading prompt. I think we should first get a polished set of evaluation criteria for one manuscript, and when we are clear on that, we can move to the rest of the manuscripts for testing. We should also write the testing procedure and the testing results in the wiki or something more convenient than the comments here. What do you think?

falquaddoomi · 2024-05-07T23:22:16Z

Perfect! Thank you for checking that. Do you want to create a PR on the three manuscript repos so we update the Manubot version on them? Once we merge this change to use the fixed Manubot version, we can update the branches (with the tests) on each manuscript repository so all these local tests would pass.

Sure thing. I created the PRs (below), but then I realized that you didn't specify which branch/repo you wanted them created against; I assumed it was the *-test repos and against the main branch, so that you could then rebase the test branches onto main. If that's not correct, let me know and I can change them to target a different repo/branch.

Ah, of course, I see. No problem with overriding these PRs, we can keep overriding them until the tests pass.

Actually, I see your point about not creating new branches per test run; otherwise the repo will be littered with these branches, and you can always see the history in the single branch per test. I'm going to go ahead and delete the -v2 branches, and we'll just reuse the existing ones as you suggested.

Do you want to finish the other GitHub Actions tests on the PhenoPLIER manuscript test repository? Maybe we can first fix the manubot version (comment above), and then create one branch per each of the unit tests that you wrote in the file tests/config_loader_fixtures. I tried to document this on the testing procedure comment where I put a list of branch names based on your unit tests, and also on other ideas where we use a simple proofreading prompt.

Sure, I can run the remaining GitHub Actions. I'll wait for those PRs above to get merged, then I'll rebase each of the testing branches onto main when that's done, then run the actions.

On a side note, I noticed that the phenoplier manuscript test repo only has two of the branches you mentioned, pr37_test-prompts_are_used and pr37_test-both_prompts_config, and the other two manuscript repos don't have any of them. Perhaps I'm misunderstanding that list; I assumed it was a list of branches that existed, not ones you were considering creating. If it's the latter, I can go ahead and create them from the unit tests you referenced. I'm unclear on what the test would look like for a "simple proofreading prompt", so if you could supply that I'd appreciate it.

I think we should first get a polished set of evaluation criteria for one manuscript, and when we are clear on that, we can move to the rest of the manuscripts for testing.

I think that sounds reasonable, although if it's ok with you I'd prefer that you create these evaluation criteria, since I assume you have a better idea of what you intended with the spec and the larger context in which manubot-ai-editor is used. (For example, I had no idea that manubot was checking explicitly for GPT3CompletionModel and not passing the title and keyword arguments if it wasn't exactly that.) If it's not feasible for you to do it, just let me know and I can write them up, no worries.

We should also write the testing procedure and the testing results in the wiki or something more convenient than the comments here. What do you think?

I totally agree on moving the testing procedure from a comment in the middle of our discussion to either the wiki, or perhaps to the description of this PR so we don't have to search for it. Regarding the testing results, my personal opinion is that they're part of the discussion relating to this PR so I wouldn't mind continuing to post them as comments, but I'm open to moving them somewhere more convenient for you if this is getting unwieldy.

miltondp · 2024-05-08T14:52:24Z

Sure thing. I created the PRs (below), but then I realized that you didn't specify which branch/repo you wanted them created against; I assumed it was the *-test repos and against the main branch, so that you could then rebase the test branches onto main. If that's not correct, let me know and I can change them to target a different repo/branch.

Updates manubot version to include title/keyword fix pivlab/phenoplier-manuscript-test#8

Updates manubot version to include title/keyword fix pivlab/ccc-manuscript-test#3

Updates manubot version to include title/keyword fix pivlab/manubot-gpt-manuscript-test#3

Perfect. I approved all the PRs on these manuscripts, so you can merge them and then rebase the test branches.

Ah, of course, I see. No problem with overriding these PRs, we can keep overriding them until the tests pass.

Actually, I see your point about not creating new branches per test run; otherwise the repo will be littered with these branches, and you can always see the history in the single branch per test.

Exactly.

I'm going to go ahead and delete the -v2 branches, and we'll just reuse the existing ones as you suggested.

Perfect.

Do you want to finish the other GitHub Actions tests on the PhenoPLIER manuscript test repository? Maybe we can first fix the manubot version (comment above), and then create one branch per each of the unit tests that you wrote in the file tests/config_loader_fixtures. I tried to document this on the testing procedure comment where I put a list of branch names based on your unit tests, and also on other ideas where we use a simple proofreading prompt.

Sure, I can run the remaining GitHub Actions. I'll wait for those PRs above to get merged, then I'll rebase each of the testing branches onto main when that's done, then run the actions.

👍🏻

On a side note, I noticed that the phenoplier manuscript test repo only has two of the branches you mentioned, pr37_test-prompts_are_used and pr37_test-both_prompts_config, and the other two manuscript repos don't have any of them. Perhaps I'm misunderstanding that list; I assumed it was a list of branches that existed, not ones you were considering creating. If it's the latter, I can go ahead and create them from the unit tests you referenced.

Yes, that's what I meant: if you can create those branches based on your unit tests. Sorry for the confusion. Feel free to assess whether we want to add all the unit tests or just a subset. Since the testing on GitHub Actions is more time-consuming for us, it would make sense to include only tests relevant to this context (GitHub Actions) since they are already being tested via unit tests.

I'm unclear on what the test would look like for a "simple proofreading prompt", so if you could supply that I'd appreciate it.

Yes, absolutely. I'll add these after you merge your PRs (with the manubot version update) into the manuscripts.

I think we should first get a polished set of evaluation criteria for one manuscript, and when we are clear on that, we can move to the rest of the manuscripts for testing.

I think that sounds reasonable, although if it's ok with you I'd prefer that you create these evaluation criteria, since I assume you have a better idea of what you intended with the spec and the larger context in which manubot-ai-editor is used. (For example, I had no idea that manubot was checking explicitly for GPT3CompletionModel and not passing the title and keyword arguments if it wasn't exactly that.) If it's not feasible for you to do it, just let me know and I can write them up, no worries.

Yes, I can work on the criteria for testing and then we can review it together. Once we feel we are covering all the functionality that we want to cover, we can move on to the other manuscripts.

We should also write the testing procedure and the testing results in the wiki or something more convenient than the comments here. What do you think?

I totally agree on moving the testing procedure from a comment in the middle of our discussion to either the wiki, or perhaps to the description of this PR so we don't have to search for it. Regarding the testing results, my personal opinion is that they're part of the discussion relating to this PR so I wouldn't mind continuing to post them as comments, but I'm open to moving them somewhere more convenient for you if this is getting unwieldy.

Perfect, I like that. Do you want to move the testing procedure to the PR description? Looks like everyone can edit the PR description, so that's great. I would add, at the top, a "history of changes" (including dates) that gets updated when major changes are included. Regarding the testing results, I like also the idea of having them posted as comments, but I would probably keep a single one, also with a history of changes (including dates).

miltondp

Looks good to me, and I think it's ready for merging and a new release.

…ompts with a single default

…te doc on how to use the custom prompts system.

…RRIDE is specified, added test for when DEFAULT_PROMPT_OVERRIDE is specified.

miltondp

Looks great. I left some edits and comments for you to revise. But this is good to merge already.

docs/custom-prompts.md

miltondp · 2024-07-05T18:59:17Z

docs/custom-prompts.md

+- `{section_name}`: the name of the section (e.g., "introduction", "conclusion"), derived from the filename*
+
+*(\* The mechanism that produces `section_name` is out of the scope of this document, but you can find the implementation in [editor.py](https://github.com/falquaddoomi/manubot-ai-editor/blob/issue-31-customprompts-yaml/libs/manubot_ai_editor/editor.py#L178-L211))*


I think we have a set of fixed, defined sections, right? If the file name indicates any of them, we return one of this fixed set of values. I know you put a link to the implementation, but it might be useful to mention this set of fixed section values here (I left a suggestion above). Also, the link points to your personal repository instead of the official one.

I committed your suggestion, thanks for that. Regarding the link, I linked to my PR (which is in my personal repo) because once this gets merged the line references will be out of date. I think what I'm going to do instead is just describe the mechanism by which it's resolved here, since it's not all that complicated, and just forego linking to the source.

docs/custom-prompts.md

Co-authored-by: Milton Pividori <[email protected]>

…e to none to run, rather than just skipping them if it's not none

falquaddoomi marked this pull request as draft December 13, 2023 21:12

miltondp reviewed Dec 20, 2023

View reviewed changes

tests/config_loader/fixtures/content/00.front-matter.md Outdated Show resolved Hide resolved

falquaddoomi marked this pull request as ready for review January 31, 2024 21:01

miltondp reviewed Feb 2, 2024

View reviewed changes

falquaddoomi mentioned this pull request Mar 6, 2024

Add schema, validation for custom prompt YAML files #40

Open

falquaddoomi and others added 24 commits April 4, 2024 12:22

Adds ManuscriptPromptConfig, which performs filename->prompt resoluti…

0f09bc5

…on via YAML config files.

Integrates prompt->filename resolution into ManuscriptEditor.revise_m…

f2f96a7

…anuscript(), passed down to model-specific prompt resolution via the 'resolved_prompt' argument.

Adds a few tests for ManuscriptPromptConfig

f85aa87

get_prompt_for_filename() now returns "__IGNORE_FILE__" for the promp…

d513eef

…t if it's explicitly ignored. revise_manuscript() now checks for this sentinel and ignores the file.

Adds testing utility mock_unify_open(), which allows us to merge manu…

5704998

…script folders and config folders temporarily for testing. Adds set_directory() helper to temporarily switch folders.

Adds tests for four scenarios mentioned in issue 31. Adds ignore test…

68ac05e

… on the phenoplier full manuscript for the "both files" scenario.

add pre-commit config

de5b3c9

fix python code style

49ab292

remove old test file

21f3c3a

we are doing prompt testing separately now

setup.py: update version

335aace

add DebuggingManuscriptRevisionModel

9b93c08

update DebuggingManuscriptRevisionModel

cc3adb0

update DebuggingManuscriptRevisionModel to pretty print json

40432f0

Fixes issue where default_prompt in ai_revision-config.yaml was treat…

4dac807

…ed as a literal string rather than an index into prompts

Applied typing suggestions from pylance

f3e7a91

Strips prompts of leading and trailing spaces

acb1d3d

Adds warning about both files.matchings and prompts_files being speci…

bbd9b08

…fied. Adds tests to verify the warning is shown.

Updates phenoplier prompt config tests to treat the config prompt ref…

2e7d897

…erence as a reference into prompts, not a literal.

Fixes issue where resolved_prompt's value wasn't being propogated dow…

e8dfb23

…n to all invocations of model.get_prompt()

Applies placeholder replacements to ai_revision-derived prompts, as i…

55972b4

…t was with the other prompts.

Adds a bit more debugging output to DebuggingManuscriptRevisionModel,…

73870ae

… adds a title so we can test the placeholder replacement of {title}.

Changed 'default_prompt' in the tests to refer to a prompt key and no…

264d150

…t be literal text; added the default prompt where it was missing.

Changed invalid field references to items included in the format() ca…

88ba10e

…lls in GPT3CompletionModel.get_prompt()

Adds an end-to-end test that applies custom prompts via the Debugging…

e27a996

…ManuscriptRevisionModel and then looks for the prompts' text in the resulting .md files

DebuggingManuscriptRevisionModel() no longer supplies default title o…

fa28c19

…r keyword args, since we typically want to customize them

falquaddoomi added 5 commits April 18, 2024 17:04

Removes hardcoded front-matter ignore in favor of using other ignore …

1fb08b8

…mechanisms

Adds test for correct title, keywords in resulting prompts.

6c08714

Updates a few DebuggingManuscriptRevisionModel() uses that didn't sup…

98b174d

…ply title, keywords.

Finishes updating test for title inclusion in test_prompts_in_final_r…

8cf6ca8

…esult().

Updates test that previously discounted front-matter to include it, s…

f9f89d1

…ince it's no longer hardcoded as ignored

falquaddoomi added 2 commits April 18, 2024 17:18

Factors out replacements/placeholders dicts into a shared placeholder…

8ea6d51

…s dict

DebuggingManuscriptRevisionModel provides a default title and keyword…

3ae73de

…s so that tests that don't specify one will pass

miltondp self-requested a review July 3, 2024 20:15

miltondp approved these changes Jul 3, 2024

View reviewed changes

falquaddoomi added 2 commits July 3, 2024 21:06

Adds default prompt from MP that overrides section-specific canned pr…

1766a78

…ompts with a single default

Adds info to the root README about configuring prompts. Adds a separa…

c68394a

…te doc on how to use the custom prompts system.

falquaddoomi requested review from miltondp and vincerubinetti July 4, 2024 03:11

Added skips for section-specific prompt tests when DEFAULT_PROMPT_OVE…

de8bb1f

…RRIDE is specified, added test for when DEFAULT_PROMPT_OVERRIDE is specified.

miltondp approved these changes Jul 5, 2024

View reviewed changes

falquaddoomi and others added 4 commits July 5, 2024 17:56

Update docs/custom-prompts.md

6a5dacb

Co-authored-by: Milton Pividori <[email protected]>

Fleshed out section_name description, removed link to personal repo

8867468

Mock-patches DEFAULT_PROMPT_OVERRIDE to None for tests that need it b…

31e8363

…e to none to run, rather than just skipping them if it's not none

Removes DEFAULT_PROMPT_OVERRIDE, reverts tests

310e9c4

falquaddoomi merged commit 7a33085 into manubot:main Jul 9, 2024

falquaddoomi deleted the issue-31-customprompts-yaml branch July 9, 2024 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Prompts via YAML #37

Custom Prompts via YAML #37

falquaddoomi commented Dec 13, 2023 •

edited

Loading

miltondp left a comment •

edited

Loading

miltondp Feb 2, 2024

miltondp commented Feb 14, 2024 •

edited

Loading

miltondp commented Apr 18, 2024

falquaddoomi commented Apr 18, 2024 •

edited

Loading

falquaddoomi commented May 6, 2024 •

edited

Loading

miltondp commented May 7, 2024

falquaddoomi commented May 7, 2024

miltondp commented May 7, 2024

falquaddoomi commented May 7, 2024

miltondp commented May 8, 2024

miltondp left a comment

miltondp left a comment

miltondp Jul 5, 2024

falquaddoomi Jul 6, 2024

		# FIXME: which takes priority, the files collection in ai_revision-config.yaml
		# or the prompt_file? we went with config taking precendence for now

		- `{section_name}`: the name of the section (e.g., "introduction", "conclusion"), derived from the filename*

		(\ The mechanism that produces `section_name` is out of the scope of this document, but you can find the implementation in [editor.py](https://github.com/falquaddoomi/manubot-ai-editor/blob/issue-31-customprompts-yaml/libs/manubot_ai_editor/editor.py#L178-L211))*

Custom Prompts via YAML #37

Custom Prompts via YAML #37

Conversation

falquaddoomi commented Dec 13, 2023 • edited Loading

Usage:

Integration into the existing code:

History of changes

Testing Procedure

Setting up the Manubot AI Editor

Unit tests

Manuscripts used for testing

Local tests: DebuggingManuscriptRevisionModel

GitHub Actions tests (uses GPT3CompletionModel)

Setting up workflow

Kicking off workflow

Assertions

miltondp left a comment • edited Loading

Choose a reason for hiding this comment

miltondp Feb 2, 2024

Choose a reason for hiding this comment

miltondp commented Feb 14, 2024 • edited Loading

Setting up the Manubot AI Editor

Unit tests

Manuscripts used for testing

Local tests: DebuggingManuscriptRevisionModel

GitHub Actions tests (uses GPT3CompletionModel)

Setting up workflow

Kicking off workflow

Assertions

miltondp commented Apr 18, 2024

falquaddoomi commented Apr 18, 2024 • edited Loading

falquaddoomi commented May 6, 2024 • edited Loading

Testing results

History of changes

Tests

miltondp commented May 7, 2024

falquaddoomi commented May 7, 2024

miltondp commented May 7, 2024

falquaddoomi commented May 7, 2024

miltondp commented May 8, 2024

miltondp left a comment

Choose a reason for hiding this comment

miltondp left a comment

Choose a reason for hiding this comment

miltondp Jul 5, 2024

Choose a reason for hiding this comment

falquaddoomi Jul 6, 2024

Choose a reason for hiding this comment

falquaddoomi commented Dec 13, 2023 •

edited

Loading

miltondp left a comment •

edited

Loading

miltondp commented Feb 14, 2024 •

edited

Loading

falquaddoomi commented Apr 18, 2024 •

edited

Loading

falquaddoomi commented May 6, 2024 •

edited

Loading