Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluate extraction progress in separate call #110

Merged
merged 18 commits into from
Oct 24, 2024
Merged

Conversation

anishk23733
Copy link
Contributor

@anishk23733 anishk23733 commented Oct 16, 2024

why

Changes for improvements to completion checking and progress updating.

The original issue was that in a single extraction step, we provide previously extracted content, progress so far, completed/not completed, DOM elements, etc. all at once and ask the model to give us newly extracted content from DOM elements, new progress, completed/not completed. The issue here is that the model needs to simultaneously decide what is important in the DOM and evaluate its progress based on the information it extracts.

what changed

This PR turns this into three steps: 1. extracting the content from the DOM (without previous progress, previous extraction content, etc.), 2. Refining content to ensure that we do not add redundant/duplicate data and updating the total response, and 3. deciding based on the total extracted content what the new progress is and whether the job is completed. The main idea here is that breaking this up rather than doing it all at once shows improvements and a reduction in hallucinations.

test plan

Running evals. Shows improvement with awareness of the number of outputs in arxiv eval required since progress tracking is improved. Is able to do both of the Github evals.

This shows improvement on both the GitHub tests.

@anishk23733 anishk23733 marked this pull request as ready for review October 16, 2024 18:31
@anishk23733 anishk23733 requested review from pkiv and kamath October 22, 2024 22:28
@@ -281,7 +281,9 @@ const extract_last_twenty_github_commits = async () => {

try {
await stagehand.page.goto("https://github.com/facebook/react");
await stagehand.waitForSettledDom();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be wrong here, but i thought we're deprecating this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, wasn't aware—removed. Just included since I saw it in some other tests—I figure we need to remove it there as well?

lib/prompt.ts Outdated Show resolved Hide resolved
@anishk23733 anishk23733 requested a review from kamath October 23, 2024 17:02
Copy link
Contributor

@kamath kamath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, if you don't mind also cleaning up other places where we wait for settled dom that would be awesome

@anishk23733 anishk23733 merged commit 12f1b35 into main Oct 24, 2024
1 check failed
@pkiv pkiv deleted the complete-cond-extract branch October 29, 2024 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants