evaluate extraction progress in separate call #110

anishk23733 · 2024-10-16T04:44:55Z

why

Changes for improvements to completion checking and progress updating.

The original issue was that in a single extraction step, we provide previously extracted content, progress so far, completed/not completed, DOM elements, etc. all at once and ask the model to give us newly extracted content from DOM elements, new progress, completed/not completed. The issue here is that the model needs to simultaneously decide what is important in the DOM and evaluate its progress based on the information it extracts.

what changed

This PR turns this into three steps: 1. extracting the content from the DOM (without previous progress, previous extraction content, etc.), 2. Refining content to ensure that we do not add redundant/duplicate data and updating the total response, and 3. deciding based on the total extracted content what the new progress is and whether the job is completed. The main idea here is that breaking this up rather than doing it all at once shows improvements and a reduction in hallucinations.

test plan

Running evals. Shows improvement with awareness of the number of outputs in arxiv eval required since progress tracking is improved. Is able to do both of the Github evals.

This shows improvement on both the GitHub tests.

…ss separately from main extraction prompt

…and into complete-cond-extract

kamath · 2024-10-23T05:43:37Z

evals/index.eval.ts

@@ -281,7 +281,9 @@ const extract_last_twenty_github_commits = async () => {

  try {
    await stagehand.page.goto("https://github.com/facebook/react");
+    await stagehand.waitForSettledDom();


could be wrong here, but i thought we're deprecating this?

Got it, wasn't aware—removed. Just included since I saw it in some other tests—I figure we need to remove it there as well?

lib/prompt.ts

kamath

lgtm, if you don't mind also cleaning up other places where we wait for settled dom that would be awesome

Anish Kachinthaya and others added 5 commits October 15, 2024 21:43

add an additional complete condition for extract, and evaluate progre…

514aaf3

…ss separately from main extraction prompt

use original playground

66c2d21

move merge to utils.ts

d16c19e

use tests

4559ab6

revert other evals back to original state

a5c1bd6

anishk23733 marked this pull request as ready for review October 16, 2024 18:31

Anish Kachinthaya and others added 10 commits October 16, 2024 11:57

complete condition not required for number accuracy

22cdf8f

remove completeCondition from evals

f82b220

add duplicate filtering

cfa5afd

formatting updates

e3f5c69

Merge branch 'main' into complete-cond-extract

b7b7ca2

remove console logging

b00388d

Merge branch 'complete-cond-extract' of github.com:browserbase/stageh…

7486c8f

…and into complete-cond-extract

make fields that may be overwritten nullable

1b77a86

refine prompt instead of filter prompt

d3c4461

remove merge, using llm merge

9e052c1

anishk23733 requested review from pkiv and kamath October 22, 2024 22:28

kamath reviewed Oct 23, 2024

View reviewed changes

anishk23733 added 2 commits October 23, 2024 09:58

remove commented filter code

7ef3755

remove waitforsettled

38fd912

anishk23733 requested a review from kamath October 23, 2024 17:02

kamath approved these changes Oct 24, 2024

View reviewed changes

shorten prompts

e27f6ae

anishk23733 merged commit 12f1b35 into main Oct 24, 2024
1 check failed

pkiv deleted the complete-cond-extract branch October 29, 2024 11:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluate extraction progress in separate call #110

evaluate extraction progress in separate call #110

anishk23733 commented Oct 16, 2024 •

edited

Loading

kamath Oct 23, 2024

anishk23733 Oct 23, 2024

kamath left a comment

evaluate extraction progress in separate call #110

evaluate extraction progress in separate call #110

Conversation

anishk23733 commented Oct 16, 2024 • edited Loading

why

what changed

test plan

kamath Oct 23, 2024

Choose a reason for hiding this comment

anishk23733 Oct 23, 2024

Choose a reason for hiding this comment

kamath left a comment

Choose a reason for hiding this comment

anishk23733 commented Oct 16, 2024 •

edited

Loading