Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jean/non hf db models #48

Closed
wants to merge 10 commits into from
Closed

Jean/non hf db models #48

wants to merge 10 commits into from

Conversation

jmercat
Copy link
Collaborator

@jmercat jmercat commented Jan 9, 2025

Change some DB functions to handle non-huggingface models

git-subtree-dir: eval/chat_benchmarks/LiveBench
git-subtree-split: 48b283e
ab834ea update livebench releases in download_questions
3055d6a Merge pull request #68 from dylanjcastillo/main
8edea2b update readme
a58e552 Update README.md
a0fc4fe Update changelog.md
1f37ca3 update changelog
fb1380a add changelog
1c779c2 Merge remote-tracking branch 'upstream/main'
85cae69 Update README.md
f050f8c Merge pull request #48 from codepharmer/patch-1
3953c6c fix bug in download_leaderboard
0678e4e bump version
e84e548 add debugging option for AMPS_Hard, math_comp, and web_of_lies_v2
f774d19 handle edge case in tablejoin judging
ee0b887 handle edge case in math olympiad judging
7104134 update api model lists
4de3327 add --livebench-release-option to all runner files
b87e779 Add line separator
174d07c update figure
37b5027 update model_adapter.py
0870422 Update README.md
850d09b update leaderboard
5702272 Merge pull request #30 from johnfelipe/main
ed36110 allow boxed output format for spatial reasoning task
8f7aade add support for nvidia models and remove hard-coded model in vertexai
ffc2364 Create README-es_CO.md
6f3f1e3 add livebench_releases param in download_questions
1fd5165 Remove superfluous arg.
ecc86ce add gpt-4o-2024-08-06
3e8328c Update README.md
4f2dfff add gemini-1.5-pro-exp-0801
2fc42c5 update leaderboard
0ca7c40 Merge pull request #20 from LiveBench/july_release
94fa419 README updates for july release
135ef7c Files changes for release 2
e424f7b add scoring for new spatial reasoning task
8df6cd5 update tablereformat scoring to allow for leading phrase
989b5b9 add other Llama-3.1 models
9226afe fixed mistral-large-2407 categorization from api to open
5963499 add mistral
4a0719c add Llama-3.1 70B and 8B
e921527 add llama-3.1-405b to leaderboard
40c0805 update math_comp scoring
4dc9304 fix bug in gen_model_answer.py
8316b5a update leaderboard and readme
93e3a7d add gpt-4o-mini
1264bb0 Merge pull request #16 from LiveBench/usability_updates_july
95dbca5 fix bug in gen_ground_truth_judgment
9640696 July usability updates
428e1e6 Update README.md
b98e0b6 make web_of_lies_v2 parsing less strict
f6ddc97 remove house_traversal because of ambiguous parsing
ca35a04 allow boxed output format for math_comp
028266e Merge pull request #10 from eltociear/patch-1
7503f15 update leaderboard
5556d06 docs: update DATASHEET.md
fb10035 update leaderboard
24e6c89 add claude-3-5 to anthropic model list
d008154 Merge pull request #8 from LiveBench/dev/colin
ecd8530 update leaderboard
977a06c make house_traversal judging stricter
991124c Update pyproject.toml
fa94aaf Update CONTRIBUTING.md
192baf2 Update README.md
ca372cf Merge pull request #4 from sanjay920/fix_typo
905b71f inside the `tablejoin` task processing utility, handle the case if the `ground_truth` variable is not a dict (isntead a string) and use `ast.literal_eval` to convert it to a dict which is necessary for processing and comparing the ground truth to the llm output
19b94ab Merge pull request #1 from superoo7/patch-1
e896ff2 renamed livebench/show_livebench_result.py -> livebench/show_livebench_results.py
ab063ff feat: add wheel and langdetect to dependencies
03156c0 add documentation
5dd3657 Merge pull request #3 from LiveBench/dev/task_cat_update
2347dec update to LiveBench to - unify the nomenclature of "category" (a meta-group of which at this time there are 6) and "task" (a specific type of questions, of which at this time there are 18). this includes removing the `grouping` key from various files (which was done to hf datasets) - improve the `download_leaderboard.py` script to separate each model's answers into its own jsonl file per task   - update the processing of `instruction_following` tasks in order to manage this change (updates to `livebench/lf_runner/instruction_following_eval/evaluation_main.py` and `livebench/process_results/instruction_following/utils.py - permit trailing `/` when running a command like `python gen_ground_truth_judgment.py --bench-name live_bench/`
14872ce README asset fix
d4ecb47 Update README.md
9ac5f85 Update README.md
0fb5e4a Update README.md
c01596b Delete livebench/if_runner/live_data.py
262cfc1 Delete livebench/if_runner/README.md
05da363 Delete livebench/if_runner/Number_of_Instructions_(IFEval_Live_Bench).pdf
9d45efd Delete livebench/if_runner/utils.py
6b7e52c Update README.md
cc71a6c Update pyproject.toml
113b307 Update README.md
686be1e LiveBench file drop
5edcd77 Create README.md
REVERT: 48b283e [wip] docker build image
REVERT: 326788c Merge pull request #35 from mlfoundations/etashg/check_db_results
REVERT: 6af9066 linted
REVERT: 1e1ba0d edited readme
REVERT: 2603d19 checks if task is already done
REVERT: 4878904 checks if task is already done
REVERT: f8481f5 Merge pull request #34 from sedrick-keh-tri/eval-config-yaml
REVERT: ff7bc85 support reading from config yamls
REVERT: fae070e remove unused arguments
REVERT: 563a312 Merge pull request #33 from mlfoundations/etashg/bibtex
REVERT: 922170d Update README.md
REVERT: e8531fe Merge pull request #32 from mlfoundations/etashg/bs_debug
REVERT: a9d57a7 added bi btex
REVERT: 0bd3a24 added citation
REVERT: 42206b1 reverted some files to main
REVERT: b332fcf added pyproject toml fix
REVERT: f6ae3d9 Merge pull request #30 from SGEthan/main
REVERT: cb19994 fix results missed when uploading to db
REVERT: 5d6ba5a debug successful
REVERT: 639af29 Merge pull request #26 from mlfoundations/jean/parallel_wildbench
REVERT: d0a6946 Update pyproject.toml
REVERT: 591c023 Update pyproject.toml
REVERT: 4b1ddb0 Merge pull request #27 from mlfoundations/negin/clean_mtbench
REVERT: 8d1be1a removed extra files
REVERT: b7b27bb Multithread api call in Wildbench
REVERT: 757bf3b Merge pull request #24 from mlfoundations/etashg/eval_database_fix
REVERT: 9673ee2 added database fix
REVERT: dcfc11d Initial commit

git-subtree-dir: eval/chat_benchmarks/LiveBench
git-subtree-split: ab834ea6532aa042943afc3b228372f88415473d
@jmercat jmercat requested a review from neginraoof January 9, 2025 02:10
@jmercat
Copy link
Collaborator Author

jmercat commented Jan 9, 2025

some code should not have been included

@jmercat jmercat closed this Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant