Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: cannot reindex on an axis with duplicate labels #103

Open
2024WY opened this issue Nov 13, 2024 · 3 comments
Open

ValueError: cannot reindex on an axis with duplicate labels #103

2024WY opened this issue Nov 13, 2024 · 3 comments

Comments

@2024WY
Copy link

2024WY commented Nov 13, 2024

I use local models to run,but in the below line will raise "ValueError: cannot reindex on an axis with duplicate labels".

base_dataset.py:78
self.records.update(records)

@Eladlev
Copy link
Owner

Eladlev commented Nov 13, 2024

What exactly is your setting?
There are local models that are good as predictors for some tasks, but as optimizers, I strongly recommend using a strong model.

@2024WY
Copy link
Author

2024WY commented Dec 4, 2024

Thank you for your reply, I change to use google-gemini, This error no longer occurs,now my setting is below:

use_wandb: False
dataset:
name: 'dataset'
records_path: null
initial_dataset: ''
label_schema: ["Yes", "No"]
max_samples: 50
semantic_sampling: False # Change to True in case you don't have M1. Currently there is an issue with faiss and M1

annotator:
method: 'llm'
config:
llm:
type: 'google'
name: 'google-gemini'
instruction:
'Assess whether the text contains a harmful topic. Answer Yes if it does and No otherwise.'
num_workers: 1
prompt: 'prompts/predictor_completion/prediction.prompt'
mini_batch_size: 1
mode: 'annotation'

predictor:
method : 'llm'
config:
llm:
type: 'google'
name: 'google-gemini'
model_kwargs: {"seed": 220}
num_workers: 1
prompt: 'prompts/predictor_completion/prediction.prompt'
mini_batch_size: 1 #change to >1 if you want to include multiple samples in the one prompt
mode: 'prediction'

meta_prompts:
folder: 'prompts/meta_prompts_completion'
num_err_prompt: 1 # Number of error examples per sample in the prompt generation
num_err_samples: 2 # Number of error examples per sample in the sample generation
history_length: 4 # Number of sample in the meta-prompt history
num_generated_samples: 10 # Number of generated samples at each iteration
num_initialize_samples: 10 # Number of generated samples at iteration 0, in zero-shot case
samples_generation_batch: 10 # Number of samples generated in one call to the LLM
num_workers: 1 #Number of parallel workers
warmup: 4 # Number of warmup steps

eval:
function_name: 'accuracy'
num_large_errors: 4
num_boundary_predictions : 0
error_threshold: 0.5

llm:
type: 'google'
name: 'google-gemini'
temperature: 0.8

stop_criteria:
max_usage: 2
patience: 10 # Number of patience steps
min_delta: 0.01

But I find the predicition and annotation are all Nan
text prediction annotation metadata score batch_id
id
0 "This film is a rollercoaster of emotions, wit... Yes NaN NaN NaN 0
1 "The plot twists and turns, but the real surpr... Yes NaN NaN NaN 0
2 "The movie is a visual spectacle, but the most... Yes NaN NaN NaN 0
3 "Throughout the film, the tension builds, and ... No NaN NaN NaN 0
4 "The narrative is well-crafted, and the way it... Yes NaN NaN NaN 0
5 "The movie is a thrilling ride, and the climax... Yes NaN NaN NaN 0
6 "The film is a mix of humor and drama, and the... Yes NaN NaN NaN 0
7 "The pacing is excellent, and the most shockin... Yes NaN NaN NaN 0
8 "The movie is a masterpiece, and the way it ha... Yes NaN NaN NaN 0
9 "The storytelling is top-notch, and the most e... Yes NaN NaN NaN 0

so in evaluator.py, here will raise ValueError: unknown is not supported
conf_matrix = confusion_matrix(self.dataset['annotation'], self.dataset['prediction'], labels=self.label

@Eladlev
Copy link
Owner

Eladlev commented Dec 4, 2024

Can you attach the logs?
There is probably a failure in the calling to the LLM (might be credentials).
In the prediction/annotation phase, we print the errors to the log files and continue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants