ValueError: cannot reindex on an axis with duplicate labels #103

2024WY · 2024-11-13T13:14:39Z

I use local models to run，but in the below line will raise "ValueError: cannot reindex on an axis with duplicate labels".

base_dataset.py:78
self.records.update(records)

Eladlev · 2024-11-13T13:27:45Z

What exactly is your setting?
There are local models that are good as predictors for some tasks, but as optimizers, I strongly recommend using a strong model.

2024WY · 2024-12-04T08:24:41Z

Thank you for your reply， I change to use google-gemini, This error no longer occurs，now my setting is below:

use_wandb: False
dataset:
name: 'dataset'
records_path: null
initial_dataset: ''
label_schema: ["Yes", "No"]
max_samples: 50
semantic_sampling: False # Change to True in case you don't have M1. Currently there is an issue with faiss and M1

annotator:
method: 'llm'
config:
llm:
type: 'google'
name: 'google-gemini'
instruction:
'Assess whether the text contains a harmful topic. Answer Yes if it does and No otherwise.'
num_workers: 1
prompt: 'prompts/predictor_completion/prediction.prompt'
mini_batch_size: 1
mode: 'annotation'

predictor:
method : 'llm'
config:
llm:
type: 'google'
name: 'google-gemini'
model_kwargs: {"seed": 220}
num_workers: 1
prompt: 'prompts/predictor_completion/prediction.prompt'
mini_batch_size: 1 #change to >1 if you want to include multiple samples in the one prompt
mode: 'prediction'

meta_prompts:
folder: 'prompts/meta_prompts_completion'
num_err_prompt: 1 # Number of error examples per sample in the prompt generation
num_err_samples: 2 # Number of error examples per sample in the sample generation
history_length: 4 # Number of sample in the meta-prompt history
num_generated_samples: 10 # Number of generated samples at each iteration
num_initialize_samples: 10 # Number of generated samples at iteration 0, in zero-shot case
samples_generation_batch: 10 # Number of samples generated in one call to the LLM
num_workers: 1 #Number of parallel workers
warmup: 4 # Number of warmup steps

eval:
function_name: 'accuracy'
num_large_errors: 4
num_boundary_predictions : 0
error_threshold: 0.5

llm:
type: 'google'
name: 'google-gemini'
temperature: 0.8

stop_criteria:
max_usage: 2
patience: 10 # Number of patience steps
min_delta: 0.01

But I find the predicition and annotation are all Nan
text prediction annotation metadata score batch_id
id
0 "This film is a rollercoaster of emotions, wit... Yes NaN NaN NaN 0
1 "The plot twists and turns, but the real surpr... Yes NaN NaN NaN 0
2 "The movie is a visual spectacle, but the most... Yes NaN NaN NaN 0
3 "Throughout the film, the tension builds, and ... No NaN NaN NaN 0
4 "The narrative is well-crafted, and the way it... Yes NaN NaN NaN 0
5 "The movie is a thrilling ride, and the climax... Yes NaN NaN NaN 0
6 "The film is a mix of humor and drama, and the... Yes NaN NaN NaN 0
7 "The pacing is excellent, and the most shockin... Yes NaN NaN NaN 0
8 "The movie is a masterpiece, and the way it ha... Yes NaN NaN NaN 0
9 "The storytelling is top-notch, and the most e... Yes NaN NaN NaN 0

so in evaluator.py, here will raise ValueError: unknown is not supported
conf_matrix = confusion_matrix(self.dataset['annotation'], self.dataset['prediction'], labels=self.label

Eladlev · 2024-12-04T14:30:28Z

Can you attach the logs?
There is probably a failure in the calling to the LLM (might be credentials).
In the prediction/annotation phase, we print the errors to the log files and continue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: cannot reindex on an axis with duplicate labels #103

ValueError: cannot reindex on an axis with duplicate labels #103

2024WY commented Nov 13, 2024

Eladlev commented Nov 13, 2024

2024WY commented Dec 4, 2024 •

edited

Loading

Eladlev commented Dec 4, 2024

ValueError: cannot reindex on an axis with duplicate labels #103

ValueError: cannot reindex on an axis with duplicate labels #103

Comments

2024WY commented Nov 13, 2024

Eladlev commented Nov 13, 2024

2024WY commented Dec 4, 2024 • edited Loading

Eladlev commented Dec 4, 2024

2024WY commented Dec 4, 2024 •

edited

Loading