You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am encountering an issue when using the Character Detection Matching (CDM) metric as described in your paper "CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation".
The evaluation runs smoothly for small datasets (e.g., 1,000 LaTeX formulas). However, when processing larger datasets (>10,000 formulas), the evaluation fails to complete thevis match step, preventing proper comparison of results. Below is the error output:
2024-11-25 11:24:37 extract bbox done, time cost: 645.438 s
100%|██████████████████████████████████████████████████████████████████| 24600/24600 [00:00<00:00, 189073.52it/s]
/home/user1/miniconda3/envs/cdm_1120/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/home/user1/miniconda3/envs/cdm_1120/lib/python3.9/site-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide
ret = ret.dtype.type(ret / rcount)
2024-11-25 11:24:38 calculate metrics done, time cost: 0.151 s
=> process done, mean f1 score: nan
Steps to Reproduce:
Use the provided CDM implementation.
Prepare a dataset of over 10,000 LaTeX formulas.
Run the evaluation pipeline, ensuring the vis_match step is included.
Expected Behavior:
The vis_match step should complete successfully, and the evaluation metrics (e.g., mean F1 score) should be computed without errors.
Actual Behavior:
The evaluation halts during the vis_match step, resulting in an nan (not a number) mean F1 score.
I suspect the issue might be related to memory usage or internal handling of bounding boxes for large datasets. Any guidance or suggestions on addressing this issue would be greatly appreciated.
The text was updated successfully, but these errors were encountered:
Hello, I am encountering an issue when using the Character Detection Matching (CDM) metric as described in your paper "CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation".
The evaluation runs smoothly for small datasets (e.g., 1,000 LaTeX formulas). However, when processing larger datasets (>10,000 formulas), the evaluation fails to complete the
vis match
step, preventing proper comparison of results. Below is the error output:Steps to Reproduce:
Expected Behavior:
The vis_match step should complete successfully, and the evaluation metrics (e.g., mean F1 score) should be computed without errors.
Actual Behavior:
The evaluation halts during the vis_match step, resulting in an nan (not a number) mean F1 score.
Environment:
Operating System: Ubuntu 20.04
Python Version: 3.9
I suspect the issue might be related to memory usage or internal handling of bounding boxes for large datasets. Any guidance or suggestions on addressing this issue would be greatly appreciated.
The text was updated successfully, but these errors were encountered: