Unable to Complete vis_match on Large Datasets (>10,000 LaTeX Formulas) #43

BFlameSwift · 2024-11-25T03:37:22Z

Hello, I am encountering an issue when using the Character Detection Matching (CDM) metric as described in your paper "CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation".

The evaluation runs smoothly for small datasets (e.g., 1,000 LaTeX formulas). However, when processing larger datasets (>10,000 formulas), the evaluation fails to complete thevis match step, preventing proper comparison of results. Below is the error output:

2024-11-25 11:24:37 extract bbox done, time cost: 645.438 s
100%|██████████████████████████████████████████████████████████████████| 24600/24600 [00:00<00:00, 189073.52it/s]
/home/user1/miniconda3/envs/cdm_1120/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
/home/user1/miniconda3/envs/cdm_1120/lib/python3.9/site-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide
  ret = ret.dtype.type(ret / rcount)
2024-11-25 11:24:38 calculate metrics done, time cost: 0.151 s
=> process done, mean f1 score: nan

Steps to Reproduce:

Use the provided CDM implementation.
Prepare a dataset of over 10,000 LaTeX formulas.
Run the evaluation pipeline, ensuring the vis_match step is included.

Expected Behavior:

The vis_match step should complete successfully, and the evaluation metrics (e.g., mean F1 score) should be computed without errors.

Actual Behavior:

The evaluation halts during the vis_match step, resulting in an nan (not a number) mean F1 score.

Environment:

Operating System: Ubuntu 20.04
Python Version: 3.9

I suspect the issue might be related to memory usage or internal handling of bounding boxes for large datasets. Any guidance or suggestions on addressing this issue would be greatly appreciated.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to Complete vis_match on Large Datasets (>10,000 LaTeX Formulas) #43

Unable to Complete vis_match on Large Datasets (>10,000 LaTeX Formulas) #43

BFlameSwift commented Nov 25, 2024

Unable to Complete vis_match on Large Datasets (>10,000 LaTeX Formulas) #43

Unable to Complete vis_match on Large Datasets (>10,000 LaTeX Formulas) #43

Comments

BFlameSwift commented Nov 25, 2024

Steps to Reproduce:

Expected Behavior:

Actual Behavior:

Environment: