Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve main source detection (AutoTeX) #468

Open
dginev opened this issue May 7, 2024 · 0 comments
Open

Improve main source detection (AutoTeX) #468

dginev opened this issue May 7, 2024 · 0 comments
Labels
arxiv-integration Issues related to integrating closer with the core arXiv services bug Something isn't working conversion stability
Milestone

Comments

@dginev
Copy link
Owner

dginev commented May 7, 2024

This issue contains a bulk report of 147 article in the no_problems category which come out with empty HTML. A cursory look at a handful showed that was due to picking up the wrong TeX source as the main input.

The file list is attached as a ZIP here:
no-content-no-problem.filelist.txt

With gratitude to Tianning Zhang, who provided the list of empty results, distilled from the ar5iv-04.2024 dataset.

A successful resolution could use them as a mini testbench where we at least check that each produced HTML file has HTML content with a document body.

@dginev dginev added bug Something isn't working arxiv-integration Issues related to integrating closer with the core arXiv services conversion stability labels May 7, 2024
@dginev dginev added this to the Coverage 90% milestone May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arxiv-integration Issues related to integrating closer with the core arXiv services bug Something isn't working conversion stability
Projects
None yet
Development

No branches or pull requests

1 participant