Improve main source detection (AutoTeX) #468
Labels
arxiv-integration
Issues related to integrating closer with the core arXiv services
bug
Something isn't working
conversion stability
Milestone
This issue contains a bulk report of 147 article in the no_problems category which come out with empty HTML. A cursory look at a handful showed that was due to picking up the wrong TeX source as the main input.
The file list is attached as a ZIP here:
no-content-no-problem.filelist.txt
With gratitude to Tianning Zhang, who provided the list of empty results, distilled from the ar5iv-04.2024 dataset.
A successful resolution could use them as a mini testbench where we at least check that each produced HTML file has HTML content with a document body.
The text was updated successfully, but these errors were encountered: