Python code for saving the official AO3 data dump into smaller files, filtered by year. (2008-2021)
Check out the official post "Selective data dump for fan statisticians" on AO3 website.
The data released by AO3 comes in two CSV files.
The first includes information about works:
- creation date
- language
- word count
- restricted or not
- complete or not
- associated tag IDs
The second provides the key to the tag IDs:
- tag ID
- tag type (e.g. Warning, Fandom, Relationship)
- tag name (unless the tag has fewer than 5 uses)
- canonical or not
- an approximate number of uses
- merger ID (i.e. the tag's canonical version, if it has one)
The python code Works filters the first file by year and saves to separate, smaller csv files.
The python code Tags filters the second file by type and saves to separate, smaller csv files.
You can download these smaller csv files from MEGA.
More analysis and data visualization can be found on my repo AO3-Data-Vis and my website A Look Into AO3 Data.