Release Version 2.2.0 · gwu-libraries/TweetSets

Upgrades Python to 3.8 (#126. #131)
Upgrades Spark & pyspark to 3.1 (#128, #117)
Uses the Spark DataFrame API to create full extracts at time of load: Tweet ID's, full Tweet CSV, Tweet mentions, Tweet users (#128)
Re-purposes the original (gzipped) JSONL from SFM to create the full Tweet JSON extract, concatenating the files by date of harvest (#152)
Adds an environment variable for specifying maximum file size for full extracts (#128)
Updates the TweetSets data model to align with twarc v. 1.12 (#150, #128)
Improves the indexing and extraction of full text and hashtags from extended Tweets (#150, #128)
Updates tests to test the Spark schema for creating extracts (#135)
Prevents access to full dataset files by those not authorized (#148)
Installation documentation and docker-compose.yml clarifications (#119, #95, #90)
Updates pinning of Elasticsearch dependencies (#141)
Bugfixes for using flask create-extract command (#125) and checking whether user should be directed to full dataset (#120)
Preventing incorrect date format from being submitted in form (#87)

Provide feedback