Upgrades Python to 3.8 (#126 . #131 )
Upgrades Spark & pyspark to 3.1 (#128 , #117 )
Uses the Spark DataFrame API to create full extracts at time of load: Tweet ID's, full Tweet CSV, Tweet mentions, Tweet users (#128 )
Re-purposes the original (gzipped) JSONL from SFM to create the full Tweet JSON extract, concatenating the files by date of harvest (#152 )
Adds an environment variable for specifying maximum file size for full extracts (#128 )
Updates the TweetSets data model to align with twarc v. 1.12 (#150 , #128 )
Improves the indexing and extraction of full text and hashtags from extended Tweets (#150 , #128 )
Updates tests to test the Spark schema for creating extracts (#135 )
Prevents access to full dataset files by those not authorized (#148 )
Installation documentation and docker-compose.yml clarifications (#119 , #95 , #90 )
Updates pinning of Elasticsearch dependencies (#141 )
Bugfixes for using flask create-extract command (#125 ) and checking whether user should be directed to full dataset (#120 )
Preventing incorrect date format from being submitted in form (#87 )
You can’t perform that action at this time.