Releases · gwu-libraries/TweetSets · GitHub

22 Sep 12:41

dolsysmith

Version 2.2.0 Latest

Latest

Upgrades Python to 3.8 (#126. #131)
Upgrades Spark & pyspark to 3.1 (#128, #117)
Uses the Spark DataFrame API to create full extracts at time of load: Tweet ID's, full Tweet CSV, Tweet mentions, Tweet users (#128)
Re-purposes the original (gzipped) JSONL from SFM to create the full Tweet JSON extract, concatenating the files by date of harvest (#152)
Adds an environment variable for specifying maximum file size for full extracts (#128)
Updates the TweetSets data model to align with twarc v. 1.12 (#150, #128)
Improves the indexing and extraction of full text and hashtags from extended Tweets (#150, #128)
Updates tests to test the Spark schema for creating extracts (#135)
Prevents access to full dataset files by those not authorized (#148)
Installation documentation and docker-compose.yml clarifications (#119, #95, #90)
Updates pinning of Elasticsearch dependencies (#141)
Bugfixes for using flask create-extract command (#125) and checking whether user should be directed to full dataset (#120)
Preventing incorrect date format from being submitted in form (#87)

Assets 2

17 Mar 19:00

lwrubel

Version 2.1.0

Changes in this release:

Remediates accessibility issues in the UI (#9, #91)
Changes source dataset selection to allow only one dataset to be selected (#85)
Update help guide to reflect UI chances and improve accessibility (#93)
Set up Google Analytics for usage tracking (#55)
User email address required for custom extract; user notified upon extract completion (#94)
Extract options for top users, top mentions, and mentions disabled (#89)
"Top 1000" analytics provided as CSV from dataset statistics page (#88)
Requesting a full extract now redirects to download prepared files (#83)
Prepared full extracts may be created by command-line utility or by requesting them in the UI (if they don't already exist) (#83)
Added an update loader command, which re-reads dataset.json to update the dataset's descriptive metadata (and statistics) (#41)
Added GW footer and cookie consent popup (#82)
Wording improvements in the UI (#80, #81)
Added lang field to indexing of newly-loaded datasets (#39). UI changes (#114) were not done in this release and should only be done after (most?) data sets have been reloaded.

Assets 2

24 Nov 19:31

lwrubel

Version 2.0

Changes in this release:

Major version upgrade to ElasticSearch from 6.2.2 to 7.9.2 and to elasticsearch-dsl and elasticsearch-py Python dependencies (#47, #52, #63)
Upgrades to other dependencies, including Flask and its dependencies, pyspark, and requests (#54, #60, #31)
Alerts user and prevents dataset request if their dataset parameters would produce a dataset of zero tweets (#5)
Replaces date picker with jQuery UI date picker for more complete browser support (#4)
Makes link to dataset zip files more visible by moving to top of page (#3)
Add notice for GW users to use VPN for enhanced access (#48)
Clarify terminology in statistics page (#51)
Indexes tweet language (#39) but only for tweet dataset loads that occur using v2.1. Datasets loaded previously would need to be reloaded. UI and subsetting functionality will be added later in #114.
spark-submit update command to update an existing dataset now also updates metadata (from dataset.json) and statistics (#41)
Compliance: Added GW footer including cookie consent (#82)
Clarified labels for date/time fields to specify UTC (#80)
Clarified wording when subset yields no tweets (#81)
Bug fix for statistics showing zero uses instead of actual number (#49)
Bug fix for error when pressing Enter on dataset name modal input (#35)
Bug fix for unittests for stats (#58)

Assets 2

24 Mar 18:27

lwrubel

Version 1.1.1

Updates to dependencies.

Assets 2

05 Dec 14:44

lwrubel

Version 1.1.0

Updates to dependencies, Docker base images, and tests.

Assets 2

14 Jun 12:58

Version 1.0.0

An initial release of TweetSets, for the purposes of registering with Zenodo.

Assets 2