Releases: gwu-libraries/TweetSets
Releases · gwu-libraries/TweetSets
Version 2.2.0
- Upgrades Python to 3.8 (#126. #131)
- Upgrades Spark & pyspark to 3.1 (#128, #117)
- Uses the Spark DataFrame API to create full extracts at time of load: Tweet ID's, full Tweet CSV, Tweet mentions, Tweet users (#128)
- Re-purposes the original (gzipped) JSONL from SFM to create the full Tweet JSON extract, concatenating the files by date of harvest (#152)
- Adds an environment variable for specifying maximum file size for full extracts (#128)
- Updates the TweetSets data model to align with twarc v. 1.12 (#150, #128)
- Improves the indexing and extraction of full text and hashtags from extended Tweets (#150, #128)
- Updates tests to test the Spark schema for creating extracts (#135)
- Prevents access to full dataset files by those not authorized (#148)
- Installation documentation and docker-compose.yml clarifications (#119, #95, #90)
- Updates pinning of Elasticsearch dependencies (#141)
- Bugfixes for using flask create-extract command (#125) and checking whether user should be directed to full dataset (#120)
- Preventing incorrect date format from being submitted in form (#87)
Version 2.1.0
Changes in this release:
- Remediates accessibility issues in the UI (#9, #91)
- Changes source dataset selection to allow only one dataset to be selected (#85)
- Update help guide to reflect UI chances and improve accessibility (#93)
- Set up Google Analytics for usage tracking (#55)
- User email address required for custom extract; user notified upon extract completion (#94)
- Extract options for top users, top mentions, and mentions disabled (#89)
- "Top 1000" analytics provided as CSV from dataset statistics page (#88)
- Requesting a full extract now redirects to download prepared files (#83)
- Prepared full extracts may be created by command-line utility or by requesting them in the UI (if they don't already exist) (#83)
- Added an
update
loader command, which re-readsdataset.json
to update the dataset's descriptive metadata (and statistics) (#41) - Added GW footer and cookie consent popup (#82)
- Wording improvements in the UI (#80, #81)
- Added
lang
field to indexing of newly-loaded datasets (#39). UI changes (#114) were not done in this release and should only be done after (most?) data sets have been reloaded.
Version 2.0
Changes in this release:
- Major version upgrade to ElasticSearch from 6.2.2 to 7.9.2 and to elasticsearch-dsl and elasticsearch-py Python dependencies (#47, #52, #63)
- Upgrades to other dependencies, including Flask and its dependencies, pyspark, and requests (#54, #60, #31)
- Alerts user and prevents dataset request if their dataset parameters would produce a dataset of zero tweets (#5)
- Replaces date picker with jQuery UI date picker for more complete browser support (#4)
- Makes link to dataset zip files more visible by moving to top of page (#3)
- Add notice for GW users to use VPN for enhanced access (#48)
- Clarify terminology in statistics page (#51)
- Indexes tweet language (#39) but only for tweet dataset loads that occur using v2.1. Datasets loaded previously would need to be reloaded. UI and subsetting functionality will be added later in #114.
spark-submit update
command to update an existing dataset now also updates metadata (fromdataset.json
) and statistics (#41)- Compliance: Added GW footer including cookie consent (#82)
- Clarified labels for date/time fields to specify UTC (#80)
- Clarified wording when subset yields no tweets (#81)
- Bug fix for statistics showing zero uses instead of actual number (#49)
- Bug fix for error when pressing Enter on dataset name modal input (#35)
- Bug fix for unittests for stats (#58)
Version 1.1.1
Updates to dependencies.
Version 1.1.0
Updates to dependencies, Docker base images, and tests.
Version 1.0.0
An initial release of TweetSets, for the purposes of registering with Zenodo.