Releases: CPSSD/LUCAS
Releases · CPSSD/LUCAS
Sprint 3 (MVP)
Backend:
- Investigated SVMs and logistic regression classifier much deeper
- Finalised our statistical model, at 74.5% accuracy, using reviewer features and SVC with grid search
- Investigated three forms of neural network architectures, FFNN, CNN and RNN, in a POC fashion
- Cross compared the performance of the these three architectures using BOW and word2vec
- Created custom word embeddings over our datasets using Google's word2vec (attached) and Facebook's fastText
- Did an experiment investigating FFNN architectures with BOW and word2vec
- Created and hosted our first neural network model (attached), a FFNN running alongside our SVM returning feature weights
- Remodelled and revamped the wiki for documenting
- Toyed around and read up on Grove, DCU's GPU instances that we will use next semester to train models
- Researched deep learning and neural networks extensively and documented our research in the wiki
Frontend:
- Got rejected by the Yelp API, however..
- Integrated with Google Places API, and used an ensemble of Yelp Fusion and Google Places to return Google reviews
- Set up a NoSQL OO database on our Yelp dataset to make our data queryable, allowing us pseudo-Yelp access as a backup
- Did extensive research on data visualization and color theory, documented in the wiki
- Implemented a word cloud indicating the most important words to a particular classification
- Grouped best and worst classified reviews to make the result easier to read
Sprint 2
Added term weight visualisations to the webapp, along with integrated Yelp search.
Did a whole bunch more experiments on the data with multiple statistical classifiers, and cross-compared accuracies.
Did a bunch of research on the most significant papers in the opinion spam detection field and proposed a novel hypothesis for improving the cutting edge.
Sprint 1
Niall & Stefan
- Conda environment for managing Python dependencies and versions
- Docker environment for replication on any machine
- Jupyter Notebook integration detailing experiments and classifier results and metrics
- Dataset conversion into Protobuffers
- Cross comparison and metrics of 4 classifiers: Naive Bayes, Logistic Regression, k-NN and Linear SVM
- Feature extraction of data to replicate Stanford paper: POS, structural, sentiment
- Understanding of Naive Bayes
- Unit testing with Pytest and linting with Pylint
- Python API serving a /classify endpoint using a pickled classification model to serve webapp results
Kirill:
- Continuous Integration / Continuous Delivery pipeline via CircleCI
- Node backend server hosted on Redbrick machine
- ReactJS web application using Webpack, Bulma.io and Redux
- Unit testing and integration testing using Jest and Chai
- End-to-end communication of frontend and backend API classification model and displaying of results