The content contained in this project iPython notebook is for informational purposes only. You should not construe any such information or other material (the “Materials”) as investment, financial, tax, legal or other advice. The Materials are not investment advice and any observations concerning any security, trading algorithm or investment strategy provided in the project is not a recommendation to buy, sell or hold such investment or security or to make any other investment decisions. The contributors in this project do not provide any advice regarding the nature, potential value, risk or suitability of any particular investment strategy, trading algorithm, transaction, security or investment. Any use of the Materials, and any decisions made in reliance thereon, including any trading or investment decisions or strategies, are made at your own risk. You should seek the advice of a competent, licensed professional if you require investment, trading or other advice. The contributors of this project is not authorized to provide any professional advice in connection with this project and any such advice. Nothing contained in the Materials constitutes a solicitation, recommendation, endorsement, or any third party service provider to buy or sell any securities or other financial instruments in this or in any other jurisdiction whether or not such solicitation or offer would be unlawful under the securities laws of such jurisdiction.
With the use of NLP Sentiment Analysis, we will analyze the tweets regarding Activision Blizzard, CD Projekt Red and NASDAQ 100, and correlate the tweets to their respective stock prices and how it affects it, then score those tweets as good, bad, or neutral. These tweets are from a major negative event in the life of the companies mentioned.
We use this tool to create a prediction signal to be use in our stock price predictor if it actually tracked the rise and fall of its stock prices accurately.
This project leverages python version 3.8.5 with the following packages and modules:
-
pandas - This was used to be able to easily manipulate dataframes.
-
Scikit Learn - version 0.24.2 - This package has a lot of different tools and model that could be use to create a machine learning model.
- linear model - This allows the model to call the Logisitic Regression model to run our machine learning.
-
Numpy - This provides the ability to work with arrays and use different mathematical calculations on arrays.
-
TensorFlow - version 2.6.0 - This is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML-powered applications.
-
Text Vectorization - This transforms a batch of strings (one example = one string) into either a list of token indices (one example = 1D tensor of integer token indices) or a dense representation (one example = 1D tensor of float values representing data about the example's tokens).
-
Keras - version 2.6.0 - This is an API designed for human beings, not machines. Keras follows best practices for reducing cognitive load: it offers consistent & simple APIs, it minimizes the number of user actions required for common use cases, and it provides clear & actionable error messages. It also has a free open source Python library for developing and evaluating deep learning models.
-
models Sequential - This is used appropriately for a plain stack of layers where each layer has exactly one input tensor and one output tensor.
-
ReLU - The rectified linear unit (ReLU) function returns a value from 0 to infinity. This activation function transforms any negative input to 0. It is the most commonly used activation function in neural networks due to its faster learning and simplified output. However, it is not always appropriate for simpler models.
-
Sigmoid - Used to classify the tweets into 0s and 1s.
-
Recurrent Neural Network (RNN) - This is a class of neural networks that is powerful for modeling sequence data such as time series or natural language.
-
-
ALPACA Trade API - This is used to get API trading data for this project. To have access to Alpaca's API key and API secret keys, the user needs to register for personal account and save their own keys.
-
OS module - This provides a portable way of using operating system dependent functionality.
-
python-dotenv Library - This enables the user to read key-value pairs from an .env file and set them as an environment variables.
-
Twint - It is an advanced Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter's API. utilizes Twitter's search operators to let you scrape Tweets from specific users, scrape Tweets relating to certain topics, hashtags & trends, or sort out sensitive information from Tweets like e-mail and phone numbers.
On your web browser (use incognito), copy and paste this link below:
https://colab.research.google.com/notebooks/intro.ipynb?utm_source=scs-index#recent=true
After pulling the ipynb file from github and saving it on your local machine or Google Drive, this link should've a pop-up allowing you to upload your ipynb file.
Since we are using Google Colaboratory, we are installing each our libraries while on the platform. This will allow the user to temporarily use the libraries while on the cloud. As you can see on gif below, it shows you that the installs are run, it will run during your Colab session.
The libraries that we will be installing are the following:
-
To Install Twint:
!pip3 install --user --upgrade git+https://github.com/twintproject/twint.git@origin/master#egg=twint
Note: Twint sometimes fails to load even after installing it on the notebook, so best way to get it to work is to restart the runtime and refresh the Google Colab.
-
To install Alpaca API:
!pip install alpaca-trade-api
-
To install python dotenv:
!pip3 install python-dotenv
- Base - Sequential model
- Recurrent Neural Network (RNN) - Bidirectional LSTM (BiLSTM)
- Recurrent Neural Network 2 (RNN 2)- Bidirectional LSTM (BiLSTM)
Base Model Summary
Base Model Epoch
Base Model Accuracy
RNN2 Model
Prediction Graph
CDPT Trades
After installing the libraries, we have to import the libraries below:
1. Starts with mounting your Google Drive, but since you will have a copy of zip file of Twitter-sentiment, you don't need this mount anymore. But to make sure there aren't any errors, just save the twitter sentiment zip on your Google Drive.
import os
import re
import io
import string
import datetime
import numpy as np
import twint
import nest_asyncio
import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
from tensorflow import keras
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Embedding, GlobalAveragePooling1D
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
from keras.backend import manual_variable_initialization
from dotenv import load_dotenv, find_dotenv
import alpaca_trade_api as tradeapi
3. Once running each model, when the user gets to Tensorboard, which sometimes does not show, please use Incognito browser in this case. Also, all the models both take awhile to load their epochs. So try to use your GPUs when running this notebook.
Click HERE to learn more from our presentation.
Copyright (c) [2021] [UW Fintech Bootcamp: Colin Benjamin | Justine Cho | Chris Henderson | Nathan Patterson]
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.