Global Terrorism Attacks - predicting the group responsible using Machine Learning
Global Terrorism Database (http://www.start.umd.edu/gtd/) is an open-source database including information on terrorist events around the world from 1970 through 2014.
This project explores attempting to predict which group may have been responsible for a terrorist incident based on information such as weapons used, attack type and the country of the incident.
- Written in Python
2.7
, some alterations might be required for3+
numpy
pandas
scikit-learn
matplotlib
- Jupyter Notebook
- Global Terrorism Database - to avoid hosting the files on GitHub (file size limits apply), you should download the latest version of the Global Terrorism Database. The version can be changed in the config.py file if it has been updated.
The easiest way to retrieve the requirements for this project is with the anaconda/miniconda python distribution, as it simplifies the setup process for scientific computation libraries such as numpy
and scikit-learn
.
If you use python distribution based on anaconda or miniconda based environment, first, install required packages by conda
command:
$ conda install numpy pillow scipy pandas scikit-learn matplotlib pip
Jupyter Notebook can be installed with the Conda installer if you have Anaconda or Miniconda installed:
$ conda install jupyter notebook
Alternatively you can use pip:
$ pip install jupyter notebook
To open the notebook, cd
to the directory that contains your code examples, e.g,.
$ cd ~/directory/GTA
and launch jupyter notebook
by executing
$ jupyter notebook
Jupyter will start in our default browser (typically running at http://localhost:8888/), and you can explore the notebooks from here.
- Run through the steps in 'Data Processing.ipynb' - the CSV files will be created as you step through the notebook
- Run through the steps in FeatureExtraction.ipynb
- GTD Codebook - Further information on the GTD data
- Improve accuracy!
- Plot visualisation of groups classification after PCA/SVD has been applied
- Better feature selection
- Classifier confidence
- Breakdown which features contributed the most to the classification
- Interactice controls for entering prediction data - dropdown list and explanations for WeaponType, TargetType etc, Text box for Year etc.
- Split out common functions into separate file
- Tests! Automated
- Twitter bot - tweet the bot attack details and it returns the probable group responsible