CityScout: Identifying Safe Cities Based on Crime Statistics

CityScout is a tool designed to help users identify safe cities based on specific demographic factors and crime statistics. Whether you're planning a vacation or considering a move, CityScout provides tailored safety information to meet your unique needs.

Group members: Abhay Sastry, Jessica Eggers, Kunal Mohindra, Rameen Gauher, Sai Karyekar, Suraksha Vinod

Contribution:

Abhay and Suraksha: Data Processing
Kunal and Rameen: Machine Learning
Jessica and Sai: Visualization

Introduction

Safety is a top priority when choosing a place to live or visit. While general crime statistics are widely available, they often fail to address specific demographic concerns. CityScout bridges this gap by providing tailored safety information based on user-selected criteria, such as crime rates against women, children, or tourists.

Dashboard

Explore our interactive dashboard here: CityScout Dashboard

How to Use the Dashboard

Open the CityScout link provided above.
Select your demographic category of interest (e.g., women traveling solo, children, tourists).
View the interactive choropleth map displaying a heat map representing the risk score for different states.
Adjust the population filter using the sliding bar to refine your results.
Explore the list of top 10 safe cities based on the calculated crime risk scores.
Switch to State view to narrow down your results to a particular state and view the county-wise choropleth map with cities reflecting the state filter.
Analyze the top crimes for the selected category or states to make an informed decision.

Technology Stack

Python
Pandas
PySpark
Google Cloud Platform (GCP)
Dolthub (for database management)
Tableau

Project Structure

The project consists of three main components:

Data Processing:
- dolt_test.py: Queries the database hosted on DoltHub (for reference)
- doltSQLCommands.txt: Consists of the commands used to extract relevant tables
- data_merge_and_clean.ipynb: Cleans and merges the tables to get the final file
- merge.py: Merges various datasets extracted for each year or feature (only if you stop with merging all the files using PySpark final_all_df.write.csv('gs://6242filteringbucket/final_all.csv', header=True) in the data_merge_and_clean.ipynb notebook; if you follow the notebook, this is not needed)
Machine Learning:
- Risk_Score_Calcuation.ipynb: Script that uses the processed data calculates all of the risk scores for each person and each city based on a person's demographic attributes
- Offense_By_Categories.ipynb: Script that uses the processed data to find the crimes most associated with victims that are of a specific demographic
Tableau Dashboard:
- Interactive visualization of processed data

Data Source

The final dataset (~2GB) used for analysis is available on Google Drive: Merged Dataset Link

Development

Data processing operations were performed using Python's PySpark and Pandas on the Google Cloud Platform to handle the large dataset size and computational requirements. The original dataset can be found here Dataset Link

Install Dolt by building from source.
Clone the FBI NIBRS data repository hosted on Dolthub here using the command dolt clone Liquidata/fbi-nibrs. Please keep in mind that this is a 1.1TB database, so ensure you have sufficient storage space.
Extract the relevant information for the tables using the commands present in the doltSQLCommands.txt file.
Run the data_merge_and_clean.ipynb script to get the necessary columns and merge the files to get the final output file for further analysis. The script uses PySpark. Ensure the file path is correct when running the script.
The final2017_2021_merged.csv file was used in further analyses.

Alternatively, please feel free to access the data used for our tool, which is linked under Data Source.

Machine Learning and Regression Analysis

We employ TF-IDF and random forest to:

Filter out relevant offenses for each demographic category using machine learning. To classify offenses into categories:

Download the Merged Dataset Link to your Google Drive
Run the Offense By Categories.ipynb notebook to categorize offenses into relevant categories.
Download the offense_classifications_by_category.csv file produced by the script

Calculate risk scores based off of key features in the dataset(e.g. population, age_num, sex_code, race, resident_status_code, offense_name) To calculate the risk scores for the dataset:

Download the Merged Dataset Link to your Google Drive
Run the Risk_Score_Calculation.ipynb script to find the risk scores for each person in each city based on their demographic attributes
Download the FinalCityDataRiskScores.csv file produced by the script

Limitations

The current dataset is static and does not include real-time updates.
Some states are not included in the dataset due to data availability constraints.
The dashboard is currently available only as a Tableau visualization.

Future Enhancements

Incorporate real-time data updates
Expand demographic categories
Incorporate missing states
Develop mobile applications for on-the-go access

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CityScout: Identifying Safe Cities Based on Crime Statistics

Introduction

Dashboard

How to Use the Dashboard

Technology Stack

Project Structure

Data Source

Development

Machine Learning and Regression Analysis

Limitations

Future Enhancements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Offense_By_Categories.ipynb		Offense_By_Categories.ipynb
README.md		README.md
Risk_Score_Calculation.ipynb		Risk_Score_Calculation.ipynb
data_merge_and_clean.ipynb		data_merge_and_clean.ipynb
doltSQLCommands.txt		doltSQLCommands.txt
dolt_test.py		dolt_test.py
merge.py		merge.py

surakshavinod/CityScout

Folders and files

Latest commit

History

Repository files navigation

CityScout: Identifying Safe Cities Based on Crime Statistics

Introduction

Dashboard

How to Use the Dashboard

Technology Stack

Project Structure

Data Source

Development

Machine Learning and Regression Analysis

Limitations

Future Enhancements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages