This code gathers basic statistics, career statistics and game logs provided by the NFL on their website for all players past and present. The dataset can be found at https://www.kaggle.com/kendallgillies/nflstatistics.
This code gathers some of the statistics provided by the NFL on their website for all players past and present. Given not all players are assigned a position, the code will use the “find player by name” page to first scrape all of the player’s names, unique url identifier and years played (if retired). The code then scrapes three main groups of statistics (basic statistics, career statistics and game logs) independently of each other and stores the data in various CSV files.
- Base_File_NFL_Stats.py
- Player_Class.py
- Website_to_CSV_Functions (Folder)
- Basic_Stats_Class.py
- Career_Stats_Class.py
- Functions_Needed_For_All_Stats.py
- Game_Logs_Class.py
- NFL_Glossary.py
- Obtain_Players_from_Website.py
- The base file, the player class file and Functions_Needed_For_All_Stats file are all needed to run any of the other parts of the code.
- The file Obtain_Players_from_Website is used to gather the player names and URL identifiers from the NFL website. For retired players the easiest place to gather the years they played is on the webpage used to gather their names and identifiers, so it is also taken and stored at this point in time. For active players the years played are not as easily accessible, but can be gathered at a later time from the basic statistics, career stats or game logs.
- The first main group of statistics is the basic statistics provided for each player and can be obtained through the basic statistics class. The variables are stored in a CSV file titled Basic_Stats.csv along with the player’s name and URL identifier. The variables pulled for each player are as follows:
- Number
- Position
- Current Team
- Height
- Weight
- Age
- Birthday
- Birth Place
- College Attended
- High School Attended
- High School Location
- Experience
- NFL_Glossary.py is used to translate the NFL shorthand found in the table headers. It is needed for both Career_Stats_Class.py and Game_Logs_Class.py.
- The second main group of statistics gathered for each player are their career statistics and can be obtain through the career statistics class. While each player has a main position they play, they will have statistics in other areas; therefore, the career statistics are divided into statistics types. The statistics are then stored in CSV files based on statistic type along with the player name, URL identifier and position (if available). The following are the career statistics types and accompanying CSV file names:
- Defensive Statistics – Career_Stats_Defensive.csv
- Field Goal Kickers - Career_Stats_Field_Goal_Kickers.csv
- Fumbles - Career_Stats_Fumbles.csv
- Kick Return - Career_Stats_Kick_Return.csv
- Kickoff - Career_Stats_Kickoff.csv
- Offensive Line - Career_Stats_Offensive_Line.csv
- Passing - Career_Stats_Passing.csv
- Punt Return - Career_Stats_Punt_Return.csv
- Punting - Career_Stats_Punting.csv
- Receiving - Career_Stats_Receiving.csv
- Rushing - Career_Stats_Rushing.csv
- The final group of statistics is the game logs for each player and can be obtained through the game logs class. The game logs are stored by position and have the player name, URL identifier and position (if available). The following are the game log types and accompanying CSV file names:
- Quarterback – Game_Logs_Quarterback.csv
- Running back – Game_Logs_Runningback.csv
- Wide Receiver and Tight End – Game_Logs_Wide_Receiver_and_Tight_End.csv
- Offensive Line – Game_Logs_Offensive_Line.csv
- Defensive Lineman – Game_Logs_Defensive_Lineman.csv
- Kickers – Game_Logs_Kickers.csv
- Punters – Game_Logs_Punters.csv