View final project code here!
Research Question:
Is the combination of an mlb free agents age and batting performance, measured by batting average and on base plus slugging percentage, indicative of their yearly salary, and if so, can an algorithm predict a players future contract based on these factors?
Background and Prior Work:
Although there are many statistics that attempt to quantify the offensive ability of Major League Baseball (MLB) players, the on base percentage plus slugging (OPS) average is often regarded as the most telling1. OPS is adept at capturing a player’s overall offensive capabilities because it combines multiple relevant offensive statistics: On Base Percentage, the likelihood a player gets on base, and Slugging Percentage, a measurement of the quality of a player’s hits[^2]. Thus understanding factors that influence a player’s OPS is valuable in that it gives insight into how well a player will perform in the future. While physical attributes like arm strength, speed, and agility can certainly affect a batter’s ability to hit a baseball, professional MLB players often regard baseball as a mental game. Yogi Berra, previously a catcher for the New York Yankees, claimed that “baseball is 90 percent mental; the other half is physical"[^3]. In this project we will attempt to identify whether a player’s contract to an MLB team is a mental factor that affects their batting ability.
MLB players generally desire to sign long, high paying contracts. Although this caps the players potential earnings, it provides job security and a guaranteed payday[^4]. At the same time, the teams focus on signing the best players for the lowest cost possible. Long term contracts can also be desirable for teams because it guarantees that the players will remain on their roster for the specified time period[^4]. However, it is not guaranteed that a player will continue their current production through the length of their contract. In some cases, players exceed their expectations, but in others, players drastically underperform. Whether an increase in performance is due to having job security or whether a drop in performance is due to complacency, it is important for both players and teams to be informed on optimal contract salaries and lengths that will best suit their own interests. Inorder for optimal contracts to be written all relevant factors that give insight into the players future performance must be considered. Thus the factor of how the contract, itself, may influence the players future contributions must be considered as well.
Similar projects to ours have had variable results. In the article, “New Evidence in the Study of Shirking in Major League Baseball”, Richard J. Paulson discovered a negative relationship between years remaining on a baseball player’s contract and performance[^5]. These findings suggest that players generally underperform long contracts due to complacency. While this study is similar to ours, the results are not fully applicable because the contract salary was not taken into account. Another similar study, performed by Matthew J. Cahill, deduced that there is no relationship between contract salary and length and player performance[^6]. It must be noted, however, that Cahill’s sample size was too small to apply these generalizations to the league as a whole.
Hypothesis:
We predict that the higher a player's on-base plus slugging average as well as batting average, in the years before free agency, will increase the average annual value of a players contract. We believe this because a higher OPS and batting average means that as a whole, a player is getting on base a lot through hits and walks as well as getting extra base hits and homeruns. These statistics works together as any stat that applies to OPS, will affect it as a whole. A higher batting average correlates to a higher OPS but also shows that the player is consistently getting base hits. A player with a lower OPS and batting average prior to a new contract signing will get a lower valued contract. These position players are trying to prove their value to a team, and we are tracking this based off of those two main statistics. We will set a limit to determine these statistics by cleaning the data to only show players with a significant number of at-bats. The other main factor we will look at is how age affects contract value. We predict older players, 33 years or older, will not get as high of a contract value since at this age, performance starts to decrease and injuries become more apparent as the body deteriorates. Players at this age or older have generally hit their max potential so younger players can get higher values due to their potential and signing in their prime years. Because of these reasons, we believe an algorithm can predict a player's future salary based on OPS, batting average, and age.
Data: The ideal dataset would include observations describing MLB players during a particular year. Each observation should have the following variables: year, on base percentage plus slugging average, remaining contract length, and current salary. As we will be comparing players’ performance year by year, we need our dataset to contain multiple years worth of observations. For each year, we want as many observations as possible. In the most perfect situation, we want an observation for every active MLB player in the past ten or so years. Most of this information is easily accessible on baseball websites, often even with an option to download as csv. Likely we will download pieces of the data and then combine and store them as a dataframe using the Pandas library. In our search for data we have been successful. On Spotrac (https://www.spotrac.com/mlb/rankings/2022/contract-length/), we found an organized list of all MLB player contracts from 2011-2023. From that data we can extract player contract length and salary amount. On Rotowire (https://www.rotowire.com/baseball/stats.php), we found an organized list of all MLB player batting statistics (including OPS) from 2010-2022. From that site we can extract OPS. Combining both of these datasets should give us the ideal dataset which we can begin to analyze.
Footnotes
-
Pate, A. (24 Jan 2020) 3 Basic MLB Hitting Stats that Define a Great Hitter. The Hitting Vault. https://thehittingvault.com/3-basic-mlb-hitting-stats-that-define-a-great-hitter/ [^2]: Fields, B. (28 Aug 2022) How to Calculate OPS in Baseball. SportsRec. https://www.sportsrec.com/calculate-ops-baseball-2063754.html [^3]: Thompson, A. (26 Oct 2007) Mind Games: What Makes a Great Baseball Player Great. Livescience. https://www.livescience.com/4685-mind-games-great-baseball-player-great.html [^4]: Turvey, J. (Fall 2003) The Future of Baseball Contracts: A Look at the Growing Trend in Long-Term Contracts. SABR. https://www.researchgate.net/publication/347239317_New_Evidence_in_the_Study_of_Shirking_in_Major_League_Baseball [^5]: Paulson, R. (12 Dec 2020) New Evidence in the Study of Shirking in Major League Baseball. Human Kinetics Journal. https://www.researchgate.net/publication/347239317_New_Evidence_in_the_Study_of_Shirking_in_Major_League_Baseball [^6]: Cahill, M. (Summer 2014) Change in Major League Baseball player performance after signing a Long-Term Deal. Fisher Digital Publications. https://www.researchgate.net/publication/347239317_New_Evidence_in_the_Study_of_Shirking_in_Major_League_Baseball ↩