Predicting Baseball Player Performance with OLS Regression and Out-of-Sample Forecasting Pubblico

Treiman, Lauren (Spring 2020)

Permanent URL: https://etd.library.emory.edu/concern/etds/1c18dg883?locale=it
Published

Abstract

Objective: To assist Major League Baseball (MLB) teams in contract negotiations by better predicting players’ future performance.

Methods: Players from 1871 – 2018 with at least 7 years of MLB experience were analyzed to determine the most important factors affecting their performance. I used wins above replacement (WAR) as my dependent variable to measure players’ value and Ordinary Least Squares (OLS) regression to predict players’ future WAR. Initially, players from the 2010s were ana- lyzed with out of sample forecasting by comparing players with similar WAR. Multiple regression models of comparable players were then developed from different decades with 1-6 years of past experience. Future performance for multiple seasons were then predicted for players competing in the early 2010s by using comparable players who played in the last 3 decades (1990s-2010s) with 6 years of past experience. To best reflect the contract negotiation process, only the sample’s actual WAR from their first 6 years in the MLB was considered to predict the rest of their career. Thus, WAR predictions for their 7th, 8th, ... years were used to predict performance towards the end of their career.

Results: The model developed was most accurate when only analyzing the 3 most recent decades of past players (players since the 1990s for batters in 2010s) in conjunction with the past 6 WAR values. The regression model constructed was within 2 WAR from the actual WAR and was able to accurately predict player’s performance trends throughout their career. My model should help teams by providing additional information that will improve evaluation of a player’s performance for the next four years after seven years in MLB.

Table of Contents

1 Introduction 1

1.1 Contracts ............................................ 1

1.2 Alternative Projection Methods ......... 2

2 Methods 6

2.1 Datasets.............................................. 6

2.2 Variables............................................. 7

2.3 Predictive Metric Analysis ................. 13

2.4 Player Classification .......................... 14

2.5 The Model........................................... 16

3 Results 18

3.1 Groups................................................ 18

3.2 Handedness ....................................... 23

3.3 Predictive Metrics............................... 24

3.4 Single Year Predictions....................... 27

3.5 Multi-Year Predictions ....................... 33

4 Discussion 35

4.1 Background ........................................ 35

4.2 Approach ............................................ 35

4.3 Rate of Improvement.......................... 36

4.4 Handedness ........................................ 37

4.5 Predictive Metrics............................... 37

4.6 Implications........................................ 40

4.7 Limitations ......................................... 40

4.8 Future Approaches ............................. 41

5 Conclusion 43

References 44

Appendix 47

About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Parola chiave
Committee Chair / Thesis Advisor
Committee Members
Ultima modifica

Primary PDF

Supplemental Files