Friday, May 22, 2009

Welcome to AggPro

Welcome to the home of AggPro: The Aggregate Projection System. I have tried to anticipate commonly asked questions about AggPro and provide answers to them below. As AggPro is developed, status updates and results will be posted on this blog. If you have any questions regarding AggPro feel free to contact me. Cameron Snapp will be assisting me with the development of AggPro.

What is AggPro?

AggPro is a proposed Major League Baseball (MLB) player projection system. Currently there exist many different projection systems that predict the performance of MLB players in a variety of different statistical categories. The goal of AggPro is to aggregate the MLB player predictions from the existing systems into a single more accurate prediction. AggPro will use the following projection systems, Bill James Handbook, CHONE, Marcel, PECOTA and ZiPS, for the years 2006, 2007, and 2008 to form the aggregated projections.

Why do we need yet another projection system?

It is important to note that AggPro is not just another projection system. Instead it is a methodology for aggregating effective projections from different systems into a single more accurate projection. That said, we probably don't need another projection system. It appears we are reaching the limit of the accuracy that can be expected from projection systems. While AggPro may improve the accuracy of existing systems I do not expect it to improve the state of the art significantly. However, Greg Rybarczyk believes paradigm shifts that will improve the accuracy of projection systems are on the horizon. If paradigm shifting projection systems are developed, the AggPro methodology will be applicable to improve these systems as well.

My interest in aggregating predictions from effective projection systems to form a better prediction stems from BellKor, the leading solution to the Netflix Prize. In October, 2006 Netflix released a dataset of anonymous movie ratings and challenged researchers to develop systems that could beat the accuracy of its recommendation system, Cinematch. A grand prize, known as the Netflix Prize, of $1,000,000 will be awarded to the first system to beat Cinematch by 10%. The BellKor prediction system, with 8.26% improvement over Cinematch, is the leading solution. BellKor employs 107 different models of varying approaches and uses mathematical optimization methods to weight each prediction. From the weighted predictions of the 107 models BellKor forms its aggregate recommendation. I was curious how well this strategy would work when applied to MLB player projection systems, thus AggPro was born.

How will AggPro work and how will it be evaluated?

AggPro will employ a combination of mathematical optimization methods including Hill climbing, Genetic algorithms, and Simulated annealing to determine the weight for each projected statistic in each system that yields the AggPro projections with the least root mean square error (RMSE) from the MLB players’ actual performance in the 2006, 2007, and 2008 seasons. Using the same weights AggPro will provide projections for the 2009 MLB season.

All the projection systems (AggPro, Bill James Handbook, CHONE, Marcel, PECOTA and ZiPS) will be evaluated against the root mean square error from the actual player performance for the 2006, 2007, 2008 and 2009 MLB seasons.

When will AggPro be presented?

AggPro has been selected to be presented at the 2009 Society for American Baseball Research Convention (SABR 39) in Washington, D.C. on July 29th - August 2nd. The abstract proposing AggPro is available here.