Monday, June 15, 2009

Projected Categories, Players and Measures of Success

Cameron and I have spent the last several weeks acquiring and formatting the projections from Bill James, CHONE, Marcel, PECOTA and ZiPS. We are going to be using the 2007 and 2008 projections from these systems. I will refer to these projections as the constituent projections. Originally, we had planned on using the 2006 projections from these systems too, but as Chone Smith nicely explained in this comment the 2006 projections from CHONE were more experimental than anything else and probably would not be particurally useful. Since we need projections for all systems for all the years that we include in the AggPro analysis we had to eliminate the 2006 projections.

Projected Categories

We will be using the following player performance categories:
Hitters: At Bats, Hits, Runs, Doubles, Triples, Home Runs, RBIs, Strikeouts, Walks and Stolen Bases.
Pitchers: Innings Pitched, Earned Runs, Strikeouts, Walks and Hits.

This is the set of hitter and pitcher performance categories that is common to all the constituent projection systems.

Projected Players

The following lists contain the players AggPro included in the AggPro analysis: 2007, 2008.

The player list for a given year is the list of players that are common to all the constituent projection systems for the year.

AggPro Measures of Success

I also have determined the standard error of the 2007 and 2008 constituent projections from the actual Major League Baseball player performance data in each of the categories. The standard error, as a percentage of the actual population of each category, for each system, is listed below. Double click on the table to enlarge it.

AggPro will be successful if it identifies a single weight to apply to each projection system such that the resulting AggPro projections have less standard error for a given statistical category in a given year than the best constituent system projection for that category, for that year. For example, each category in the AggPro projections for 2007 must have less standard error than corresponding value for the category identified in the righthand most column in the top half of the table. Based on the early results from the simulated annealing optimization it looks like an AggPro cocktail consisting of 2 parts Bill James, 1 part Marcel and 1 part PECOTA comes very close to meeting this goal.

No comments:

Post a Comment