Thursday, November 25, 2010

Analysis of the hire of Orioles new hitting coach: Jim Presley

I have a piece up on Baltimore Sports Report which takes a look at how Jim Presley fits into the development of Adam Jones and Matt Wieters. Feel free to check it out or ignore.

Tuesday, November 16, 2010

Analysis of the hire of Orioles new pitching coach Mark Connor

I took a closer look at the Orioles new pitching coach Mark Connor over at Baltimore Sports Report. Please mock, enjoy or ignore if you get a chance.

Thursday, October 7, 2010

Juan Samuel Piece on BSR

I wrote a retrospective piece on BSR about the 2010 Juan Samuel Era in Baltimore. Its available here. I tried to have fun with it but noticed at least one glaring typo afterwards. Regardless I enjoyed taking a break from the more technical writing I have previously done. Per usual, feel free to mock or ignore.

Wednesday, September 15, 2010

Is this the new and improved Chris Tillman?

I have a piece up on Baltimore Sports Report about the Orioles' Chris Tillman. The article examines Tillman's two good starts since his Sept. 1 call up to see if his recent success is due to improved skills or good fortune experienced during a small sample size.

Wednesday, July 28, 2010

2010 Orioles Starting Rotation - Part 2

Part 2 of my examination of the Orioles starting rotation in pitcher friendly 2010 is up at baltimoresportsreport.com. If you get a chance head on over and check it out.

Monday, July 26, 2010

The 2010 Orioles Starting Rotation

I've started writing over at BaltimoreSportsReport.com. My first article is a two part series on how the Orioles starting rotation has performed in the year of the pitcher. Its one of my first forays into non academic writing so bear with me. I promise it will get better.

Tuesday, April 20, 2010

Fantasy Valuation

Abstract— Currently when drafting fantasy players, managers are reliant upon the rankings as they exist in their platform (ex: Yahoo, ESPN, CBSSports), regardless of the league’s actual format or statistic categories. Managers must then use a variety of other information to pre-rank players for a customized league. As an additional analysis step to “AggPro: The Aggregate Projection System” we set out to find a way to rank hitters and pitchers across any league format, with a strict mathematical basis. The following article proposes a valuation metric applicable to any Rotisserie style league.

I. INTRODUCTION
Pre-ranking fantasy players for an annual draft is an overwhelming task. Managers can consume a wealth of information, such as ESPN’s annual draft kit[1], Yahoo’s position primers[2], CBSSports’ fantasy depth charts[3], MockDraftCentral’s Average Draft Pick (ADP) data[4], Athlon Sports’ annual baseball preview magazine[5] and countless other sources of fantasy material. These resources provide analysis and rankings of players based on projected performance. In addition, managers also consider the projection systems for the statistics themselves. As discussed in “AggPro: The Aggregate Projection System”[6], there are several projection systems available, thus bestowing managers with the task of choosing which valuation system to follow. In our development of the AggPro category weights, we expanded our research, creating a standard way to turn these projections into a single value for each player which can then be used to pre-rank players for a draft.
There are three core problems with the existing valuation systems: 1) Ranking systems are inconsistent across platforms. 2) Projections themselves need to be balanced against valuation information. 3) Subjective projections (source data systems discussed in AggPro) should be objectively weighted for valuation. Using AggPro to produce a more accurate 2010 projections, it is possible to evaluate all players objectively for any league format. Our solution as discussed in this paper is also threefold: 1) Apply a valuation metric, based on standard deviation, to determine hitter and pitcher value in each projection category. 2) Determine the “overall” value by using the customized league scoring settings. 3) Apply ADP information to suggest draft value coupled with statistical value.
With this valuation formula, managers can use any projections in tandem with the specific categories for the league, to objectively define how valuable players are for that league format. These values when ordered descending produce a pre-ranked list of players, but not necessarily in the best order to draft them. Leveraging ADP data in conjunction with these values will best prepare managers for draft day. This analysis is not new; many have contributed content across the web with articles discussing this process. The remainder of this paper focuses on how to 1) gather the information 2) calculate the valuation metric and 3) organize these values in conjunction with ADP data to have a more precise list of pre-ranked players for their specific league.

II. COLLECTING PROJECTION DATA
This research uses the AggPro projections, but any source can be used for this process, provided that it projects enough players for the draft and stats for each of the categories. Some systems neglect to project ERA, but do provide ERs and IPs, so ERA can be easily calculated. For more unique leagues that use categories such as fielding errors or grounded into double plays, finding a projection system that has this data may be more difficult. The analysis uses the following 8x8 rotisserie categories:
Hitters R HR RBI SB K Avg. OBP. SLG.
Pitchers IP W L Sv K ERA WHIP K/BB
Because this 8x8 league uses additional categories, the generic rankings that Yahoo! or ESPN provide do not provide a full picture of player value.
After finding a usable projection data set, store the values in a spreadsheet or database. The next section will detail the additional mathematical steps to creating the valuation metric. The valuation metric will change depending on the field of players considered, so it is best to create the valuation for all players regardless of league size. For this analysis, the minimum benchmarks were hitters projecting to more than 200 at-bats and pitchers with at least 100 innings pitched or at least 4 saves recorded. This provided ample data for most league formats, while also focusing on players who are fantasy relevant.

III. DETERMINING EACH PLAYER/CATEGORY VALUE
For each category, create an output column that calculates the following:
Player’s Projected Stat - Average (All Players’ Projections)
Standard Deviation (All Players’ Projections)

In Excel: =( - AVERAGE($$1:$$)) / STDEV($$1:$$)

The result of calculating how far above or below a player’s stat is from the average, then compared to the standard deviation, provides a positive or negative distance from zero, which translates to that player’s value for that statistic. The larger the value, the greater the player’s impact on that rotisserie category. Next, create this stat value for each player and stat category in the league. Summing these creates a single value that translates into a player’s value in the league. ESPN’s top five 2010 ranked players are shown below[7]:
Projections:
Player R HR RBI K SB AVG OBP SLG
Albert Pujols 114 43 122 62 11 0.337 0.452 0.647
Hanley Ramirez 113 29 91 106 34 0.324 0.4 0.546
Alex Rodriguez 110 39 125 124 21 0.3 0.402 0.563
Ryan Braun 110 37 116 125 17 0.315 0.38 0.576
Chase Utley 114 30 97 111 18 0.29 0.396 0.52

Values:
Player Run Val HR Val RBI Val K Val SB Val Avg Val OBP Val SLG Val
Albert Pujols 1.95 2.20 1.98 1.32 -0.17 2.82 3.75 3.37
Hanley Ramirez 1.88 0.71 0.41 -0.12 1.61 2.09 1.57 1.31
Alex Rodriguez 1.67 1.78 2.13 -0.70 0.60 0.75 1.66 1.66
Ryan Braun 1.67 1.56 1.67 -0.74 0.29 1.59 0.74 1.92
Chase Utley 1.95 0.82 0.71 -0.28 0.37 0.18 1.41 0.78

Totals:
Player 8x8 Value 5x5 Value
Albert Pujols 17.22 8.78
Hanley Ramirez 9.47 6.70
Alex Rodriguez 9.54 6.93
Ryan Braun 8.71 6.79
Chase Utley 5.94 4.03

In looking at the resulting data set, for the 8x8 (and a standard 5x5) league, Alex Rodriguez is slightly better than Hanley Ramirez, but four other players actually come in higher than Utley (data not shown). The most notable finding is that in the 8x8, Albert Pujols is almost twice as valuable as the next best player.

IV. RANKING THE VALUE
When evaluating this valuation process, we found that value of the projections varies in accuracy depending on 1) size of the dataset, 2) degree of range between projections for the same stat, 3) a source system’s favoring of a certain statistic. While these factors impacted comparing the calculated value, ranking players 1 to n based on this value proved to be more accurate than any system in any year. The resulting new column of data represents the order of value, which is can now be used in conjunction with the ADP data to provide the final pre-ranked list.
We also chose to add a “rank round” column which is simply the round in which that player should go according to the rank of the value. The formula to calculate a round based on some rank:

((Rank – (Rank MOD x )) / x ) + 1
In Excel: =(-MOD(, x ))/ x +1

In the above, x represents the number of teams drafting.

V. ADDING IN ADP
Like projection data, ADP data is readily available online from a variety of sources. For our research we combined the 2010 ADP from Yahoo[8], ESPN[9], and KFFL[10] to obtain an average ADP. This ADP data is most useful when comparing a ranked value. To compare the ranked rounds and the ADP rounds, calculate the ADP round using the same formula as in section IV. With the addition of this information, the full data set output includes seven columns:
Player Position Value Rank Rank Round ADP ADP Round

With this dataset, it is then easy to highlight players with a net difference in projected rank round to ADP round > 1. Those with a positive difference are good values for drafting (have a value higher than that of the ADP). Those with a negative difference are not as valuable (are being drafter sooner in ADP than their value projects).
With this indicator in place, it is thus easier to rank players in an order based on 1) his value to the league 2) how soon he will commonly be drafted in a league. For our 2010 data, it’s obvious that Albert Pujols and Ryan Braun will go in the first round. However, the interesting comparisons come in for those that meet the criteria above:
Player Position Value Rank Rank Round ADP ADP Round
Haren SP 7.85 9 1 50.2 6
Sandoval 3B/1B 5.415 18 2 33 4
Kemp OF 4.331 26 3 8 1
Roberts 2B 0.911 81 9 43 5

This table shows that Dan Haren and Pablo Sandoval are good draft values, especially at their ADP levels, but Matt Kemp and Brian Roberts are commonly taken before their value dictates. Obviously, this is due to the custom league format. In a standard 5x5, this data value data will be different (stolen bases are not as valuable in an 8x8). However, this difference is the purpose of research, to break down the barriers of generic value rankings and create a more quantitatively sound and accurate valuation applicable to any league.

VI. LIMITATIONS
There are a few important points to make regarding limitations of the process. This process does not consider health concerns, supporting teammates, or potential upside/”breakout” players. Such measures are usually applied to the projection data source. This process does not consider position scarcity. Other systems, in which analysts rank players manually, likely take this into account. Admittedly, this is likely a factor as to why Utley is ranked 4th in section III but projects slightly lower. This process does not necessarily dictate a manager’s draft. The use of this indicator still requires subjective choices. For example, managers should not draft Dan Haren 9th overall because he projects that high in terms of value. However, using this process, a manager can make an educated decision in drafting him in the fifth round, just before most draft him in the sixth. Thus, that manager gets first round value in the fifth round. Likewise, waiting until the ninth round to select Brian Roberts will result in him not being available. Managers may choose to draft him sooner because of position scarcity, this research provides an indicator of what his true value is in this league format, rather than assuming it is high because common platforms say so. Even though the process thus far has focused on custom formats, it can be repeated for a standard 5x5 league to provide a sanity check against the default rankings of a drafting platform.

VII. CONCLUSION
By starting with a set of projection data, managers can evaluate players based on the statistics relevant to any custom league using a series of valuation metrics. Ranking players according to a common value metric provides useful insight for fantasy drafts. This ranking is further enhanced by including a check against ADP data to highlight players who project above or below their ADP.

VIII. REFERENCES
[1] http://games.espn.go.com/frontpage/flbdraftkit.
[2] http://sports.yahoo.com/fantasy/mlb/news?slug=be-posprimer08-rp
[3] http://cbssports.com/mlb/depth-charts
[4] http://www.mockdraftcentral.com/report_adp.jsp?period=0&sport=1&type=218&color=1
[5] http://www.athlonsports.com/store/index.php?cpath=33_274
[6] http://aggpro.blogspot.com
[7] At time of authoring, AggPro data for 2010 was not yet available. ESPN projections available: Valuation calculations may differ due to size of sample set.
[8] http://baseball.fantasysports.yahoo.com/b1//draftanalysis
[9] http://games.espn.go.com/flb/livedraftresults
[10]http://www.kffl.com/static/programs/baseball/compilations/mlb_adp_report.php

Monday, March 8, 2010

2010 AggPro Weights

For those of you playing at home, your AggPro weights for 2010 are as follows:

Bill James = .4266
CHONE = 0.0
Marcel = .2851
PECOTA = .2833
ZiPS = 0.0

To review, in 2009 the weights were:

Bill James = .37
CHONE = 0.0
Marcel = .35
PECOTA = .28
ZiPS = 0.0

And in 2008 the weights were:

Bill James = .56
CHONE = 0.0
Marcel = .15
PECOTA = .29
ZiPS = 0.0

We adapted a polynomial time matrix least-squares minimization algorithm to solve the AggPro problem this year instead of simply performing a brute force search across the search space. The new algorithm has allowed for more precise weight sets (4 digits vs. 2 digits) and a two order of magnitude improvement in speed (seconds vs. hours).

We are experimenting with finer grained weight sets this year. For example, instead of only allowing one weight for all the categories what if we allowed a separate weight set for each category? for each player? for each category for each player? Our instinct is that the most accurate results will come from allowing each system to have a separate weight set for each category for the upcoming year but at this point we don't have any empirical evidence to support it. Results will be posted here as they come in.