I’m always a little bit behind the crowd when it comes to the early pair of annual publications that brighten baseball’s offseason, because I wait and ask for them as Christmas gifts. This year, I was a bit later even than that, because I didn’t actually ask for the books as gifts this year, so I didn’t get them. I went out and bought The Bill James Handbook 2015 on Dec. 29, though, and I’ve already rifled through it once, so I now offer my review. (I’ll review The Hardball Times 2015 Baseball Annual in short order, once Amazon gets my copy out to me. That should happen this week.)
The Bill James Handbook is colossally misunderstood, and so, underrated. It’s a data dump, that’s true, and because it’s a data dump, people tend to sneer at it. That’s misguided. The way most people use the book is also wrong; this is not a reference book. It should be read as completely and as thoroughly as possible, preferably from beginning to end, and it should be read with a pen in hand.
That’s because, while the Handbook is loaded with data and has precious little in the way of analytical writing or prose, it’s an expository work. The book kicks off with its three new sections (they try to add something every year, and each addition is an attempt to make the book a bit more representative of what it is we know, exactly, about baseball), each authored by James himself. (One thing you should know is that much of the book, to the extent that it’s written by anyone, is written by people other than James. It’s a cooperative work, with Baseball Info Solutions.) This year, those cover Starting Pitcher Rankings, a James concoction that has gained traction on his website in the last year; pitcher velocity trends by season; and Defensive Runs Saved by season.
The pitcher rankings are fascinating, and a fun way to parse the thin differences between so many pitchers and pitching staffs. The system, which James details within the essay, blends recency and career accomplishment to really craft an estimate of who the best pitchers in the game are at any given moment. It’s imperfect and imprecise, of course, but I find it effective, as a blunt instrument.
Travis Wood saw his ERA balloon nearly two runs in 2014, relative to his breakout 2013 campaign:
Year | Age | Tm | W | L | GS | IP | H | R | ER | HR | BB | SO | BF | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2013 ★ | 26 | CHC | 9 | 12 | 3.11 | 32 | 200.0 | 163 | 73 | 69 | 18 | 66 | 144 | 821 | 124 | 3.89 | 1.145 |
2014 | 27 | CHC | 8 | 13 | 5.03 | 31 | 173.2 | 190 | 110 | 97 | 20 | 76 | 146 | 781 | 76 | 4.38 | 1.532 |
How much should the off year color one’s evaluation of Wood (who has always been tough to peg, given his size and skill set)? There are many answers, but the rankings offer one: Wood began the season 45th, and finished it 85th, implying that he fell from a comfortable place among the league’s better second-tier arms to a soft number-three profile.
A few critiques of the method:
- It’s entirely predicated on results, using James’s old invention, the Game Score. I love Game Scores, for their granularity, their simplicity and the vital nature of that which they capture, but they are, themselves, a blunt instrument. Adjustments are made, but only to even out contextual imbalances and punish players for missing time with injury—not for obvious cases where results don’t match performance, or where some non-statistical data could and should change the player’s standing.
- Pitchers begin from an arbitrary base of 300 points, and are credited 30 percent of each Game Score. However, they’re also debited three percent of their previous score when they make each start. This is brilliant, in my view. It seems to perfectly balance new information with existing track records. A pitcher who had never made a start before would gain six points in the system. An established back-end guy would gain only three. Near-aces gain nothing. True aces lose points. I’m just eyeballing this, though, and as with any system built the way James’s so often are—via credits and debits against an artificial baseline, derived by trial and error—the mathematical legitimacy of it is tough to tell, and therefore, tough to trust.
The comprehensive listing of the rankings is meant to give the reader enough data to get comfortable with a framework they haven’t seen before. That’s akin to the aim of the pitcher velocity section, which lists pitchers by birth date and gives their average fastball velocity for every season since 2007. It’s not information of which the average reader can easily make sense; that’s the point. The idea is to read through them, finding patterns, similar players, divergences and outliers. It’s about creating the frames within the mind that do allow one to contextualize a velocity drop, eventually. Of course, it also works in conjunction with the previous section (and with others) to permit a step-by-step construction of the big picture when it comes to a given pitcher. Wood, for instance, sat at 89 miles per hour with his fastball in his first two MLB seasons. He then averaged 88 MPH in 2012 and in 2013. In 2014, the number slid down to 87 for the first time. Maybe that step down the ladder is unrelated to his concomitant performance drop; the book does not draw a conclusion. It only lays out the data, to encourage this sort of cross-referencing and follow-up.
The data itself is explained a bit too thinly in this section. Many pitchers use something other than a typical fastball as their primary fastball. Some change what pitch they’re using as a primary weapon from one year to the next. It’s not clear whether the listed figures are only for four-seam fastballs, or are for whichever heater a pitcher threw most often. It’s also a trap, if used improperly, inviting an uneducated reader to use the numbers with ham fists. The values are entered as integers, which is admirable in its embrace of the real error level around such things, but which might be disguising mere blips as dips in velocity, or might be hiding a real change that stays within the rounding margin. Still, it makes for interesting data, and it takes another small step toward accomplishing the book’s miniature mission statement (found in the introduction, presumably penned by Baseball Info Solutions head John Dewan): “we absorb far more about the season after the fact [through the vetting and verification processes involved in culling the data] than we did during the preceding six months.”
The Defensive Runs Saved summary is still James’s writing, but he begins to really hand off the baton by discussing Dewan’s (and BIS’s, generally) work to create the new defensive stat. James writes about the way we encounter these numbers, and draws a parallel to our understanding of statistics like batting average, home runs, strikeouts and RBI. He points out a couple of crucial differences between the two.
- Unlike with homers, RBI, etc., we don’t have the level of familiarity and comfort with DRS to know what kind of year-to-year swing is normal. We don’t know, by looking, the difference between a career year and a statistical anomaly, or the difference between a real skill chance and expected, random variance.
- (This one is implicit in his writing. I’m drawing it out and making the point myself.) Because people thus far mistrust DRS (and its cousin, UZR), they too often attribute strange fluctuations or sharp changes as statistical volatility, a sign that the numbers aren’t capturing the real skills on the field, when they should probably be taking the numbers more seriously and asking, instead, whether the change was the result of random chance, a permanent change in true talent level or a temporary performance change that, while real, will not hold up in the long term.
- Teams themselves are uncomfortable with the fluctuations in DRS, and hesitate to make decisions based thereupon, because they don’t have a sense of what a normal career arc for a defender looks like, or how to explain changes in the numbers from one year to the next.
Again, the data itself is a list, a player’s name, their positional home (or homes, in some cases) over the past several seasons, and a row of integers showing their DRS by year. I won’t spoil it, but going through the list with a fine-toothed comb, patterns do emerge. A familiarity with it all settles over the reader, and it’s possible to begin feeling out DRS the same way one might feel out the same chart, but with OBPs listed. Some players decline precipitously, in a way that makes it feel like a very real going over the cliff. Others hold extraordinarily steady at a certain level, and the number of those who do so suggests there is some skill that it’s possible to repeat and express in the field, and that DRS captures well. Many, though, see a non-linear progression, high one year, low another, medium two years in between. Maybe they make a position change five years into their career. That’s the most typical thing, just like the most typical readout for a player’s OBP would be fluctuation. They might be making real progress in terms of true talent over a given period, but get unlucky one year, or struggle through an adjustment to their approach, and therefore show a step back in a season after an apparent breakthrough.
Let me tie this up nicely, because it’s my most important point, and the single thing the Handbook has done best this year. Consider Andrew McCutchen. In the six seasons shown on this readout (which happen to constitute his whole career), McCutchen’s DRS figures are: -10, -8, 5, -5, 7, -11. One reading of this might be that DRS is simply a flawed system that doesn’t measure defense adequately. I think the data refute that hypothesis, and that we should throw it out.
Another reading of it could be that McCutchen was bad when he first entered the league, then figured something out in 2011, faltered in 2012, got his groove back in 2013 and dropped the ball once more in 2014. That’s possible, though it sure feels unlikely, right? That would be a lot of twists and turns in true talent level for a player who’s otherwise regarded as consistent and hard-working, and who has neither shown obvious signs of age-related decline nor been seriously hurt at any point in the period under study.
The third way to read it is exactly the way a clear-eyed observer might read McCutchen’s seasonal OPS figures (.836, .814, .820, .953, .911, .952), which is: though an underlying skill change has gradually taken place, the numbers have moved somewhat unpredictably from one year to the next, for three reasons:
- The sample size of one season is insufficient to fully capture true talent, and to wash out noise in the data.
- The statistic at hand is a holistic expression of many smaller skills, some of which may be more purely skill-based and accurately measured, but which, taken as a group, are hard to precisely evaluate.
- Seasons are arbitrary endpoints. A single data point for a whole season might not fairly measure the performance level demonstrated over the majority of the season, especially if the sample size is relatively small.
Remember that defense is a much smaller part of a position player’s game than offense, that Andrew McCutchen sees maybe 400 balls he might make a play on in center field in a season, and the picture should crystallize. I don’t think DRS is failing to measure defensive performance. I think defensive performance is just more susceptible to variance than we would like to believe, whether it be because any marginal play made or not made has an outsized impact on the eventual numbers, or because the skill set itself is every bit as fragile as, say, the skill of hitting. The Defensive Runs Saved summary is a really good opportunity to plumb the depths of our ignorance about that element of the game.
With that, I’ve guided you through the first 23 of the book’s 585 pages. Don’t worry, the rest won’t get such rough treatment. I’ve made my essential points. There’s a wealth of good data in the remaining 560-plus pages, from full career stat capsules on every 2014 MLB player to granular baserunning credits and debits. There are the Fielding Bible Awards, three-year park factors I trust more than any others, managerial breakdowns, hitter and pitcher analysis sections that suggest the reader seek certain patterns by selecting an interesting cross-section of stats, on and on. Again, the book encourages active readership, with the idea that an intensive consumer can come away understanding not only what a given player has done in every facet of the game, but how they’ve done it. Reading each section thoroughly helps one get a sense of where there might be a lot of hidden value on the field, and where there is definitely only a very little. It creates familiarity with data that becomes a genuine understanding. A smart user of the Handbook will be able to return to it for reference, but will be able to use a statistic only in a context that makes it truly meaningful.
I’ve never read the old Baseball Abstracts, the things that first made James famous. I would love to, but I have neither the time nor the money to do it. From what I’ve heard, they were data dumps, themselves. There would long lists of numbers, or dense tables, but James’s words were terse and (especially early on) quite sparse. That didn’t make the books any less useful, and even though we now have Baseball-Reference and FanGraphs and Baseball Prospectus and Brooks Baseball and Baseball Heat Maps and Baseball Savant and StatCorner to consult when we need data, the Handbook is invaluable. There are no rabbit holes down which to dive. The data at hand, if one takes the time to consider it all, allows one to gain a firm grasp on what every number means.
There’s a grid used to describe the aims of scientific research, placing it into bins based on what the mentality of the research is. Research concerned purely with utility, applying scientific and technological knowledge to the goal of producing something very practical, falls into what has been dubbed Edison’s quadrant. There’s no thought of the fundamental inner workings of the science in question in that quadrant.
Another quadrant, named for Niels Bohr, is the reverse. Unconcerned with real-world utility, this research seeks only to understand something, how it works, what it is, down to its core. The final quadrant, though, named for Louis Pasteur, blends the immediate interest of society with a sincere desire to understand a subject deeply. That quadrant is where the Handbook lives, and that’s what makes it a pleasurable experience. This book isn’t about winning a fantasy league, but nor is it a trivia almanac. It delivers information, not just data. So call it an information dump, if you must. Just make sure that, if you read it, you read it right, and no tome published this winter will be more informative.
Next post: On Alex Rios, Jimmy Rollins and Power Outages in Old AgePrevious post: BttP Podcast: Ep 1 – Ben Lindbergh
Jon Williams
Nice review. I bought the handbook when they first started putting them out. But haven’t gotten any the last few seasons. You make me miss the experience.