Well, as promised I wrote about FanGraphs.com and the Win Expectancy framework introduced (to me at least) in "The Book" in my initial Stat Guy piece. If you don't have a hard copy of the KC Star handy,
A public thanks goes out to David Appelman, the brains behind FanGraphs, for taking the time to do an e-mail interview over the weekend. I've arranged our exchanges into a Q&A, posted below for your perusal.
For more on Win Expectancy or Wins Probability Added, buy "The Book" or visit
I would love it if FanGraphs could incorporate WPA data using zone ratings for contextualized defensive stats that, as far as I know, have never been compiled. That would be awesome.
Q&A with David Appelman of FanGraphs.com
BD: What is your "day job", so to speak?
DA: I'm currently working on FanGraphs full time right now. I had worked for AOL for about 4 years in their operations analysis department and ended up leaving about 8 months ago to put my full attention to the site.
BD: How did Fangraphs come about and when? What inspired you to want to create it? How much time to you spend on it on, say, a weekly basis?
DA: I was (and still am) really into fantasy baseball and as my fantasy baseball decision making became more stats oriented, I felt that graphing players stats would be more helpful than just seeing stats in a tabular format. I find trends are much easier to spot this way on a season by season basis, and the rolling daily graphs really point out some of the finer details about what players are doing and if they've actually made a breakthrough. Trying to figure that out of a regular game log can be pretty difficult.
The choice of stats to use was heavily influenced by both Bill James' work and Ron Shandler's baseball forecaster.
Originally I spent nights and weekends working on it. It took me about 3 months to put the site together part time, but working on it full time has allowed me to add things like the Live Win Probability. I don't think I would have been able to do that with another full time job.
BD: I know Tangotiger has at least input to some of the things on the site (and, of course, WPA is from "The Book"). Is Fangraphs pretty much your baby or are there others that help out?
DA: Tangotiger has been an incredible influence on the sites development. I originally started doing the after-the-game Win Probability graphs when I had seen Dave Studeman's work on the Hardball Times. Tangotiger had just co-written "The Book", which has a lot of analysis based on Win Expectancy (or Win Probability, same thing), and he suggested I incorporate some of the work he's already done.
As far as the actual FanGraphs staff goes, it's just me. I've done all the development and everything on the business side of things.
BD: Also, can you throw some site stats my way? Such as typical number of visitors, how many graphs get created per day, etc.? And some demographics: Who is Fangraphs aimed at?
DA: It's been around about a year and a half now, and it's currently pulling around 40,000 unique visitors a month. I don't really collect information of my visitors, so I'm not really sure of the exact demographics. FanGraphs was originally to be aimed at the Fantasy Baseball player, but I find it resonates more with just baseball fans in general who don't necessarily play fantasy baseball.
BD: You've got a couple of team-themed spinoffs linked to Fangraphs this season. Are you hoping to build on this trend?
DA: The two team blogs on FanGraphs which
BD: There are a lot of advanced metrics in use on Fangraphs but the centerpiece metric is WPA as it is featured in the live game charting and in your box scores, along with leverage index. Can you talk a little about WPA - what it measures and why you think it is a valuable metric, ie. what is the best use for it?
DA: Win Probability Added (WPA) is the differences between the Win Probability at the beginning of the play and the end of the play. That difference is then credited/debited to the offensive player and the pitcher. I say the offensive player because it's not always the batter that gets credit. On stolen bases and other base advancing plays, the base runner will be credited instead. The pitcher is always credit on defense.
What's great about WPA is that after the game you can quantify a player's contribution to the win or loss. And over the course of the season you can quantify how much a particular player contributed to his team's win and loss totals. WPA has most often been used in MVP discussions because you can quantify his contribution to the team. What's makes this different from other stats is that WPA is not context neutral. Most stats will give a home run the same value, but in real life, some home runs are more valuable than others. WPA captures which home runs are valuable and which aren't.
WPA can also be used to calculate Clutch statistics, and while FanGraphs is currently calculating Clutch as WPA - WPA/LI, (WPA/LI is WPA divided by Leverage Index on a per play basis, not the season aggregate level), there's still heavy debate about what exactly is the meaning of clutch and the best way to calculate it.
BD: Along those lines, what do you see as the limitations of WPA? How should one not use it?
DA: A lot of people think WPA is not a predictive stat and that's one of the "outs" people use to discredit it. The truth is, while WPA is not as predictive as say OPS, OBP, or SLG, it is more predictive than batting average and depending on the year, it's not too far off any of the "big three". WPA/LI, which is used in calculating Clutch and is a more context neutral version of WPA is just as predictive as "the big three", so I don't think this is an argument which can really be used any longer.
I still wouldn't use it solely to try and determine how "good" a player is, but I probably wouldn't use any of those other stats on their own either.
Relief pitchers can tend to get more value than starting pitchers because of the high leverage situations they're put in. In some sense this puts starters and relievers on equal footing so you can compare them to each other in terms of how valuable they are to their team. This shouldn't be mistaken for assessing a player's skill level. It's easier to relieve than it is to start and WPA isn't trying to suggest B.J. Ryan is a better pitcher than Johan Santana.
BD: Do you see any possible improvements that could be integrated into the WPA framework?
DA: There's a few things that would make calculating WPA more accurate. Currently it's not ballpark adjusted, only league adjusted. This is something I plan on correcting this season. Also in the way the plays are divvied up, occasionally a player will be mis-credited slightly. Over the course of a season, these things don't make much of a difference, but it'd be best to have them corrected anyway.
BD: The live WPA graphs are mesmerizing. How did that come about and what hurdles did you face in putting it together? How has the response been to the live scoreboard?
DA: I had wanted to do live win probability graphs only a few weeks after I launched the static ones last year. I just thought it'd be a very fun way to follow the game. Now that they're up, I find it's a great companion to watching the game on TV, and even at the game I occasionally look at the graph on my cell phone (it's a little slow but it does work!).
The plan was to get live data up and running by early August last year, but I encountered a number of obstacles in getting the correct live data to actually do it. Needless to say, it didn't happen last year. This off-season I worked for quite a while with my stats provider (Stats Inc) to get the data feeds correct for the live data. The after-the-game graphs are actually Baseball Info Solutions feeds and contain slightly more detail than the Stats feeds I receive. Both companies have been great to work with, but it took a bit of worth on both my end and the providers end to get the data feeds looking the way I wanted them.
BD: Do you see any possibility of using a version of WPA to generate some contextualized defensive stats? This occurred to me since your data comes from Baseball Info Solutions, who also put out the version of zone rating that currently is in favor. As an example, say a batter pops out to short, a play that is made 97 percent of the time according to the BIS database. Then 97 percent of the defensive WPA on that play could go to the pitcher and the other 3 percent to the fielder. What do you think?
DA: Doing defensive metrics based on WPA is something which would be pretty interesting in my opinion, especially using the zone data Baseball Info Solutions provides. This would be the only way I would feel comfortable doing defensive win probability metrics. Looking at just errors and assists, etc.... are just way too subjective.
BD: Lastly, you've mentioned a couple of things but can you highlight what else is on your docket for Fangraphs?
DA: I have a project currently in the works which I personally think is pretty exciting. It has to do with re-vamping the news system which I put together early 2006 and eventually got rid of because I wasn't happy with how news was categorized. I'm hoping that will be up in running in the next month or so. I have yet to see how well it's going to work, but it should be one of the more unique baseball news aggregators around.
As far as stats go, there's a lot of stuff I'd like to do. Doing Leverage Index splits (high/medium/low) and batted ball splits are high on the priority list. Basic 2007 minor league stats should start showing up next weekend. Back-filling the Minor League stats database is something I'd definitely like to do.
Back to regular blogging: some leftovers
Some reaction to announcer comments I've heard recently:
- The Angels' announcer (Hud and that other guy) were talking about how Shea Hillenbrand lockers next to Howie Kendrick. Apparently the veteran has taken the phenom under his wing and they spend long hours talking hitting. The Angels need to move Kendrick to a different locker.
- The Reds' broadcasts are, quite frankly, intolerable. I've not like loud-mouthed Thom Brennaman since he was calling Cubs games when I lived in Chicago. Now he's teamed up with Jeff Brantley and, well, you can guess the results. Last night, Brennaman is wailing about Ryan Freel going after a 1-0 pitch while the Reds were down 4-0. (Freel reached base on the play.) Brantley chirps in by saying that since Freel isn't a good breaking ball hitter, he's always going to be hacking at the first fastball he sees. He's never going to be a hitter who draws a lot of walks. Well, Mr. Brantley, Freel has walked in almost 11 percent of his career plate appearances, 24 percent better than the league average. It's style and substance with those two.
- Never really been a big fan of the Cubs' Len Kasper, either. Not sure why. Probably because he's not Harry Caray. And everything about the Cubs since the 2003 NLCS ended has left me unable to root for them. If I were to move back to Wrigleyville, I'd probably hop right back on board but this whole Andy McPhail, Ed Lynch, Jim Hendry and, especially, Dusty Baker era has really spoiled the Cubbies for me. Throw the Skip Caray/Steve Stone affair in there as well. Anyway, last week I heard Kasper explain the defensive spectrum to Bob Brenly, using the simplest language he could muster. Did a good job with it, too, though Brenly just sat there in stone silence. Now I like Len Kasper.