Sunday, May 5, 2013

Thoughts on the misunderstanding of sabermetrics

Share
A deceptively simple title for what seems to be a complex issue in baseball these days.  The definition of sabermetrics is deceptively simple on Wikipedia: "Sabermetrics is the specialized analysis of baseball through objective evidence, especially baseball statistics that measure in-game activity."

I was watching the Braves-Mets game today on TBS, but couldn't help but be a little disappointed about the ongoing discussion about the Met's philosophy in patient at-bats between commentators Brian Anderson, John Smoltz, and Tom Verducci that went from the top of the game into the 8th inning.  It seems like fans and the media have confabulated sabermetrics with "taking pitches," a la the classic Moneyball A's.  But as the textbook definition states, sabermetrics is much more.  Its association with walks and on-base percentage is largely due to the underrating of such attributes, which the A's were able to take advantage of in the early 2000s to "win an unfair game."

The broadcast pointed out the list of teams with the most pitches seen per plate appearance, a number of which have been successful (Red Sox).  Their argument was that the Mets have been taking the 2nd most pitches per PA and are still losing, fundamentally showing the failure of sabermetrics (and resulting in a consequent rant from John Smoltz, which saddens me as he was a remarkable pitcher; though he is the same guy who injured himself while ironing a shirt that he was wearing).  The big issue here: pitches per PA, while associated with OBP, is not equivalent: the Mets are 22nd in baseball in team OBP (a paltry .310).  This makes intuitive sense: OBP is a combination of patience and the ability to put the ball in play to get hits.

The top teams in OBP coming into play today: Rockies (.354), Tigers (.350), Red Sox (.347), A's (.346), and Indians (.342).  These are five of the top six scoring teams in baseball (the Orioles rank 4th with their 13th place OBP of .321).  I also just want to take the opportunity to also say that while sabermetricians have pointed to OBP as an important part of the discussion, sabermetrics can also make the point that building a productive and winning baseball team can require more than that.  All pieces have a place in the game (including the stolen base and bunt) - where sabermetricians differ from conventional thinking is how much value each of these has given their success rate.

Just for some fun, it's worth noting that their complaints didn't really work out in practice today (not to say that a single result should ever be used to validate a process). Lucas Duda was the target of their complaints, as he has been drawing media attention for his 8 RBIs entering play today despite protecting David Wright in the lineup.  When he hit a 3-1 RBI single, they applauded his aggressiveness, which was somewhat silly given that he had seen 5 pitches in the at-bat (higher than his average pitches per plate appearance) and had worked himself into the favorable 3-1 count to get a meatball fastball.

Friday, November 9, 2012

The election, fivethirtyeight, and baseball

Share
The election brought a lot of controversy swirling around PECOTA creator Nate Silver and his fivethirtyeight projections.  I followed the projections back in the last election, before its incorporation into the New York Times, and have been a long fan of PECOTA and Silver's work.  The discussion around analytics in predicting the election in comparison with traditional methods has been eerily similar to the introduction of sabermetrics to the baseball community.  I won't really belabor that discussion - I think that resistance to new metrics is to be expected from those at the top.  More interestingly, what inspired me to write this post was a more nuanced discussion of the complexity and proprietary nature of Silver's analytical models.  Rather than arguing about the result, Colby Cosh discusses flaws in the execution of analytics.  On some points, I do agree - but really it all comes down to what you believe these projection systems should be about.

It is true that PECOTA and fivethirtyeight both use complex and likely poorly documented (at least in the case of PECOTA) methods.  For us, it is a black box that lacks transparency.  Other methods are far more transparent - in baseball, Marcel is simple enough for a monkey to use, hence its Friends-inspired name.  Many other projection systems are freely available.  I think simplicity, documentation, and transparency are fine to expect in certain situations, but the latter in particular is hardly a reason to decry the scientific method involved.  Cosh states that, "That last feature makes it unwise to use Silver’s model as a straw stand-in for “science”, as if the model had been fully specified in a peer-reviewed journal." Indeed, blindly defending Silver is dangerous.  However, peer-reviewed academic papers are often shrouded in a fair amount of mystery - though they detail the steps necessary to execute the same research, it is often impossible to duplicate the work from the manuscript.  Furthermore, I don't think that most people value projection systems because of their openness, but rather believe that the ends justify the means.  To argue that Silver's methods are obfuscated when he is able to make accurate projections is to debate a different philosophical approach to what the analytical model is for.  Clearly, fivethirtyeight's intention is to provide accurate results without overwhelming a mainstream audience.

Anecdotally, Matt Wieters' rookie season is a clear example of PECOTA gone badly.  But really, the numbers have to be studied to definitively say that PECOTA is equivalent to other simpler models.  I don't have the time nor resources to do that.  Neither, I would hazard a guess, do the people offering criticisms.  My own anecdotal experience has been very supportive of PECOTA against many of the freely available resources.  Intuitively, that makes sense.  Predictions are about variables - the more independent predictors you have, the more accurate your prediction should be, assuming you've done your math right.  If handedness and weight matter, then they matter.  If hat size is predictive, no matter how little it makes sense, it benefits your model.  Simplicity is the counterbalance of this, and seemingly what many argue for with regards to Silver's methods - if you want to be able to make back-of-the-envelope calculations, simpler models like medical nomograms are more practical.  But I think if we're waiting four years, or even just a baseball offseason, cranking computers through simulations if you're willing to do so for the sake of accuracy is pretty fair too.

In a way, fivethirtyeight's success this Election Day comes as some validation for Silver's methods.  He correctly called all of the states in the electoral college (including a last minute Obama surge in Virginia).  None of the other major projections were identical to his call or the end result.  Though the projections all came to a close consensus as far as who would win the election, fivethirtyeight was more accurate.  Furthermore, I would argue that Silver's methods are far from unique - analytics are a widespread tool used everywhere.  However, much of it comes from its application - baseball has adopted analytics as a mainstream tool - as well as the choice of input and how to modify them.  I would hardly say that it is reasonable to say that Nate Silver is a better analyst than others - he has had his mistakes, such as the British Election.  However, I do think that at the very least, fivethirtyeight has shown itself to be an accurate model.  I would make a similar argument about PECOTA (especially at the time of its introduction), but I believe that to evaluate its effectiveness requires data comparisons.

Update: An article from Wired about the evolution of making data analysis more open right after I made my post.

Saturday, October 20, 2012

I take it all back...

Share

Barry Zito, yes the Barry Zito I mentioned in the post last night, was "filthy" last night, tossing 7.2 innings of 6K/1BB shutout ball, rescuing the Giants season for one more day and sending the series back to the Bay for Ryan Vogelsong and Matt Cain to pitch games 6 and 7.  I actually posted "Cy Zito" as my Facebook status last night during the second inning, and never thought it would stick.  I've never thought terribly much of Zito even in his Cy young campaigns, but the results were great last night.

Zito accumulated a 13.3% whiff rate on his fastball last night, which sat in his normal 84.5 mph (maybe a mph faster - he netted 83.8 mph this season.  The PITCHf/x data (courtesy of texasleaguers.com) suggests that all of his pitches showed superior movement last night, relative to his regular season data.  Anecdotally, Zito had his old curveball back, which agrees with the data.  I think the most interesting question that comes out of this performance is how much variability is there in pitch movement from game to game.  I think most would suggest that there is a fair amount of variation, but I don't have the data in front of me to say definitively that's the case.

Whether or not we could have predicted such a performance, the Giants are going back to San Francisco with their two best pitchers in the postseason lined up thanks to Barry Zito.

Friday, October 19, 2012

Zits to the hill

Share
Barry Zito will start Game 5 tonight, with the Giants facing elimination.  He of the 4.15 ERA, 4.49 FIP, 4.92 xFIP in 2012 Barry Zito.  Certainly Madison Bumgarner has appeared tired in his recent starts, but Boch is dipping into the veteran well again.  Against Cincy, he had the option of going to Tim Lincecum who was lights out after Zito had to make an early exit.  Tonight, the option will be to go to Bumgarner.  Zito was left off the postseason roster in 2010 when the Giants won it all.  Interestingly enough, his campaign that year was worse than his 2012 run, and on a pitching staff that has continued to add quality arms (like Bumgarner).  Yes, I know that the Giants have won Zito's previous 12 starts (I certainly heard it enough times watching Tim McCarver and Joe Buck on Fox.  But really, there should be no reason to go to Zito over Bumgarner - the Giants have rather continued to win games despite Zito.

More relevant to the discussion is why Bumgarner has been bumped from the rotation.  He has struggled thus far in the postseason to a tune of an 11.25 ERA in 8 innings.  However, Bumgarner has recorded 6 Ks and 2 BBs in those 8 innings.  Obviously, the K% is down, but I'm not totally convinced that Bumgarner has been getting knocked around.  Indeed, he has given up 3 homers in those innings, but we all know that those numbers should regress to something more palatable.  Against Cincinnati, he was what a friend called, "BABIPed to death."  Regardless, the decisions to be due to his appearance of being tired.  Mad Bum tossed 208.1 innings this year.

Either way, the discussion of the starter may end up being trivial if the Giants can keep it close for at least a couple of innings.  Bochy will be ready to pull the plug on Zito in an elimination game if he struggles.  While not an argument for either Zito or Bumgarner (both lefties), the already potent STL club OPSed 787 vs 747 against lefties this year (it is a fairly right-handed mix of guys).

Saturday, July 7, 2012

Fantasy Pitch: Tyler Colvin is mashing

Share

Tyler Colvin has been getting a lot of fantasy press lately, and not because of a freak accident.  I personally own him in a couple of deeper leagues.  Right now, I'm not sure what to think of him - he's hot, hitting 311/339/644, but with a 333 ISO, he is easily playing at a career best at any level (he made it up to 222 in AAA last year).  Though he likely represents an upgrade for the Rockies over Todd Helton at this stage of his career, there is good reason to expect regression.  Colvin is still striking out a quarter of the time, so his average will come down.  As an overall player, his ~5% walk rate will keep his overall value modest.  However, as long as he elevates the ball over 40% of the time, he stands to give decent power numbers.  ZIPS projects him to hit a more realistic 260/301/498 the rest of the way with 10 dingers.  I think that's pretty reasonable to expect - though maybe I'm a little more bearish on the power.  If that's useful to you, go add him in your league.  I recently questionably dropped Justin Morneau for him, but the swap made sense for me.  That's probably a good reference point for the break even on the kind of gamble I would make for Colvin.

In short, don't buy into the hype, but he's worth a speculative add in case he keeps it up.

Thursday, June 28, 2012

And here comes Bauer...

Share
Our first post in a while.  Sadly done in haste, but hopefully the start of a stream of posts as we enter the dog days of summer...

Tonight, Trevor Bauer will make his first MLB start for the Arizona Diamondbacks.  His widely anticipated debut as one of the top, if not the top, pitching prospects in the game comes against the Atlanta Braves.  We've seen the hype machine roll plenty - with the emergence of Mike Trout and Bryce Harper this year.

So what do we expect from Bauer tonight and beyond?  Dave Cameron of Fangraphs wrote an excellent piece today comparing Bauer and Andrew Cashner of the Padres and the hype that surrounds Bauer.  Known for his dedication to his unique warm-up routine and delivery, Bauer brings excellent minor league credentials in addition to his personality.  He sported an 11.28/4.43 K/BB ratio in AAA, relatively comparable with his numbers across the prior levels.  This came along with a likely unsustainable HR-rate below 1 at each stop, given his historical penchant for the fly ball.  This could be potentially concerning since he pitches in a relatively hitter-friendly park.  The walk rate is a bit high for my taste, but with such a high K rate, Bauer could easily wash it out.  The key for Bauer will be maintaining the K rate - he relies on it to minimize the damage done by his other deficiencies.  Seems like a relatively obvious point, but I feel like the emphasis will be on Bauer to maintain his strengths more so than issues with his weaknesses.

I'm a little too busy to dig too far into the numbers before tonight's game, but here are a few shallow things to look for before opening pitch:
1. The Braves rank in the middle of the pack in strike-out rate, 20.5%.
2. The Braves rank in the middle of the pack in walk rate too, 8.2%.
3. The Braves rank in the middle of the pack in homers, 73.
4. But they are 6th in the league in BABIP (Michael Bourn anyone?), and consequently in the top 10 in runs scored.

Matchup to watch: Bauer vs Dan Uggla.  Uggla strikes out a ton, walks a ton, and hits the ball far.  He plays into a true outcomes matchup nicely.  In a truly poetic world, he would hit a homer, walk once, and strikeout twice.  Probably won't happen though.

Of course, tonight's game will merely be a small sample and will depend heavily on match-ups and the looks that ATL gives Bauer.  However, look for tonight to be a solid game that might give us a good metric of what we should be looking for out of Bauer over the rest of the season.

Fantasy-wise, Bauer is worth a pick-up in all leagues based on potential alone.  He probably shouldn't be owned if you don't have the flexibility to deal with him being shut down before the fantasy playoffs, but if you have room, go get him.  This goes without saying, but he should be owned in all keeper leagues.

Saturday, May 26, 2012

We're back!

Share
Hi everyone,

We're back in business now with our new URL.  Point your browsers to www.pay-offpitch.com (we had to move the hyphen since someone acquired our old domain due to an administrative oversight).  EP and I will be bringing you some new fresh material as the season progresses, so keep following!