rewrite this title Introducing BP's New Arsenal Metrics

rewrite this content and keep HTML tags

Image credit: © Rick Scuteri-Imagn Images

Introduction

Pitch models have taken baseball analytics by storm in recent years, including ours, with the release of StuffPro and PitchPro. Their ability to distill our visceral reaction to a filthy breaking ball down to a specific value draws us in, and their ability to capture that value so accurately year after year holds us in place. But however well they perform, they still have a glaring weakness in only considering an individual pitch in (mostly) isolation. Yes, much of what makes a pitcher good is simply throwing good pitches, but baseball fans know that some pitchers consistently get more out of their arsenals than the individual values of their pitches suggest. After an immense amount of study and research, we believe we’ve found a way to quantify that skill and incorporate it into a pitch model.

Approach

Our approach focuses on two causal pathways through which having a “deep” arsenal improves pitchers’ outcomes:

Having multiple pitches reduces the Times Through The Order penalty, as this disadvantage manifests itself partially through the batter becoming familiar with a specific pitch from a specific pitcher.
Having multiple pitches that look similar to the batter early in flight while varying in movement and velocity makes it difficult for the batter to anticipate when and where the pitch will cross the plate. This both forces the batter to make worse decisions about when and where to swing, and also causes them to be further away from the actual location of the pitch more often.

Measuring the first pathway is as simple as logging the number of times the batter has previously seen that specific pitch from that specific pitcher in that game, and we can input that value directly in a pitch model. Addressing the second pathway is more complicated, as we’re attempting to measure the subconscious process that occurs as the batter watches the release of a pitch and tracks its flight up until the point when they’re forced to decide if—and if so, where—to swing. Our approach borrows heavily from our previous work on pitch tunneling, which sought to understand how two subsequent pitches appeared to a batter and how they varied in flight time and location at the plate. I highly recommend reading those pieces in their entirety, as they provide an in-depth background into the conceptual framework for how batters perceive pitches and for how to evaluate pitch trajectory data to match that perceptive process.

Our updated approach here applies a similar methodology, but instead of looking solely at two back-to-back pitches we consider a pitcher’s entire arsenal. This results in four new metrics: Pitch Type Probability, Movement Spread, Velocity Spread, and Surprise Factor. We’ll provide a brief definition of each before diving into how we calculate them (and the assumptions made when doing so), how they impact pitch results, and the path we see toward continual improvement of this methodology.

Pitch Type Probability: The probability the batter would be able to correctly identify the incoming pitch type given the release point, the pitch’s trajectory up to the batter’s decision point, and the count in which it was thrown.
Movement Spread: The size of the distribution of possible pitch movements given a) the probabilities the pitch is any one of a pitcher’s offerings and b) the movement distributions of each of those offerings.
Velocity Spread: Same as Movement Spread but for velocity rather than movement.
Surprise Factor: How surprising the observed pitch movement was based on the distribution of possible pitch movements estimated for Movement Spread.

As implied by Pitch Type Probability, we begin by taking each pitch’s trajectory from release to decision point and comparing it to the typical trajectories of each of that pitcher’s offerings, providing us with a Pitch Type Probability for each of those pitches. Remember that we’re not concerned with how the trajectories compare in true space, but instead how they compare from the batter’s point of view. This means we must make two important modifications to the trajectories. First, instead of using a pitch’s actual location in space we use its location in the estimated field of view of the batter, using an estimated location for the batter’s head and an assumption that they are looking toward the pitcher’s average release point. As we explain in the aforementioned tunneling work, this is important generally but is especially so for pitchers with extreme release points, whose pitches look substantially different to righties than to lefties. The second modification is to apply additional uncertainty to the batter’s estimate of the pitch’s location at each point in time, based on an estimate of the human eye’s ability to see differences in objects from a distance. In effect, this means we are using less precision in the measurement of the release point than we are in the pitch’s location at the decision point and substantially less than we are in the pitch’s location at the plate. Finally, translating this estimated visual data and uncertainty into a pitch-type probability is then just a matter of comparing the observed trajectory with the typical trajectory of each of that pitcher’s distinct pitch types, and then multiplying that by their usage rate of the pitch in the given count.

Consider the example below of Tobias Myers, who does an exceptional job at disguising his pitches. Figure 1 shows the average pitch trajectory of his four-seam fastball, his slider, and his cutter from the perspective of a right-handed hitter, with ellipses shown at the release point and at the decision point to indicate the distribution of each pitch’s location at that point along with the visual uncertainty of the batter. The large amount of overlap in each of the ellipses suggest that righties will have a very difficult time distinguishing one of these from the other, thus any given FA, SL, or FC thrown by him will likely have a very low Pitch Type Probability. These low probabilities are shown in Figure 2, which plots his distribution of Pitch Type Probabilities to righties for each pitch he throws. Note that for his slider in particular he almost never throws one that is more detectable than a league-average slider.

Figure 1. Pitch Trajectories from Tobias Myers from RHH perspective

Figure 2. Pitch Detectability Distributions for Tobias Myers vs RHH

Making all of one’s pitches look similar is important, but the batter’s job is not to tag pitch types for analysts. The batter’s job is instead to predict where the pitch is headed. To create as much confusion as possible, pitchers need to combine those similar releases with a broad range of final movements and velocities. That brings us to our final three metrics: Movement Spread, Velocity Spread, and Surprise Factor.

We start by multiplying the pitch type probabilities calculated above with the movement and velocity distributions for each pitch in that pitcher’s arsenal, yielding a single mixture of distributions. The size of this total distribution of movements is Movement Spread, and the size of the distribution of velocities is, of course, Velocity Spread. Surprise Factor is effectively a measure of the density of this mixture of distributions for the given pitch’s observed movement. To make this a little more concrete, let’s return to Tobias Myers and consider a slider thrown by him to a right-handed hitter. Figure 3 shows the final movement distribution mixture for that slider. This looks similar to a standard movement chart, but here the density of each pitch’s distribution is determined by the probability the average slider thrown by Tobias is, in fact, a slider, or if it is instead a cutter or a four-seamer. In his case, the probability is spread almost perfectly among each of the three pitches, suggesting hitters are no more confident the slider is a slider than they are that it’s actually the fastball. This results in large Movement and Velocity Spread values, along with a high Surprise Factor for a given pitch.

Figure 3. Expected movement distribution for Tobias Myers’ slider vs RHH

Contrast that with the movement distribution plot for José Ureña’s slider to lefties, which he struggles to tunnel with his changeup and sinker. Here we see that almost all of the distribution’s density is focused on the slider specifically, indicating that batters have an easy time guessing both what’s coming and where it’s headed, resulting in much lower Movement and Velocity Spread values along with a lower Surprise Factor.

Figure 4. Expected movement distribution for José Ureña’s slider vs LHH

Performance

Our confidence in these metrics lies partly in the fact that we’re not really covering new ground, but are instead creating novel methods for measuring things we already know. We’ve made it a point to keep our approach as close as possible to how the effect plays out in the mind of the hitters. But our confidence also lies in how well we’ve found these metrics to perform when predicting pitch outcomes. First, we found that each of our three compiled metrics are associated with a decrease in batters’ abilities to make correct decisions about whether they should swing or take. Figure 5 below shows the correct decision rate as a function of the number of times the batter has previously seen that pitch that game, with a correct decision being defined as a swing on a pitch with a greater than 50% likelihood of being called a strike or a take on a pitch with a greater than 50% likelihood of being called a ball. As batters see a pitch more and more throughout the game, they gain familiarity with it and make better and better swing decisions against it. However, pitches with above-average values for each of our metrics soften this effect, showing worse decision rates for batters and a muted familiarity impact.

Figure 5. Correct Decision Rate as a function of number of times batter has seen a pitcher for all pitches and for those with above average arsenal metrics

The same is true for the probability that a batter will whiff on a pitch they swing at. The more familiar the batter is, the less likely they are to whiff; on the other hand, the more surprising or uncertain the pitch’s movement and velocity is, the more likely they are to swing through the pitch.

Figure 6. Whiff Rate as a function of number of times batter has seen a pitcher for all pitches and for those with above average arsenal metrics

Leaders

Now that we know how they work, let’s look at which pitchers top our lists for each of the metrics. For this we’ll focus on starting pitchers who threw at least 1,500 total pitches in the 2024 season, and we’ll present each metric as a percentile, with a larger percentile being better for the pitcher.

The top pitcher for lowest average Pitch Type Probability across all of their pitches was Michael Lorenzen. This is perhaps unsurprising for a pitcher who relies so heavily on fastballs and a changeup, but Lorenzen pushes his deception even further by commanding each pitch well to areas that play perfectly off one another. Next on the list is another unsurprising name in Carlos Carrasco who has a broad array of offerings, each with similar movement patterns.

For Surprise Factor, the top of the list is knuckleballer Matt Waldron. Matt is an interesting case in that he doesn’t throw a lot of pitches, but instead the variability of his knuckleball movement alone makes any individual one thrown relatively surprising in terms of movement. Perhaps these metrics could open the door to pitch models finally understanding what makes knuckleballs so valuable.

Next on the list are Logan Gilbert and Max Fried, two guys known for their craftiness and broad arsenals. Michael Rosen of FanGraphs recently wrote about how Fried stands out in Driveline Baseball’s own arsenal metrics, and the $218 million the Yankees handed out to him this past off-season suggests they value this skill as well.

The top starter in MLB for both Movement Spread and Velocity Spread is also Matt Waldron, but after him are Bowden Francis and Chris Bassitt, respectively. Bassitt’s entire approach is centered around what these metrics are attempting to measure, so it’s encouraging to see him rated highly. Francis excels by carefully tweaking his pitch mix against lefties and righties, featuring the splitter much more heavily to lefties and the slider more to righties. Each tunnels perfectly against his fastball while varying in total movement and velocity, keeping batters on their toes and helping him consistently outperform the quality of his stuff.

Next Steps

Though we would love to say this work led to us having arsenal interactions and pitch deception figured out, there’s still a lot of work left to do. One area is finding continued ways to validate our estimates of what pitch the batter is expecting. Ideally, one would have data on where the barrel of the bat crossed the plate during the swing, as this should align with where the batter thought the pitch was going. Absent that information, we’re still making educated guesses using swing decisions and whiff rates as above. Related to this, there is also value in knowing the batter’s preferences. If a batter is looking for a specific pitch in a specific spot, based either on his strengths or on the pitcher’s weaknesses, then how he evaluates the incoming pitch may change. For example, it doesn’t matter if your slider out of the zone looks like a sinker in the zone if the batter doesn’t want to swing at the sinker either way. If we had more data on the batter’s swing, then maybe we could extract enough signal to learn what these preferences are and thus to quantify how a pitcher can influence them.

Another area of exploration is incorporating information about what pitch type or movement the batter might expect if they had no knowledge of the current pitcher’s repertoire. For example, the very first time a batter faces a pitcher, they may not be thinking primarily about what that guy throws but rather what pitches and movements they typically see from that arm slot. Max Bay, now of the Dodgers, did some work on this publicly before getting scooped back behind the curtain. In his Dynamic Dead Zone app you can see what fastball movements a batter might be expecting based on the pitcher’s arm angle. We’ve done something similar, but expanded for all pitch types, and including information about the pitch’s trajectory up to the decision point. The figure below shows a similar movement distribution plot as shown above for Tobias Myers, but this time instead of the distributions and their weights being based on his own pitches, they are based on what the batter would expect having zero knowledge of Tobias’ own arsenal. Note that not only does his slider look like it could be a fastball or a cutter to the batter, but it also has somewhat unique movement relative to the average slider from his arm slot.

Figure 7. League-Expected movement distribution for Tobias Myers’ slider vs RHH

This work holds a lot of promise, though we have not yet found the best way to incorporate it in such a way that improves modeling outcomes. We hope to create a model that properly weights both league information and pitcher-specific information based on how often the batter has seen that pitcher, but that work is still ongoing.

Finally, some pitching coaches have spoken about the value of being able to cover different areas of the plate and have multiple tools for a given situation. For example, a pitcher’s sinker may not be a great pitch in isolation, but if he can command it well when runners are on base it could be valuable specifically for generating double plays. We explored a few different options for quantifying this effect, but none of them showed any ability to consistently predict pitch outcomes better than our current models. Maybe the variation in this skill is too small across pitchers to matter much, or maybe we’re looking in the wrong places. Time will tell, and we look forward to seeing what other researchers find along with us.

Conclusion

We’re thrilled to present this work, for our readers to explore the new metrics, and to follow what new research it leads to or inspires. We’d be remiss if we did not mention the others who are working in this area as well, and we’re grateful for our ongoing conversations with them as we work toward a shared goal. It’s a difficult area of inquiry, but we’ve collectively made considerable progress and know that with all of the bright minds working on it, we will continue to progress even further. Keep an eye out on our player pages and leaderboards, and also for an update of our pitch models that in part incorporates this work.

Thank you for reading

This is a free article. If you enjoyed it, consider subscribing to Baseball Prospectus. Subscriptions support ongoing public baseball research and analysis in an increasingly proprietary environment.

Subscribe now

Source link