Tuesday, July 1, 2008

Baseball Shapes

Today in our first Fathom lab, we explored statistics for all "regular" batters in the Major Leagues last year (2007). We saw ...

1. A collection of batting averages tends to be symmetric shaped with an average of about .280. The lowest batting average last year belonged to David Ross of the Reds. Ross is a catcher, one of the more important defensive positions, so a low batting average is ok if he is a good catcher.

2. In contrast, a collection of home run counts tends to be right-skewed. A lot of player have low or moderate home run counts and the big home run hitters stand out.

3. We looked at the collection of on-base percentages (OBP) for a single player. Generally if you plot a player's OBP's against year, you'll see an interesting shape that looks something like this.

Players generally peak around the ages of 28-30.

4. We briefly looked at a piece of exciting new PITCHf/x data. Specifically, we looked at the distribution of the pitch speeds of Cole Hamels' pitches for the first game he pitched for the 2008 season. Here's a dotplot of the pitch speeds:

To help understand the bimodal distribution, we plot the pitch speed against the pitch type (FAstball, CUrveball, or CHange).

We see that fastballs for Cole are in the mid 80's, his changeups are in the mid 70's, and curveballs somewhat slower.

In our class time, we talked about the progression of the season home run record from Ruth (1927) to Maris (1961) to McGwire (1998) to Bonds (2001).

Tomorrow, we'll talk about summaries for a single batch of data.

