Thursday, July 31, 2008

Pudge to the Yankees

Yesterday Ivan Rodriguez was traded from the Tigers to the Yankees. This made sense from one perspective since the Yankees were in desperate need of a catcher and Pudge was available. But this trade raises several questions:

1. Is Pudge really a good hitter?
2. What about his hitting ability at this point (age 36) in his career?

First, Pudge is noted for his high career batting average of .302. But this is a bit misleading, since AVG ignores other ways for a batter to get on base such as a walk. From a OBP perspective, Pudge has a mediocre batting record and also he hasn't displayed much power in recent years. Pudge will be an ok hitter towards the bottom of the Yankees lineup since he is good in hitting singles. But Pudge is a good example of a hitter who looks better than he is based on his batting average.

To understand how Pudge's performance over time, we plotted his batting averages as a function of age. We see that his AVG showed steady improvement in early years, hit a peak in midseason and has displayed a lot of variability (up and down movement) in recent years. How do we make sense of these large changes in AVG in the latter part of Pudge's career? This illustrates the difficulty in learning about a batting ABILITY based on a batter's PERFORMANCE in a single season.

We talked about ways of learning about a player's ability -- that is, his hitting probability -- based on x successes in n at-bats. This discussion showed how little we learn about ability based on a single season. We get a better understanding of a player's ability by looking at his performance over several seasons.

Wednesday, July 30, 2008

The Pre-Game Show

Today we were getting ready for tonight's Mud Hens game. I passed out scoresheets and we talked about how to keep score. I'll be sitting in section 115 with a Phillies cap -- I can help you if you need help scoring particular plays.

In Chapter 7, we are talking about statistical inference. Specifically, how can we learn about a player's ability given his performance.

We did a silly exercise to illustrate the distinction between ability and performance. Suppose the manager Casey has seven types of hitters in his dugout -- a couple of hitters are great in getting on-base and their ability is p=.5; two other players have p=.45; two other players have p=.4, ..., and two crummy players are weak in getting on-base and p=.2. Suppose Casy chooses a player at random and has that player bats 20 times and records X = number of times on base.

We repeat this exercise 10,000 times and we categorize all hitters by their ability (value of p) and their performance (value of X). I passed out the table of counts. Given a particular batting performance, say 8 out of 20 on-base, I showed how you can compute the probabilities of the player having different batting abilities.

Tonight we'll keep score, compute a couple of OBPs and SLGs and be on the lookout for lucky plays.

Tuesday, July 29, 2008

The M&M Boys

Today we saluted the M&M boys, Mickey Mantle and Roger Maris, who were featured in the movie 61*. Yankee Stadium was the "house that Ruth built" and Mantle and Maris challenged Ruth's single season home run record of 60 during the 1961 season. Mantle and Maris had different personalities. Mantle was the leader of the 1961 team and enjoyed life, especially late in the evening. In contrast, Maris was shy and did not like all of the media pressure during the chase to break Ruth's record.

In class, we compared batting statistics for the two players. From a career perspective, Mantle was clearly the better offensive player -- his average OPS value was significantly higher than Maris' average OPS. Also, Maris' home run total of 61 in 1961 was unusual relative to his home run totals in his other years. Many people thought that Maris was not deserving of breaking Ruth's record, but this assessment is a bit unfair. Maris had a few great seasons, winning the MVP (most valuable player) title two consecutive seasons.

We concluded with a rolling dice activity. We considered two hypothetical players, Jeff and Bobby, who we know have mediocre and good probabilities of getting on-base. By rolling 10-sided dice, we looked at the performances of these two players in 10 plate appearances. By writing down the results of the rolls in a table, we can look at the relationship between a player's ability (value of p) with his performance (number of times on-base).

Monday, July 28, 2008

The Spinner Game

Today we played our spinner game -- quite a high-scoring affair -- the American League all-stars won over the National League all-stars 25-18.

Why did the AL team win? There are several explanations:

1. The AL was the better team with the better ballplayers.

2. There was no pitching. (Actually in this game, the pitcher isn't a part of the game, so the game has literally no pitching.)

3. The AL had more luck -- their spinners went the right way.

4. The people controlling the AL spinners were cheating. (Although some of your spinners didn't work that well, I don't believe you were cheating.)

This relates to the main subject of the next chapter. In statistics, we see a lot of variation in data and we have to figure out how much of the variation is due to skill and how much is attributable to luck or chance variation.

We distinguished between a player's ABILITY and his PERFORMANCE. We observe how players perform in games, and from the performance data, we want to learn about players' abilities.

Luck is a dirty word in baseball -- no player wants to say that he did well or that the team did well due to some lucky breaks. But luck or chance variation plays a big role in the variability of player and team performances. Statisticians want to understand this variability, so we can draw conclusions about ability.

Thursday, July 24, 2008

Binomial experiments

Today we talked about binomial experiments and the associated probabilities. Suppose we have a experiment where (1) you have a sequence of n trials, (2) on each trial, there are two outcomes, Success or Failure, (3) the chance of a Success (p) is the same from trial to trial and (4) the results of different trials are independent. Then the number of successes X has a binomial distribution.

Once you identify a binomial experiment, you have to figure out n, the number of trials, and p, the chance of a success. Then you can find probabilities of different outcomes by a table of binomial probabilities.

This can be applied to chance outcomes in baseball. In our Fathom lab, we suppose that Susuki has five plate appearances and we're interested in the number of times he gets on base. His OBP percentage is approximately 40%. This is (approximately) a binomial experiment with n = 5 and p = .5 where we define a success as getting on base. On Fathom, we can display the binomial probabilities -- we can find the chances that Susuki gets on base 0, 1, 2, 3, 4, or 5 times.

What is interesting is that actual baseball data (that is, the number of times Susuki gets on base different numbers of times) matches up well with the binomial distribution. The underlying assumptions aren't quite true. For example, the chance that Susuki gets on base likely changes depending on the pitcher and team that he faces. But this model gives reasonable answers and helps us understand the variation in the hitting data.

Wednesday, July 23, 2008

Matching birthdays

Today we illustrated a famous problem in probability called the birthday problem. Suppose you look at the active roster of a Major League team -- what's the chance that at least two people will have matching birthdays (month and day)? The answer is surprisingly high -- over 50%. Since I wasn't sure if you believed this, I had each of you find the active roster for one MLB team. Thirteen of us found rosters, and in 9 of these rosters, we found a matching birthday. So

Prob(match) is approximately 9/13.

We also talked about several more sophisticated baseball simulation games. You will be making spinners for the All-Star Baseball game. Here each batter is assumed to have a different ability and the chances that he will get a walk, out, single, etc are represented by slices of the spinner. A more sophisticated game Strat-O-Matic is based on the use of player cards (both batters and pitchers) and three dice. Strat-O-Matic actually is a very realistic game in that the games resemble actual baseball games.

We'll be covering Chapter 5, Probability distributions, tomorrow.

The Phillies had a great victory over the Mets last night -- hopefully Bret Myers will come through for the Phils tonight.

Tuesday, July 22, 2008

Determining probabilities

Today I gave you a little practice in specifying probabilities. What did we learn?

1. If the Mudhens played the Tigers, there are two possible outcomes (Mudhens win, Tiger wins), but they wouldn't be equally likely. I think the Tigers are the stronger team, so the probability that the Tigers win would be larger than 0.5.

2. Surprisingly, the chance that there are two matching birthdays on a baseball roster is over 50% -- we'll check this out soon.

3. Coins have no memory. So if you flip five consecutive heads, the chance that the next flip is heads is still 0.5.

The rest of the class was devoted to the Big League Baseball dice game. In the Fathom lab, we played both parts of the game many times. We were able to compute the probability that the red die will result in a strikeout, and compute the probability that a "in-play event" is a home run. Tomorrow, we'll see how these game probabilities match up with the probabilities of these events in real baseball.