Math 138 - Statistics
Thinking about Averages and Medians on Histograms
It's important to understand how the concepts of average and median relate to histograms. Here's a review with an example that highlights the difference:
The average of a data set can be viewed as the point at which the histogram balances, and by balances, we mean as if the blocks of the histogram were blocks of concrete on an actual see-saw.
That is, the average is the point where we would put the fulcrum -- the pivot of the see-saw -- to get the two sides to balance.
For example, in the histogram below, let's estimate the average:.
![[example histogram]](avg-med-histogram_files/2-block-histogram.gif)
It's hard to find the balance point exactly, but it is easy to see that 20 is too low,
because: while the two blocks in fact have the same area - the
block on the right is half as tall, but twice as wide, as the block on
the left - that's not how we find the average, or balance point. If we try to make the histogram balance (as with a see-saw) at 20,
while we have the same amount of "weight" on either side of our
proposed fulcrum, the weight on the right is spread out farther from
the fulcrum, giving it more leverage.
Thus the right side of the histogram would fall:
![[sample histogram tipping to right]](avg-med-histogram_files/2-block-histogramtip.GIF)
"Clunk!"
So we need to move the fulcrum to the right to get it to balance:
![[a histogram balanced on a fulcrum]](avg-med-histogram_files/2-block-histogrambalanced.GIF)
We say "something like" because, again, it's hard to visually
estimate the exact point where the average, or balance point, is.
The average here is somewhere around 25, definitely more than 20.
Any estimate from around 22 to 30 is probably reasonable, but do read
what follows as well.
The median of a data set is the point at which half the area is to the right, and half the area is to the left.
For our sample histogram above, the median is exactly 20, because we
said the two blocks have the same area, so exactly 50% of the
histogram's area is above 20, and 50% is below.
![[histogram showing median]](avg-med-histogram_files/2-block-histogrammedian.GIF)
Now the last example, by chance, made the median easy to find, but that's usually not the case. Usually the median is also hard to visually estimate, though perhaps not as hard as estimating the average.
For example, consider the histogram below:
![[another histogram]](avg-med-histogram_files/anotherhistogramexample.gif)
It's hard to say precisely where the average or median are for this histogram, though we can say some things:
- The average is definitely not 38 (the midpoint of the scale)
because the histogram won't balance there; there's more weight (area,
actually) on the left AND it's just as far from 38, so it has just as
much leverage. ("Clunk!") So the average is lower than 38.
- The median is definitely not 38 because there is more area to the
left of 38 than to the right. The median is lower than 38.
- Because of the tail-pulling-the-average principle from class (a long tail pulls the average more than it pulls the median),
we can say that the right tail will pull the average to the right farther than it pulls the median (but not past 38!).
- So we can definitely say that the average is higher than the median, but both are below 38.
Last Modified Sept. 14, 2011.
Prof.
Janeba's Home Page | Send comments or questions to: mjaneba
willamette.edu
Department
of Mathematics | Willamette
University Home Page