-
PROFESSOR: In this
video, we're going
-
to learn about measures
of variability,
-
another form of
descriptive statistics
-
that people often want to
know in addition to measures
-
of central tendency.
-
But before we get to any of
the nitty gritty details,
-
I want to motivate why we
need measures of variability
-
with two examples.
-
So here's two different
data sets, one on the top
-
and one on the bottom.
-
I'll just go ahead and tell you
that the mean for both data sets
-
is 87.
-
Now if I were to just tell
you the mean of these data,
-
I would be misleading
you a little bit,
-
because in reality, the
situation in each data set
-
is quite different.
-
If I were to plot
it out, for example,
-
you would see this
difference clearly.
-
In the top data
set, all the scores
-
are very clustered together.
-
Everything is close.
-
But in the bottom data set,
scores are very spread out.
-
So again, I need some way to
quantify these differences.
-
And a measure of central
tendency, like the mean,
-
simply can't capture that alone.
-
Here's another example.
-
Let's say you're working for
a pharmaceutical company,
-
something like that.
-
And you need to decide between
two different medications
-
for depression.
-
We'll call them medication
A and medication B. So
-
let's say you did a
study where you measured
-
how much improvement
happened when people took one
-
over the other, and
this is what you got.
-
So let's say over here
that higher scores mean
-
more improvement
and lower scores
-
mean little to no improvement.
-
Well, let's compare.
-
The means in this
case are the same.
-
In both cases, people improved
by about 10-ish points or so.
-
But the variability
is very different.
-
On the left, some people
benefited very greatly,
-
whereas others really
didn't benefit at all.
-
But on the right, everyone
benefits a good amount.
-
In this case, I would
personally pick medication B
-
because it's more consistent.
-
And so this is an example of
why knowing the variability
-
might help us to make
some real-life decisions.
-
So in general in statistics,
measures of variability
-
are ways to describe these
differences statistically.
-
They describe how scores
in a given data set
-
differ from one another.
-
And they capture things
like how spread out
-
or how clustered
together, the points
-
are things we've been
looking at already.
-
So there are three that
we're going to talk about.
-
We have the range, standard
deviation, and variance.
-
Let's start with the range.
-
The range is nice because
it's a really simple measure
-
of variability, of dispersion,
of how spread out points are.
-
It can often be calculated
in 5 or 10 seconds.
-
Here's the formula.
-
So we have the range, r.
-
Don't get confused
later on when we
-
learn about correlations, which
are often also described by r.
-
We'll use some
different subscripts
-
to make that difference
clear when the time comes.
-
But for now, range is r.
-
And then we have r
equals h minus l.
-
h means the highest
score in the data set,
-
l means the lowest
score in the data set.
-
So you can see that this is
a very simple calculation.
-
And if we go back
to the example we
-
were working with
a minute ago, we
-
can calculate the
range very quickly.
-
So for the first data set,
we have 95 negative 80.
-
So the range is 15.
-
And in the second data set,
we have 150 negative 25,
-
giving us a much
larger range of 125.
-
So in this case, I would do
well to report both to you.
-
I'll tell you the mean and
this measure of variability,
-
because that gives you a more
full picture of what's going on.
-
So a mean of 87
and a range of 15
-
describes a very
different situation
-
compared to a mean of
87 and a range of 125.
-
So again, it's a great
idea for me to report both.
-
And this is what's often done.
-
A big limitation of
the range, though,
-
is that by using it,
even though it's simple
-
and it's pretty
effective, you might
-
miss a little bit of the data,
a little bit of the information
-
in your data set.
-
Let me show you an
example to illustrate.
-
Here's a data set here.
-
Although these bars
are quite high,
-
there's really just one
sort of value in each bar.
-
So we have one person who
scored a 30, one person who
-
scored a 40, and so on.
-
Now the range here is 120.
-
It's 150 minus 30.
-
But let's look at
a second data set.
-
In this case, the
range is still 120
-
because our highest and
lowest values are the same.
-
But everybody is
kind of over here,
-
and there's just a couple
outliers beyond that.
-
So again, if I were to
just tell you the range,
-
I might be misleading you a
little bit because you're not
-
sure if it looks
like this on the left
-
or if the data looks
like this on the right.
-
And this is where standard
deviation and variance
-
come into play.
-
Standard deviation, just
like the name suggests,
-
describes the standard
or typical amount
-
that scores deviate from the
mean, hence standard deviation.
-
Now we'll get into exactly
what this looks like
-
once we learn to calculate
standard deviation.
-
But I just want to show
you some symbols for now.
-
So like with means, we
have different symbols
-
to describe population standard
deviation versus sample standard
-
deviation.
-
Population standard deviation
is described by sigma.
-
It's this sort of O with a Elvis
hair, I like to think of it as,
-
not to be confused
with this sigma, which
-
is a capital S.
Unfortunately, they're
-
named the same thing, which
means take the sum of.
-
We learned about
that previously.
-
This is sigma with a little s.
-
So for a sample,
standard deviation
-
is simply described by S.
-
So I want to take a
step back and talk
-
about why standard
deviations are really useful.
-
Whenever you have a normal
curve, a normally distributed
-
set of data, which is very
common in the world, things
-
like height, weight, and so on
are all normally distributed,
-
standard deviations have this
really interesting property
-
of telling you a lot of
information about what's common
-
and what's uncommon.
-
So if we have 0, this is right
at the mean of whatever we're
-
talking about.
-
This is the mean.
-
0 standard deviations away
from the mean is right here.
-
You're right at the mean.
-
We can look at one standard
deviation above the mean and one
-
standard deviation below,
and we automatically just
-
because of how standard
deviations work,
-
that 68% of people will
fall within this range.
-
We can go beyond that.
-
We know that between
two standard deviations
-
in either direction of the mean,
95% of people will be contained.
-
And 3, you're getting really
extreme, really far out, really
-
rare, 99.7% of the
data will be contained
-
within three standard
deviations in either direction
-
from the mean.
-
To illustrate this a little bit
more, let's talk some specifics.
-
So let's say I'm
looking at IQ scores.
-
We know a lot about IQ scores.
-
We know for example, the
population mean of IQ is 100.
-
And we know that the population
standard deviation, sigma,
-
is 15.
-
So let's go ahead and draw
that same sort of normal curve.
-
We know that intelligence
is normally distributed.
-
And let's take a look
at what information
-
we have just by knowing
standard deviation.
-
So average IQ is
right here at 100.
-
One standard deviation
above the mean would be 115.
-
Two standard deviations
above the mean would be 130.
-
And three standard
deviations would be 145.
-
And we could do the same
in the opposite direction.
-
One standard deviation below
the mean of intelligence is 85.
-
Two standard
deviations below is 70.
-
And three standard deviations
below the mean of intelligence
-
is 55.
-
So again, I automatically
know 68% of people
-
will fall between
an IQ of 85 and 115.
-
I also know that 95%
of people will fall
-
between an IQ of 70 and 130.
-
And finally, that
99.7 or so will fall
-
between an IQ of 55 and 145.
-
So this is great to know
because if you tell me
-
you have an IQ of 146,
I'm really impressed.
-
This is rare.
-
This is very extreme.
-
But if you tell me you have
an IQ say, 106, something
-
like that, that's fine.
-
Good for you.
-
Not very impressed.
-
So knowing standard
deviations helps
-
you to get this extra
information about a data set.
-
So finally, we have variance.
-
Variance is very simple.
-
It's just the square
of standard deviation.
-
So it's the average squared
deviation from the mean.
-
Unfortunately, for variance,
it doesn't get its own symbols.
-
We just take the
symbols we already
-
have for standard deviation.
-
And we put a squared
because it's just
-
squared standard deviation.
-
So here, for a population,
we would call the variance
-
in a population sigma squared.
-
And for a sample, we would call
the sample variance, s squared.
-
So in the next
video, we'll learn
-
how to calculate
some of these things.
-
But I want to at least highlight
some of the formulas you're
-
going to see.
-
So we have four
different formulas
-
because we have standard
deviation and variance.
-
And we have the population
versions and the statistic
-
for sample versions.
-
So for standard deviation
in the population.
-
This is our formula.
-
Notice we have
sigma on the left.
-
And we have all this mess,
which I'll get into next time.
-
One thing I'll mention is that
for all of these formulas,
-
the numerator is called
the sums of squares, ss.
-
And we're going to learn
about what the sums of squares
-
really means in the next video.
-
But for now, just
keep that in mind.
-
So for our sample
statistic, we have this.
-
You're going to see
an s on the left here.
-
And it's going to have
some similarities,
-
but you're going to notice a
difference or two that we'll
-
talk about in the next video.
-
For variance, we
have sigma squared.
-
And for sample statistic version
of variance, we have s squared.