< Return to Video

Measures of Variability (Range, Standard Deviation, Variance)

  • 0:00 - 0:02
    PROFESSOR: In this
    video, we're going
  • 0:02 - 0:04
    to learn about measures
    of variability,
  • 0:04 - 0:06
    another form of
    descriptive statistics
  • 0:06 - 0:08
    that people often want to
    know in addition to measures
  • 0:08 - 0:09
    of central tendency.
  • 0:09 - 0:12
    But before we get to any of
    the nitty gritty details,
  • 0:12 - 0:16
    I want to motivate why we
    need measures of variability
  • 0:16 - 0:17
    with two examples.
  • 0:17 - 0:20
    So here's two different
    data sets, one on the top
  • 0:20 - 0:21
    and one on the bottom.
  • 0:21 - 0:24
    I'll just go ahead and tell you
    that the mean for both data sets
  • 0:24 - 0:25
    is 87.
  • 0:25 - 0:28
    Now if I were to just tell
    you the mean of these data,
  • 0:28 - 0:30
    I would be misleading
    you a little bit,
  • 0:30 - 0:33
    because in reality, the
    situation in each data set
  • 0:33 - 0:34
    is quite different.
  • 0:34 - 0:36
    If I were to plot
    it out, for example,
  • 0:36 - 0:37
    you would see this
    difference clearly.
  • 0:37 - 0:39
    In the top data
    set, all the scores
  • 0:39 - 0:41
    are very clustered together.
  • 0:41 - 0:42
    Everything is close.
  • 0:42 - 0:45
    But in the bottom data set,
    scores are very spread out.
  • 0:45 - 0:48
    So again, I need some way to
    quantify these differences.
  • 0:48 - 0:51
    And a measure of central
    tendency, like the mean,
  • 0:51 - 0:53
    simply can't capture that alone.
  • 0:53 - 0:55
    Here's another example.
  • 0:55 - 0:57
    Let's say you're working for
    a pharmaceutical company,
  • 0:57 - 0:58
    something like that.
  • 0:58 - 1:01
    And you need to decide between
    two different medications
  • 1:01 - 1:02
    for depression.
  • 1:02 - 1:06
    We'll call them medication
    A and medication B. So
  • 1:06 - 1:09
    let's say you did a
    study where you measured
  • 1:09 - 1:11
    how much improvement
    happened when people took one
  • 1:11 - 1:14
    over the other, and
    this is what you got.
  • 1:14 - 1:16
    So let's say over here
    that higher scores mean
  • 1:16 - 1:18
    more improvement
    and lower scores
  • 1:18 - 1:20
    mean little to no improvement.
  • 1:20 - 1:21
    Well, let's compare.
  • 1:21 - 1:23
    The means in this
    case are the same.
  • 1:23 - 1:27
    In both cases, people improved
    by about 10-ish points or so.
  • 1:27 - 1:29
    But the variability
    is very different.
  • 1:29 - 1:32
    On the left, some people
    benefited very greatly,
  • 1:32 - 1:35
    whereas others really
    didn't benefit at all.
  • 1:35 - 1:37
    But on the right, everyone
    benefits a good amount.
  • 1:37 - 1:40
    In this case, I would
    personally pick medication B
  • 1:40 - 1:42
    because it's more consistent.
  • 1:42 - 1:45
    And so this is an example of
    why knowing the variability
  • 1:45 - 1:50
    might help us to make
    some real-life decisions.
  • 1:50 - 1:54
    So in general in statistics,
    measures of variability
  • 1:54 - 1:57
    are ways to describe these
    differences statistically.
  • 1:57 - 2:00
    They describe how scores
    in a given data set
  • 2:00 - 2:02
    differ from one another.
  • 2:02 - 2:04
    And they capture things
    like how spread out
  • 2:04 - 2:05
    or how clustered
    together, the points
  • 2:05 - 2:07
    are things we've been
    looking at already.
  • 2:07 - 2:10
    So there are three that
    we're going to talk about.
  • 2:10 - 2:13
    We have the range, standard
    deviation, and variance.
  • 2:13 - 2:15
    Let's start with the range.
  • 2:15 - 2:18
    The range is nice because
    it's a really simple measure
  • 2:18 - 2:22
    of variability, of dispersion,
    of how spread out points are.
  • 2:22 - 2:25
    It can often be calculated
    in 5 or 10 seconds.
  • 2:25 - 2:27
    Here's the formula.
  • 2:27 - 2:29
    So we have the range, r.
  • 2:29 - 2:30
    Don't get confused
    later on when we
  • 2:30 - 2:35
    learn about correlations, which
    are often also described by r.
  • 2:35 - 2:36
    We'll use some
    different subscripts
  • 2:36 - 2:39
    to make that difference
    clear when the time comes.
  • 2:39 - 2:40
    But for now, range is r.
  • 2:40 - 2:44
    And then we have r
    equals h minus l.
  • 2:44 - 2:46
    h means the highest
    score in the data set,
  • 2:46 - 2:49
    l means the lowest
    score in the data set.
  • 2:49 - 2:51
    So you can see that this is
    a very simple calculation.
  • 2:51 - 2:53
    And if we go back
    to the example we
  • 2:53 - 2:55
    were working with
    a minute ago, we
  • 2:55 - 2:58
    can calculate the
    range very quickly.
  • 2:58 - 3:01
    So for the first data set,
    we have 95 negative 80.
  • 3:01 - 3:02
    So the range is 15.
  • 3:02 - 3:06
    And in the second data set,
    we have 150 negative 25,
  • 3:06 - 3:09
    giving us a much
    larger range of 125.
  • 3:09 - 3:13
    So in this case, I would do
    well to report both to you.
  • 3:13 - 3:16
    I'll tell you the mean and
    this measure of variability,
  • 3:16 - 3:19
    because that gives you a more
    full picture of what's going on.
  • 3:19 - 3:22
    So a mean of 87
    and a range of 15
  • 3:22 - 3:25
    describes a very
    different situation
  • 3:25 - 3:29
    compared to a mean of
    87 and a range of 125.
  • 3:29 - 3:31
    So again, it's a great
    idea for me to report both.
  • 3:31 - 3:33
    And this is what's often done.
  • 3:33 - 3:36
    A big limitation of
    the range, though,
  • 3:36 - 3:38
    is that by using it,
    even though it's simple
  • 3:38 - 3:40
    and it's pretty
    effective, you might
  • 3:40 - 3:43
    miss a little bit of the data,
    a little bit of the information
  • 3:43 - 3:44
    in your data set.
  • 3:44 - 3:46
    Let me show you an
    example to illustrate.
  • 3:46 - 3:48
    Here's a data set here.
  • 3:48 - 3:50
    Although these bars
    are quite high,
  • 3:50 - 3:54
    there's really just one
    sort of value in each bar.
  • 3:54 - 3:56
    So we have one person who
    scored a 30, one person who
  • 3:56 - 3:58
    scored a 40, and so on.
  • 3:58 - 4:00
    Now the range here is 120.
  • 4:00 - 4:03
    It's 150 minus 30.
  • 4:03 - 4:05
    But let's look at
    a second data set.
  • 4:05 - 4:07
    In this case, the
    range is still 120
  • 4:07 - 4:10
    because our highest and
    lowest values are the same.
  • 4:10 - 4:13
    But everybody is
    kind of over here,
  • 4:13 - 4:15
    and there's just a couple
    outliers beyond that.
  • 4:15 - 4:17
    So again, if I were to
    just tell you the range,
  • 4:17 - 4:19
    I might be misleading you a
    little bit because you're not
  • 4:19 - 4:22
    sure if it looks
    like this on the left
  • 4:22 - 4:24
    or if the data looks
    like this on the right.
  • 4:24 - 4:26
    And this is where standard
    deviation and variance
  • 4:26 - 4:28
    come into play.
  • 4:28 - 4:31
    Standard deviation, just
    like the name suggests,
  • 4:31 - 4:34
    describes the standard
    or typical amount
  • 4:34 - 4:38
    that scores deviate from the
    mean, hence standard deviation.
  • 4:38 - 4:40
    Now we'll get into exactly
    what this looks like
  • 4:40 - 4:43
    once we learn to calculate
    standard deviation.
  • 4:43 - 4:46
    But I just want to show
    you some symbols for now.
  • 4:46 - 4:49
    So like with means, we
    have different symbols
  • 4:49 - 4:53
    to describe population standard
    deviation versus sample standard
  • 4:53 - 4:54
    deviation.
  • 4:54 - 4:57
    Population standard deviation
    is described by sigma.
  • 4:57 - 5:01
    It's this sort of O with a Elvis
    hair, I like to think of it as,
  • 5:01 - 5:03
    not to be confused
    with this sigma, which
  • 5:03 - 5:06
    is a capital S.
    Unfortunately, they're
  • 5:06 - 5:09
    named the same thing, which
    means take the sum of.
  • 5:09 - 5:11
    We learned about
    that previously.
  • 5:11 - 5:14
    This is sigma with a little s.
  • 5:14 - 5:17
    So for a sample,
    standard deviation
  • 5:17 - 5:19
    is simply described by S.
  • 5:19 - 5:21
    So I want to take a
    step back and talk
  • 5:21 - 5:24
    about why standard
    deviations are really useful.
  • 5:24 - 5:28
    Whenever you have a normal
    curve, a normally distributed
  • 5:28 - 5:31
    set of data, which is very
    common in the world, things
  • 5:31 - 5:34
    like height, weight, and so on
    are all normally distributed,
  • 5:34 - 5:37
    standard deviations have this
    really interesting property
  • 5:37 - 5:40
    of telling you a lot of
    information about what's common
  • 5:40 - 5:42
    and what's uncommon.
  • 5:42 - 5:45
    So if we have 0, this is right
    at the mean of whatever we're
  • 5:45 - 5:46
    talking about.
  • 5:46 - 5:47
    This is the mean.
  • 5:47 - 5:50
    0 standard deviations away
    from the mean is right here.
  • 5:50 - 5:51
    You're right at the mean.
  • 5:51 - 5:54
    We can look at one standard
    deviation above the mean and one
  • 5:54 - 5:57
    standard deviation below,
    and we automatically just
  • 5:57 - 6:00
    because of how standard
    deviations work,
  • 6:00 - 6:03
    that 68% of people will
    fall within this range.
  • 6:03 - 6:04
    We can go beyond that.
  • 6:04 - 6:06
    We know that between
    two standard deviations
  • 6:06 - 6:10
    in either direction of the mean,
    95% of people will be contained.
  • 6:10 - 6:14
    And 3, you're getting really
    extreme, really far out, really
  • 6:14 - 6:18
    rare, 99.7% of the
    data will be contained
  • 6:18 - 6:20
    within three standard
    deviations in either direction
  • 6:20 - 6:22
    from the mean.
  • 6:22 - 6:25
    To illustrate this a little bit
    more, let's talk some specifics.
  • 6:25 - 6:28
    So let's say I'm
    looking at IQ scores.
  • 6:28 - 6:30
    We know a lot about IQ scores.
  • 6:30 - 6:33
    We know for example, the
    population mean of IQ is 100.
  • 6:33 - 6:37
    And we know that the population
    standard deviation, sigma,
  • 6:37 - 6:38
    is 15.
  • 6:38 - 6:41
    So let's go ahead and draw
    that same sort of normal curve.
  • 6:41 - 6:44
    We know that intelligence
    is normally distributed.
  • 6:44 - 6:46
    And let's take a look
    at what information
  • 6:46 - 6:48
    we have just by knowing
    standard deviation.
  • 6:48 - 6:51
    So average IQ is
    right here at 100.
  • 6:51 - 6:55
    One standard deviation
    above the mean would be 115.
  • 6:55 - 6:59
    Two standard deviations
    above the mean would be 130.
  • 6:59 - 7:02
    And three standard
    deviations would be 145.
  • 7:02 - 7:04
    And we could do the same
    in the opposite direction.
  • 7:04 - 7:08
    One standard deviation below
    the mean of intelligence is 85.
  • 7:08 - 7:10
    Two standard
    deviations below is 70.
  • 7:10 - 7:13
    And three standard deviations
    below the mean of intelligence
  • 7:13 - 7:14
    is 55.
  • 7:14 - 7:18
    So again, I automatically
    know 68% of people
  • 7:18 - 7:22
    will fall between
    an IQ of 85 and 115.
  • 7:22 - 7:26
    I also know that 95%
    of people will fall
  • 7:26 - 7:28
    between an IQ of 70 and 130.
  • 7:28 - 7:33
    And finally, that
    99.7 or so will fall
  • 7:33 - 7:36
    between an IQ of 55 and 145.
  • 7:36 - 7:39
    So this is great to know
    because if you tell me
  • 7:39 - 7:42
    you have an IQ of 146,
    I'm really impressed.
  • 7:42 - 7:44
    This is rare.
  • 7:44 - 7:45
    This is very extreme.
  • 7:45 - 7:50
    But if you tell me you have
    an IQ say, 106, something
  • 7:50 - 7:52
    like that, that's fine.
  • 7:52 - 7:53
    Good for you.
  • 7:53 - 7:54
    Not very impressed.
  • 7:54 - 7:57
    So knowing standard
    deviations helps
  • 7:57 - 8:01
    you to get this extra
    information about a data set.
  • 8:01 - 8:03
    So finally, we have variance.
  • 8:03 - 8:05
    Variance is very simple.
  • 8:05 - 8:08
    It's just the square
    of standard deviation.
  • 8:08 - 8:13
    So it's the average squared
    deviation from the mean.
  • 8:13 - 8:16
    Unfortunately, for variance,
    it doesn't get its own symbols.
  • 8:16 - 8:18
    We just take the
    symbols we already
  • 8:18 - 8:19
    have for standard deviation.
  • 8:19 - 8:21
    And we put a squared
    because it's just
  • 8:21 - 8:23
    squared standard deviation.
  • 8:23 - 8:26
    So here, for a population,
    we would call the variance
  • 8:26 - 8:28
    in a population sigma squared.
  • 8:28 - 8:34
    And for a sample, we would call
    the sample variance, s squared.
  • 8:34 - 8:36
    So in the next
    video, we'll learn
  • 8:36 - 8:38
    how to calculate
    some of these things.
  • 8:38 - 8:40
    But I want to at least highlight
    some of the formulas you're
  • 8:40 - 8:41
    going to see.
  • 8:41 - 8:43
    So we have four
    different formulas
  • 8:43 - 8:46
    because we have standard
    deviation and variance.
  • 8:46 - 8:48
    And we have the population
    versions and the statistic
  • 8:48 - 8:50
    for sample versions.
  • 8:50 - 8:53
    So for standard deviation
    in the population.
  • 8:53 - 8:54
    This is our formula.
  • 8:54 - 8:56
    Notice we have
    sigma on the left.
  • 8:56 - 8:59
    And we have all this mess,
    which I'll get into next time.
  • 8:59 - 9:01
    One thing I'll mention is that
    for all of these formulas,
  • 9:01 - 9:06
    the numerator is called
    the sums of squares, ss.
  • 9:06 - 9:08
    And we're going to learn
    about what the sums of squares
  • 9:08 - 9:10
    really means in the next video.
  • 9:10 - 9:13
    But for now, just
    keep that in mind.
  • 9:13 - 9:15
    So for our sample
    statistic, we have this.
  • 9:15 - 9:17
    You're going to see
    an s on the left here.
  • 9:17 - 9:19
    And it's going to have
    some similarities,
  • 9:19 - 9:21
    but you're going to notice a
    difference or two that we'll
  • 9:21 - 9:22
    talk about in the next video.
  • 9:22 - 9:24
    For variance, we
    have sigma squared.
  • 9:24 - 9:30
    And for sample statistic version
    of variance, we have s squared.
Title:
Measures of Variability (Range, Standard Deviation, Variance)
Description:

more » « less
Video Language:
English
Duration:
09:30

English subtitles

Revisions