< Return to Video

Visual Guide to Random Forests

  • 0:00 - 0:04
  • 0:04 - 0:06
    TEACHER: One of the most
    deceptively obvious questions
  • 0:06 - 0:11
    in machine learning is, are more
    models better than fewer models?
  • 0:11 - 0:13
    The science that
    answers this question
  • 0:13 - 0:16
    is called model ensembling.
  • 0:16 - 0:19
    Model ensembling asks how
    to construct aggregations
  • 0:19 - 0:23
    of models that improve test
    accuracy while reducing
  • 0:23 - 0:27
    the costs associated with
    storing, training, and getting
  • 0:27 - 0:30
    inference from multiple models.
  • 0:30 - 0:32
    We'll explore a popular
    ensembling method
  • 0:32 - 0:34
    applied to decision trees.
  • 0:34 - 0:36
    Random forests.
  • 0:36 - 0:39
    In order to illustrate random
    forests, let's take an example.
  • 0:39 - 0:43
    Imagine we're trying to predict
    what caused a wildfire given
  • 0:43 - 0:47
    its size, location, and date.
  • 0:47 - 0:49
    The basic building blocks
    of the random forest model
  • 0:49 - 0:51
    are decision trees.
  • 0:51 - 0:53
    So if you want to
    learn how they work,
  • 0:53 - 0:56
    I recommend checking out our
    previous video linked here.
  • 0:56 - 0:58
    As a quick refresher,
    decision trees
  • 0:58 - 1:02
    perform the task of
    classification or regression
  • 1:02 - 1:05
    by recursively asking simple
    true-or-false questions that
  • 1:05 - 1:09
    split the data into the
    purest possible subgroups.
  • 1:09 - 1:11
    Now, back to random forests.
  • 1:11 - 1:14
    In this method of
    ensembling, we train
  • 1:14 - 1:17
    a bunch of decision trees,
    hence the name forest,
  • 1:17 - 1:20
    and then take a vote among
    the different trees--
  • 1:20 - 1:22
    one tree, one vote.
  • 1:22 - 1:25
    In the case of
    classification, each tree
  • 1:25 - 1:27
    spits out a class
    prediction, and then
  • 1:27 - 1:29
    the class with the most
    votes becomes the output
  • 1:29 - 1:31
    of the random forest.
  • 1:31 - 1:34
    In the case of regression,
    a simple average
  • 1:34 - 1:35
    of each individual
    tree's prediction
  • 1:35 - 1:38
    becomes the output
    of the random forest.
  • 1:38 - 1:40
    The key idea behind
    random forests
  • 1:40 - 1:42
    is that there's
    wisdom in crowds.
  • 1:42 - 1:45
    Insight drawn from a
    large group of models
  • 1:45 - 1:48
    is likely to be more accurate
    than the prediction from any one
  • 1:48 - 1:50
    model alone.
  • 1:50 - 1:52
    Sounds simple enough, right?
  • 1:52 - 1:52
    Sure.
  • 1:52 - 1:54
    But why does this even work?
  • 1:54 - 1:58
    What if all of our models
    learn the exact same thing
  • 1:58 - 2:00
    and vote for the same answer?
  • 2:00 - 2:02
    Isn't that equivalent
    to just having
  • 2:02 - 2:04
    one model make the prediction?
  • 2:04 - 2:07
    Yes, but there's
    a way to fix that.
  • 2:07 - 2:11
    First, we need to define a
    word that will help explain,
  • 2:11 - 2:12
    uncorrelatedness.
  • 2:12 - 2:16
    We need our decision trees to
    be different from each other.
  • 2:16 - 2:18
    We want them to disagree
    on what the splits are
  • 2:18 - 2:20
    and what the predictions are.
  • 2:20 - 2:24
    Uncorrelatedness is
    important for random forests.
  • 2:24 - 2:26
    A large group of
    uncorrelated trees,
  • 2:26 - 2:29
    working together in an
    ensemble, will outperform
  • 2:29 - 2:32
    any of the constituent trees.
  • 2:32 - 2:35
    In other words, the forest
    is shielded from the errors
  • 2:35 - 2:37
    of individual trees.
  • 2:37 - 2:41
    So how do we ensure our
    trees are uncorrelated?
  • 2:41 - 2:43
    There are a few different
    methods to do this.
  • 2:43 - 2:45
    As you learn these
    methods, try and see
  • 2:45 - 2:49
    if you understand what makes
    a random forest random.
  • 2:49 - 2:51
    The first method to
    ensure uncorrelatedness
  • 2:51 - 2:53
    is called bootstrapping.
  • 2:53 - 2:56
    Bootstrapping is
    creating smaller data
  • 2:56 - 2:59
    sets out of our training
    data set through sampling.
  • 2:59 - 3:02
    Now, with normal
    decision trees, we
  • 3:02 - 3:05
    feed the entire training
    data set to the tree
  • 3:05 - 3:07
    and allow it to
    generate its prediction.
  • 3:07 - 3:10
    However, with bootstrapping,
    we allow each tree
  • 3:10 - 3:13
    to randomly sample a
    subset of the training
  • 3:13 - 3:17
    data with replacement,
    resulting in different trees.
  • 3:17 - 3:20
    When we allow replacement,
    some observations
  • 3:20 - 3:24
    may be repeated in the sample.
  • 3:24 - 3:28
    In our data set, we have
    1.88 million wildfires,
  • 3:28 - 3:32
    but we're only going to show,
    say, a random 25% subset
  • 3:32 - 3:35
    of those to each of our trees.
  • 3:35 - 3:38
    As a result, these two trees
    sampled from the same data
  • 3:38 - 3:42
    set, but ended up with two
    very different training sets.
  • 3:42 - 3:45
    Using bootstrapping to
    create uncorrelated models
  • 3:45 - 3:48
    and then aggregating
    their results
  • 3:48 - 3:51
    is called bootstrap aggregating,
    or bagging for short.
  • 3:51 - 3:54
    The second way to introduce
    variation in our trees
  • 3:54 - 3:58
    is by shuffling, which features
    each tree can split on.
  • 3:58 - 4:01
    This method is called
    feature randomness.
  • 4:01 - 4:03
    Remember, with basic
    decision trees,
  • 4:03 - 4:06
    when it's time to split
    the data on a node,
  • 4:06 - 4:08
    the tree considers
    each possible feature
  • 4:08 - 4:11
    and picks the one that leads
    to the purest subgroups.
  • 4:11 - 4:15
    However, with random forests,
    we limit the number of features
  • 4:15 - 4:18
    that each tree can even
    consider splitting on.
  • 4:18 - 4:22
    For example, consider
    the two trees shown here.
  • 4:22 - 4:25
    The first one only sees
    the location and size
  • 4:25 - 4:27
    of the wildfire, while
    the second one only
  • 4:27 - 4:30
    sees size and date.
  • 4:30 - 4:34
    As a result, the two trees
    learn very different splits.
  • 4:34 - 4:38
    Feature randomness
    encourages diverse trees.
  • 4:38 - 4:41
    Because the individual
    trees are very simple,
  • 4:41 - 4:43
    and they're only trained on
    a subset of the training data
  • 4:43 - 4:46
    and feature set, training
    time is very low,
  • 4:46 - 4:50
    so we can afford to
    train thousands of trees.
  • 4:50 - 4:53
    Random forests are widely
    used in academia and industry.
  • 4:53 - 4:56
    Now that you
    understand the concept,
  • 4:56 - 4:58
    you're almost ready to
    implement a random forest model
  • 4:58 - 5:01
    to use with your own projects.
  • 5:01 - 5:04
    Stay tuned to Econoscent for the
    random forest coding tutorial
  • 5:04 - 5:08
    and for a new video on yet
    another ensembling method,
  • 5:08 - 5:10
    gradient boosted trees.
  • 5:10 - 5:11
Title:
Visual Guide to Random Forests
Description:

more » « less
Video Language:
English
Duration:
05:12

English subtitles

Revisions