-
-
TEACHER: One of the most
deceptively obvious questions
-
in machine learning is, are more
models better than fewer models?
-
The science that
answers this question
-
is called model ensembling.
-
Model ensembling asks how
to construct aggregations
-
of models that improve test
accuracy while reducing
-
the costs associated with
storing, training, and getting
-
inference from multiple models.
-
We'll explore a popular
ensembling method
-
applied to decision trees.
-
Random forests.
-
In order to illustrate random
forests, let's take an example.
-
Imagine we're trying to predict
what caused a wildfire given
-
its size, location, and date.
-
The basic building blocks
of the random forest model
-
are decision trees.
-
So if you want to
learn how they work,
-
I recommend checking out our
previous video linked here.
-
As a quick refresher,
decision trees
-
perform the task of
classification or regression
-
by recursively asking simple
true-or-false questions that
-
split the data into the
purest possible subgroups.
-
Now, back to random forests.
-
In this method of
ensembling, we train
-
a bunch of decision trees,
hence the name forest,
-
and then take a vote among
the different trees--
-
one tree, one vote.
-
In the case of
classification, each tree
-
spits out a class
prediction, and then
-
the class with the most
votes becomes the output
-
of the random forest.
-
In the case of regression,
a simple average
-
of each individual
tree's prediction
-
becomes the output
of the random forest.
-
The key idea behind
random forests
-
is that there's
wisdom in crowds.
-
Insight drawn from a
large group of models
-
is likely to be more accurate
than the prediction from any one
-
model alone.
-
Sounds simple enough, right?
-
Sure.
-
But why does this even work?
-
What if all of our models
learn the exact same thing
-
and vote for the same answer?
-
Isn't that equivalent
to just having
-
one model make the prediction?
-
Yes, but there's
a way to fix that.
-
First, we need to define a
word that will help explain,
-
uncorrelatedness.
-
We need our decision trees to
be different from each other.
-
We want them to disagree
on what the splits are
-
and what the predictions are.
-
Uncorrelatedness is
important for random forests.
-
A large group of
uncorrelated trees,
-
working together in an
ensemble, will outperform
-
any of the constituent trees.
-
In other words, the forest
is shielded from the errors
-
of individual trees.
-
So how do we ensure our
trees are uncorrelated?
-
There are a few different
methods to do this.
-
As you learn these
methods, try and see
-
if you understand what makes
a random forest random.
-
The first method to
ensure uncorrelatedness
-
is called bootstrapping.
-
Bootstrapping is
creating smaller data
-
sets out of our training
data set through sampling.
-
Now, with normal
decision trees, we
-
feed the entire training
data set to the tree
-
and allow it to
generate its prediction.
-
However, with bootstrapping,
we allow each tree
-
to randomly sample a
subset of the training
-
data with replacement,
resulting in different trees.
-
When we allow replacement,
some observations
-
may be repeated in the sample.
-
In our data set, we have
1.88 million wildfires,
-
but we're only going to show,
say, a random 25% subset
-
of those to each of our trees.
-
As a result, these two trees
sampled from the same data
-
set, but ended up with two
very different training sets.
-
Using bootstrapping to
create uncorrelated models
-
and then aggregating
their results
-
is called bootstrap aggregating,
or bagging for short.
-
The second way to introduce
variation in our trees
-
is by shuffling, which features
each tree can split on.
-
This method is called
feature randomness.
-
Remember, with basic
decision trees,
-
when it's time to split
the data on a node,
-
the tree considers
each possible feature
-
and picks the one that leads
to the purest subgroups.
-
However, with random forests,
we limit the number of features
-
that each tree can even
consider splitting on.
-
For example, consider
the two trees shown here.
-
The first one only sees
the location and size
-
of the wildfire, while
the second one only
-
sees size and date.
-
As a result, the two trees
learn very different splits.
-
Feature randomness
encourages diverse trees.
-
Because the individual
trees are very simple,
-
and they're only trained on
a subset of the training data
-
and feature set, training
time is very low,
-
so we can afford to
train thousands of trees.
-
Random forests are widely
used in academia and industry.
-
Now that you
understand the concept,
-
you're almost ready to
implement a random forest model
-
to use with your own projects.
-
Stay tuned to Econoscent for the
random forest coding tutorial
-
and for a new video on yet
another ensembling method,
-
gradient boosted trees.
-