-
Okay. In this video, we'll be discussing
-
about how we can implement linear
-
regression in Splunk MLTK, okay? So
-
in my previous video, we have seen how we
-
can install Splunk MLTK and it's
-
related packages, right? And also if you
-
remember when I was discussing about the
-
machine learning core algorithm, I was
-
also introduced the core data set we'll
-
be using for our linear regression
-
modeling, okay?
-
That's the graduate admission dataset
-
where we have for various students we
-
have their GRE score, TOEFL score,
-
university rating, statement of purpose
-
rating okay,
-
reference rating, CGPA, whether
-
they have done research or not. Based on
-
all these fields, we will try to
-
predict the chances of admit, okay? So now
-
to implement linear regression- so we
-
will be implementing linear
-
regression for this one and see how best
-
the model is fitting the particular data,
-
okay? So to implement linear
-
regression, what you have to do you have
-
to go to a Splunk machine learning
-
toolkit, okay? As I stated before, the
-
landing page of the machine learning
-
toolkit app is this showcased dashboard,
-
right? Where it has basically a lot of
-
examples based on whatever the
-
different algorithm- machine learning
-
algorithms Splunk supports, okay? Now to
-
implement the machine learning on your
-
own data set, what you need to do is you need to come
-
to experiments tab, okay? So now if you do
-
not have any other models or if it is
-
the first time you are coming to this
-
particular dashboard, this will be the
-
default view, okay? But if you have
-
already experimented on different models,
-
the view will be slightly different
-
which we'll see later, ok? So now as in
-
linear regression we are trying to do a
-
prediction on the numeric fields, right?
-
So we will go over here, okay?
-
The predict numeric field.
-
We're clicking over here. Now it is
-
asking me for an experiment title and a
-
description. So I will say graduate
-
admission prediction. Let's give the
-
experiment title like this one
-
prediction, okay? Now you got to give some
-
description as well, meaningful
-
description. So I'll click on create, okay?
-
So now this particular view comes up
-
over here. Now, if you see here, here we
-
have two tabs, experiment settings and
-
experiment history. Initially the
-
experiment history will be blank,
-
there is nothing over here, okay? Now
-
based on the experiment settings,
-
experiment history will be updated
-
accordingly which we will see it later, okay?
-
Now, the first thing is it is asking me
-
for a search, right? So now let me
-
show you the data. So this
-
particular data I already indexed in my
-
main index. Okay so I'll just write the
-
query index equals to main and just
-
tabling it all my different, different
-
features and chances of admit, okay? So this
-
is my data set. So this data set, I will
-
be using it for my training purpose, not
-
the full data set or not all the 500
-
records. Maybe some of the data I will be
-
using it for training purpose, and rest
-
of the data I will be using it for the
-
prediction purpose just to see how my
-
model is working, okay? So I'll give this
-
query over here, and then I'll click on
-
search, okay? So by default, it is showing me
-
this is the my data, initial data preview,
-
right? Now, let's go to the next one. So
-
here if you see, there are a lot of
-
pre-processing steps over here, right? So
-
now in machine learning when you train a
-
particular model, right, you- there is
-
a- there may be some need to pre-process
-
that data so that you will reduce lot
-
of noise from the data. Now, there are a
-
lot of pre-processing algorithm
-
present over there. So when we will
-
discuss those algorithms, we'll come back
-
to this page again and work on it, okay?
-
So for now, I will not be doing any kind
-
of pre-processing because this data is
-
clean enough data, okay? So now the
-
algorithm I will be choosing, linear
-
regression. Now, there are a lot of
-
regulation algorithm, so currently we
-
studied only about linear regression, and we'll
-
be implementing linear regression only
-
in this video, so I will be choosing the
-
linear regression over here, okay? So now
-
fields to predict, that means which field
-
you want to predict. So as I will
-
predicting my chances of admit, I will
-
be choosing that. And then field used
-
for predicting. That means here basically
-
you are choosing your features, right? So
-
I will be choosing all my columns. So
-
here if you see, the concept of simple
-
linear regression and multiple linear
-
regression comes up, right? If I choose a
-
single feature, it will become a simple
-
linear regression. If I choose multiple
-
feature, it will become a multiple linear
-
regression. So for now, I will be choosing
-
for all, okay? Now here if you see, the
-
split for training, right? So here
-
basically it is what is happening is you
-
are splitting the whole dataset between
-
a training and test data set. There are-
-
Here currently it is 50 percent, 50 percent.
-
That means the first 50 percent data
-
will be used for training and the rest
-
50 percent data will be used for testing
-
purpose. I'll slide this one, it goes like
-
this one, I'll keep 70 and 30, okay? Now,
-
fit intercept, okay? That means, if you
-
remember from my machine learning video,
-
not only we have the slope value for
-
each and every feature, we also have a
-
intercept of y axis intercept by the
-
way. So by this option, you are
-
basically choosing whether
-
your model should include an implicit
-
intercept terms or not, okay? Now notes
-
you can give some meaningful
-
notes. Maybe the notes could be like what
-
are the fields you are using for
-
prediction purpose. So- and some
-
meaningful note which will be useful in
-
later when we'll see the history of the
-
model, okay? So I will say using all the
-
features, using all the features, okay? Now
-
after all is done, you need to click on
-
fit model. So it's basically- behind the
-
scene, what it do, it runs Splunk
-
custom command which basically
-
implemented [inaudible]. So
-
using that particular command, it is
-
trying to come up with the equation of
-
that line, right? Which we discussed
-
before. And if you remember from my
-
multiple linear regression video, we come
-
up with a linear algebra solution over
-
there, right? With matrix inversion
-
and matrix transpose, right? So behind the
-
scene it is doing the same thing over
-
there, okay?
-
So now if you see, the result came up,
-
right, after clicking on the fit model.
-
Now if you see, apart from our own data,
-
it's actually added two new columns over
-
here. One is the predicted chances of
-
admit and the residual column, right? Now,
-
predicted chance of admit is actually
-
the actual prediction happen on the data,
-
right? So if you see for the first row,
-
the actual chance of admit is 0.73,
-
that means 73%. Now the predicted
-
was 0.70, that means 70 percent. Now,
-
the residual column is the difference
-
between the actual chance of admit and
-
the predicted chance of admit, okay?
-
So this is how, after fitting the model,
-
it came up with this kind of
-
visualization.
-
It also shows up, there are other five to
-
six charts over here, okay? Now let us
-
discuss one by one this one. The first
-
chart show me the actual versus
-
predicted line chart. That means
-
if you see the chance of admit, the blue
-
colored graph, is the actual one, and the
-
predicted chance of admit, the yellow
-
color one, is the prediction one, right?
-
And if you see by seeing this one, we can
-
at least see this particular model is
-
okay fit to this particular data.
-
Somewhere it is lagging over here if you
-
see it, right? But somehow it's
-
actually fitting good over there. Now the
-
residual chart, whatever you are seeing
-
it over here, the line chart it is
-
showing up over here, okay? So now the
-
more this chart particular chart is
-
close to zero, that means the model is
-
fitting really, really good. But over here
-
if you see the latter part of this one,
-
the residuals are more, right? Because it
-
is more sparse, more distance from the
-
zeroth line. And the same thing is
-
reflecting over here as well. The model
-
has some kind of lagging over here, right?
-
So this kind of analysis you can do
-
it from there, how the model is fitting
-
your data. And this particular graph is
-
showing me the scatter plot of the
-
actual and the predicted one. And here
-
basically you can see how the line
-
is fitting your data over here through
-
this chart, okay? Now, it also provides say
-
residual histogram where let us
-
understand this one as well. So we
-
have the zeroth line over here if you
-
see. It's basically shows up for each and
-
every residual value, how many counts are
-
there if you see. So if you just
-
think about it, if for all my data points
-
this residual is zero, that's the
-
ideal scenario, right? That means I am
-
predicting the [inaudible], right?
-
So from this histogram, if you see that
-
means- if you see the residual error
-
equals to zero, the sample count is 24
-
[inaudible], right? If the more and more
-
samples are very close to this zero, that
-
means my model is doing good, that's it's
-
actually good fit model. And if it is
-
more sparse, if-
-
that means if we have more number of big
-
lines over here, that means that somehow the
-
model is not good- not a good fit for
-
that particular data. So this kind of
-
interpretation, you can do it from this
-
particular diagram, okay? So now there are
-
another two things over here. It's called R squared
-
statistic and root mean square
-
error, okay? So these two are actually a measure
-
about how accurate the model is, okay? So
-
I'll be discussing this measurement in
-
very detail in separate video.
-
There we will be discussing about R square
-
statistic, root mean square, and also some
-
other way to determine how the
-
accurate the model is. Just like bias,
-
variance, there are a lot of other
-
measurement as well
-
we'll discuss in detail over there, okay?
-
But for now, just try to remember
-
like this is the fit, measurement of fit,
-
like maybe R squares I just say we [inaudible]
-
can think of it is more close to 1, it's
-
a good fit. Something like this, okay?
-
Mmm, so we will see like how to best
-
judge a model based on that, okay? But
-
still like even for our square statistic,
-
it's all depend on the context, the field,
-
you are solving, you're implementing
-
linear regression as well. We'll discuss
-
those stuff as well in future, okay? And now,
-
if you see the last graph, it is showing
-
me the model parameters. If you remember
-
the big equation we have written it over
-
there [inaudible], right? So let me open the bamboo
-
paper here. If you remember, when we
-
talked about multiple linear regression,
-
we defined- we started our discussion
-
with a big equation, right? So let me go
-
back over there.
-
Yes, so this one, right? So we're? [inaudible] beta 1,
-
beta 2, to beta P is our slow value
-
coefficient of each and every feature
-
and beta 0 is my intercept, right? And
-
what we are doing basically at the
-
end of the day, we came up with a big
-
equation to determine this whole beta
-
vector, right? So this is the same stuff
-
over here it is representing. So it is
-
basically giving me like for each and
-
every feature what is the coefficient
-
value, okay? So- and the intercept value as
-
well. If you see, this is my beta 0 and my
-
beta 1 to beta P's these guys,
-
other guys. Now, if you see it closely
-
there are some of the coefficient which
-
have very greater value. Some of the
-
coefficient which are very less value
-
over here. Like the way to interpret the
-
coefficient is like how much it is
-
influencing the end result.
-
So to understand that, let us see this
-
one. Let's say I have a variable called 'x'
-
and I am writing something like 0.9 'y'.
-
Now, what do I mean by this particular
-
equation. .9 into 'y', right? So that means
-
if I give y equals to 1, that means
-
my 'x' will become 0.9, right? So what do
-
we mean by that? That means one unit
-
change in 'y',
-
it's basically 0.9 unit we are
-
changing in 'x', right? So this kind of
-
interpretation, you can do it. So that
-
means how 'y' is influencing 'x', right? So
-
this is how we are interpreting this
-
kind of coefficients as well in linear
-
regression. So that means we will know
-
from the coefficient itself which
-
particular feature is mostly influencing
-
that one. And now if you see it over here,
-
I think,
-
CGPA is the most influencing factor to
-
determine whether my chances of admit
-
is higher
-
or not, right? Considering we are
-
implementing a linear regression, there
-
could be a better fit of this particular
-
data which we need to experiment and see.
-
But for the current linear
-
regression implementation, we can
-
conclude this kind of stuff over here,
-
right, correct? So this is how the model
-
parameters summary visualization table
-
visualization is telling me those
-
different those details, right? So now if
-
you see, we actually fit our model, right?
-
So we still not clear that our model [inaudible]
-
until analyst we are saving it. That's
-
why it is showing me as a draft status of
-
your model, right? And you can now go to
-
experiment history to see what you have
-
done till now. So it will be maintaining
-
a history over there so now I can see
-
using this all these features my R
-
squared statistic is somewhere around 78%
-
and these are my coefficient and I am
-
coming up with a conclusion that maybe
-
CGPA is the most influential factor over
-
here, okay? So let us do another
-
experiment, okay? So in here, I'll keep my
-
CGPA over here just to see whether it is
-
actually true or not, okay? So now what I
-
will do here is I will give CGPA
-
I'll give the stat of 5:1 I will keep
-
the I will keep the yellower okay I'll
-
keep the research 1 and I will keep the
-
GRE score okay so I'll click over here
-
again I will keep the GRE score I will
-
remove the TOEFL score I will remove the
-
university rating I will remove the SOP
-
CGPA research and a lower I will keep so
-
now I am trying to do this experiment
-
with four features which I am thinking
-
maybe most influential one so maybe the
-
other feature may not have much impact
-
on on this particular prediction okay
-
so now using only I'll keep a note using
-
only four features so this is how this
-
particular note is coming into hand you
-
over here right so it is when I will see
-
the history I will come to know what I
-
have done over there okay so I will
-
click on fit model again let's see how
-
it's how it's working now so similar
-
stuff is happening over there it's
-
running the that custom comments in in
-
in later videos we will discuss in
-
detail of this this customs command
-
custom command as well okay so now if
-
you see it again predicted that one now
-
if you see from the actual versus line
-
chart it's more or less keeping same
-
even though I removed three features
-
right even this one has well more or
-
less
-
okay now if you see my R square
-
statistics has improved a lot with 82%
-
right so by this one at least I am
-
confident that really those three
-
features are not not impacting much of
-
it and if you see from this one residual
-
histogram residuals histogram the more
-
and more features are very close to zero
-
right with residual or residual either
-
more or more receivers are very very
-
close to zero right
-
so by this kind of analysis we can say
-
this particular model is better than
-
compared to my previous model right so
-
now what I will do is I will save this
-
particular model okay so I will say I
-
will give the experiment title as
-
graduate date predictor okay I will
-
click on save so now a data a model will
-
be created okay so now if I just we have
-
two options over here after you save the
-
model whether you have two you can go to
-
the listing page
-
or you continue editing okay let us
-
continue editing to see how experiment
-
history is looking now now experiment
-
history has two rows over there okay
-
the first row is my the current
-
experiment with my four features right
-
with R square value of 82% the second
-
row is telling with my older one right
-
so at any point of time you can load
-
this corresponding settings and
-
experiment with it okay
-
it will also show you the data
-
corresponding to H and XP experiment
-
okay so now let's go back to our
-
experiment tab and see what is happening
-
over there okay now if you see my
-
experiment tab it's not showing with
-
those big big blocks right and it is
-
showing with this kind of view where I
-
have a predict numeric fills a single
-
experiment I have done I have given the
-
experiment name like this one right it
-
the algorithm I have chosen linear
-
regression there are lot of actions you
-
can do on this particular model so
-
before publishing let us talked about
-
that one okay you can create an alert
-
from this model just to see so suppose
-
the model is predicting data right so
-
you can choose an alert create an alert
-
something like when my predicted chance
-
of administrator at the 90 percent that
-
means 0.9 okay fine 99 maybe that means
-
the model is really really working good
-
over there right so this kind of alert
-
you can do okay next you can edit the
-
title and description it's a simple
-
enough now you can see scheduler
-
training this is an interesting feature
-
where we whatever we have done till now
-
we have done a manual training over
-
there right now in the scheduled
-
training feature that you can create a
-
scheduler which will run it training
-
based on the data now if you see there
-
is a time range over there so you can
-
choose the time range of the data you
-
want to use for training purpose okay
-
let's say real interesting feature you
-
have so that means the more and more
-
data coming to your system you can use
-
those particular data right to training
-
purposes as well automatically using
-
this scheduled training okay
-
similarly for other scheduling stuff the
-
schedule priority and schedule window
-
you can set it up as well even you can
-
trigger an action as well when the
-
scheduling is happening you either you
-
can run a log you can send the log file
-
output to a look up everything this is
-
normal scheduling purposes okay that is
-
also you can do over here so this is a
-
very versatile feature as well with the
-
model you can do and now you can delete
-
it as well that's fine so now we will
-
publish this model okay
-
let's say chances of admit model okay
-
this is the model name and the
-
destination app you will be choosing
-
over here the model will be saved over
-
there okay I will be choosing my search
-
and reporting app I will click on submit
-
okay so the model is created now so how
-
the model is created in the background
-
it's basically a look of file so let us
-
see that okay so from the Splunk home
-
etc' apps search okay
-
lookups okay so currently if you see it
-
over here mmm
-
it's the by default the model is saved
-
as a user context so it is that's why it
-
is not coming up under search so further
-
what I need to do and to go to e.t.c
-
then I need to go to users currently I'm
-
the admin user you put you at mean and
-
I'll go to the Search app and here in
-
the look of folder this is how the model
-
is getting stored over there okay so I
-
think this lookup is in read-only format
-
so if I just open in notepad so this is
-
how it looks like so is this is
-
basically saving lot of the information
-
the metadata related information about
-
the model over here okay what are the
-
feature variables whatever the columns I
-
have in my data okay all of these things
-
ever from the rather there are others
-
features which we do not have any
-
control about it is saving over there
-
okay so now we created our own model
-
right nowI to apply this right how we
-
are going to apply this there is a
-
command called apply in Splunk ml TK
-
okay so by using that command you can
-
apply that particular model on any data
-
set okay on or specifically we'll be
-
doing in itself otherwise if you apply
-
that model on any evil Evan that I said
-
it will anyhow not not not gonna not
-
going to give you a proper results so
-
this is how you will be applying the
-
model so I'll have my this is my data
-
set based data set right I'll just
-
choose say lots last hundred records
-
okay let's last 200 records okay now I
-
will be using the apply command don't
-
worry about it I will be discussing this
-
Ron came LT k commands in detail in in
-
my next video so here we will just see
-
how we are just applying the model so
-
now I will see my apply command then my
-
model name right so we have given our
-
model name as chances of admit model
-
I'll just copy it and I will just run it
-
so what it should do basically it will
-
apply this particular model or that what
-
whatever okay so it is permission denied
-
it is saying up so for that what I need
-
to do is settings lookups okay lookup
-
table files I'll choose this one search
-
and reporting okay this is my chances of
-
admin model currently it is in private
-
mode that's why I am NOT able to apply
-
it on from the Search app so I choose
-
this app only readwrite currently I will
-
give I'll click on save
-
okay internal either detected node we
-
retain on to okay so let me see what's
-
going on over there okay so I think
-
there was some technical glitch so I
-
just did the permission again and I just
-
I my chosen all apps I think it it works
-
now so now let us see whether our search
-
is working or not
-
okay so I have taken the last 200
-
records and I'm just clicking on apply
-
the machine learning one machine
-
learning model so it is if you see that
-
it is applying that model on this
-
particular two hundred records two
-
hundred events over there and it has
-
created a new column called predicted
-
chances of advic okay so this is how we
-
are applying that model even you can
-
create your own alert using this
-
particular command as well so that
-
whenever you you want you want something
-
like tons of admit is more than 90
-
percent eighty percent or any other
-
everything you want you can use this
-
particular command to to achieve that
-
same thing over there okay so this is
-
how you can experiment with machine
-
learning specifically the linear
-
regression in Splunk ml TK and and we
-
saw of the lot of experiments we have
-
done it regarding this one right so this
-
is how you experiment with your data as
-
well and see how is how its best fit
-
your data and you can achieve a lot of
-
other stuff like automatically training
-
creating alerts from these things as
-
well okay in next video we will talk
-
more details we will basically deep dive
-
into what basically internally happening
-
over here we will talk about different
-
Splunk commands internally running the
-
custom commands internal running and
-
whatever we have done this experiment we
-
have done from the UI the same thing can
-
be achieved from the from the search
-
command
-
as well from Splunk SPL as well okay see
-
you in next video