-
Okay. In this video, we'll be discussing
-
about how we can implement linear
-
regression in Splunk MLTK, okay? So
-
in my previous video, we have seen how we
-
can install Splunk MLTK and it's
-
related packages, right? And also if you
-
remember when I was discussing about the
-
machine learning core algorithm, I was
-
also introduced the core data set we'll
-
be using for our linear regression
-
modeling, okay?
-
That's the graduate admission dataset
-
where we have for various students we
-
have their GRE score, TOEFL score,
-
university rating, statement of purpose
-
rating okay,
-
reference rating CGPA their whether
-
they have done research or not based on
-
all these fails we we will try to
-
predict the chances of admit okay so now
-
to implement linear regression so we
-
will be we will be implementing linear
-
regression for this one and see how best
-
the model is fitting the particular data
-
okay so so to implement linear
-
regression what you have to do you have
-
to go to a Splunk machine learning
-
toolkit okay as I stated before the
-
landing page of the machine learning
-
toolkit app is this showcased - boot
-
right where it has basically a lot of
-
examples based on whatever the different
-
different algorithm machine learning
-
algorithms plung supports okay now to
-
implement the machine learning on your
-
own data set what introduce you to come
-
to experiments tab ok so now if you do
-
not have any other models or if it is
-
the first time you are coming to this
-
particular dashboard this will be the
-
default view okay but if you have
-
already experimented on different models
-
the view will be slightly different
-
which we'll see later ok so now as in
-
linear regression we are trying to do a
-
prediction on the numeric fields right
-
so we will go over here okay
-
the predict numeric field
-
we'll creaking over here now it is
-
asking me for an experiment title and a
-
description so I will say graduate
-
admission prediction let's give the exam
-
experiment title like this one
-
prediction okay now you can give some
-
description as well meaningful
-
description so I'll click on create okay
-
so now this particular view comes up
-
over here now if you see here here we
-
have two tabs experiment settings and
-
experiment history initially the
-
experiment history will be blank
-
there is nothing over here okay now
-
based on the experiment settings
-
experiment history will be updated
-
accordingly which will see it later okay
-
now the first thing is it is asking me
-
for a search right so now let me let me
-
show you the data so this this
-
particular data I already indexed in my
-
main index okay so I'll just write the
-
query index equals to main and just
-
abling it all my different different
-
features and chances of admin ok so this
-
is my data set so this data set I will
-
be using it for my training purpose not
-
the full data set all not all the 500
-
records maybe some of the data I will be
-
using it for training purpose and rest
-
of the data I will be using it for the
-
prediction purpose just to see how my
-
model is working ok so I'll give this
-
query over here and then I'll click on
-
search ok so by default it is showing me
-
this is the my data initial data preview
-
right now let's go to the next one so
-
here if you see there are a lot of
-
pre-processing steps over here right so
-
now in machine learning when you train
-
particular model right you you there is
-
it there may be some meat to pre-process
-
there data so that you will reduce lot
-
of noise from the data now there are a
-
lot of people searching algorithm
-
present over there so when we will
-
discuss those algorithms we'll come back
-
to this page again and work on it okay
-
so for now I will not be doing any kind
-
of pre-processing because this data is
-
clean enough data okay so now the
-
algorithm I will be choosing linear
-
regression now there are a lot of
-
regulation algorithm so currently we
-
studied only about regression and we'll
-
be implementing linear regression only
-
in this video so I will be choosing the
-
linear regression over here okay so now
-
fields to predict that means which feel
-
you want to predict so as I will be
-
predicting my chances of at Mynt I will
-
be choosing that and then fill the use
-
for predicting that means here basically
-
you are choosing your features right so
-
I will be choosing all my columns so
-
here if you see the concept of simple
-
linear regression and multiple linear
-
regression comes up right if I choose a
-
single feature it will become a simple
-
linear regression if I choose multiple
-
feature it will become a multiple linear
-
regression so for now I will be choosing
-
for all okay now here if you see the
-
split for training right so here
-
basically it is what is happening is you
-
are splitting the whole dataset between
-
a training and test data set there here
-
currently it is 50 percent 50 percent
-
that means the first 50 percent data
-
will be used for training and the rest
-
50 percent data will be used for testing
-
purpose I'll slide this one it goes like
-
this one I'll keep 70 and 30 okay now
-
fit intercept okay that means if you
-
remember from my machine learning video
-
not only we have the slope value for
-
each and every feature we also have ay
-
intercept of y axis intercept value
-
right so by this option you are
-
basically choosing
-
your model should include an implicit
-
intercept thumbs or not okay now notes
-
you can you can give some meaningful
-
notes maybe the notes could be like what
-
are the fields you are using for
-
prediction purpose so and some
-
meaningful note which will be useful in
-
later when we'll see the history of the
-
model okay so I will say using all the
-
features using all the features okay now
-
after all is done you need to click on
-
freak model so it's basically behind the
-
scene what it do it runs say Splunk
-
custom command which basically
-
implemented I think is kick it learn so
-
using that particular command it is
-
trying to come up with the equation of
-
that line right which we discussed
-
before and and if you remember from my
-
multiple linear regression video we come
-
up with a linear algebra solution over
-
there right with math matrix inversion
-
and matrix transpose right so behind the
-
scene it is doing the same thing over
-
there okay
-
so now if you see the result came up
-
right after clicking on the fit model
-
now if you see apart from our own data
-
it's actually added two new columns over
-
here one is the predicted chances of
-
admin and the residual column right now
-
predicted chance of admin is actually
-
the actual prediction happen on the data
-
right so if you see for the first row
-
the actual chance of add bit is 0.7
-
three that means 73% now the predicted
-
was 0.7 zero that means 70 percent so
-
the residual column is the difference
-
between the actual chance of admin had
-
the predicted chance of an MIT okay so
-
so this is how after fitting the model
-
it it came up with this kind of
-
visualization
-
it also shows up there are other five to
-
six charts over here okay now let us
-
discuss one by one this one the first
-
chart show me the actual versus
-
predicted line chart that means
-
if you see the chance of admit the blue
-
color graph is the actual one and the
-
predicted chance of Edmund the yellow
-
color one is the prediction one right
-
and if you see by seeing this one we can
-
at least see this particular model is
-
okay fit to this particular data
-
somewhere it is lagging over here if you
-
see it right but somehow it's it's
-
actually fitting good over there now the
-
residual chart whatever you are seeing
-
it over here the line chart it is
-
showing up over here okay so now the
-
more this that particular chart is um
-
close to zero that means the model is
-
fitting really really good but over here
-
if you see the latter part of this one
-
the residuals are more right because it
-
is more sparse more distance from the
-
zeroth line and the same thing is
-
reflecting over here as well the model
-
has some kind of lagging over here right
-
so so this kind of analysis you can do
-
it from there how the model is fitting
-
your data and this particular graph is
-
showing me the scatter plot of the
-
actual and the predicted one and here
-
basically you can see the how the line
-
is fitting your data over here through
-
this chart okay now it also provides say
-
residual histogram where let us
-
understand this one as well so what we
-
have the zeroth line over here if you
-
see it's basically shows up for each and
-
every residual value how many counts are
-
there if you see so if you if you just
-
think about it if for all my data points
-
this residual is zero that that's the
-
ideal scenario right that means I am
-
predicting the LED level right
-
so from this histogram if you see that
-
means if you see the residual error
-
equals to zero the sample count is 24
-
over here right if the more and more
-
samples are very close to this zero that
-
means my model is doing good that's it's
-
actually good fit model and if it is
-
more sparse if
-
that means if we have more number of big
-
lines over here that means somehow the
-
model is not good not a good fit for
-
that particular data so this kind of
-
interpretation you can do it from this
-
particular diagram okay so now there are
-
another two things over here is called R
-
squared statistic and root mean square
-
okay so these two are actually a measure
-
about how accurate the model is okay so
-
I'll be discussing this measurement in
-
very detail in in in separate video
-
there will be discussing about R square
-
statistic root mean square and also some
-
other way to determine how the how the
-
accurate the model is just like bias
-
variance there are a lot of other
-
measurement as well
-
we'll discuss in detail over there okay
-
but for now just just try to remember
-
like this is the fit measurement of fit
-
like it may be R squares I just say we
-
can think of it is more close to 1 it's
-
a good fit something like this okay
-
mmm so we will see like how how to best
-
judge a model based on that okay but
-
still like even for our square statistic
-
it's all depend on the context the field
-
you are solving you're implementing
-
linear regression as well we'll discuss
-
those stuff as well in future ok and now
-
if you see the last graph it is showing
-
me the model parameters if you remember
-
the big equation we have written it over
-
there right so let me open the bamboo
-
paper here if you remember when we
-
talked about multiple linear regression
-
we defined we started our discussion
-
with a big equation right so let me go
-
back go back over there
-
yes so this one right so we're beta 1
-
beta 2 2 beta P is our our slow value
-
coefficient of each and every feature
-
and beta 0 is my intercept right and
-
what what we are doing basically at the
-
end of the day we came up with a big
-
equation to determine this whole beta
-
vector right so this is the same stuff
-
over here it is representing so it is
-
basically giving me like for each and
-
every feature what is the coefficient
-
value okay so and the intercept value as
-
well if you see this is my beta 0 and my
-
beta 1 to beta P is this these guys
-
other guys now if you see it closely
-
there are some of the coefficient which
-
have very greater value some of the
-
coefficient which are very less value
-
over here like the way to interpret the
-
coefficient is like how much it is
-
influencing influencing the end result
-
so to understand that let us see this
-
one let's say I have a variable called X
-
and I am writing something like 0.9 Y
-
now what do I mean by this particular
-
equation by 9 into Y right so that means
-
if I if I give y equals to 1 that means
-
my x will become 0.9 right so what do
-
you mean by that that means one unit
-
change in Y
-
it's basically 0.9 new it we are
-
changing in X right so this kind of
-
interpretation you can do it so that
-
means how Y is influencing X right so
-
this is how we are interpreting this
-
kind of coefficients as well in linear
-
regression so that means we will know
-
from the coefficient itself which
-
particular feature is mostly influencing
-
that one and now if you see it over here
-
I think
-
CGP is the most influencing factor to
-
determine whether my chances of admin
-
admit is higher
-
or not right considering we are
-
implementing a linear regression there
-
could be a better fit of this particular
-
data which we need to experiment and see
-
but the forint for the current linear
-
regression implementation we we can
-
conclude this kind of stuff over here
-
right okay so so this is how the model
-
parameters summary visualization table
-
visualization is telling me those
-
different those details right so now if
-
you see we actually fit our model right
-
so we still not clear that our model
-
until analyst we are saving it that's
-
why it is showing me as a drop status of
-
your model right and you can now go to
-
experiment history to see what you have
-
done till now so it will be maintaining
-
a history over there so now I can see
-
using using this all these features my R
-
square statistic is somewhere around 78%
-
and these are my coefficient and I am
-
coming up with a conclusion that maybe
-
CGPA is the most influential factor over
-
here okay so let us do another
-
experiment okay so in here I'll keep my
-
CGPA over here just to see whether it is
-
actually true or not okay so now what I
-
will do here is I will keep cgpa
-
I'll give the stat of 5:1 I will keep
-
the I will keep the yellower okay I'll
-
keep the research 1 and I will keep the
-
GRE score okay so I'll click over here
-
again I will keep the GRE score I will
-
remove the TOEFL score I will remove the
-
university rating I will remove the SOP
-
CGPA research and a lower I will keep so
-
now I am trying to do this experiment
-
with four features which I am thinking
-
maybe most influential one so maybe the
-
other feature may not have much impact
-
on on this particular prediction okay
-
so now using only I'll keep a note using
-
only four features so this is how this
-
particular note is coming into hand you
-
over here right so it is when I will see
-
the history I will come to know what I
-
have done over there okay so I will
-
click on fit model again let's see how
-
it's how it's working now so similar
-
stuff is happening over there it's
-
running the that custom comments in in
-
in later videos we will discuss in
-
detail of this this customs command
-
custom command as well okay so now if
-
you see it again predicted that one now
-
if you see from the actual versus line
-
chart it's more or less keeping same
-
even though I removed three features
-
right even this one has well more or
-
less
-
okay now if you see my R square
-
statistics has improved a lot with 82%
-
right so by this one at least I am
-
confident that really those three
-
features are not not impacting much of
-
it and if you see from this one residual
-
histogram residuals histogram the more
-
and more features are very close to zero
-
right with residual or residual either
-
more or more receivers are very very
-
close to zero right
-
so by this kind of analysis we can say
-
this particular model is better than
-
compared to my previous model right so
-
now what I will do is I will save this
-
particular model okay so I will say I
-
will give the experiment title as
-
graduate date predictor okay I will
-
click on save so now a data a model will
-
be created okay so now if I just we have
-
two options over here after you save the
-
model whether you have two you can go to
-
the listing page
-
or you continue editing okay let us
-
continue editing to see how experiment
-
history is looking now now experiment
-
history has two rows over there okay
-
the first row is my the current
-
experiment with my four features right
-
with R square value of 82% the second
-
row is telling with my older one right
-
so at any point of time you can load
-
this corresponding settings and
-
experiment with it okay
-
it will also show you the data
-
corresponding to H and XP experiment
-
okay so now let's go back to our
-
experiment tab and see what is happening
-
over there okay now if you see my
-
experiment tab it's not showing with
-
those big big blocks right and it is
-
showing with this kind of view where I
-
have a predict numeric fills a single
-
experiment I have done I have given the
-
experiment name like this one right it
-
the algorithm I have chosen linear
-
regression there are lot of actions you
-
can do on this particular model so
-
before publishing let us talked about
-
that one okay you can create an alert
-
from this model just to see so suppose
-
the model is predicting data right so
-
you can choose an alert create an alert
-
something like when my predicted chance
-
of administrator at the 90 percent that
-
means 0.9 okay fine 99 maybe that means
-
the model is really really working good
-
over there right so this kind of alert
-
you can do okay next you can edit the
-
title and description it's a simple
-
enough now you can see scheduler
-
training this is an interesting feature
-
where we whatever we have done till now
-
we have done a manual training over
-
there right now in the scheduled
-
training feature that you can create a
-
scheduler which will run it training
-
based on the data now if you see there
-
is a time range over there so you can
-
choose the time range of the data you
-
want to use for training purpose okay
-
let's say real interesting feature you
-
have so that means the more and more
-
data coming to your system you can use
-
those particular data right to training
-
purposes as well automatically using
-
this scheduled training okay
-
similarly for other scheduling stuff the
-
schedule priority and schedule window
-
you can set it up as well even you can
-
trigger an action as well when the
-
scheduling is happening you either you
-
can run a log you can send the log file
-
output to a look up everything this is
-
normal scheduling purposes okay that is
-
also you can do over here so this is a
-
very versatile feature as well with the
-
model you can do and now you can delete
-
it as well that's fine so now we will
-
publish this model okay
-
let's say chances of admit model okay
-
this is the model name and the
-
destination app you will be choosing
-
over here the model will be saved over
-
there okay I will be choosing my search
-
and reporting app I will click on submit
-
okay so the model is created now so how
-
the model is created in the background
-
it's basically a look of file so let us
-
see that okay so from the Splunk home
-
etc' apps search okay
-
lookups okay so currently if you see it
-
over here mmm
-
it's the by default the model is saved
-
as a user context so it is that's why it
-
is not coming up under search so further
-
what I need to do and to go to e.t.c
-
then I need to go to users currently I'm
-
the admin user you put you at mean and
-
I'll go to the Search app and here in
-
the look of folder this is how the model
-
is getting stored over there okay so I
-
think this lookup is in read-only format
-
so if I just open in notepad so this is
-
how it looks like so is this is
-
basically saving lot of the information
-
the metadata related information about
-
the model over here okay what are the
-
feature variables whatever the columns I
-
have in my data okay all of these things
-
ever from the rather there are others
-
features which we do not have any
-
control about it is saving over there
-
okay so now we created our own model
-
right nowI to apply this right how we
-
are going to apply this there is a
-
command called apply in Splunk ml TK
-
okay so by using that command you can
-
apply that particular model on any data
-
set okay on or specifically we'll be
-
doing in itself otherwise if you apply
-
that model on any evil Evan that I said
-
it will anyhow not not not gonna not
-
going to give you a proper results so
-
this is how you will be applying the
-
model so I'll have my this is my data
-
set based data set right I'll just
-
choose say lots last hundred records
-
okay let's last 200 records okay now I
-
will be using the apply command don't
-
worry about it I will be discussing this
-
Ron came LT k commands in detail in in
-
my next video so here we will just see
-
how we are just applying the model so
-
now I will see my apply command then my
-
model name right so we have given our
-
model name as chances of admit model
-
I'll just copy it and I will just run it
-
so what it should do basically it will
-
apply this particular model or that what
-
whatever okay so it is permission denied
-
it is saying up so for that what I need
-
to do is settings lookups okay lookup
-
table files I'll choose this one search
-
and reporting okay this is my chances of
-
admin model currently it is in private
-
mode that's why I am NOT able to apply
-
it on from the Search app so I choose
-
this app only readwrite currently I will
-
give I'll click on save
-
okay internal either detected node we
-
retain on to okay so let me see what's
-
going on over there okay so I think
-
there was some technical glitch so I
-
just did the permission again and I just
-
I my chosen all apps I think it it works
-
now so now let us see whether our search
-
is working or not
-
okay so I have taken the last 200
-
records and I'm just clicking on apply
-
the machine learning one machine
-
learning model so it is if you see that
-
it is applying that model on this
-
particular two hundred records two
-
hundred events over there and it has
-
created a new column called predicted
-
chances of advic okay so this is how we
-
are applying that model even you can
-
create your own alert using this
-
particular command as well so that
-
whenever you you want you want something
-
like tons of admit is more than 90
-
percent eighty percent or any other
-
everything you want you can use this
-
particular command to to achieve that
-
same thing over there okay so this is
-
how you can experiment with machine
-
learning specifically the linear
-
regression in Splunk ml TK and and we
-
saw of the lot of experiments we have
-
done it regarding this one right so this
-
is how you experiment with your data as
-
well and see how is how its best fit
-
your data and you can achieve a lot of
-
other stuff like automatically training
-
creating alerts from these things as
-
well okay in next video we will talk
-
more details we will basically deep dive
-
into what basically internally happening
-
over here we will talk about different
-
Splunk commands internally running the
-
custom commands internal running and
-
whatever we have done this experiment we
-
have done from the UI the same thing can
-
be achieved from the from the search
-
command
-
as well from Splunk SPL as well okay see
-
you in next video