< Return to Video

Splunk MLTK : Implementation Of Linear Regression In Splunk MLTK

  • 0:00 - 0:02
    Okay. In this video, we'll be discussing
  • 0:02 - 0:05
    about how we can implement linear
  • 0:05 - 0:08
    regression in Splunk MLTK, okay? So
  • 0:08 - 0:10
    in my previous video, we have seen how we
  • 0:10 - 0:13
    can install Splunk MLTK and it's
  • 0:13 - 0:15
    related packages, right? And also if you
  • 0:15 - 0:18
    remember when I was discussing about the
  • 0:18 - 0:21
    machine learning core algorithm, I was
  • 0:21 - 0:26
    also introduced the core data set we'll
  • 0:26 - 0:27
    be using for our linear regression
  • 0:27 - 0:29
    modeling, okay?
  • 0:29 - 0:31
    That's the graduate admission dataset
  • 0:31 - 0:35
    where we have for various students we
  • 0:35 - 0:37
    have their GRE score, TOEFL score,
  • 0:37 - 0:40
    university rating, statement of purpose
  • 0:40 - 0:41
    rating okay,
  • 0:41 - 0:45
    reference rating CGPA their whether
  • 0:45 - 0:47
    they have done research or not based on
  • 0:47 - 0:51
    all these fails we we will try to
  • 0:51 - 0:54
    predict the chances of admit okay so now
  • 0:54 - 0:59
    to implement linear regression so we
  • 0:59 - 1:02
    will be we will be implementing linear
  • 1:02 - 1:05
    regression for this one and see how best
  • 1:05 - 1:07
    the model is fitting the particular data
  • 1:07 - 1:11
    okay so so to implement linear
  • 1:11 - 1:13
    regression what you have to do you have
  • 1:13 - 1:15
    to go to a Splunk machine learning
  • 1:15 - 1:20
    toolkit okay as I stated before the
  • 1:20 - 1:21
    landing page of the machine learning
  • 1:21 - 1:24
    toolkit app is this showcased - boot
  • 1:24 - 1:27
    right where it has basically a lot of
  • 1:27 - 1:30
    examples based on whatever the different
  • 1:30 - 1:32
    different algorithm machine learning
  • 1:32 - 1:35
    algorithms plung supports okay now to
  • 1:35 - 1:39
    implement the machine learning on your
  • 1:39 - 1:42
    own data set what introduce you to come
  • 1:42 - 1:47
    to experiments tab ok so now if you do
  • 1:47 - 1:50
    not have any other models or if it is
  • 1:50 - 1:52
    the first time you are coming to this
  • 1:52 - 1:54
    particular dashboard this will be the
  • 1:54 - 1:56
    default view okay but if you have
  • 1:56 - 1:58
    already experimented on different models
  • 1:58 - 2:00
    the view will be slightly different
  • 2:00 - 2:03
    which we'll see later ok so now as in
  • 2:03 - 2:06
    linear regression we are trying to do a
  • 2:06 - 2:08
    prediction on the numeric fields right
  • 2:08 - 2:11
    so we will go over here okay
  • 2:11 - 2:13
    the predict numeric field
  • 2:13 - 2:15
    we'll creaking over here now it is
  • 2:15 - 2:18
    asking me for an experiment title and a
  • 2:18 - 2:22
    description so I will say graduate
  • 2:23 - 2:30
    admission prediction let's give the exam
  • 2:30 - 2:33
    experiment title like this one
  • 2:33 - 2:37
    prediction okay now you can give some
  • 2:37 - 2:38
    description as well meaningful
  • 2:38 - 2:44
    description so I'll click on create okay
  • 2:44 - 2:47
    so now this particular view comes up
  • 2:47 - 2:50
    over here now if you see here here we
  • 2:50 - 2:52
    have two tabs experiment settings and
  • 2:52 - 2:55
    experiment history initially the
  • 2:55 - 2:56
    experiment history will be blank
  • 2:56 - 2:59
    there is nothing over here okay now
  • 2:59 - 3:01
    based on the experiment settings
  • 3:01 - 3:03
    experiment history will be updated
  • 3:03 - 3:06
    accordingly which will see it later okay
  • 3:06 - 3:08
    now the first thing is it is asking me
  • 3:08 - 3:13
    for a search right so now let me let me
  • 3:13 - 3:16
    show you the data so this this
  • 3:16 - 3:20
    particular data I already indexed in my
  • 3:20 - 3:24
    main index okay so I'll just write the
  • 3:24 - 3:30
    query index equals to main and just
  • 3:30 - 3:33
    abling it all my different different
  • 3:33 - 3:39
    features and chances of admin ok so this
  • 3:39 - 3:42
    is my data set so this data set I will
  • 3:42 - 3:45
    be using it for my training purpose not
  • 3:45 - 3:47
    the full data set all not all the 500
  • 3:47 - 3:50
    records maybe some of the data I will be
  • 3:50 - 3:53
    using it for training purpose and rest
  • 3:53 - 3:55
    of the data I will be using it for the
  • 3:55 - 3:57
    prediction purpose just to see how my
  • 3:57 - 3:59
    model is working ok so I'll give this
  • 3:59 - 4:06
    query over here and then I'll click on
  • 4:06 - 4:11
    search ok so by default it is showing me
  • 4:11 - 4:14
    this is the my data initial data preview
  • 4:14 - 4:18
    right now let's go to the next one so
  • 4:18 - 4:20
    here if you see there are a lot of
  • 4:20 - 4:23
    pre-processing steps over here right so
  • 4:23 - 4:27
    now in machine learning when you train
  • 4:27 - 4:30
    particular model right you you there is
  • 4:30 - 4:32
    it there may be some meat to pre-process
  • 4:32 - 4:35
    there data so that you will reduce lot
  • 4:35 - 4:37
    of noise from the data now there are a
  • 4:37 - 4:38
    lot of people searching algorithm
  • 4:38 - 4:41
    present over there so when we will
  • 4:41 - 4:44
    discuss those algorithms we'll come back
  • 4:44 - 4:47
    to this page again and work on it okay
  • 4:47 - 4:50
    so for now I will not be doing any kind
  • 4:50 - 4:52
    of pre-processing because this data is
  • 4:52 - 4:56
    clean enough data okay so now the
  • 4:56 - 4:59
    algorithm I will be choosing linear
  • 4:59 - 5:00
    regression now there are a lot of
  • 5:00 - 5:02
    regulation algorithm so currently we
  • 5:02 - 5:05
    studied only about regression and we'll
  • 5:05 - 5:07
    be implementing linear regression only
  • 5:07 - 5:09
    in this video so I will be choosing the
  • 5:09 - 5:11
    linear regression over here okay so now
  • 5:11 - 5:14
    fields to predict that means which feel
  • 5:14 - 5:16
    you want to predict so as I will be
  • 5:16 - 5:18
    predicting my chances of at Mynt I will
  • 5:18 - 5:22
    be choosing that and then fill the use
  • 5:22 - 5:24
    for predicting that means here basically
  • 5:24 - 5:27
    you are choosing your features right so
  • 5:27 - 5:30
    I will be choosing all my columns so
  • 5:30 - 5:34
    here if you see the concept of simple
  • 5:34 - 5:36
    linear regression and multiple linear
  • 5:36 - 5:38
    regression comes up right if I choose a
  • 5:38 - 5:41
    single feature it will become a simple
  • 5:41 - 5:43
    linear regression if I choose multiple
  • 5:43 - 5:45
    feature it will become a multiple linear
  • 5:45 - 5:48
    regression so for now I will be choosing
  • 5:48 - 5:52
    for all okay now here if you see the
  • 5:52 - 5:55
    split for training right so here
  • 5:55 - 5:57
    basically it is what is happening is you
  • 5:57 - 6:00
    are splitting the whole dataset between
  • 6:00 - 6:03
    a training and test data set there here
  • 6:03 - 6:04
    currently it is 50 percent 50 percent
  • 6:04 - 6:07
    that means the first 50 percent data
  • 6:07 - 6:09
    will be used for training and the rest
  • 6:09 - 6:11
    50 percent data will be used for testing
  • 6:11 - 6:15
    purpose I'll slide this one it goes like
  • 6:15 - 6:19
    this one I'll keep 70 and 30 okay now
  • 6:19 - 6:23
    fit intercept okay that means if you
  • 6:23 - 6:25
    remember from my machine learning video
  • 6:25 - 6:30
    not only we have the slope value for
  • 6:30 - 6:33
    each and every feature we also have ay
  • 6:33 - 6:36
    intercept of y axis intercept value
  • 6:36 - 6:39
    right so by this option you are
  • 6:39 - 6:41
    basically choosing
  • 6:41 - 6:44
    your model should include an implicit
  • 6:44 - 6:47
    intercept thumbs or not okay now notes
  • 6:47 - 6:49
    you can you can give some meaningful
  • 6:49 - 6:52
    notes maybe the notes could be like what
  • 6:52 - 6:53
    are the fields you are using for
  • 6:53 - 6:56
    prediction purpose so and some
  • 6:56 - 6:58
    meaningful note which will be useful in
  • 6:58 - 7:00
    later when we'll see the history of the
  • 7:00 - 7:05
    model okay so I will say using all the
  • 7:05 - 7:11
    features using all the features okay now
  • 7:11 - 7:13
    after all is done you need to click on
  • 7:13 - 7:17
    freak model so it's basically behind the
  • 7:17 - 7:20
    scene what it do it runs say Splunk
  • 7:20 - 7:22
    custom command which basically
  • 7:22 - 7:25
    implemented I think is kick it learn so
  • 7:25 - 7:28
    using that particular command it is
  • 7:28 - 7:30
    trying to come up with the equation of
  • 7:30 - 7:32
    that line right which we discussed
  • 7:32 - 7:37
    before and and if you remember from my
  • 7:37 - 7:39
    multiple linear regression video we come
  • 7:39 - 7:42
    up with a linear algebra solution over
  • 7:42 - 7:46
    there right with math matrix inversion
  • 7:46 - 7:49
    and matrix transpose right so behind the
  • 7:49 - 7:50
    scene it is doing the same thing over
  • 7:50 - 7:51
    there okay
  • 7:51 - 7:54
    so now if you see the result came up
  • 7:54 - 7:56
    right after clicking on the fit model
  • 7:56 - 8:00
    now if you see apart from our own data
  • 8:00 - 8:04
    it's actually added two new columns over
  • 8:04 - 8:06
    here one is the predicted chances of
  • 8:06 - 8:09
    admin and the residual column right now
  • 8:09 - 8:11
    predicted chance of admin is actually
  • 8:11 - 8:13
    the actual prediction happen on the data
  • 8:13 - 8:17
    right so if you see for the first row
  • 8:17 - 8:20
    the actual chance of add bit is 0.7
  • 8:20 - 8:23
    three that means 73% now the predicted
  • 8:23 - 8:26
    was 0.7 zero that means 70 percent so
  • 8:26 - 8:28
    the residual column is the difference
  • 8:28 - 8:30
    between the actual chance of admin had
  • 8:30 - 8:34
    the predicted chance of an MIT okay so
  • 8:34 - 8:37
    so this is how after fitting the model
  • 8:37 - 8:40
    it it came up with this kind of
  • 8:40 - 8:40
    visualization
  • 8:40 - 8:44
    it also shows up there are other five to
  • 8:44 - 8:46
    six charts over here okay now let us
  • 8:46 - 8:49
    discuss one by one this one the first
  • 8:49 - 8:52
    chart show me the actual versus
  • 8:52 - 8:54
    predicted line chart that means
  • 8:54 - 8:57
    if you see the chance of admit the blue
  • 8:57 - 9:00
    color graph is the actual one and the
  • 9:00 - 9:02
    predicted chance of Edmund the yellow
  • 9:02 - 9:04
    color one is the prediction one right
  • 9:04 - 9:07
    and if you see by seeing this one we can
  • 9:07 - 9:10
    at least see this particular model is
  • 9:10 - 9:13
    okay fit to this particular data
  • 9:13 - 9:15
    somewhere it is lagging over here if you
  • 9:15 - 9:18
    see it right but somehow it's it's
  • 9:18 - 9:22
    actually fitting good over there now the
  • 9:22 - 9:24
    residual chart whatever you are seeing
  • 9:24 - 9:26
    it over here the line chart it is
  • 9:26 - 9:29
    showing up over here okay so now the
  • 9:29 - 9:32
    more this that particular chart is um
  • 9:32 - 9:35
    close to zero that means the model is
  • 9:35 - 9:37
    fitting really really good but over here
  • 9:37 - 9:40
    if you see the latter part of this one
  • 9:40 - 9:43
    the residuals are more right because it
  • 9:43 - 9:46
    is more sparse more distance from the
  • 9:46 - 9:48
    zeroth line and the same thing is
  • 9:48 - 9:50
    reflecting over here as well the model
  • 9:50 - 9:53
    has some kind of lagging over here right
  • 9:53 - 9:56
    so so this kind of analysis you can do
  • 9:56 - 9:59
    it from there how the model is fitting
  • 9:59 - 10:02
    your data and this particular graph is
  • 10:02 - 10:04
    showing me the scatter plot of the
  • 10:04 - 10:06
    actual and the predicted one and here
  • 10:06 - 10:09
    basically you can see the how the line
  • 10:09 - 10:12
    is fitting your data over here through
  • 10:12 - 10:16
    this chart okay now it also provides say
  • 10:16 - 10:20
    residual histogram where let us
  • 10:20 - 10:22
    understand this one as well so what we
  • 10:22 - 10:24
    have the zeroth line over here if you
  • 10:24 - 10:28
    see it's basically shows up for each and
  • 10:28 - 10:30
    every residual value how many counts are
  • 10:30 - 10:33
    there if you see so if you if you just
  • 10:33 - 10:37
    think about it if for all my data points
  • 10:37 - 10:41
    this residual is zero that that's the
  • 10:41 - 10:43
    ideal scenario right that means I am
  • 10:43 - 10:45
    predicting the LED level right
  • 10:45 - 10:49
    so from this histogram if you see that
  • 10:49 - 10:52
    means if you see the residual error
  • 10:52 - 10:54
    equals to zero the sample count is 24
  • 10:54 - 10:57
    over here right if the more and more
  • 10:57 - 11:00
    samples are very close to this zero that
  • 11:00 - 11:04
    means my model is doing good that's it's
  • 11:04 - 11:06
    actually good fit model and if it is
  • 11:06 - 11:08
    more sparse if
  • 11:08 - 11:12
    that means if we have more number of big
  • 11:12 - 11:14
    lines over here that means somehow the
  • 11:14 - 11:16
    model is not good not a good fit for
  • 11:16 - 11:18
    that particular data so this kind of
  • 11:18 - 11:20
    interpretation you can do it from this
  • 11:20 - 11:24
    particular diagram okay so now there are
  • 11:24 - 11:27
    another two things over here is called R
  • 11:27 - 11:30
    squared statistic and root mean square
  • 11:30 - 11:33
    okay so these two are actually a measure
  • 11:33 - 11:37
    about how accurate the model is okay so
  • 11:37 - 11:41
    I'll be discussing this measurement in
  • 11:41 - 11:43
    very detail in in in separate video
  • 11:43 - 11:45
    there will be discussing about R square
  • 11:45 - 11:47
    statistic root mean square and also some
  • 11:47 - 11:51
    other way to determine how the how the
  • 11:51 - 11:53
    accurate the model is just like bias
  • 11:53 - 11:55
    variance there are a lot of other
  • 11:55 - 11:56
    measurement as well
  • 11:56 - 11:59
    we'll discuss in detail over there okay
  • 11:59 - 12:00
    but for now just just try to remember
  • 12:00 - 12:03
    like this is the fit measurement of fit
  • 12:03 - 12:05
    like it may be R squares I just say we
  • 12:05 - 12:09
    can think of it is more close to 1 it's
  • 12:09 - 12:11
    a good fit something like this okay
  • 12:11 - 12:15
    mmm so we will see like how how to best
  • 12:15 - 12:18
    judge a model based on that okay but
  • 12:18 - 12:20
    still like even for our square statistic
  • 12:20 - 12:25
    it's all depend on the context the field
  • 12:25 - 12:28
    you are solving you're implementing
  • 12:28 - 12:30
    linear regression as well we'll discuss
  • 12:30 - 12:32
    those stuff as well in future ok and now
  • 12:32 - 12:35
    if you see the last graph it is showing
  • 12:35 - 12:38
    me the model parameters if you remember
  • 12:38 - 12:40
    the big equation we have written it over
  • 12:40 - 12:43
    there right so let me open the bamboo
  • 12:43 - 12:56
    paper here if you remember when we
  • 12:56 - 13:00
    talked about multiple linear regression
  • 13:02 - 13:05
    we defined we started our discussion
  • 13:05 - 13:07
    with a big equation right so let me go
  • 13:07 - 13:11
    back go back over there
  • 13:21 - 13:26
    yes so this one right so we're beta 1
  • 13:26 - 13:30
    beta 2 2 beta P is our our slow value
  • 13:30 - 13:31
    coefficient of each and every feature
  • 13:31 - 13:35
    and beta 0 is my intercept right and
  • 13:35 - 13:37
    what what we are doing basically at the
  • 13:37 - 13:39
    end of the day we came up with a big
  • 13:39 - 13:42
    equation to determine this whole beta
  • 13:42 - 13:46
    vector right so this is the same stuff
  • 13:46 - 13:49
    over here it is representing so it is
  • 13:49 - 13:51
    basically giving me like for each and
  • 13:51 - 13:53
    every feature what is the coefficient
  • 13:53 - 13:57
    value okay so and the intercept value as
  • 13:57 - 14:00
    well if you see this is my beta 0 and my
  • 14:00 - 14:02
    beta 1 to beta P is this these guys
  • 14:02 - 14:04
    other guys now if you see it closely
  • 14:04 - 14:07
    there are some of the coefficient which
  • 14:07 - 14:10
    have very greater value some of the
  • 14:10 - 14:11
    coefficient which are very less value
  • 14:11 - 14:14
    over here like the way to interpret the
  • 14:14 - 14:19
    coefficient is like how much it is
  • 14:19 - 14:22
    influencing influencing the end result
  • 14:22 - 14:25
    so to understand that let us see this
  • 14:25 - 14:29
    one let's say I have a variable called X
  • 14:29 - 14:34
    and I am writing something like 0.9 Y
  • 14:34 - 14:36
    now what do I mean by this particular
  • 14:36 - 14:41
    equation by 9 into Y right so that means
  • 14:41 - 14:45
    if I if I give y equals to 1 that means
  • 14:45 - 14:50
    my x will become 0.9 right so what do
  • 14:50 - 14:52
    you mean by that that means one unit
  • 14:52 - 14:53
    change in Y
  • 14:53 - 14:56
    it's basically 0.9 new it we are
  • 14:56 - 15:01
    changing in X right so this kind of
  • 15:01 - 15:03
    interpretation you can do it so that
  • 15:03 - 15:09
    means how Y is influencing X right so
  • 15:09 - 15:11
    this is how we are interpreting this
  • 15:11 - 15:14
    kind of coefficients as well in linear
  • 15:14 - 15:17
    regression so that means we will know
  • 15:17 - 15:19
    from the coefficient itself which
  • 15:19 - 15:22
    particular feature is mostly influencing
  • 15:22 - 15:24
    that one and now if you see it over here
  • 15:24 - 15:24
    I think
  • 15:24 - 15:27
    CGP is the most influencing factor to
  • 15:27 - 15:31
    determine whether my chances of admin
  • 15:31 - 15:32
    admit is higher
  • 15:32 - 15:36
    or not right considering we are
  • 15:36 - 15:38
    implementing a linear regression there
  • 15:38 - 15:40
    could be a better fit of this particular
  • 15:40 - 15:43
    data which we need to experiment and see
  • 15:43 - 15:45
    but the forint for the current linear
  • 15:45 - 15:47
    regression implementation we we can
  • 15:47 - 15:50
    conclude this kind of stuff over here
  • 15:50 - 15:58
    right okay so so this is how the model
  • 15:58 - 16:02
    parameters summary visualization table
  • 16:02 - 16:03
    visualization is telling me those
  • 16:03 - 16:06
    different those details right so now if
  • 16:06 - 16:09
    you see we actually fit our model right
  • 16:09 - 16:12
    so we still not clear that our model
  • 16:12 - 16:14
    until analyst we are saving it that's
  • 16:14 - 16:17
    why it is showing me as a drop status of
  • 16:17 - 16:22
    your model right and you can now go to
  • 16:22 - 16:27
    experiment history to see what you have
  • 16:27 - 16:29
    done till now so it will be maintaining
  • 16:29 - 16:32
    a history over there so now I can see
  • 16:32 - 16:36
    using using this all these features my R
  • 16:36 - 16:39
    square statistic is somewhere around 78%
  • 16:39 - 16:41
    and these are my coefficient and I am
  • 16:41 - 16:43
    coming up with a conclusion that maybe
  • 16:43 - 16:46
    CGPA is the most influential factor over
  • 16:46 - 16:49
    here okay so let us do another
  • 16:49 - 16:53
    experiment okay so in here I'll keep my
  • 16:53 - 16:57
    CGPA over here just to see whether it is
  • 16:57 - 17:00
    actually true or not okay so now what I
  • 17:00 - 17:03
    will do here is I will keep cgpa
  • 17:03 - 17:06
    I'll give the stat of 5:1 I will keep
  • 17:06 - 17:09
    the I will keep the yellower okay I'll
  • 17:09 - 17:14
    keep the research 1 and I will keep the
  • 17:14 - 17:19
    GRE score okay so I'll click over here
  • 17:19 - 17:22
    again I will keep the GRE score I will
  • 17:22 - 17:24
    remove the TOEFL score I will remove the
  • 17:24 - 17:26
    university rating I will remove the SOP
  • 17:26 - 17:29
    CGPA research and a lower I will keep so
  • 17:29 - 17:32
    now I am trying to do this experiment
  • 17:32 - 17:35
    with four features which I am thinking
  • 17:35 - 17:40
    maybe most influential one so maybe the
  • 17:40 - 17:43
    other feature may not have much impact
  • 17:43 - 17:46
    on on this particular prediction okay
  • 17:46 - 17:50
    so now using only I'll keep a note using
  • 17:50 - 17:55
    only four features so this is how this
  • 17:55 - 17:58
    particular note is coming into hand you
  • 17:58 - 18:01
    over here right so it is when I will see
  • 18:01 - 18:03
    the history I will come to know what I
  • 18:03 - 18:05
    have done over there okay so I will
  • 18:05 - 18:09
    click on fit model again let's see how
  • 18:09 - 18:14
    it's how it's working now so similar
  • 18:14 - 18:15
    stuff is happening over there it's
  • 18:15 - 18:19
    running the that custom comments in in
  • 18:19 - 18:22
    in later videos we will discuss in
  • 18:22 - 18:24
    detail of this this customs command
  • 18:24 - 18:34
    custom command as well okay so now if
  • 18:34 - 18:38
    you see it again predicted that one now
  • 18:38 - 18:41
    if you see from the actual versus line
  • 18:41 - 18:43
    chart it's more or less keeping same
  • 18:43 - 18:45
    even though I removed three features
  • 18:45 - 18:48
    right even this one has well more or
  • 18:48 - 18:48
    less
  • 18:48 - 18:52
    okay now if you see my R square
  • 18:52 - 18:54
    statistics has improved a lot with 82%
  • 18:54 - 18:58
    right so by this one at least I am
  • 18:58 - 19:01
    confident that really those three
  • 19:01 - 19:04
    features are not not impacting much of
  • 19:04 - 19:08
    it and if you see from this one residual
  • 19:08 - 19:11
    histogram residuals histogram the more
  • 19:11 - 19:13
    and more features are very close to zero
  • 19:13 - 19:16
    right with residual or residual either
  • 19:16 - 19:18
    more or more receivers are very very
  • 19:18 - 19:20
    close to zero right
  • 19:20 - 19:23
    so by this kind of analysis we can say
  • 19:23 - 19:25
    this particular model is better than
  • 19:25 - 19:28
    compared to my previous model right so
  • 19:28 - 19:31
    now what I will do is I will save this
  • 19:31 - 19:33
    particular model okay so I will say I
  • 19:33 - 19:38
    will give the experiment title as
  • 19:39 - 19:47
    graduate date predictor okay I will
  • 19:47 - 19:51
    click on save so now a data a model will
  • 19:51 - 19:55
    be created okay so now if I just we have
  • 19:55 - 19:56
    two options over here after you save the
  • 19:56 - 19:58
    model whether you have two you can go to
  • 19:58 - 20:00
    the listing page
  • 20:00 - 20:02
    or you continue editing okay let us
  • 20:02 - 20:04
    continue editing to see how experiment
  • 20:04 - 20:06
    history is looking now now experiment
  • 20:06 - 20:09
    history has two rows over there okay
  • 20:09 - 20:12
    the first row is my the current
  • 20:12 - 20:14
    experiment with my four features right
  • 20:14 - 20:18
    with R square value of 82% the second
  • 20:18 - 20:20
    row is telling with my older one right
  • 20:20 - 20:22
    so at any point of time you can load
  • 20:22 - 20:24
    this corresponding settings and
  • 20:24 - 20:26
    experiment with it okay
  • 20:26 - 20:27
    it will also show you the data
  • 20:27 - 20:30
    corresponding to H and XP experiment
  • 20:30 - 20:32
    okay so now let's go back to our
  • 20:32 - 20:35
    experiment tab and see what is happening
  • 20:35 - 20:37
    over there okay now if you see my
  • 20:37 - 20:40
    experiment tab it's not showing with
  • 20:40 - 20:43
    those big big blocks right and it is
  • 20:43 - 20:44
    showing with this kind of view where I
  • 20:44 - 20:47
    have a predict numeric fills a single
  • 20:47 - 20:50
    experiment I have done I have given the
  • 20:50 - 20:53
    experiment name like this one right it
  • 20:53 - 20:54
    the algorithm I have chosen linear
  • 20:54 - 20:56
    regression there are lot of actions you
  • 20:56 - 20:59
    can do on this particular model so
  • 20:59 - 21:01
    before publishing let us talked about
  • 21:01 - 21:03
    that one okay you can create an alert
  • 21:03 - 21:07
    from this model just to see so suppose
  • 21:07 - 21:10
    the model is predicting data right so
  • 21:10 - 21:12
    you can choose an alert create an alert
  • 21:12 - 21:15
    something like when my predicted chance
  • 21:15 - 21:16
    of administrator at the 90 percent that
  • 21:16 - 21:20
    means 0.9 okay fine 99 maybe that means
  • 21:20 - 21:23
    the model is really really working good
  • 21:23 - 21:25
    over there right so this kind of alert
  • 21:25 - 21:30
    you can do okay next you can edit the
  • 21:30 - 21:32
    title and description it's a simple
  • 21:32 - 21:36
    enough now you can see scheduler
  • 21:36 - 21:37
    training this is an interesting feature
  • 21:37 - 21:41
    where we whatever we have done till now
  • 21:41 - 21:43
    we have done a manual training over
  • 21:43 - 21:45
    there right now in the scheduled
  • 21:45 - 21:47
    training feature that you can create a
  • 21:47 - 21:49
    scheduler which will run it training
  • 21:49 - 21:52
    based on the data now if you see there
  • 21:52 - 21:54
    is a time range over there so you can
  • 21:54 - 21:56
    choose the time range of the data you
  • 21:56 - 21:59
    want to use for training purpose okay
  • 21:59 - 22:01
    let's say real interesting feature you
  • 22:01 - 22:03
    have so that means the more and more
  • 22:03 - 22:06
    data coming to your system you can use
  • 22:06 - 22:09
    those particular data right to training
  • 22:09 - 22:11
    purposes as well automatically using
  • 22:11 - 22:13
    this scheduled training okay
  • 22:13 - 22:17
    similarly for other scheduling stuff the
  • 22:17 - 22:18
    schedule priority and schedule window
  • 22:18 - 22:21
    you can set it up as well even you can
  • 22:21 - 22:23
    trigger an action as well when the
  • 22:23 - 22:24
    scheduling is happening you either you
  • 22:24 - 22:27
    can run a log you can send the log file
  • 22:27 - 22:30
    output to a look up everything this is
  • 22:30 - 22:32
    normal scheduling purposes okay that is
  • 22:32 - 22:34
    also you can do over here so this is a
  • 22:34 - 22:36
    very versatile feature as well with the
  • 22:36 - 22:39
    model you can do and now you can delete
  • 22:39 - 22:43
    it as well that's fine so now we will
  • 22:43 - 22:45
    publish this model okay
  • 22:45 - 22:56
    let's say chances of admit model okay
  • 22:56 - 22:58
    this is the model name and the
  • 22:58 - 23:00
    destination app you will be choosing
  • 23:00 - 23:02
    over here the model will be saved over
  • 23:02 - 23:04
    there okay I will be choosing my search
  • 23:04 - 23:08
    and reporting app I will click on submit
  • 23:08 - 23:13
    okay so the model is created now so how
  • 23:13 - 23:15
    the model is created in the background
  • 23:15 - 23:17
    it's basically a look of file so let us
  • 23:17 - 23:23
    see that okay so from the Splunk home
  • 23:23 - 23:28
    etc' apps search okay
  • 23:28 - 23:31
    lookups okay so currently if you see it
  • 23:31 - 23:33
    over here mmm
  • 23:33 - 23:36
    it's the by default the model is saved
  • 23:36 - 23:39
    as a user context so it is that's why it
  • 23:39 - 23:41
    is not coming up under search so further
  • 23:41 - 23:43
    what I need to do and to go to e.t.c
  • 23:43 - 23:47
    then I need to go to users currently I'm
  • 23:47 - 23:50
    the admin user you put you at mean and
  • 23:50 - 23:53
    I'll go to the Search app and here in
  • 23:53 - 23:55
    the look of folder this is how the model
  • 23:55 - 23:57
    is getting stored over there okay so I
  • 23:57 - 24:00
    think this lookup is in read-only format
  • 24:00 - 24:04
    so if I just open in notepad so this is
  • 24:04 - 24:06
    how it looks like so is this is
  • 24:06 - 24:09
    basically saving lot of the information
  • 24:09 - 24:11
    the metadata related information about
  • 24:11 - 24:13
    the model over here okay what are the
  • 24:13 - 24:16
    feature variables whatever the columns I
  • 24:16 - 24:19
    have in my data okay all of these things
  • 24:19 - 24:22
    ever from the rather there are others
  • 24:22 - 24:24
    features which we do not have any
  • 24:24 - 24:26
    control about it is saving over there
  • 24:26 - 24:32
    okay so now we created our own model
  • 24:32 - 24:34
    right nowI to apply this right how we
  • 24:34 - 24:36
    are going to apply this there is a
  • 24:36 - 24:40
    command called apply in Splunk ml TK
  • 24:40 - 24:43
    okay so by using that command you can
  • 24:43 - 24:46
    apply that particular model on any data
  • 24:46 - 24:49
    set okay on or specifically we'll be
  • 24:49 - 24:53
    doing in itself otherwise if you apply
  • 24:53 - 24:56
    that model on any evil Evan that I said
  • 24:56 - 24:59
    it will anyhow not not not gonna not
  • 24:59 - 25:01
    going to give you a proper results so
  • 25:01 - 25:04
    this is how you will be applying the
  • 25:04 - 25:08
    model so I'll have my this is my data
  • 25:08 - 25:10
    set based data set right I'll just
  • 25:10 - 25:13
    choose say lots last hundred records
  • 25:13 - 25:20
    okay let's last 200 records okay now I
  • 25:20 - 25:23
    will be using the apply command don't
  • 25:23 - 25:24
    worry about it I will be discussing this
  • 25:24 - 25:27
    Ron came LT k commands in detail in in
  • 25:27 - 25:31
    my next video so here we will just see
  • 25:31 - 25:34
    how we are just applying the model so
  • 25:34 - 25:37
    now I will see my apply command then my
  • 25:37 - 25:41
    model name right so we have given our
  • 25:41 - 25:45
    model name as chances of admit model
  • 25:45 - 25:55
    I'll just copy it and I will just run it
  • 25:55 - 25:58
    so what it should do basically it will
  • 25:58 - 26:01
    apply this particular model or that what
  • 26:01 - 26:05
    whatever okay so it is permission denied
  • 26:05 - 26:08
    it is saying up so for that what I need
  • 26:08 - 26:15
    to do is settings lookups okay lookup
  • 26:15 - 26:24
    table files I'll choose this one search
  • 26:24 - 26:27
    and reporting okay this is my chances of
  • 26:27 - 26:29
    admin model currently it is in private
  • 26:29 - 26:32
    mode that's why I am NOT able to apply
  • 26:32 - 26:35
    it on from the Search app so I choose
  • 26:35 - 26:38
    this app only readwrite currently I will
  • 26:38 - 26:40
    give I'll click on save
  • 26:40 - 26:47
    okay internal either detected node we
  • 26:47 - 26:50
    retain on to okay so let me see what's
  • 26:50 - 26:56
    going on over there okay so I think
  • 26:56 - 26:58
    there was some technical glitch so I
  • 26:58 - 27:02
    just did the permission again and I just
  • 27:02 - 27:05
    I my chosen all apps I think it it works
  • 27:05 - 27:09
    now so now let us see whether our search
  • 27:09 - 27:11
    is working or not
  • 27:11 - 27:15
    okay so I have taken the last 200
  • 27:15 - 27:18
    records and I'm just clicking on apply
  • 27:18 - 27:20
    the machine learning one machine
  • 27:20 - 27:23
    learning model so it is if you see that
  • 27:23 - 27:24
    it is applying that model on this
  • 27:24 - 27:27
    particular two hundred records two
  • 27:27 - 27:29
    hundred events over there and it has
  • 27:29 - 27:31
    created a new column called predicted
  • 27:31 - 27:34
    chances of advic okay so this is how we
  • 27:34 - 27:36
    are applying that model even you can
  • 27:36 - 27:39
    create your own alert using this
  • 27:39 - 27:41
    particular command as well so that
  • 27:41 - 27:43
    whenever you you want you want something
  • 27:43 - 27:46
    like tons of admit is more than 90
  • 27:46 - 27:48
    percent eighty percent or any other
  • 27:48 - 27:50
    everything you want you can use this
  • 27:50 - 27:53
    particular command to to achieve that
  • 27:53 - 27:55
    same thing over there okay so this is
  • 27:55 - 28:00
    how you can experiment with machine
  • 28:00 - 28:02
    learning specifically the linear
  • 28:02 - 28:07
    regression in Splunk ml TK and and we
  • 28:07 - 28:09
    saw of the lot of experiments we have
  • 28:09 - 28:12
    done it regarding this one right so this
  • 28:12 - 28:13
    is how you experiment with your data as
  • 28:13 - 28:17
    well and see how is how its best fit
  • 28:17 - 28:19
    your data and you can achieve a lot of
  • 28:19 - 28:22
    other stuff like automatically training
  • 28:22 - 28:24
    creating alerts from these things as
  • 28:24 - 28:27
    well okay in next video we will talk
  • 28:27 - 28:29
    more details we will basically deep dive
  • 28:29 - 28:31
    into what basically internally happening
  • 28:31 - 28:34
    over here we will talk about different
  • 28:34 - 28:36
    Splunk commands internally running the
  • 28:36 - 28:37
    custom commands internal running and
  • 28:37 - 28:40
    whatever we have done this experiment we
  • 28:40 - 28:42
    have done from the UI the same thing can
  • 28:42 - 28:45
    be achieved from the from the search
  • 28:45 - 28:46
    command
  • 28:46 - 28:49
    as well from Splunk SPL as well okay see
  • 28:49 - 28:52
    you in next video
Title:
Splunk MLTK : Implementation Of Linear Regression In Splunk MLTK
Description:

more » « less
Video Language:
English
Duration:
28:51

English subtitles

Revisions Compare revisions