< Return to Video

Splunk MLTK : Implementation Of Linear Regression In Splunk MLTK

  • 0:00 - 0:02
    Okay. In this video, we'll be discussing
  • 0:02 - 0:05
    about how we can implement linear
  • 0:05 - 0:08
    regression in Splunk MLTK, okay? So
  • 0:08 - 0:10
    in my previous video, we have seen how we
  • 0:10 - 0:13
    can install Splunk MLTK and it's
  • 0:13 - 0:15
    related packages, right? And also if you
  • 0:15 - 0:18
    remember when I was discussing about the
  • 0:18 - 0:21
    machine learning core algorithm, I was
  • 0:21 - 0:26
    also introduced the core dataset we'll
  • 0:26 - 0:27
    be using for our linear regression
  • 0:27 - 0:29
    modeling, okay?
  • 0:29 - 0:31
    That's the graduate admission dataset
  • 0:31 - 0:35
    where we have for various students we
  • 0:35 - 0:37
    have their GRE score, TOEFL score,
  • 0:37 - 0:40
    university rating, statement of purpose
  • 0:40 - 0:41
    rating okay,
  • 0:41 - 0:45
    reference rating, CGPA, whether
  • 0:45 - 0:47
    they have done research or not. Based on
  • 0:47 - 0:51
    all these fields, we will try to
  • 0:51 - 0:54
    predict the chances of admit, okay? So now
  • 0:54 - 0:59
    to implement linear regression- so we
  • 0:59 - 1:02
    will be implementing linear
  • 1:02 - 1:05
    regression for this one and see how best
  • 1:05 - 1:07
    the model is fitting the particular data,
  • 1:07 - 1:11
    okay? So to implement linear
  • 1:11 - 1:13
    regression, what you have to do you have
  • 1:13 - 1:15
    to go to a Splunk machine learning
  • 1:15 - 1:20
    toolkit, okay? As I stated before, the
  • 1:20 - 1:21
    landing page of the machine learning
  • 1:21 - 1:24
    toolkit app is this showcased dashboard,
  • 1:24 - 1:27
    right? Where it has basically a lot of
  • 1:27 - 1:30
    examples based on whatever the
  • 1:30 - 1:32
    different algorithm- machine learning
  • 1:32 - 1:35
    algorithms Splunk supports, okay? Now to
  • 1:35 - 1:39
    implement the machine learning on your
  • 1:39 - 1:42
    own dataset, what you need to do is you need to come
  • 1:42 - 1:47
    to experiments tab, okay? So now if you do
  • 1:47 - 1:50
    not have any other models or if it is
  • 1:50 - 1:52
    the first time you are coming to this
  • 1:52 - 1:54
    particular dashboard, this will be the
  • 1:54 - 1:56
    default view, okay? But if you have
  • 1:56 - 1:58
    already experimented on different models,
  • 1:58 - 2:00
    the view will be slightly different
  • 2:00 - 2:03
    which we'll see later, ok? So now as in
  • 2:03 - 2:06
    linear regression we are trying to do a
  • 2:06 - 2:08
    prediction on the numeric fields, right?
  • 2:08 - 2:11
    So we will go over here, okay?
  • 2:11 - 2:13
    The predict numeric field.
  • 2:13 - 2:15
    We're clicking over here. Now it is
  • 2:15 - 2:18
    asking me for an experiment title and a
  • 2:18 - 2:22
    description. So I will say graduate
  • 2:23 - 2:30
    admission prediction. Let's give the
  • 2:30 - 2:33
    experiment title like this one
  • 2:33 - 2:37
    prediction, okay? Now you got to give some
  • 2:37 - 2:38
    description as well, meaningful
  • 2:38 - 2:44
    description. So I'll click on create, okay?
  • 2:44 - 2:47
    So now this particular view comes up
  • 2:47 - 2:50
    over here. Now, if you see here, here we
  • 2:50 - 2:52
    have two tabs, experiment settings and
  • 2:52 - 2:55
    experiment history. Initially the
  • 2:55 - 2:56
    experiment history will be blank,
  • 2:56 - 2:59
    there is nothing over here, okay? Now
  • 2:59 - 3:01
    based on the experiment settings,
  • 3:01 - 3:03
    experiment history will be updated
  • 3:03 - 3:06
    accordingly which we will see it later, okay?
  • 3:06 - 3:08
    Now, the first thing is it is asking me
  • 3:08 - 3:13
    for a search, right? So now let me
  • 3:13 - 3:16
    show you the data. So this
  • 3:16 - 3:20
    particular data I already indexed in my
  • 3:20 - 3:24
    main index. Okay so I'll just write the
  • 3:24 - 3:30
    query index equals to main and just
  • 3:30 - 3:33
    tabling it all my different, different
  • 3:33 - 3:39
    features and chances of admit, okay? So this
  • 3:39 - 3:42
    is my dataset. So this dataset, I will
  • 3:42 - 3:45
    be using it for my training purpose, not
  • 3:45 - 3:47
    the full dataset or not all the 500
  • 3:47 - 3:50
    records. Maybe some of the data I will be
  • 3:50 - 3:53
    using it for training purpose, and rest
  • 3:53 - 3:55
    of the data I will be using it for the
  • 3:55 - 3:57
    prediction purpose just to see how my
  • 3:57 - 3:59
    model is working, okay? So I'll give this
  • 3:59 - 4:06
    query over here, and then I'll click on
  • 4:06 - 4:11
    search, okay? So by default, it is showing me
  • 4:11 - 4:14
    this is the my data, initial data preview,
  • 4:14 - 4:18
    right? Now, let's go to the next one. So
  • 4:18 - 4:20
    here if you see, there are a lot of
  • 4:20 - 4:23
    pre-processing steps over here, right? So
  • 4:23 - 4:27
    now in machine learning when you train a
  • 4:27 - 4:30
    particular model, right, you- there is
  • 4:30 - 4:32
    a- there may be some need to pre-process
  • 4:32 - 4:35
    that data so that you will reduce lot
  • 4:35 - 4:37
    of noise from the data. Now, there are a
  • 4:37 - 4:38
    lot of pre-processing algorithm
  • 4:38 - 4:41
    present over there. So when we will
  • 4:41 - 4:44
    discuss those algorithms, we'll come back
  • 4:44 - 4:47
    to this page again and work on it, okay?
  • 4:47 - 4:50
    So for now, I will not be doing any kind
  • 4:50 - 4:52
    of pre-processing because this data is
  • 4:52 - 4:56
    clean enough data, okay? So now the
  • 4:56 - 4:59
    algorithm I will be choosing, linear
  • 4:59 - 5:00
    regression. Now, there are a lot of
  • 5:00 - 5:02
    regulation algorithm, so currently we
  • 5:02 - 5:05
    studied only about linear regression, and we'll
  • 5:05 - 5:07
    be implementing linear regression only
  • 5:07 - 5:09
    in this video, so I will be choosing the
  • 5:09 - 5:11
    linear regression over here, okay? So now
  • 5:11 - 5:14
    fields to predict, that means which field
  • 5:14 - 5:16
    you want to predict. So as I will
  • 5:16 - 5:18
    predicting my chances of admit, I will
  • 5:18 - 5:22
    be choosing that. And then field used
  • 5:22 - 5:24
    for predicting. That means here basically
  • 5:24 - 5:27
    you are choosing your features, right? So
  • 5:27 - 5:30
    I will be choosing all my columns. So
  • 5:30 - 5:34
    here if you see, the concept of simple
  • 5:34 - 5:36
    linear regression and multiple linear
  • 5:36 - 5:38
    regression comes up, right? If I choose a
  • 5:38 - 5:41
    single feature, it will become a simple
  • 5:41 - 5:43
    linear regression. If I choose multiple
  • 5:43 - 5:45
    feature, it will become a multiple linear
  • 5:45 - 5:48
    regression. So for now, I will be choosing
  • 5:48 - 5:52
    for all, okay? Now here if you see, the
  • 5:52 - 5:55
    split for training, right? So here
  • 5:55 - 5:57
    basically it is what is happening is you
  • 5:57 - 6:00
    are splitting the whole dataset between
  • 6:00 - 6:03
    a training and test dataset. There are-
  • 6:03 - 6:04
    Here currently it is 50 percent, 50 percent.
  • 6:04 - 6:07
    That means the first 50 percent data
  • 6:07 - 6:09
    will be used for training and the rest
  • 6:09 - 6:11
    50 percent data will be used for testing
  • 6:11 - 6:15
    purpose. I'll slide this one, it goes like
  • 6:15 - 6:19
    this one, I'll keep 70 and 30, okay? Now,
  • 6:19 - 6:23
    fit intercept, okay? That means, if you
  • 6:23 - 6:25
    remember from my machine learning video,
  • 6:25 - 6:30
    not only we have the slope value for
  • 6:30 - 6:33
    each and every feature, we also have a
  • 6:33 - 6:36
    intercept of y axis intercept by the
  • 6:36 - 6:39
    way. So by this option, you are
  • 6:39 - 6:41
    basically choosing whether
  • 6:41 - 6:44
    your model should include an implicit
  • 6:44 - 6:47
    intercept terms or not, okay? Now notes
  • 6:47 - 6:49
    you can give some meaningful
  • 6:49 - 6:52
    notes. Maybe the notes could be like what
  • 6:52 - 6:53
    are the fields you are using for
  • 6:53 - 6:56
    prediction purpose. So- and some
  • 6:56 - 6:58
    meaningful note which will be useful in
  • 6:58 - 7:00
    later when we'll see the history of the
  • 7:00 - 7:05
    model, okay? So I will say using all the
  • 7:05 - 7:11
    features, using all the features, okay? Now
  • 7:11 - 7:13
    after all is done, you need to click on
  • 7:13 - 7:17
    fit model. So it's basically- behind the
  • 7:17 - 7:20
    scene, what it do, it runs Splunk
  • 7:20 - 7:22
    custom command which basically
  • 7:22 - 7:25
    implemented [inaudible]. So
  • 7:25 - 7:28
    using that particular command, it is
  • 7:28 - 7:30
    trying to come up with the equation of
  • 7:30 - 7:32
    that line, right? Which we discussed
  • 7:32 - 7:37
    before. And if you remember from my
  • 7:37 - 7:39
    multiple linear regression video, we come
  • 7:39 - 7:42
    up with a linear algebra solution over
  • 7:42 - 7:46
    there, right? With matrix inversion
  • 7:46 - 7:49
    and matrix transpose, right? So behind the
  • 7:49 - 7:50
    scene it is doing the same thing over
  • 7:50 - 7:51
    there, okay?
  • 7:51 - 7:54
    So now if you see, the result came up,
  • 7:54 - 7:56
    right, after clicking on the fit model.
  • 7:56 - 8:00
    Now if you see, apart from our own data,
  • 8:00 - 8:04
    it's actually added two new columns over
  • 8:04 - 8:06
    here. One is the predicted chances of
  • 8:06 - 8:09
    admit and the residual column, right? Now,
  • 8:09 - 8:11
    predicted chance of admit is actually
  • 8:11 - 8:13
    the actual prediction happen on the data,
  • 8:13 - 8:17
    right? So if you see for the first row,
  • 8:17 - 8:21
    the actual chance of admit is 0.73,
  • 8:21 - 8:23
    that means 73%. Now the predicted
  • 8:23 - 8:26
    was 0.70, that means 70 percent. Now,
  • 8:26 - 8:28
    the residual column is the difference
  • 8:28 - 8:30
    between the actual chance of admit and
  • 8:30 - 8:34
    the predicted chance of admit, okay?
  • 8:34 - 8:37
    So this is how, after fitting the model,
  • 8:37 - 8:40
    it came up with this kind of
  • 8:40 - 8:40
    visualization.
  • 8:40 - 8:44
    It also shows up, there are other five to
  • 8:44 - 8:46
    six charts over here, okay? Now let us
  • 8:46 - 8:49
    discuss one by one this one. The first
  • 8:49 - 8:52
    chart show me the actual versus
  • 8:52 - 8:54
    predicted line chart. That means
  • 8:54 - 8:57
    if you see the chance of admit, the blue
  • 8:57 - 9:00
    colored graph, is the actual one, and the
  • 9:00 - 9:02
    predicted chance of admit, the yellow
  • 9:02 - 9:04
    color one, is the prediction one, right?
  • 9:04 - 9:07
    And if you see by seeing this one, we can
  • 9:07 - 9:10
    at least see this particular model is
  • 9:10 - 9:13
    okay fit to this particular data.
  • 9:13 - 9:15
    Somewhere it is lagging over here if you
  • 9:15 - 9:18
    see it, right? But somehow it's
  • 9:18 - 9:22
    actually fitting good over there. Now the
  • 9:22 - 9:24
    residual chart, whatever you are seeing
  • 9:24 - 9:26
    it over here, the line chart it is
  • 9:26 - 9:29
    showing up over here, okay? So now the
  • 9:29 - 9:32
    more this chart particular chart is
  • 9:32 - 9:35
    close to zero, that means the model is
  • 9:35 - 9:37
    fitting really, really good. But over here
  • 9:37 - 9:40
    if you see the latter part of this one,
  • 9:40 - 9:43
    the residuals are more, right? Because it
  • 9:43 - 9:46
    is more sparse, more distance from the
  • 9:46 - 9:48
    zeroth line. And the same thing is
  • 9:48 - 9:50
    reflecting over here as well. The model
  • 9:50 - 9:53
    has some kind of lagging over here, right?
  • 9:53 - 9:56
    So this kind of analysis you can do
  • 9:56 - 9:59
    it from there, how the model is fitting
  • 9:59 - 10:02
    your data. And this particular graph is
  • 10:02 - 10:04
    showing me the scatter plot of the
  • 10:04 - 10:06
    actual and the predicted one. And here
  • 10:06 - 10:09
    basically you can see how the line
  • 10:09 - 10:12
    is fitting your data over here through
  • 10:12 - 10:16
    this chart, okay? Now, it also provides say
  • 10:16 - 10:20
    residual histogram where let us
  • 10:20 - 10:22
    understand this one as well. So we
  • 10:22 - 10:24
    have the zeroth line over here if you
  • 10:24 - 10:28
    see. It's basically shows up for each and
  • 10:28 - 10:30
    every residual value, how many counts are
  • 10:30 - 10:33
    there if you see. So if you just
  • 10:33 - 10:37
    think about it, if for all my data points
  • 10:37 - 10:41
    this residual is zero, that's the
  • 10:41 - 10:43
    ideal scenario, right? That means I am
  • 10:43 - 10:45
    predicting the [inaudible], right?
  • 10:45 - 10:49
    So from this histogram, if you see that
  • 10:49 - 10:52
    means- if you see the residual error
  • 10:52 - 10:54
    equals to zero, the sample count is 24
  • 10:54 - 10:57
    [inaudible], right? If the more and more
  • 10:57 - 11:00
    samples are very close to this zero, that
  • 11:00 - 11:04
    means my model is doing good, that's it's
  • 11:04 - 11:06
    actually good fit model. And if it is
  • 11:06 - 11:08
    more sparse, if-
  • 11:08 - 11:12
    that means if we have more number of big
  • 11:12 - 11:14
    lines over here, that means that somehow the
  • 11:14 - 11:16
    model is not good- not a good fit for
  • 11:16 - 11:18
    that particular data. So this kind of
  • 11:18 - 11:20
    interpretation, you can do it from this
  • 11:20 - 11:24
    particular diagram, okay? So now there are
  • 11:24 - 11:28
    another two things over here. It's called R squared
  • 11:28 - 11:30
    statistic and root mean square
  • 11:30 - 11:33
    error, okay? So these two are actually a measure
  • 11:33 - 11:37
    about how accurate the model is, okay? So
  • 11:37 - 11:41
    I'll be discussing this measurement in
  • 11:41 - 11:43
    very detail in separate video.
  • 11:43 - 11:45
    There we will be discussing about R squared
  • 11:45 - 11:47
    statistic, root mean square, and also some
  • 11:47 - 11:51
    other way to determine how the
  • 11:51 - 11:53
    accurate the model is. Just like bias,
  • 11:53 - 11:55
    variance, there are a lot of other
  • 11:55 - 11:56
    measurement as well
  • 11:56 - 11:59
    we'll discuss in detail over there, okay?
  • 11:59 - 12:00
    But for now, just try to remember
  • 12:00 - 12:03
    like this is the fit, measurement of fit,
  • 12:03 - 12:05
    like maybe R squared statistic we
  • 12:05 - 12:09
    can think of it it is more close to 1, it's
  • 12:09 - 12:11
    a good fit. Something like this, okay?
  • 12:11 - 12:15
    Mmm, so we will see like how to best
  • 12:15 - 12:18
    judge a model based on that, okay? But
  • 12:18 - 12:20
    still like even for R squared statistic,
  • 12:20 - 12:25
    it's all depend on the context, the field,
  • 12:25 - 12:28
    you are solving, you're implementing
  • 12:28 - 12:30
    linear regression as well. We'll discuss
  • 12:30 - 12:32
    those stuff as well in future, okay? And now,
  • 12:32 - 12:35
    if you see the last graph, it is showing
  • 12:35 - 12:38
    me the model parameters. If you remember
  • 12:38 - 12:40
    the big equation we have written into over
  • 12:40 - 12:43
    there, right? So let me open the bamboo
  • 12:43 - 12:56
    paper here. If you remember, when we
  • 12:56 - 13:00
    talked about multiple linear regression,
  • 13:02 - 13:05
    we defined- we started our discussion
  • 13:05 - 13:07
    with a big equation, right? So let me go
  • 13:07 - 13:11
    back over there.
  • 13:21 - 13:26
    Yes, so this one, right? So where beta 1,
  • 13:26 - 13:30
    beta 2, to beta P is our slow value,
  • 13:30 - 13:31
    coefficient of each and every feature.
  • 13:31 - 13:35
    And beta 0 is my intercept, right? And
  • 13:35 - 13:37
    what we are doing basically at the
  • 13:37 - 13:39
    end of the day, we came up with a big
  • 13:39 - 13:42
    equation to determine this whole beta
  • 13:42 - 13:46
    vector, right? So this is the same stuff
  • 13:46 - 13:49
    over here it is representing. So it is
  • 13:49 - 13:51
    basically giving me like for each and
  • 13:51 - 13:53
    every feature, what is the coefficient
  • 13:53 - 13:57
    value, okay? So- and the intercept value as
  • 13:57 - 14:00
    well. If you see, this is my beta 0, and my
  • 14:00 - 14:02
    beta 1 to beta P's, these guys,
  • 14:02 - 14:04
    other guys. Now, if you see it closely
  • 14:04 - 14:07
    there are some of the coefficient which
  • 14:07 - 14:10
    have very greater value. Some of the
  • 14:10 - 14:11
    coefficient which are very less value
  • 14:11 - 14:14
    over here. Like the way to interpret the
  • 14:14 - 14:19
    coefficient is like how much it is
  • 14:19 - 14:22
    influencing the end result.
  • 14:22 - 14:25
    So to understand that, let us see this
  • 14:25 - 14:29
    one. Let's say I have a variable called 'x'
  • 14:29 - 14:34
    and I am writing something like 0.9 'y'.
  • 14:34 - 14:36
    Now, what do I mean by this particular
  • 14:36 - 14:41
    equation. 0.9 into 'y', right? So that means
  • 14:41 - 14:45
    if I give 'y' equals to 1, that means
  • 14:45 - 14:50
    my 'x' will become 0.9, right? So what do
  • 14:50 - 14:52
    we mean by that? That means one unit
  • 14:52 - 14:53
    change in 'y',
  • 14:53 - 14:56
    it's basically 0.9 unit we are
  • 14:56 - 15:01
    changing in 'x', right? So this kind of
  • 15:01 - 15:03
    interpretation, you can do it. So that
  • 15:03 - 15:09
    means how 'y' is influencing 'x', right? So
  • 15:09 - 15:11
    this is how we are interpreting this
  • 15:11 - 15:14
    kind of coefficients as well in linear
  • 15:14 - 15:17
    regression. So that means we will know
  • 15:17 - 15:19
    from the coefficient itself which
  • 15:19 - 15:22
    particular feature is mostly influencing
  • 15:22 - 15:24
    that one. And now if you see it over here,
  • 15:24 - 15:24
    I think,
  • 15:24 - 15:27
    CGPA is the most influencing factor to
  • 15:27 - 15:31
    determine whether my chances of admit
  • 15:31 - 15:32
    is higher
  • 15:32 - 15:36
    or not, right? Considering we are
  • 15:36 - 15:38
    implementing a linear regression, there
  • 15:38 - 15:40
    could be a better fit of this particular
  • 15:40 - 15:43
    data which we need to experiment and see.
  • 15:43 - 15:45
    But for the current linear
  • 15:45 - 15:47
    regression implementation, we can
  • 15:47 - 15:50
    conclude this kind of stuff over here,
  • 15:50 - 15:58
    right? Correct? So this is how the model
  • 15:58 - 16:02
    parameters summary visualization table
  • 16:02 - 16:03
    visualization is telling me those
  • 16:03 - 16:06
    different those details, right? So now if
  • 16:06 - 16:09
    you see, we actually fit our model, right?
  • 16:09 - 16:12
    So we still [inaudible] that our model and
  • 16:12 - 16:14
    tell our analyst we are saving it. That's
  • 16:14 - 16:17
    why it is showing me as a draft status of
  • 16:17 - 16:22
    your model, right? And you can now go to
  • 16:22 - 16:27
    experiment history to see what you have
  • 16:27 - 16:29
    done till now. So it will be maintaining
  • 16:29 - 16:32
    a history over there. So now I can see
  • 16:32 - 16:36
    using this- all these features my R
  • 16:36 - 16:39
    squared statistic is somewhere around 78%,
  • 16:39 - 16:41
    and these are my coefficient, and I am
  • 16:41 - 16:43
    coming up with a conclusion that maybe
  • 16:43 - 16:46
    CGPA is the most influential factor over
  • 16:46 - 16:49
    here, okay? So let us do another
  • 16:49 - 16:53
    experiment, okay? So in here, I'll keep my
  • 16:53 - 16:57
    CGPA over here just to see whether it is
  • 16:57 - 17:00
    actually true or not, okay? So now what I
  • 17:00 - 17:03
    will do here is I will keep CGPA,
  • 17:03 - 17:06
    I'll keep the [inaudible], I will keep-
  • 17:06 - 17:09
    the I will keep the LOR, okay? I'll
  • 17:09 - 17:14
    keep the research one, and I will keep the
  • 17:14 - 17:19
    GRE score, okay? So I'll click over here
  • 17:19 - 17:22
    again. I will keep the GRE score. I will
  • 17:22 - 17:24
    remove the TOEFL score. I will remove the
  • 17:24 - 17:26
    university rating. I will remove the SOP.
  • 17:26 - 17:29
    CGPA, Research, and LOR I will keep. So
  • 17:29 - 17:32
    now I am trying to do this experiment
  • 17:32 - 17:35
    with four features which I am thinking
  • 17:35 - 17:40
    maybe most influential one. So maybe the
  • 17:40 - 17:43
    other feature may not have much impact
  • 17:43 - 17:46
    on this particular prediction, okay?
  • 17:46 - 17:50
    So now using only- I'll keep a note, using
  • 17:50 - 17:55
    only four features. So this is how this
  • 17:55 - 17:58
    particular note is coming into handy
  • 17:58 - 18:01
    over here, right? So it is- when I will see
  • 18:01 - 18:03
    the history, I will come to know what I
  • 18:03 - 18:05
    have done over there, okay? So I will
  • 18:05 - 18:09
    click on 'Fit Model' again. Let's see how
  • 18:09 - 18:14
    it's- how it's working now. So similar
  • 18:14 - 18:15
    stuff is happening over there. It's
  • 18:15 - 18:19
    running the custom commands.
  • 18:19 - 18:22
    In later videos, we will discuss in
  • 18:22 - 18:23
    detail of those
  • 18:23 - 18:34
    custom command as well, okay? Okay, so now if
  • 18:34 - 18:38
    you see, it again predicted that one. Now
  • 18:38 - 18:41
    if you see from the actual versus line
  • 18:41 - 18:43
    chart, it's more or less keeping same
  • 18:43 - 18:45
    even though I removed three features,
  • 18:45 - 18:48
    right? Even this one as well, more or less,
  • 18:48 - 18:52
    okay? Now if you see my R squared
  • 18:52 - 18:54
    statistics has improved a lot with 82%,
  • 18:54 - 18:58
    right? So by this one, at least I am
  • 18:58 - 19:01
    confident that really those three
  • 19:01 - 19:04
    features are not impacting much of
  • 19:04 - 19:08
    it. And if you see from this one residual
  • 19:08 - 19:11
    histogram, residuals histogram, that more
  • 19:11 - 19:13
    and more features are very close to zero,
  • 19:13 - 19:16
    right? With residual error-
  • 19:16 - 19:18
    more residual errors are very, very
  • 19:18 - 19:20
    close to zero, right?
  • 19:20 - 19:23
    So by this kind of analysis, we can say
  • 19:23 - 19:25
    this particular model is better than
  • 19:25 - 19:28
    compared to my previous model, right? So
  • 19:28 - 19:31
    now what I will do is I will save this
  • 19:31 - 19:33
    particular model, okay? So I will save, I
  • 19:33 - 19:38
    will give the experiment title as
  • 19:39 - 19:47
    'graduate_date_predictor', okay? I will
  • 19:47 - 19:51
    click on save. So now a data- a model will
  • 19:51 - 19:55
    be created, okay? So now if I just- we have
  • 19:55 - 19:56
    two options over here after you save the
  • 19:56 - 19:58
    model. Either you have to- you can go to
  • 19:58 - 20:00
    the listing page
  • 20:00 - 20:02
    or you continue editing, okay? Let us
  • 20:02 - 20:04
    continue editing to see how experiment
  • 20:04 - 20:06
    history is looking now. Now experiment
  • 20:06 - 20:09
    history has two rows over there, okay?
  • 20:09 - 20:12
    The first row is my- the current
  • 20:12 - 20:14
    experiment with my four features, right?
  • 20:14 - 20:18
    With R squared value of 82%. The second
  • 20:18 - 20:20
    row is telling me my older one, right?
  • 20:20 - 20:22
    So at any point of time, you can load
  • 20:22 - 20:24
    this corresponding settings and
  • 20:24 - 20:26
    experiment with it, okay?
  • 20:26 - 20:27
    It will also show you the data
  • 20:27 - 20:30
    corresponding to each experiment,
  • 20:30 - 20:32
    okay? So now let's go back to our
  • 20:32 - 20:35
    experiment tab and see what is happening
  • 20:35 - 20:37
    over there, okay? Now if you see my
  • 20:37 - 20:40
    experiment tab, it's not showing me
  • 20:40 - 20:43
    those big blocks, right? Mmm, it is
  • 20:43 - 20:44
    showing with this kind of view where I
  • 20:44 - 20:47
    have a predict numeric fields, a single
  • 20:47 - 20:50
    experiment I have done. I have given the
  • 20:50 - 20:53
    experiment name like this one, right?
  • 20:53 - 20:54
    The algorithm I have chosen, linear
  • 20:54 - 20:56
    regression. There are lot of actions you
  • 20:56 - 20:59
    can do on this particular model so
  • 20:59 - 21:01
    before publishing, let us talk about
  • 21:01 - 21:03
    that one, okay? You can create an alert
  • 21:03 - 21:07
    from this model just to see. So suppose-
  • 21:07 - 21:10
    the model is predicting data, right? So
  • 21:10 - 21:12
    you can choose an alert, create an alert,
  • 21:12 - 21:15
    something like when my predicted chance
  • 21:15 - 21:16
    of admit is greater than 90 percent, that
  • 21:16 - 21:20
    means 0.9, okay? Fine, 99 maybe. That means
  • 21:20 - 21:23
    the model is really, really working good
  • 21:23 - 21:25
    over there, right? So this kind of alert
  • 21:25 - 21:30
    you can do, okay? Next you can edit the
  • 21:30 - 21:32
    title and description. It's a simple
  • 21:32 - 21:36
    enough. Now you can see schedule a
  • 21:36 - 21:37
    training. This is an interesting feature
  • 21:37 - 21:41
    where we- whatever we have done till now,
  • 21:41 - 21:43
    we have done manual training over
  • 21:43 - 21:45
    there, right? Now, in the scheduled
  • 21:45 - 21:47
    training feature, that you can create a
  • 21:47 - 21:49
    scheduler which will run a training
  • 21:49 - 21:52
    based on the data. Now, if you see, there
  • 21:52 - 21:54
    is a time range over there. So you can
  • 21:54 - 21:56
    choose the time range of the data you
  • 21:56 - 21:59
    want to use for training purpose, okay?
  • 21:59 - 22:01
    That's a really interesting feature you
  • 22:01 - 22:03
    have, so that means the more and more
  • 22:03 - 22:06
    data coming to your system, you can use
  • 22:06 - 22:09
    those particular data, right, to training
  • 22:09 - 22:11
    purposes as well automatically using
  • 22:11 - 22:13
    the scheduled training, okay? And
  • 22:13 - 22:17
    similarly for other scheduling stuff, the
  • 22:17 - 22:18
    schedule priority and schedule window,
  • 22:18 - 22:21
    you can set it up as well. Even you can
  • 22:21 - 22:23
    trigger an action as well when the
  • 22:23 - 22:24
    scheduling is happening, you either you
  • 22:24 - 22:27
    can run a log, you can send the log file
  • 22:27 - 22:30
    output to a lookup, everything. This is
  • 22:30 - 22:32
    normal scheduling purposes, okay? That is
  • 22:32 - 22:34
    also you can do over here. So this is a
  • 22:34 - 22:36
    very versatile feature as well with the
  • 22:36 - 22:39
    model you can do. And now you can delete
  • 22:39 - 22:43
    it as well that's fine. So now we will
  • 22:43 - 22:45
    publish this model, okay?
  • 22:45 - 22:56
    Let's say 'chances_of_admit_model', okay?
  • 22:56 - 22:58
    This is the model name, and the
  • 22:58 - 23:00
    destination app you will be choosing
  • 23:00 - 23:02
    over here, so the model will be saved over
  • 23:02 - 23:04
    there, okay? I will be choosing my search
  • 23:04 - 23:08
    and reporting app, I will click on submit,
  • 23:08 - 23:13
    okay? So the model is created now. So how
  • 23:13 - 23:15
    the model is created in the background?
  • 23:15 - 23:17
    It's basically a lookup file, so let us
  • 23:17 - 23:23
    see that, okay? So from the Splunk home,
  • 23:23 - 23:28
    etc, apps, search, okay,
  • 23:28 - 23:31
    lookups. Okay so currently if you see it
  • 23:31 - 23:33
    over here mmm,
  • 23:33 - 23:36
    it's the- by default the model is saved
  • 23:36 - 23:39
    as a user context so it is- that's why it
  • 23:39 - 23:41
    is not coming up under search. So further
  • 23:41 - 23:43
    what I need to do, I need to go to etc,
  • 23:43 - 23:47
    then I need to go to users. Currently I'm
  • 23:47 - 23:50
    the admin user, go to admin, and
  • 23:50 - 23:53
    I'll go to the search app. And here in
  • 23:53 - 23:55
    the lookup folder, this is how the model
  • 23:55 - 23:57
    is getting stored over there, okay? So I
  • 23:57 - 24:00
    think this lookup is in read-only format,
  • 24:00 - 24:04
    so if I just open it in notepad- so this is
  • 24:04 - 24:06
    how it looks like. So this is
  • 24:06 - 24:09
    basically saving lot of the information,
  • 24:09 - 24:11
    the metadata related information about
  • 24:11 - 24:13
    the model over here, okay? What are the
  • 24:13 - 24:16
    feature variables, whatever the columns I
  • 24:16 - 24:19
    have in my data, okay? All of these things
  • 24:19 - 24:22
    [inaudible] others
  • 24:22 - 24:24
    features which we do not have any
  • 24:24 - 24:26
    control about, it is saving over there,
  • 24:26 - 24:32
    okay? So now we created our own model,
  • 24:32 - 24:34
    right? Now we need to apply this, right? How we
  • 24:34 - 24:36
    are going to apply this? There is a
  • 24:36 - 24:40
    command called apply in Splunk MLTK,
  • 24:40 - 24:43
    okay? So by using that command, you can
  • 24:43 - 24:46
    apply that particular model on any dataset,
  • 24:46 - 24:49
    okay? Or specifically we'll be
  • 24:49 - 24:53
    doing in dataset itself, otherwise if you apply
  • 24:53 - 24:56
    that model on any [inaudible] dataset,
  • 24:56 - 24:59
    it will anyhow not gonna not
  • 24:59 - 25:01
    going to give you a proper results. So
  • 25:01 - 25:04
    this is how you will be applying the
  • 25:04 - 25:08
    model. So I'll have my- this is my dataset,
  • 25:08 - 25:10
    base dataset, right? I'll just
  • 25:10 - 25:13
    choose [inaudible] last hundred records,
  • 25:13 - 25:20
    okay? Let's last 200 records, okay? Now I
  • 25:20 - 25:23
    will be using the apply command. Don't
  • 25:23 - 25:24
    worry about it, I will be discussing this
  • 25:24 - 25:27
    Splunk MLTK commands in detail in
  • 25:27 - 25:31
    my next video. So here we will just see
  • 25:31 - 25:34
    how we are just applying the model. So
  • 25:34 - 25:37
    now I will see my apply command, then my
  • 25:37 - 25:41
    model name, right? So we have given our
  • 25:41 - 25:45
    model name as 'chances_of_admit_model'.
  • 25:45 - 25:55
    I'll just copy it, okay? And I will just run it.
  • 25:55 - 25:58
    So what it should do basically, it will
  • 25:58 - 26:01
    apply this particular model or that what
  • 26:01 - 26:05
    whatever, okay. So it is permission denied
  • 26:05 - 26:08
    it is saying now. So for that what I need
  • 26:08 - 26:15
    to do is settings, lookups, okay, lookup
  • 26:15 - 26:24
    table files, I'll choose this one, search
  • 26:24 - 26:27
    and reporting, okay? This is my chances of
  • 26:27 - 26:29
    admit model. Currently it is in private
  • 26:29 - 26:32
    mode, that's why I am not able to apply
  • 26:32 - 26:35
    it on from the search app. So I'll choose
  • 26:35 - 26:38
    this app only, read write currently I will
  • 26:38 - 26:40
    give, I'll click on save,
  • 26:40 - 26:47
    okay? Internal error, data could not be
  • 26:47 - 26:50
    written on to- okay. So let me see what's
  • 26:50 - 26:56
    going on over there. Okay so I think
  • 26:56 - 26:58
    there was some technical glitch, so I
  • 26:58 - 27:02
    just did the permission again. And I-
  • 27:02 - 27:05
    this time I chosen all apps, I think it works
  • 27:05 - 27:09
    now. So now let us see whether our search
  • 27:09 - 27:11
    is working or not.
  • 27:11 - 27:15
    Okay so I have taken the last 200
  • 27:15 - 27:18
    records and I'm just clicking on apply,
  • 27:18 - 27:20
    the machine learning one, machine
  • 27:20 - 27:23
    learning model. So it is- if you see that
  • 27:23 - 27:24
    it is applying that model on this
  • 27:24 - 27:27
    particular two hundred records, two
  • 27:27 - 27:29
    hundred events over there, and it has
  • 27:29 - 27:31
    created a new column called predicted
  • 27:31 - 27:34
    chances of admit, okay? So this is how we
  • 27:34 - 27:36
    are applying that model. Even you can
  • 27:36 - 27:39
    create your own alert using this
  • 27:39 - 27:41
    particular command as well, so that
  • 27:41 - 27:43
    whenever you want something
  • 27:43 - 27:46
    like chances of admit is more than 90
  • 27:46 - 27:48
    percent 80 percent or any other
  • 27:48 - 27:50
    [inaudible] you want, you can use this
  • 27:50 - 27:53
    particular command to achieve that
  • 27:53 - 27:55
    same thing over there, okay? So this is
  • 27:55 - 28:00
    how you can experiment with machine
  • 28:00 - 28:02
    learning, specifically the linear
  • 28:02 - 28:07
    regression in Splunk MLTK. And we
  • 28:07 - 28:09
    saw that a lot of experiments we have
  • 28:09 - 28:12
    done it regarding this one, right? So this
  • 28:12 - 28:13
    is how you experiment with your data as
  • 28:13 - 28:17
    well and see how its best fit
  • 28:17 - 28:19
    your data, and you can achieve a lot of
  • 28:19 - 28:22
    other stuff like automatically training,
  • 28:22 - 28:24
    creating alerts from these things as
  • 28:24 - 28:27
    well, okay? In next video, we will talk
  • 28:27 - 28:29
    more details, we will basically deep dive
  • 28:29 - 28:31
    into what basically internally happening
  • 28:31 - 28:34
    over here. We will talk about different
  • 28:34 - 28:36
    Splunk commands internally running, the
  • 28:36 - 28:37
    custom commands internal running. And
  • 28:37 - 28:40
    whatever we have done, this experiment we
  • 28:40 - 28:42
    have done from the UI, the same thing can
  • 28:42 - 28:45
    be achieved from the search
  • 28:45 - 28:46
    command
  • 28:46 - 28:49
    as well from Splunk SPL as well, okay? See
  • 28:49 - 28:52
    you in next video.
Title:
Splunk MLTK : Implementation Of Linear Regression In Splunk MLTK
Description:

more » « less
Video Language:
English
Duration:
28:51

English subtitles

Revisions Compare revisions