Splunk MLTK : Implementation Of Linear Regression In Splunk MLTK

Edit subtitles

0:00 - 0:02

Okay. In this video, we'll be discussing
0:02 - 0:05

about how we can implement linear
0:05 - 0:08

regression in Splunk MLTK, okay? So
0:08 - 0:10

in my previous video, we have seen how we
0:10 - 0:13

can install Splunk MLTK and it's
0:13 - 0:15

related packages, right? And also if you
0:15 - 0:18

remember when I was discussing about the
0:18 - 0:21

machine learning core algorithm, I was
0:21 - 0:26

also introduced the core dataset we'll
0:26 - 0:27

be using for our linear regression
0:27 - 0:29

modeling, okay?
0:29 - 0:31

That's the graduate admission dataset
0:31 - 0:35

where we have for various students we
0:35 - 0:37

have their GRE score, TOEFL score,
0:37 - 0:40

university rating, statement of purpose
0:40 - 0:41

rating okay,
0:41 - 0:45

reference rating, CGPA, whether
0:45 - 0:47

they have done research or not. Based on
0:47 - 0:51

all these fields, we will try to
0:51 - 0:54

predict the chances of admit, okay? So now
0:54 - 0:59

to implement linear regression- so we
0:59 - 1:02

will be implementing linear
1:02 - 1:05

regression for this one and see how best
1:05 - 1:07

the model is fitting the particular data,
1:07 - 1:11

okay? So to implement linear
1:11 - 1:13

regression, what you have to do you have
1:13 - 1:15

to go to a Splunk machine learning
1:15 - 1:20

toolkit, okay? As I stated before, the
1:20 - 1:21

landing page of the machine learning
1:21 - 1:24

toolkit app is this showcased dashboard,
1:24 - 1:27

right? Where it has basically a lot of
1:27 - 1:30

examples based on whatever the
1:30 - 1:32

different algorithm- machine learning
1:32 - 1:35

algorithms Splunk supports, okay? Now to
1:35 - 1:39

implement the machine learning on your
1:39 - 1:42

own dataset, what you need to do is you need to come
1:42 - 1:47

to experiments tab, okay? So now if you do
1:47 - 1:50

not have any other models or if it is
1:50 - 1:52

the first time you are coming to this
1:52 - 1:54

particular dashboard, this will be the
1:54 - 1:56

default view, okay? But if you have
1:56 - 1:58

already experimented on different models,
1:58 - 2:00

the view will be slightly different
2:00 - 2:03

which we'll see later, ok? So now as in
2:03 - 2:06

linear regression we are trying to do a
2:06 - 2:08

prediction on the numeric fields, right?
2:08 - 2:11

So we will go over here, okay?
2:11 - 2:13

The predict numeric field.
2:13 - 2:15

We're clicking over here. Now it is
2:15 - 2:18

asking me for an experiment title and a
2:18 - 2:22

description. So I will say graduate
2:23 - 2:30

admission prediction. Let's give the
2:30 - 2:33

experiment title like this one
2:33 - 2:37

prediction, okay? Now you got to give some
2:37 - 2:38

description as well, meaningful
2:38 - 2:44

description. So I'll click on create, okay?
2:44 - 2:47

So now this particular view comes up
2:47 - 2:50

over here. Now, if you see here, here we
2:50 - 2:52

have two tabs, experiment settings and
2:52 - 2:55

experiment history. Initially the
2:55 - 2:56

experiment history will be blank,
2:56 - 2:59

there is nothing over here, okay? Now
2:59 - 3:01

based on the experiment settings,
3:01 - 3:03

experiment history will be updated
3:03 - 3:06

accordingly which we will see it later, okay?
3:06 - 3:08

Now, the first thing is it is asking me
3:08 - 3:13

for a search, right? So now let me
3:13 - 3:16

show you the data. So this
3:16 - 3:20

particular data I already indexed in my
3:20 - 3:24

main index. Okay so I'll just write the
3:24 - 3:30

query index equals to main and just
3:30 - 3:33

tabling it all my different, different
3:33 - 3:39

features and chances of admit, okay? So this
3:39 - 3:42

is my dataset. So this dataset, I will
3:42 - 3:45

be using it for my training purpose, not
3:45 - 3:47

the full dataset or not all the 500
3:47 - 3:50

records. Maybe some of the data I will be
3:50 - 3:53

using it for training purpose, and rest
3:53 - 3:55

of the data I will be using it for the
3:55 - 3:57

prediction purpose just to see how my
3:57 - 3:59

model is working, okay? So I'll give this
3:59 - 4:06

query over here, and then I'll click on
4:06 - 4:11

search, okay? So by default, it is showing me
4:11 - 4:14

this is the my data, initial data preview,
4:14 - 4:18

right? Now, let's go to the next one. So
4:18 - 4:20

here if you see, there are a lot of
4:20 - 4:23

pre-processing steps over here, right? So
4:23 - 4:27

now in machine learning when you train a
4:27 - 4:30

particular model, right, you- there is
4:30 - 4:32

a- there may be some need to pre-process
4:32 - 4:35

that data so that you will reduce lot
4:35 - 4:37

of noise from the data. Now, there are a
4:37 - 4:38

lot of pre-processing algorithm
4:38 - 4:41

present over there. So when we will
4:41 - 4:44

discuss those algorithms, we'll come back
4:44 - 4:47

to this page again and work on it, okay?
4:47 - 4:50

So for now, I will not be doing any kind
4:50 - 4:52

of pre-processing because this data is
4:52 - 4:56

clean enough data, okay? So now the
4:56 - 4:59

algorithm I will be choosing, linear
4:59 - 5:00

regression. Now, there are a lot of
5:00 - 5:02

regulation algorithm, so currently we
5:02 - 5:05

studied only about linear regression, and we'll
5:05 - 5:07

be implementing linear regression only
5:07 - 5:09

in this video, so I will be choosing the
5:09 - 5:11

linear regression over here, okay? So now
5:11 - 5:14

fields to predict, that means which field
5:14 - 5:16

you want to predict. So as I will
5:16 - 5:18

predicting my chances of admit, I will
5:18 - 5:22

be choosing that. And then field used
5:22 - 5:24

for predicting. That means here basically
5:24 - 5:27

you are choosing your features, right? So
5:27 - 5:30

I will be choosing all my columns. So
5:30 - 5:34

here if you see, the concept of simple
5:34 - 5:36

linear regression and multiple linear
5:36 - 5:38

regression comes up, right? If I choose a
5:38 - 5:41

single feature, it will become a simple
5:41 - 5:43

linear regression. If I choose multiple
5:43 - 5:45

feature, it will become a multiple linear
5:45 - 5:48

regression. So for now, I will be choosing
5:48 - 5:52

for all, okay? Now here if you see, the
5:52 - 5:55

split for training, right? So here
5:55 - 5:57

basically it is what is happening is you
5:57 - 6:00

are splitting the whole dataset between
6:00 - 6:03

a training and test dataset. There are-
6:03 - 6:04

Here currently it is 50 percent, 50 percent.
6:04 - 6:07

That means the first 50 percent data
6:07 - 6:09

will be used for training and the rest
6:09 - 6:11

50 percent data will be used for testing
6:11 - 6:15

purpose. I'll slide this one, it goes like
6:15 - 6:19

this one, I'll keep 70 and 30, okay? Now,
6:19 - 6:23

fit intercept, okay? That means, if you
6:23 - 6:25

remember from my machine learning video,
6:25 - 6:30

not only we have the slope value for
6:30 - 6:33

each and every feature, we also have a
6:33 - 6:36

intercept of y axis intercept by the
6:36 - 6:39

way. So by this option, you are
6:39 - 6:41

basically choosing whether
6:41 - 6:44

your model should include an implicit
6:44 - 6:47

intercept terms or not, okay? Now notes
6:47 - 6:49

you can give some meaningful
6:49 - 6:52

notes. Maybe the notes could be like what
6:52 - 6:53

are the fields you are using for
6:53 - 6:56

prediction purpose. So- and some
6:56 - 6:58

meaningful note which will be useful in
6:58 - 7:00

later when we'll see the history of the
7:00 - 7:05

model, okay? So I will say using all the
7:05 - 7:11

features, using all the features, okay? Now
7:11 - 7:13

after all is done, you need to click on
7:13 - 7:17

fit model. So it's basically- behind the
7:17 - 7:20

scene, what it do, it runs Splunk
7:20 - 7:22

custom command which basically
7:22 - 7:25

implemented [inaudible]. So
7:25 - 7:28

using that particular command, it is
7:28 - 7:30

trying to come up with the equation of
7:30 - 7:32

that line, right? Which we discussed
7:32 - 7:37

before. And if you remember from my
7:37 - 7:39

multiple linear regression video, we come
7:39 - 7:42

up with a linear algebra solution over
7:42 - 7:46

there, right? With matrix inversion
7:46 - 7:49

and matrix transpose, right? So behind the
7:49 - 7:50

scene it is doing the same thing over
7:50 - 7:51

there, okay?
7:51 - 7:54

So now if you see, the result came up,
7:54 - 7:56

right, after clicking on the fit model.
7:56 - 8:00

Now if you see, apart from our own data,
8:00 - 8:04

it's actually added two new columns over
8:04 - 8:06

here. One is the predicted chances of
8:06 - 8:09

admit and the residual column, right? Now,
8:09 - 8:11

predicted chance of admit is actually
8:11 - 8:13

the actual prediction happen on the data,
8:13 - 8:17

right? So if you see for the first row,
8:17 - 8:21

the actual chance of admit is 0.73,
8:21 - 8:23

that means 73%. Now the predicted
8:23 - 8:26

was 0.70, that means 70 percent. Now,
8:26 - 8:28

the residual column is the difference
8:28 - 8:30

between the actual chance of admit and
8:30 - 8:34

the predicted chance of admit, okay?
8:34 - 8:37

So this is how, after fitting the model,
8:37 - 8:40

it came up with this kind of
8:40 - 8:40

visualization.
8:40 - 8:44

It also shows up, there are other five to
8:44 - 8:46

six charts over here, okay? Now let us
8:46 - 8:49

discuss one by one this one. The first
8:49 - 8:52

chart show me the actual versus
8:52 - 8:54

predicted line chart. That means
8:54 - 8:57

if you see the chance of admit, the blue
8:57 - 9:00

colored graph, is the actual one, and the
9:00 - 9:02

predicted chance of admit, the yellow
9:02 - 9:04

color one, is the prediction one, right?
9:04 - 9:07

And if you see by seeing this one, we can
9:07 - 9:10

at least see this particular model is
9:10 - 9:13

okay fit to this particular data.
9:13 - 9:15

Somewhere it is lagging over here if you
9:15 - 9:18

see it, right? But somehow it's
9:18 - 9:22

actually fitting good over there. Now the
9:22 - 9:24

residual chart, whatever you are seeing
9:24 - 9:26

it over here, the line chart it is
9:26 - 9:29

showing up over here, okay? So now the
9:29 - 9:32

more this chart particular chart is
9:32 - 9:35

close to zero, that means the model is
9:35 - 9:37

fitting really, really good. But over here
9:37 - 9:40

if you see the latter part of this one,
9:40 - 9:43

the residuals are more, right? Because it
9:43 - 9:46

is more sparse, more distance from the
9:46 - 9:48

zeroth line. And the same thing is
9:48 - 9:50

reflecting over here as well. The model
9:50 - 9:53

has some kind of lagging over here, right?
9:53 - 9:56

So this kind of analysis you can do
9:56 - 9:59

it from there, how the model is fitting
9:59 - 10:02

your data. And this particular graph is
10:02 - 10:04

showing me the scatter plot of the
10:04 - 10:06

actual and the predicted one. And here
10:06 - 10:09

basically you can see how the line
10:09 - 10:12

is fitting your data over here through
10:12 - 10:16

this chart, okay? Now, it also provides say
10:16 - 10:20

residual histogram where let us
10:20 - 10:22

understand this one as well. So we
10:22 - 10:24

have the zeroth line over here if you
10:24 - 10:28

see. It's basically shows up for each and
10:28 - 10:30

every residual value, how many counts are
10:30 - 10:33

there if you see. So if you just
10:33 - 10:37

think about it, if for all my data points
10:37 - 10:41

this residual is zero, that's the
10:41 - 10:43

ideal scenario, right? That means I am
10:43 - 10:45

predicting the [inaudible], right?
10:45 - 10:49

So from this histogram, if you see that
10:49 - 10:52

means- if you see the residual error
10:52 - 10:54

equals to zero, the sample count is 24
10:54 - 10:57

[inaudible], right? If the more and more
10:57 - 11:00

samples are very close to this zero, that
11:00 - 11:04

means my model is doing good, that's it's
11:04 - 11:06

actually good fit model. And if it is
11:06 - 11:08

more sparse, if-
11:08 - 11:12

that means if we have more number of big
11:12 - 11:14

lines over here, that means that somehow the
11:14 - 11:16

model is not good- not a good fit for
11:16 - 11:18

that particular data. So this kind of
11:18 - 11:20

interpretation, you can do it from this
11:20 - 11:24

particular diagram, okay? So now there are
11:24 - 11:28

another two things over here. It's called R squared
11:28 - 11:30

statistic and root mean square
11:30 - 11:33

error, okay? So these two are actually a measure
11:33 - 11:37

about how accurate the model is, okay? So
11:37 - 11:41

I'll be discussing this measurement in
11:41 - 11:43

very detail in separate video.
11:43 - 11:45

There we will be discussing about R squared
11:45 - 11:47

statistic, root mean square, and also some
11:47 - 11:51

other way to determine how the
11:51 - 11:53

accurate the model is. Just like bias,
11:53 - 11:55

variance, there are a lot of other
11:55 - 11:56

measurement as well
11:56 - 11:59

we'll discuss in detail over there, okay?
11:59 - 12:00

But for now, just try to remember
12:00 - 12:03

like this is the fit, measurement of fit,
12:03 - 12:05

like maybe R squared statistic we
12:05 - 12:09

can think of it it is more close to 1, it's
12:09 - 12:11

a good fit. Something like this, okay?
12:11 - 12:15

Mmm, so we will see like how to best
12:15 - 12:18

judge a model based on that, okay? But
12:18 - 12:20

still like even for R squared statistic,
12:20 - 12:25

it's all depend on the context, the field,
12:25 - 12:28

you are solving, you're implementing
12:28 - 12:30

linear regression as well. We'll discuss
12:30 - 12:32

those stuff as well in future, okay? And now,
12:32 - 12:35

if you see the last graph, it is showing
12:35 - 12:38

me the model parameters. If you remember
12:38 - 12:40

the big equation we have written into over
12:40 - 12:43

there, right? So let me open the bamboo
12:43 - 12:56

paper here. If you remember, when we
12:56 - 13:00

talked about multiple linear regression,
13:02 - 13:05

we defined- we started our discussion
13:05 - 13:07

with a big equation, right? So let me go
13:07 - 13:11

back over there.
13:21 - 13:26

Yes, so this one, right? So where beta 1,
13:26 - 13:30

beta 2, to beta P is our slow value,
13:30 - 13:31

coefficient of each and every feature.
13:31 - 13:35

And beta 0 is my intercept, right? And
13:35 - 13:37

what we are doing basically at the
13:37 - 13:39

end of the day, we came up with a big
13:39 - 13:42

equation to determine this whole beta
13:42 - 13:46

vector, right? So this is the same stuff
13:46 - 13:49

over here it is representing. So it is
13:49 - 13:51

basically giving me like for each and
13:51 - 13:53

every feature, what is the coefficient
13:53 - 13:57

value, okay? So- and the intercept value as
13:57 - 14:00

well. If you see, this is my beta 0, and my
14:00 - 14:02

beta 1 to beta P's, these guys,
14:02 - 14:04

other guys. Now, if you see it closely
14:04 - 14:07

there are some of the coefficient which
14:07 - 14:10

have very greater value. Some of the
14:10 - 14:11

coefficient which are very less value
14:11 - 14:14

over here. Like the way to interpret the
14:14 - 14:19

coefficient is like how much it is
14:19 - 14:22

influencing the end result.
14:22 - 14:25

So to understand that, let us see this
14:25 - 14:29

one. Let's say I have a variable called 'x'
14:29 - 14:34

and I am writing something like 0.9 'y'.
14:34 - 14:36

Now, what do I mean by this particular
14:36 - 14:41

equation. 0.9 into 'y', right? So that means
14:41 - 14:45

if I give 'y' equals to 1, that means
14:45 - 14:50

my 'x' will become 0.9, right? So what do
14:50 - 14:52

we mean by that? That means one unit
14:52 - 14:53

change in 'y',
14:53 - 14:56

it's basically 0.9 unit we are
14:56 - 15:01

changing in 'x', right? So this kind of
15:01 - 15:03

interpretation, you can do it. So that
15:03 - 15:09

means how 'y' is influencing 'x', right? So
15:09 - 15:11

this is how we are interpreting this
15:11 - 15:14

kind of coefficients as well in linear
15:14 - 15:17

regression. So that means we will know
15:17 - 15:19

from the coefficient itself which
15:19 - 15:22

particular feature is mostly influencing
15:22 - 15:24

that one. And now if you see it over here,
15:24 - 15:24

I think,
15:24 - 15:27

CGPA is the most influencing factor to
15:27 - 15:31

determine whether my chances of admit
15:31 - 15:32

is higher
15:32 - 15:36

or not, right? Considering we are
15:36 - 15:38

implementing a linear regression, there
15:38 - 15:40

could be a better fit of this particular
15:40 - 15:43

data which we need to experiment and see.
15:43 - 15:45

But for the current linear
15:45 - 15:47

regression implementation, we can
15:47 - 15:50

conclude this kind of stuff over here,
15:50 - 15:58

right? Correct? So this is how the model
15:58 - 16:02

parameters summary visualization table
16:02 - 16:03

visualization is telling me those
16:03 - 16:06

different those details, right? So now if
16:06 - 16:09

you see, we actually fit our model, right?
16:09 - 16:12

So we still [inaudible] that our model and
16:12 - 16:14

tell our analyst we are saving it. That's
16:14 - 16:17

why it is showing me as a draft status of
16:17 - 16:22

your model, right? And you can now go to
16:22 - 16:27

experiment history to see what you have
16:27 - 16:29

done till now. So it will be maintaining
16:29 - 16:32

a history over there. So now I can see
16:32 - 16:36

using this- all these features my R
16:36 - 16:39

squared statistic is somewhere around 78%,
16:39 - 16:41

and these are my coefficient, and I am
16:41 - 16:43

coming up with a conclusion that maybe
16:43 - 16:46

CGPA is the most influential factor over
16:46 - 16:49

here, okay? So let us do another
16:49 - 16:53

experiment, okay? So in here, I'll keep my
16:53 - 16:57

CGPA over here just to see whether it is
16:57 - 17:00

actually true or not, okay? So now what I
17:00 - 17:03

will do here is I will keep CGPA,
17:03 - 17:06

I'll keep the [inaudible], I will keep-
17:06 - 17:09

the I will keep the LOR, okay? I'll
17:09 - 17:14

keep the research one, and I will keep the
17:14 - 17:19

GRE score, okay? So I'll click over here
17:19 - 17:22

again. I will keep the GRE score. I will
17:22 - 17:24

remove the TOEFL score. I will remove the
17:24 - 17:26

university rating. I will remove the SOP.
17:26 - 17:29

CGPA, Research, and LOR I will keep. So
17:29 - 17:32

now I am trying to do this experiment
17:32 - 17:35

with four features which I am thinking
17:35 - 17:40

maybe most influential one. So maybe the
17:40 - 17:43

other feature may not have much impact
17:43 - 17:46

on this particular prediction, okay?
17:46 - 17:50

So now using only- I'll keep a note, using
17:50 - 17:55

only four features. So this is how this
17:55 - 17:58

particular note is coming into handy
17:58 - 18:01

over here, right? So it is- when I will see
18:01 - 18:03

the history, I will come to know what I
18:03 - 18:05

have done over there, okay? So I will
18:05 - 18:09

click on 'Fit Model' again. Let's see how
18:09 - 18:14

it's- how it's working now. So similar
18:14 - 18:15

stuff is happening over there. It's
18:15 - 18:19

running the custom commands.
18:19 - 18:22

In later videos, we will discuss in
18:22 - 18:23

detail of those
18:23 - 18:34

custom command as well, okay? Okay, so now if
18:34 - 18:38

you see, it again predicted that one. Now
18:38 - 18:41

if you see from the actual versus line
18:41 - 18:43

chart, it's more or less keeping same
18:43 - 18:45

even though I removed three features,
18:45 - 18:48

right? Even this one as well, more or less,
18:48 - 18:52

okay? Now if you see my R squared
18:52 - 18:54

statistics has improved a lot with 82%,
18:54 - 18:58

right? So by this one, at least I am
18:58 - 19:01

confident that really those three
19:01 - 19:04

features are not impacting much of
19:04 - 19:08

it. And if you see from this one residual
19:08 - 19:11

histogram, residuals histogram, that more
19:11 - 19:13

and more features are very close to zero,
19:13 - 19:16

right? With residual error-
19:16 - 19:18

more residual errors are very, very
19:18 - 19:20

close to zero, right?
19:20 - 19:23

So by this kind of analysis, we can say
19:23 - 19:25

this particular model is better than
19:25 - 19:28

compared to my previous model, right? So
19:28 - 19:31

now what I will do is I will save this
19:31 - 19:33

particular model, okay? So I will save, I
19:33 - 19:38

will give the experiment title as
19:39 - 19:47

'graduate_date_predictor', okay? I will
19:47 - 19:51

click on save. So now a data- a model will
19:51 - 19:55

be created, okay? So now if I just- we have
19:55 - 19:56

two options over here after you save the
19:56 - 19:58

model. Either you have to- you can go to
19:58 - 20:00

the listing page
20:00 - 20:02

or you continue editing, okay? Let us
20:02 - 20:04

continue editing to see how experiment
20:04 - 20:06

history is looking now. Now experiment
20:06 - 20:09

history has two rows over there, okay?
20:09 - 20:12

The first row is my- the current
20:12 - 20:14

experiment with my four features, right?
20:14 - 20:18

With R squared value of 82%. The second
20:18 - 20:20

row is telling me my older one, right?
20:20 - 20:22

So at any point of time, you can load
20:22 - 20:24

this corresponding settings and
20:24 - 20:26

experiment with it, okay?
20:26 - 20:27

It will also show you the data
20:27 - 20:30

corresponding to each experiment,
20:30 - 20:32

okay? So now let's go back to our
20:32 - 20:35

experiment tab and see what is happening
20:35 - 20:37

over there, okay? Now if you see my
20:37 - 20:40

experiment tab, it's not showing me
20:40 - 20:43

those big blocks, right? Mmm, it is
20:43 - 20:44

showing with this kind of view where I
20:44 - 20:47

have a predict numeric fields, a single
20:47 - 20:50

experiment I have done. I have given the
20:50 - 20:53

experiment name like this one, right?
20:53 - 20:54

The algorithm I have chosen, linear
20:54 - 20:56

regression. There are lot of actions you
20:56 - 20:59

can do on this particular model so
20:59 - 21:01

before publishing, let us talk about
21:01 - 21:03

that one, okay? You can create an alert
21:03 - 21:07

from this model just to see. So suppose-
21:07 - 21:10

the model is predicting data, right? So
21:10 - 21:12

you can choose an alert, create an alert,
21:12 - 21:15

something like when my predicted chance
21:15 - 21:16

of admit is greater than 90 percent, that
21:16 - 21:20

means 0.9, okay? Fine, 99 maybe. That means
21:20 - 21:23

the model is really, really working good
21:23 - 21:25

over there, right? So this kind of alert
21:25 - 21:30

you can do, okay? Next you can edit the
21:30 - 21:32

title and description. It's a simple
21:32 - 21:36

enough. Now you can see schedule a
21:36 - 21:37

training. This is an interesting feature
21:37 - 21:41

where we- whatever we have done till now,
21:41 - 21:43

we have done manual training over
21:43 - 21:45

there, right? Now, in the scheduled
21:45 - 21:47

training feature, that you can create a
21:47 - 21:49

scheduler which will run a training
21:49 - 21:52

based on the data. Now, if you see, there
21:52 - 21:54

is a time range over there. So you can
21:54 - 21:56

choose the time range of the data you
21:56 - 21:59

want to use for training purpose, okay?
21:59 - 22:01

That's a really interesting feature you
22:01 - 22:03

have, so that means the more and more
22:03 - 22:06

data coming to your system, you can use
22:06 - 22:09

those particular data, right, to training
22:09 - 22:11

purposes as well automatically using
22:11 - 22:13

the scheduled training, okay? And
22:13 - 22:17

similarly for other scheduling stuff, the
22:17 - 22:18

schedule priority and schedule window,
22:18 - 22:21

you can set it up as well. Even you can
22:21 - 22:23

trigger an action as well when the
22:23 - 22:24

scheduling is happening, you either you
22:24 - 22:27

can run a log, you can send the log file
22:27 - 22:30

output to a lookup, everything. This is
22:30 - 22:32

normal scheduling purposes, okay? That is
22:32 - 22:34

also you can do over here. So this is a
22:34 - 22:36

very versatile feature as well with the
22:36 - 22:39

model you can do. And now you can delete
22:39 - 22:43

it as well that's fine. So now we will
22:43 - 22:45

publish this model, okay?
22:45 - 22:56

Let's say 'chances_of_admit_model', okay?
22:56 - 22:58

This is the model name, and the
22:58 - 23:00

destination app you will be choosing
23:00 - 23:02

over here, so the model will be saved over
23:02 - 23:04

there, okay? I will be choosing my search
23:04 - 23:08

and reporting app, I will click on submit,
23:08 - 23:13

okay? So the model is created now. So how
23:13 - 23:15

the model is created in the background?
23:15 - 23:17

It's basically a lookup file, so let us
23:17 - 23:23

see that, okay? So from the Splunk home,
23:23 - 23:28

etc, apps, search, okay,
23:28 - 23:31

lookups. Okay so currently if you see it
23:31 - 23:33

over here mmm,
23:33 - 23:36

it's the- by default the model is saved
23:36 - 23:39

as a user context so it is- that's why it
23:39 - 23:41

is not coming up under search. So further
23:41 - 23:43

what I need to do, I need to go to etc,
23:43 - 23:47

then I need to go to users. Currently I'm
23:47 - 23:50

the admin user, go to admin, and
23:50 - 23:53

I'll go to the search app. And here in
23:53 - 23:55

the lookup folder, this is how the model
23:55 - 23:57

is getting stored over there, okay? So I
23:57 - 24:00

think this lookup is in read-only format,
24:00 - 24:04

so if I just open it in notepad- so this is
24:04 - 24:06

how it looks like. So this is
24:06 - 24:09

basically saving lot of the information,
24:09 - 24:11

the metadata related information about
24:11 - 24:13

the model over here, okay? What are the
24:13 - 24:16

feature variables, whatever the columns I
24:16 - 24:19

have in my data, okay? All of these things
24:19 - 24:22

[inaudible] others
24:22 - 24:24

features which we do not have any
24:24 - 24:26

control about, it is saving over there,
24:26 - 24:32

okay? So now we created our own model,
24:32 - 24:34

right? Now we need to apply this, right? How we
24:34 - 24:36

are going to apply this? There is a
24:36 - 24:40

command called apply in Splunk MLTK,
24:40 - 24:43

okay? So by using that command, you can
24:43 - 24:46

apply that particular model on any dataset,
24:46 - 24:49

okay? Or specifically we'll be
24:49 - 24:53

doing in dataset itself, otherwise if you apply
24:53 - 24:56

that model on any [inaudible] dataset,
24:56 - 24:59

it will anyhow not gonna not
24:59 - 25:01

going to give you a proper results. So
25:01 - 25:04

this is how you will be applying the
25:04 - 25:08

model. So I'll have my- this is my dataset,
25:08 - 25:10

base dataset, right? I'll just
25:10 - 25:13

choose [inaudible] last hundred records,
25:13 - 25:20

okay? Let's last 200 records, okay? Now I
25:20 - 25:23

will be using the apply command. Don't
25:23 - 25:24

worry about it, I will be discussing this
25:24 - 25:27

Splunk MLTK commands in detail in
25:27 - 25:31

my next video. So here we will just see
25:31 - 25:34

how we are just applying the model. So
25:34 - 25:37

now I will see my apply command, then my
25:37 - 25:41

model name, right? So we have given our
25:41 - 25:45

model name as 'chances_of_admit_model'.
25:45 - 25:55

I'll just copy it, okay? And I will just run it.
25:55 - 25:58

So what it should do basically, it will
25:58 - 26:01

apply this particular model or that what
26:01 - 26:05

whatever, okay. So it is permission denied
26:05 - 26:08

it is saying now. So for that what I need
26:08 - 26:15

to do is settings, lookups, okay, lookup
26:15 - 26:24

table files, I'll choose this one, search
26:24 - 26:27

and reporting, okay? This is my chances of
26:27 - 26:29

admit model. Currently it is in private
26:29 - 26:32

mode, that's why I am not able to apply
26:32 - 26:35

it on from the search app. So I'll choose
26:35 - 26:38

this app only, read write currently I will
26:38 - 26:40

give, I'll click on save,
26:40 - 26:47

okay? Internal error, data could not be
26:47 - 26:50

written on to- okay. So let me see what's
26:50 - 26:56

going on over there. Okay so I think
26:56 - 26:58

there was some technical glitch, so I
26:58 - 27:02

just did the permission again. And I-
27:02 - 27:05

this time I chosen all apps, I think it works
27:05 - 27:09

now. So now let us see whether our search
27:09 - 27:11

is working or not.
27:11 - 27:15

Okay so I have taken the last 200
27:15 - 27:18

records and I'm just clicking on apply,
27:18 - 27:20

the machine learning one, machine
27:20 - 27:23

learning model. So it is- if you see that
27:23 - 27:24

it is applying that model on this
27:24 - 27:27

particular two hundred records, two
27:27 - 27:29

hundred events over there, and it has
27:29 - 27:31

created a new column called predicted
27:31 - 27:34

chances of admit, okay? So this is how we
27:34 - 27:36

are applying that model. Even you can
27:36 - 27:39

create your own alert using this
27:39 - 27:41

particular command as well, so that
27:41 - 27:43

whenever you want something
27:43 - 27:46

like chances of admit is more than 90
27:46 - 27:48

percent 80 percent or any other
27:48 - 27:50

[inaudible] you want, you can use this
27:50 - 27:53

particular command to achieve that
27:53 - 27:55

same thing over there, okay? So this is
27:55 - 28:00

how you can experiment with machine
28:00 - 28:02

learning, specifically the linear
28:02 - 28:07

regression in Splunk MLTK. And we
28:07 - 28:09

saw that a lot of experiments we have
28:09 - 28:12

done it regarding this one, right? So this
28:12 - 28:13

is how you experiment with your data as
28:13 - 28:17

well and see how its best fit
28:17 - 28:19

your data, and you can achieve a lot of
28:19 - 28:22

other stuff like automatically training,
28:22 - 28:24

creating alerts from these things as
28:24 - 28:27

well, okay? In next video, we will talk
28:27 - 28:29

more details, we will basically deep dive
28:29 - 28:31

into what basically internally happening
28:31 - 28:34

over here. We will talk about different
28:34 - 28:36

Splunk commands internally running, the
28:36 - 28:37

custom commands internal running. And
28:37 - 28:40

whatever we have done, this experiment we
28:40 - 28:42

have done from the UI, the same thing can
28:42 - 28:45

be achieved from the search
28:45 - 28:46

command
28:46 - 28:49

as well from Splunk SPL as well, okay? See
28:49 - 28:52

you in next video.

Title:: Splunk MLTK : Implementation Of Linear Regression In Splunk MLTK
Description:: more » « less
Video Language:: English
Duration:: 28:51

	OEVIDEOS edited English subtitles for Splunk MLTK : Implementation Of Linear Regression In Splunk MLTK
	OEVIDEOS edited English subtitles for Splunk MLTK : Implementation Of Linear Regression In Splunk MLTK
	OEVIDEOS edited English subtitles for Splunk MLTK : Implementation Of Linear Regression In Splunk MLTK
	OEVIDEOS edited English subtitles for Splunk MLTK : Implementation Of Linear Regression In Splunk MLTK
	OEVIDEOS edited English subtitles for Splunk MLTK : Implementation Of Linear Regression In Splunk MLTK
	OEVIDEOS edited English subtitles for Splunk MLTK : Implementation Of Linear Regression In Splunk MLTK
	OEVIDEOS edited English subtitles for Splunk MLTK : Implementation Of Linear Regression In Splunk MLTK
	OEVIDEOS edited English subtitles for Splunk MLTK : Implementation Of Linear Regression In Splunk MLTK

Show all

English subtitles

Revisions Compare revisions

Revision 11 Edited

OEVIDEOS
Revision 10 Edited

OEVIDEOS
Revision 9 Edited

OEVIDEOS
Revision 8 Edited

OEVIDEOS
Revision 7 Edited

OEVIDEOS
Revision 6 Edited

OEVIDEOS
Revision 5 Edited

OEVIDEOS
Revision 4 Edited

OEVIDEOS
Revision 3 Edited

OEVIDEOS
Revision 2 Edited

OEVIDEOS
Revision 1 Uploaded

OEVIDEOS

	Revision Number	Author	Created
	11	OEVIDEOS
	10	OEVIDEOS
	9	OEVIDEOS
	8	OEVIDEOS
	7	OEVIDEOS
	6	OEVIDEOS
	5	OEVIDEOS
	4	OEVIDEOS
	3	OEVIDEOS
	2	OEVIDEOS
	1	OEVIDEOS

Splunk MLTK : Implementation Of Linear Regression In Splunk MLTK

Revisions Compare revisions

Our website uses cookies

Operating cookies (Required)