Anomaly Detection So Easy Your Grandma Can Do It. No ML degree Required Splunk .conf 2024 Rehearsal

Rollback to version 1

0:02 - 0:03

all right welcome to another lame
0:03 - 0:05

Creations video this is going to be more
0:05 - 0:08

or less a re uh visit of the uh
0:08 - 0:11

splunk.com address I gave this uh
0:11 - 0:16

previous June 2024 in Vegas and so let's
0:16 - 0:18

go give it it was entitled anomaly
0:18 - 0:20

detection so easy your Gramma could do
0:20 - 0:22

it no ml degree
0:22 - 0:23

required I'm not going to spend much
0:23 - 0:25

time if you don't know who I am I'm log
0:25 - 0:27

I'm Troy Moore from log analysis Made
0:27 - 0:31

Easy here's my contact information
0:31 - 0:33

all right what we're going to discuss in
0:33 - 0:35

this uh conference is breakout session
0:35 - 0:37

is common baselines you might need what
0:37 - 0:39

are some things you might need in your
0:39 - 0:41

company how to make those baselines
0:41 - 0:43

let's give a demo I'm not a Death by
0:43 - 0:45

PowerPoint person so we're going to go
0:45 - 0:47

into a live demo on this what should you
0:47 - 0:49

do after seeing this presentation and
0:49 - 0:51

some gotas to
0:51 - 0:54

baselining let's discuss what baselining
0:54 - 0:57

is baselining is the expected values or
0:57 - 0:58

conditions against which all
0:58 - 1:00

performances are compared I'd have given
1:00 - 1:02

a definition to it what is that in
1:02 - 1:05

practice let's go the other way common
1:05 - 1:07

baselines let's discuss what some are
1:07 - 1:09

maybe a hardware Baseline can I get a
1:09 - 1:11

software Baseline can I get network
1:11 - 1:13

ports and protocol baselines user
1:13 - 1:16

baselines user Behavior
1:16 - 1:18

baselines I know that I've been working
1:18 - 1:20

in uh the cyber world for many many
1:20 - 1:23

years and I will ask as an auditor do
1:23 - 1:26

you by chance have an inventory and you
1:26 - 1:29

know what it's a funny answer ask
1:29 - 1:31

yourself do you does your your company
1:31 - 1:33

have a network inventory what how
1:33 - 1:34

thorough is it how accurate is that
1:34 - 1:37

Network inventory well if you don't have
1:37 - 1:39

that can you give me a baseline of what
1:39 - 1:41

your is on your network a lot of people
1:41 - 1:42

tell you it's really difficult to give a
1:42 - 1:44

baseline if you don't have a network
1:44 - 1:47

inventory and so you'll ask these
1:47 - 1:49

questions and often you don't get the
1:49 - 1:51

answers I know I've told otter year
1:51 - 1:53

after year at the places I've worked I
1:53 - 1:55

don't have an inventory I don't have
1:55 - 1:56

those kind of
1:56 - 1:58

things what we're going to do in this
1:58 - 1:59

presentation is we're going to show how
1:59 - 2:01

Splunk and statistical models make you
2:01 - 2:03

that hero you can be the person who
2:03 - 2:05

provides that inventory you can be the
2:05 - 2:08

person in this presentation who provides
2:08 - 2:11

baselines and can show what is normal in
2:11 - 2:13

our
2:13 - 2:15

environment and what I'm going to do is
2:15 - 2:16

in order to know what's normal we need
2:16 - 2:19

to look to the past it sounds hopefully
2:19 - 2:21

this makes sense if you don't know
2:21 - 2:22

what's happening in the past you won't
2:22 - 2:25

be able to know what's normal the the
2:25 - 2:28

past is what defines normal normaly and
2:28 - 2:30

so what we can do is we can look at his
2:30 - 2:32

historical IP logs have the connections
2:32 - 2:34

and that can help us build a baseline we
2:34 - 2:36

can track the processes that have been
2:36 - 2:38

running and we can build a baseline we
2:38 - 2:40

can look at the ports used by systems
2:40 - 2:42

historically and we can build a baseline
2:42 - 2:45

we can track historical login events and
2:45 - 2:48

we can build a login event Baseline
2:48 - 2:51

Splunk is a logging system so if you've
2:51 - 2:53

been if you've been getting those logs
2:53 - 2:57

then you have a tool now to build a
2:57 - 3:00

baseline so here is the concept that
3:00 - 3:01

I'll need you to understand to be able
3:01 - 3:03

to understand everything else there are
3:03 - 3:05

two methods for baselining there's what
3:05 - 3:07

I call the rolling window and the allow
3:07 - 3:10

listing the the rolling window is the
3:10 - 3:13

easiest way to start a baseline it's in
3:13 - 3:15

my opinion the most simple method and
3:15 - 3:17

the concept is we're going to use this
3:17 - 3:21

little bar here we've got an X we're
3:21 - 3:23

going to say this is a full line and
3:23 - 3:25

this is a timeline so this might be one
3:25 - 3:28

day this might be a a week a month a
3:28 - 3:31

year whatever the cas may be 3 months
3:31 - 3:34

and this is a a a historical part of
3:34 - 3:36

that time so let's say it's a one day
3:36 - 3:38

this could be 23 hours this could be a
3:38 - 3:40

week so this could be six days this
3:40 - 3:42

could be a month this could be 29 days
3:42 - 3:44

this could be a year this could be 11
3:44 - 3:47

months or the case may and Y is a
3:47 - 3:50

portion of that time that we're going to
3:50 - 3:52

look at the X portion will be our
3:52 - 3:55

Baseline it'll be the historical events
3:55 - 3:56

the Y is what we're going to look at
3:56 - 3:58

we're going to say hey looking at all
3:58 - 4:00

these events that have occurred are
4:00 - 4:02

there there any events in y that were
4:02 - 4:05

not in this Baseline we're not in X and
4:05 - 4:06

if we do that that's the definition of
4:06 - 4:09

an anomaly anomalies are things that are
4:09 - 4:11

not in your Baseline so we can use a
4:11 - 4:13

rolling window and I will actually demo
4:13 - 4:15

how to do that the other one is allow
4:15 - 4:18

listing in which case we build a list of
4:18 - 4:20

our Baseline again you've got to figure
4:20 - 4:20

out how you're going to build that
4:20 - 4:23

Baseline but if you do that you put that
4:23 - 4:25

list into a lookup file we can do that
4:25 - 4:26

by using the output lookup command and
4:26 - 4:29

then we use a lookup command in our logs
4:29 - 4:31

and we look for uh at all the logs
4:31 - 4:33

coming in and say are any of these logs
4:33 - 4:36

do we not have a matching pair of them
4:36 - 4:37

in our lookup and that would be an
4:37 - 4:41

indication that this is a new anomalous
4:41 - 4:45

event all right I've given the the
4:45 - 4:48

PowerPoint presentation of that we're
4:48 - 4:51

now going to go into demo time we want
4:51 - 4:53

to demo this and show how this works in
4:53 - 4:56

actual practice again the queries are at
4:56 - 4:58

the end of the presentation um I'm going
4:58 - 5:00

to have to give a link to this uh p PF
5:00 - 5:02

so you can just grab them if you want to
5:02 - 5:04

uh use them or just slow down the video
5:04 - 5:05

and I'm sorry you'll have to record them
5:05 - 5:08

that way um but anyway that is what
5:08 - 5:09

we're going to
5:09 - 5:12

do all right for this demo I wanted to
5:12 - 5:14

make sure that any of you could go home
5:14 - 5:17

and use this very same thing the same
5:17 - 5:19

data set so I went and grabbed the
5:19 - 5:21

Splunk freely available boss of the sock
5:21 - 5:25

referred to as Bots V3 you could go grab
5:25 - 5:27

V2 V3 but you can and you can these
5:27 - 5:28

things are going to work on your own
5:28 - 5:31

data but I wanted to make sure that you
5:31 - 5:32

could do these very same scenarios when
5:32 - 5:34

you went back uh back home after this
5:34 - 5:38

conference so I went to Google I type in
5:38 - 5:40

Bots V3 and I type GitHub and that
5:40 - 5:44

brings me back with this little link
5:44 - 5:46

here and it's just an app you can go
5:46 - 5:49

download it it's a relatively large app
5:49 - 5:51

because it contains all the data in an
5:51 - 5:53

index already pre-indexed for you so
5:53 - 5:55

when you put this in here you will have
5:55 - 5:57

all the exact same data that I'm using
5:57 - 5:59

in this in these queries allowing you to
5:59 - 6:02

easily be able to run the exact same
6:02 - 6:04

thing in your own environment and as you
6:04 - 6:05

learn these queries and you'll be able
6:05 - 6:07

to use them elsewhere all the
6:07 - 6:09

documentation is right here if you want
6:09 - 6:12

to use any of these Source types they're
6:12 - 6:13

all there
6:13 - 6:15

available and any of the required
6:15 - 6:17

software that you need to run any of the
6:17 - 6:19

Tas in order to be able to get your data
6:19 - 6:22

to parse correctly and we're going to
6:22 - 6:25

primarily use the stream data and some
6:25 - 6:27

Network loog host
6:27 - 6:30

logs I'm going to jump over here into my
6:30 - 6:33

environment if I do a head 100 on this
6:33 - 6:36

little command here this spots V3 stream
6:36 - 6:40

TCP this is TCP Network traffic and I
6:40 - 6:42

can see the network traffic I can see
6:42 - 6:45

Sears going through I can see
6:45 - 6:47

connections with bytes in and bytes out
6:47 - 6:51

destination IP Source IP desk Port
6:51 - 6:52

Source port and what I'm going to want
6:52 - 6:55

to do is I want to Baseline what is the
6:55 - 6:57

normal IP traffic on my network and
6:57 - 7:00

thereby when I see abnormal IP traffic I
7:00 - 7:03

want to be alerted of it and this has
7:03 - 7:05

varying levels of success based on how
7:05 - 7:08

random how many new machines do your
7:08 - 7:11

systems go out and visit workstations
7:11 - 7:13

browsing the internet they're going to
7:13 - 7:15

have a lot of new IP addresses on
7:15 - 7:17

everyday basis servers they're probably
7:17 - 7:20

not going to go out and talk to a whole
7:20 - 7:23

lot of different devices uh specialized
7:23 - 7:25

devices such as OT operation technology
7:25 - 7:28

devices they won't talk to a lot of
7:28 - 7:31

machines there their communication is
7:31 - 7:34

pretty standard so we can actually use
7:34 - 7:37

that with uh to understand what's going
7:37 - 7:39

on if I come in here let's run that
7:39 - 7:41

query the concept is I'm using an
7:41 - 7:43

alltime query but that's because I'm
7:43 - 7:46

using this bots 3 data I have it in the
7:46 - 7:49

notes in the in the in my PowerPoint how
7:49 - 7:51

to turn this into a 90day rolling window
7:51 - 7:54

but to make this work on the Bots data I
7:54 - 7:58

had to actually set my time and do some
7:58 - 8:00

a little a little bit different so I'll
8:00 - 8:02

show you how that looks what I'm done
8:02 - 8:03

what we're going to do is index equals
8:03 - 8:07

bots 3 Source type equals stream TCP and
8:07 - 8:08

what we're going to do here this is the
8:08 - 8:10

magic we're just going to use a stats
8:10 - 8:12

command if I just did stats count by
8:12 - 8:14

Source IP destination IP that would give
8:14 - 8:17

me every tupal that I've seen during
8:17 - 8:20

this window but if I put the stats Min
8:20 - 8:22

time it's going to give me the earliest
8:22 - 8:25

time the smallest time value that it is
8:25 - 8:28

seen in this tupal and so this is giving
8:28 - 8:30

me the earliest time this is POS popped
8:30 - 8:33

up so I've got a 90-day rolling window
8:33 - 8:35

of tup PS and I will tag it with the
8:35 - 8:39

earliest value scene and if I do that
8:39 - 8:41

just like
8:44 - 8:46

this we'll see here comes back the
8:46 - 8:49

earliest
8:49 - 8:51

time and if I undo that now I'm going to
8:51 - 8:52

come in here I'm going to change it now
8:52 - 8:55

I want to set a time I want to know
8:55 - 8:58

anytime that this earliest time is
8:58 - 9:00

greater so in normal thing I might go
9:00 - 9:02

back 86400 that's the amount of seconds
9:02 - 9:04

in a day and so I might be looking for
9:04 - 9:07

any new tupal in a day I had to use this
9:07 - 9:10

value here to move it to a New Day based
9:10 - 9:12

off this bots 3 data there's only two
9:12 - 9:14

and a half days worth of data in this
9:14 - 9:16

bot's data set so I had but in order to
9:16 - 9:18

make it work so I had to put this
9:18 - 9:19

specific time stamp in normally you
9:19 - 9:22

would be using something like now minus
9:22 - 9:25

86400 and I'll show that but we're going
9:25 - 9:26

to come down here we'll go where
9:26 - 9:29

earliest time is greater than now and so
9:29 - 9:32

if this first time it's been seen is
9:32 - 9:34

greater than this thing we're going to
9:34 - 9:37

get the values back if it's not it'll
9:37 - 9:39

dis it won't show up that's going to
9:39 - 9:41

show and if I do it like in a day Tim
9:41 - 9:43

stamp that'll only show me any of the
9:43 - 9:46

new tupal that I've never seen in 90
9:46 - 9:48

days that have shown up today so if I
9:48 - 9:51

run this we're going to flip this to
9:51 - 9:52

fast
9:52 - 9:54

mode this will come back with all the
9:54 - 9:57

tupol brand new tupol that it's ever
9:57 - 9:59

seen I'm going to tell you this is still
9:59 - 10:01

too long large of a list but part of
10:01 - 10:04

this list would normally drop down the
10:04 - 10:06

fact is the bigger the window you make
10:06 - 10:08

the less values you will have if I'm
10:08 - 10:10

looking at one day and I'm looking at
10:10 - 10:11

the new values you're going to have a
10:11 - 10:13

you're going to have more results if I
10:13 - 10:16

go 90 days the amount of new tupal will
10:16 - 10:19

shrink the bigger the window you have
10:19 - 10:22

over here the smaller the amount of
10:22 - 10:24

results will come because the more of
10:24 - 10:26

those every now and then I go to will be
10:26 - 10:30

included in my list
10:30 - 10:31

all right this works let's grab
10:31 - 10:34

something even a little easier to grasp
10:34 - 10:37

now I'm showing this these are processes
10:37 - 10:38

when I look at the processes I'm looking
10:38 - 10:41

at processes firing off this is
10:41 - 10:43

calculator being run application frame
10:43 - 10:45

host crash plan desktop these are
10:45 - 10:48

processes Auto Machine I want to know if
10:48 - 10:51

there's new processes that have fired
10:51 - 10:54

off we're going back to the exact same
10:54 - 10:56

query we're going at this time we're
10:56 - 10:59

grouping by instance which is like this
10:59 - 11:00

host name here
11:00 - 11:02

sorry instance is application frame host
11:02 - 11:04

and the host here we're going to look at
11:04 - 11:06

instance and host and we're going to
11:06 - 11:09

again grab the earliest time it was
11:09 - 11:11

seen and we're going to do an eval time
11:11 - 11:12

when it's later and then we're going to
11:12 - 11:15

run it so we're basically saying Hey did
11:15 - 11:17

I see this
11:17 - 11:20

value ear what's the newest instances
11:20 - 11:22

that have fired up in the last 24 hours
11:22 - 11:25

that I've not seen over my my time
11:25 - 11:28

period I run
11:28 - 11:32

that and and you can see software
11:32 - 11:34

running processes are going to be a lot
11:34 - 11:36

less running on your system and so we
11:36 - 11:37

can run
11:37 - 11:41

that we get back the new in processes
11:41 - 11:45

that ran on this machine in the last 24
11:45 - 11:47

hours and you can
11:47 - 11:50

see what happens is SCP and SSH those
11:50 - 11:52

are brand new processes if I was doing
11:52 - 11:55

an investigation also sudden machines
11:55 - 11:56

they've never done it they start
11:56 - 11:59

involving SCP and SE SSH that's probably
11:59 - 12:01

be something I want to be looking at and
12:01 - 12:03

so baselining and knowing what your
12:03 - 12:05

systems run and then when new processes
12:05 - 12:07

fire we can look at them and say do I
12:07 - 12:08

want to look at this we can build alerts
12:08 - 12:09

off of
12:09 - 12:12

it let's jump to another example this
12:12 - 12:14

time listening ports the amount of ports
12:14 - 12:16

that your machine is listening on should
12:16 - 12:18

be very static it's not going to change
12:18 - 12:20

a ton but if someone's opened up new
12:20 - 12:22

applications they might be opening up
12:22 - 12:24

new listening ports so you want to look
12:24 - 12:26

at that we can see here kind of the data
12:26 - 12:30

coming back we can see M what machine
12:30 - 12:32

what ports are being opened what they're
12:32 - 12:37

listening on if we use this very same
12:40 - 12:44

concept we can see mintime as early as
12:44 - 12:46

time this time we're looking at host and
12:46 - 12:48

desport that's my pairing that's what
12:48 - 12:51

I'm looking for for anomalies in grab
12:51 - 12:52

the earliest time
12:52 - 12:55

scene grab the window that I want to
12:55 - 12:57

track and I'm going to say where early
12:57 - 12:59

time is greater than now time and in
12:59 - 13:02

this one make sure I flip it to verbose
13:02 - 13:05

because ports are really
13:05 - 13:08

static what a surprise I'm going to get
13:08 - 13:10

zero results back and that's actually
13:10 - 13:12

what I'm looking for that'll work out
13:12 - 13:14

really well for me so I've shown three
13:14 - 13:16

examples here of how you can just grab
13:16 - 13:18

any form of data you look for what you
13:18 - 13:21

want to find as group it by nor at
13:21 - 13:25

what's normal grab a big window and then
13:25 - 13:27

set your time to say anything that's
13:27 - 13:30

occurred that any new Tuple that I I see
13:30 - 13:33

new since this
13:33 - 13:36

time so we jump over here we can quickly
13:36 - 13:38

see this is how it look like in real
13:38 - 13:40

this how we do it at my
13:40 - 13:43

place
13:43 - 13:49

now minus 86400 last 90 days we do the
13:49 - 13:52

now minus 86400 this says give me a
13:52 - 13:56

90-day window go back one
13:56 - 13:59

day very simple we don't change much and
13:59 - 14:01

we just have that
14:01 - 14:03

working now if we come over here we can
14:03 - 14:07

do the exact same thing with our Splunk
14:07 - 14:10

searches we can come over here we can
14:10 - 14:13

take this to the next level another way
14:13 - 14:15

instead of saying I want a 90-day window
14:15 - 14:17

there's a problem with a 90-day window
14:17 - 14:19

as soon as this anomalous event occurs
14:19 - 14:23

it's not going to be anomalous tomorrow
14:23 - 14:24

so what we can do is we can actually
14:24 - 14:27

build a lookup and say I'm going to make
14:27 - 14:29

everything I'm going to do the same
14:29 - 14:32

concept Tuple them together this time I
14:32 - 14:34

don't need a time I'm just going to grab
14:34 - 14:37

all of my tupal and I'm going to Output
14:37 - 14:40

look up into a CSV or I could do a KV
14:40 - 14:44

store and that becomes my window that's
14:44 - 14:48

be and so new anomalous events will not
14:48 - 14:51

repopulate unless I rerun this output
14:51 - 14:53

lookup so when I run this I can do so
14:53 - 14:56

this would be i' build my Baseline and
14:56 - 14:58

then I would I'd have a scheduled search
14:58 - 15:00

or whatever that would search search and
15:00 - 15:02

I'm going to do again stats count and
15:02 - 15:03

I'm going to do a lookup going to match
15:03 - 15:06

on Source IP and desk IP and output
15:06 - 15:08

account say as matched and then I'm
15:08 - 15:10

going to do where is null is matched
15:10 - 15:12

meaning I've got a source IP and
15:12 - 15:14

destination IP but that are not in this
15:14 - 15:17

lookup that would make it null and this
15:17 - 15:19

will alert me and if tomorrow the same
15:19 - 15:21

Source IP and destination IP appears it
15:21 - 15:23

will also alert me because I as long as
15:23 - 15:28

I don't update this lookup table it will
15:28 - 15:30

always be anomalous
15:30 - 15:32

and so there's pros and cons this is a
15:32 - 15:34

dynamic growing list these ones I did
15:34 - 15:37

over here with the where earliest grader
15:37 - 15:39

than but over here we're building a
15:39 - 15:43

lookup list and we're doing a match same
15:43 - 15:44

principle over here we can take the
15:44 - 15:47

perfmon process exact same thing we're
15:47 - 15:50

going to Output it to a CSV and then we
15:50 - 15:53

can set up a search to run every day we
15:53 - 15:56

do this look up on instance on host and
15:56 - 15:59

where it's matched or we can go to
15:59 - 16:03

listening ports we can output the lookup
16:03 - 16:04

we can do this one of the things you
16:04 - 16:09

could do is you could actually take a an
16:09 - 16:13

evaluation of the two you could actually
16:13 - 16:15

look take all the the alerts that are
16:15 - 16:18

popping each day and compare it to this
16:18 - 16:21

L this list and see how much variance
16:21 - 16:24

there is so you could grab a 90-day
16:24 - 16:26

table and then compare it to this output
16:26 - 16:28

lookup there's a lot of ways to Value
16:28 - 16:31

how much is changing on your environment
16:31 - 16:34

but the big key is use the his use your
16:34 - 16:37

historical data to create a Baseline and
16:37 - 16:39

search on
16:40 - 16:44

it all right basic summary there that
16:44 - 16:46

video we showed how we can use the stats
16:46 - 16:48

command to Baseline normal behavior from
16:48 - 16:50

historical data we use that Baseline
16:50 - 16:51

determine new events we're able to
16:51 - 16:53

detect anomalous network connections
16:53 - 16:55

anomalous processes and anomalous open
16:55 - 16:57

ports we then did those very same things
16:57 - 17:00

with a CSV and we Baseline normal
17:00 - 17:02

behavior and we were able to use that
17:02 - 17:03

CSV to detect the same thing network
17:03 - 17:06

connections processes
17:06 - 17:09

hosts so there are some gotas that you
17:09 - 17:11

need to be aware of this is really cool
17:11 - 17:13

process but as you start to get on it my
17:13 - 17:14

don't let the gotas get you don't let
17:14 - 17:17

the Quest for Perfection get in the way
17:17 - 17:19

of getting something done or having a
17:19 - 17:22

good product and the rolling window and
17:22 - 17:25

allow list will get you a good answer
17:25 - 17:27

it's not perfect and there'll be some
17:27 - 17:29

gotes along the road but it's will get
17:29 - 17:31

you most of the way but now that you've
17:31 - 17:33

got those baselines now we're going to
17:33 - 17:34

tell you some things you want to be
17:34 - 17:36

careful of rolling window you're going
17:36 - 17:38

to be alerted one time that the
17:38 - 17:40

anomalous connection occurred and then
17:40 - 17:43

if you remember that X and Y the X being
17:43 - 17:45

the Baseline y being the new events the
17:45 - 17:48

new events from y are going to roll into
17:48 - 17:51

X and now that anomalous event will be
17:51 - 17:52

part of your Baseline so you'll detect
17:52 - 17:55

once and then your anomaly is part of
17:55 - 17:57

your Baseline so you need to be aware of
17:57 - 17:59

that and the frequency of the the times
17:59 - 18:02

you run the alert is run in a day that
18:02 - 18:03

you need to make sure that how often you
18:03 - 18:05

run this alert remember that you can
18:05 - 18:07

have a small window say I'm going to
18:07 - 18:09

look at a 90-day window and I just want
18:09 - 18:12

to look at 1 second so the day will be
18:12 - 18:15

the Y will be 1 second and the Baseline
18:15 - 18:20

will be 899 days 23 hours ex uh 59
18:20 - 18:23

minutes and 59 seconds whatever the fact
18:23 - 18:24

is it's still going to look at 90 days
18:24 - 18:27

worth of data and so no matter how big
18:27 - 18:29

your y window is it's always to take the
18:29 - 18:32

time it's required to run the entire X
18:32 - 18:35

and Y window together so you need to be
18:35 - 18:37

aware that it can take some time to run
18:37 - 18:39

this alert it sounds great to run a
18:39 - 18:41

really long I want an alltime or run a
18:41 - 18:45

real time I want a a yearlong twoyear
18:45 - 18:46

recognize that if you run it every day
18:46 - 18:48

you're still running that query every
18:48 - 18:50

day so there is it's going to take it's
18:50 - 18:52

going to take some time up and you want
18:52 - 18:53

to make sure it doesn't impact the rest
18:53 - 18:55

your the stuff you're doing allow
18:55 - 18:57

listing other on the other hand it's
18:57 - 18:59

going to run against whatever window so
18:59 - 19:00

if you look at the last 10 seconds it's
19:00 - 19:02

only going to run on a 10-second window
19:02 - 19:04

you look at the last hour it's going to
19:04 - 19:07

run on a 1 hour window so it will run
19:07 - 19:10

faster but uh you need to remember that
19:10 - 19:12

as you one how am I going to build that
19:12 - 19:14

Baseline how do I get items new items
19:14 - 19:17

into the Baseline um you'll need to
19:17 - 19:19

address that and remember that a
19:19 - 19:21

baseline it's a CSV a KV store it's
19:21 - 19:24

going to occupy space on your search
19:24 - 19:26

head and you can run out of dis space
19:26 - 19:28

you only have so much typically you
19:28 - 19:30

build a lot of space on our indexers our
19:30 - 19:33

search heads are not huge on dis space
19:33 - 19:34

just be aware as you start to build
19:34 - 19:36

large baselines one you'll have
19:36 - 19:39

performance issues the more values it
19:39 - 19:40

has to look against the slower your
19:40 - 19:42

search will run and it's going to take
19:42 - 19:45

up physical disc spas on your uh machine
19:45 - 19:46

so that's something you need to just
19:46 - 19:48

just be aware
19:48 - 19:50

of I'm going to recommend a hybrid
19:50 - 19:52

approach and that's the ability to com
19:52 - 19:54

combine both we're going to do a rolling
19:54 - 19:58

window and AOW listing and so the basic
19:58 - 20:01

concept is we're going to use uh your
20:01 - 20:03

query goes here so you're going to write
20:03 - 20:04

your query and then you're going to use
20:04 - 20:07

this collect command this uh this is not
20:07 - 20:09

a Comm about the different syntax and
20:09 - 20:11

Splunk but just know that if you use
20:11 - 20:13

this pipe collect command you will write
20:13 - 20:15

to a summary index summary indexes are a
20:15 - 20:18

form of indexing that do not cost you on
20:18 - 20:20

inest on an ingestion license you can
20:20 - 20:22

write to a summary index and then you
20:22 - 20:24

can query that index just like you could
20:24 - 20:26

query any other index and so you can
20:26 - 20:28

save your results in an index and the
20:28 - 20:30

concept is I typically like if I want to
20:30 - 20:32

build these I might write every day a
20:32 - 20:33

query and I'm going to write it to the
20:33 - 20:35

index and it will timestamp it with
20:35 - 20:37

today's information tomorrow we'll have
20:37 - 20:38

tomorrow's information and yesterday
20:38 - 20:40

we'll have yesterday's information and
20:40 - 20:41

we can query it and search it and so
20:41 - 20:43

you'll basically write your query you'll
20:43 - 20:45

run the collect command index equal
20:45 - 20:47

summary source and give it a name then
20:47 - 20:49

what you want to do is now that you've
20:49 - 20:51

done that you'll build that's going to
20:51 - 20:54

be building your alert uh then you're
20:54 - 20:55

going to come in here and you're going
20:55 - 20:58

to look at that summary data and you're
20:58 - 21:02

going going to append to that those
21:02 - 21:04

results that fired for that day so you
21:04 - 21:06

look at the last set of time it ran so
21:06 - 21:08

maybe you run this once a day you look
21:08 - 21:10

at yesterday's results you put that in
21:10 - 21:11

here and then you'll use this append
21:11 - 21:14

command which will append the lookup I
21:14 - 21:16

said allow list it should be a disallow
21:16 - 21:18

list that was a bad writing here gota
21:18 - 21:21

love gota be careful the uh descriptions
21:21 - 21:23

you use this is a you're going to grab a
21:23 - 21:26

list of things you don't you consider to
21:26 - 21:28

be anomalous so if you see these you
21:28 - 21:31

want to flag on them it's not what I've
21:31 - 21:33

done before the CSV which is this is my
21:33 - 21:35

normal Baseline these are bad events I
21:35 - 21:37

don't want these events and so I'm going
21:37 - 21:39

to do the input lookup allow list. CSV
21:39 - 21:41

and I'm going to do a table on the
21:41 - 21:44

matching Fields so this what matching
21:44 - 21:45

from this query over here and then I'm
21:45 - 21:47

going to Stats count by the matching
21:47 - 21:49

fields that will basically dup for those
21:49 - 21:51

who the DD command will remove the
21:51 - 21:54

duplicates but stats does the same thing
21:54 - 21:55

and it's more efficient so if you want
21:55 - 21:57

to write D you can but I I recommend
21:57 - 21:59

that you learn the power of the stats
21:59 - 22:02

command it is fast it's got it uh it's
22:02 - 22:04

just the right it's the right command to
22:04 - 22:05

use so stats count by matching this
22:05 - 22:07

basically removing the duplicate so if
22:07 - 22:10

it was in this index summary index and
22:10 - 22:11

it's in my lookup we're not going to
22:11 - 22:12

write it in twice and then we'll write
22:12 - 22:15

it to this allow lless CSV which will
22:15 - 22:17

update it which means all the new things
22:17 - 22:18

that were found will be then written
22:18 - 22:21

into this lookup and then it will be
22:21 - 22:23

updated it'll have a new lookup with the
22:23 - 22:26

results combined so an example that
22:26 - 22:28

would be index equals Bots V3 uh so
22:28 - 22:31

Source typ equals perm MK process stats
22:31 - 22:34

Min time as early as time Max this is
22:34 - 22:36

all the exact same query nothing's
22:36 - 22:39

really changed the difference is after I
22:39 - 22:40

do this eval time I'm not going to do
22:40 - 22:42

this lookup lookup I'm going to use the
22:42 - 22:44

name of this CSV I'm going to do
22:44 - 22:47

instance as instance host as host output
22:47 - 22:49

instance as recurring I need a value
22:49 - 22:51

that shows hey I matched the CSV and
22:51 - 22:53

then I'm going to go where and I
22:53 - 22:55

actually changed this I forgot Max time
22:55 - 22:57

I want a mint time and a Max time and
22:57 - 22:59

the reason being is the min time is
22:59 - 23:02

looking to see if the value falls in the
23:02 - 23:04

x of the XY on my rowing window the max
23:04 - 23:07

time is to use to find if it's in the Y
23:07 - 23:09

area and we'll explain that so I still
23:09 - 23:10

have the same where earliest time is
23:10 - 23:11

greater than now time that's going to
23:11 - 23:13

say hey I've never seen this event
23:13 - 23:16

before or recurring equals star meaning
23:16 - 23:20

hey I got a match on this value and the
23:20 - 23:22

latest time is greater than the now time
23:22 - 23:25

meaning it's in the Y section that means
23:25 - 23:27

this alert that is on my list of things
23:27 - 23:29

I don't want to keep seeing I don't want
23:29 - 23:30

to
23:30 - 23:33

allow it just showed up again that way
23:33 - 23:35

you can you'll be notified again that it
23:35 - 23:37

occurred you're updating your key your
23:37 - 23:40

uh lookup file and you're using a
23:40 - 23:42

rolling window and so you kind of get
23:42 - 23:44

the best of both worlds and you can use
23:44 - 23:48

this to a as a method to automate
23:48 - 23:51

keeping up to date on any of your
23:51 - 23:54

alerts and now we're going to demo that
23:54 - 23:56

I've got a video of it we're going to go
23:56 - 24:00

watch that and then we'll come back
24:00 - 24:02

all right so this is the hybrid approach
24:02 - 24:03

that we've been talking about it's going
24:03 - 24:05

to look exactly like we did before
24:05 - 24:07

you've got yourself a the the normal
24:07 - 24:10

query we're going to just make some form
24:10 - 24:12

of query here uh this is going to build
24:12 - 24:13

our processes and then we're going to
24:13 - 24:16

write a collect command this col collect
24:16 - 24:18

command will write our results
24:18 - 24:22

into a summary index and that's the
24:22 - 24:24

index equals summary and then we go
24:24 - 24:27

Source equals new process that's going
24:27 - 24:30

to define the the name of the source for
24:30 - 24:32

this summary index and so we're going to
24:32 - 24:34

do that run the run the results and we
24:34 - 24:36

can see that we got two values coming
24:36 - 24:38

back if we jump over here this is where
24:38 - 24:40

we're going to take that very same we
24:40 - 24:42

can see the summary index being run
24:42 - 24:44

there's my two results we can query them
24:44 - 24:47

just like any other index and you'll
24:47 - 24:48

notice index equals summary Source
24:48 - 24:49

equals new process I'm going to use this
24:49 - 24:52

append command this append is going to
24:52 - 24:55

add this lookup this new process. CSV
24:55 - 24:57

and then we're going to put in the table
24:57 - 24:59

of the instance and the host I'm just
24:59 - 25:00

going to do a stats count by instance
25:00 - 25:03

host that's basically going to dup so I
25:03 - 25:04

don't it's going to take the index
25:04 - 25:06

summary and the input lookup and make
25:06 - 25:08

them one if there's any duplicates there
25:08 - 25:10

so I don't get them twice that's what
25:10 - 25:12

that command's going to do it's faster
25:12 - 25:14

than DD I'm going to Output the lookup
25:14 - 25:17

to new process. CSV and that's going to
25:17 - 25:19

write that what was the original new
25:19 - 25:21

process CSV and it's going to update
25:21 - 25:23

with any new fills coming from this
25:23 - 25:24

summary
25:24 - 25:27

index so we can see that being run if I
25:27 - 25:29

go over and we're going to get that
25:29 - 25:32

taken care of we go run
25:32 - 25:34

this what it's going to do it's going to
25:34 - 25:35

grab the summary index stuff and it's
25:35 - 25:36

going to grab the stuff that was already
25:36 - 25:38

in my CSV and it's going to write them
25:38 - 25:42

in there and so now I have four
25:42 - 25:44

values and those all got written to the
25:44 - 25:46

CSV now I'm going to write my query that
25:46 - 25:48

I've been doing this rolling window all
25:48 - 25:50

all over again we're going to do the
25:50 - 25:53

difference is I'm going to add a Max
25:53 - 25:54

time in there it's not just going to be
25:54 - 25:56

a m time it's also going to be a Max
25:56 - 25:57

time so I can look at the Y side of the
25:57 - 26:00

equation and then I'm going to do this
26:00 - 26:02

lookup new process instance as instance
26:02 - 26:05

host as host output instance as
26:05 - 26:07

recurring I need to make the output
26:07 - 26:09

instance that shows me what matched on
26:09 - 26:10

this lookup it's kind of like a join
26:10 - 26:12

command it's going to join the CSV to
26:12 - 26:14

the previous values and whatever
26:14 - 26:17

whatever matches is going to be output
26:17 - 26:18

and I'm going to do same like I've
26:18 - 26:20

always been doing earliest time greater
26:20 - 26:22

than now time and that's normal and then
26:22 - 26:23

we're going to add this recurring equals
26:23 - 26:25

star recurring equals star means it
26:25 - 26:27

matched on something I have a value and
26:27 - 26:30

latest time is greater than now time and
26:30 - 26:32

that's going to look is there a value in
26:32 - 26:35

the Y field and is it a recurring field
26:35 - 26:37

and if it is that's going to alert and
26:37 - 26:40

so we can see that being
26:47 - 26:49

run and we're just going to see the
26:49 - 26:50

we're going to go back to those two
26:50 - 26:52

Fields these two new field these two
26:52 - 26:55

Fields occurred during the new window
26:55 - 26:56

and they'll keep showing up as often as
26:56 - 27:00

they occur
27:00 - 27:02

all right we basically showed how we can
27:02 - 27:04

combine those two approaches in a hybrid
27:04 - 27:07

me uh approach we created our look up of
27:07 - 27:08

anomalous Behavior so they don't get
27:08 - 27:11

excluded we use that to validate how and
27:11 - 27:12

then there are other things like we can
27:12 - 27:15

do we can result look at the results of
27:15 - 27:18

one against the other and we can see if
27:18 - 27:19

there's changes in our environment how
27:19 - 27:21

much change is going on there's a lot of
27:21 - 27:23

things that this gives you abilities to
27:23 - 27:25

gain more insights
27:25 - 27:28

into all right so what's next I shown
27:28 - 27:30

you how you can build baselines I've
27:30 - 27:32

shown you examples of them I've given
27:32 - 27:34

you multiple methods what I want you to
27:34 - 27:36

do is now look at your environment think
27:36 - 27:39

right now what do I have in my
27:39 - 27:41

environment that I could Baseline what
27:41 - 27:43

could I grab what logs could I do I
27:43 - 27:45

could take that very same approach and I
27:45 - 27:46

want you to think about it right now
27:46 - 27:50

write it down and let's go go do it this
27:50 - 27:52

video is great but if you don't take
27:52 - 27:55

action on it this video will not have
27:55 - 27:56

served its full purpose so take that
27:56 - 27:58

time right now to think what video
27:58 - 28:03

videos what logs do I have that I can
28:03 - 28:05

use and what approach can I do to make a
28:05 - 28:09

Baseline and check for anomalous
28:09 - 28:11

events thank you so much for your time
28:11 - 28:16

and I now open it up two questions

Title:: Anomaly Detection So Easy Your Grandma Can Do It. No ML degree Required Splunk .conf 2024 Rehearsal
Description:: more » « less
Video Language:: English
Duration:: 28:21

	OEVIDEOS edited English subtitles for Anomaly Detection So Easy Your Grandma Can Do It. No ML degree Required Splunk .conf 2024 Rehearsal
	OEVIDEOS edited English subtitles for Anomaly Detection So Easy Your Grandma Can Do It. No ML degree Required Splunk .conf 2024 Rehearsal
	OEVIDEOS edited English subtitles for Anomaly Detection So Easy Your Grandma Can Do It. No ML degree Required Splunk .conf 2024 Rehearsal
	OEVIDEOS edited English subtitles for Anomaly Detection So Easy Your Grandma Can Do It. No ML degree Required Splunk .conf 2024 Rehearsal

English subtitles

Revisions Compare revisions

Revision 4 Edited

OEVIDEOS
Revision 3 Edited

OEVIDEOS
Revision 2 Edited

OEVIDEOS
Revision 1 Uploaded

OEVIDEOS

	Revision Number	Author	Created
	4	OEVIDEOS
	3	OEVIDEOS
	2	OEVIDEOS
	1	OEVIDEOS

Anomaly Detection So Easy Your Grandma Can Do It. No ML degree Required Splunk .conf 2024 Rehearsal

Revisions Compare revisions

Our website uses cookies

Operating cookies (Required)