Anomaly Detection So Easy Your Grandma Can Do It. No ML degree Required Splunk .conf 2024 Rehearsal

Edit subtitles

0:02 - 0:03

Alright. Welcome to another L.A.M.E.
0:03 - 0:05

Creations video. This is going to be more
0:05 - 0:08

or less a revisit of the
0:08 - 0:11

splunk.conf address I gave this
0:11 - 0:16

previous June 2024 in Vegas. So, let's
0:16 - 0:18

go give it. It was entitled "Anomaly
0:18 - 0:20

Detection: So Easy Your Grandma Could Do
0:20 - 0:22

It--No ML Degree Required."
0:22 - 0:23

I'm not going to spend much
0:23 - 0:25

time introducing myself; if you don't know who I am,
0:25 - 0:27

I'm Troy Moore from Log Analysis Made
0:27 - 0:31

Easy. Here's my contact information.
0:31 - 0:33

Alright, what we're going to discuss in
0:33 - 0:35

this conference breakout session
0:35 - 0:37

are common baselines you might need. What
0:37 - 0:39

are some things you might need in your
0:39 - 0:41

company? How do you create those baselines?
0:41 - 0:43

Let’s give a demo. I'm not a "death by
0:43 - 0:45

PowerPoint" person, so we’ll go
0:45 - 0:47

into a live demo on this. What should you
0:47 - 0:49

do after seeing this presentation, and
0:49 - 0:51

some "gotchas" to baselining?
0:51 - 0:54

Let’s discuss what baselining
0:54 - 0:57

is. Baselining is the expected values or
0:57 - 0:58

conditions against which all
0:58 - 1:00

performances are compared. I’ve given
1:00 - 1:02

a definition for it. What does that look like in
1:02 - 1:05

practice? Let’s go the other way. Common
1:05 - 1:07

baselines--let’s discuss what some are.
1:07 - 1:09

Maybe a hardware baseline. Can I get a
1:09 - 1:11

software baseline? Can I get network
1:11 - 1:13

ports and protocol baselines? User
1:13 - 1:16

baselines? User behavior baselines?
1:16 - 1:18

I know that I’ve been working
1:18 - 1:20

in the cyber world for many, many
1:20 - 1:23

years, and I will ask, as an auditor, "Do
1:23 - 1:26

you by chance have an inventory?" And you
1:26 - 1:29

know what? It's a funny answer. Ask
1:29 - 1:31

yourself, does your company
1:31 - 1:33

have a network inventory? How
1:33 - 1:34

thorough is it? How accurate is that
1:34 - 1:37

network inventory? Well, if you don’t have
1:37 - 1:39

that, can you give me a baseline of what's
1:39 - 1:41

on your network? A lot of people will
1:41 - 1:42

tell you it's really difficult to give a
1:42 - 1:44

baseline if you don't have a network
1:44 - 1:47

inventory. And so, you ask these
1:47 - 1:49

questions and often don’t get the
1:49 - 1:51

answers. I know I’ve told auditors year
1:51 - 1:53

after year at places I’ve worked, "I
1:53 - 1:55

don't have an inventory. I don't have
1:55 - 1:56

those kinds of things."
1:56 - 1:58

What we’re going to do in this
1:58 - 1:59

presentation is show how
1:59 - 2:01

Splunk and statistical models can make you
2:01 - 2:03

that hero. You can be the person who
2:03 - 2:05

provides that inventory. You can be the
2:05 - 2:08

person in this presentation who provides
2:08 - 2:12

baselines and can show what is normal in our environment.
2:13 - 2:15

And what I'm going to do is,
2:15 - 2:16

in order to know what’s normal, we need
2:16 - 2:19

to look at the past. Hopefully,
2:19 - 2:21

this makes sense. If you don’t know
2:21 - 2:22

what happened in the past, you won’t
2:22 - 2:25

be able to know what’s normal. The
2:25 - 2:28

past is what defines normalcy.
2:28 - 2:30

So what we can do is look at
2:30 - 2:32

historical IP logs to see the connections,
2:32 - 2:34

and that can help us build a baseline. We
2:34 - 2:36

can track the processes that have been
2:36 - 2:38

running, and we can build a baseline. We
2:38 - 2:40

can look at the ports used by systems
2:40 - 2:42

historically, and we can build a baseline.
2:42 - 2:45

We can track historical login events, and
2:45 - 2:48

we can build a login event baseline.
2:48 - 2:51

Splunk is a logging system. So, if you've
2:51 - 2:53

been getting those logs,
2:53 - 2:57

then you have a tool now to build a baseline.
2:57 - 3:00

Here is the concept that
3:00 - 3:01

I'll need you to understand
3:01 - 3:03

to be able to grasp everything else. There are
3:03 - 3:05

two methods for baselining:
3:05 - 3:08

there's what I call the rolling window and the allow listing.
3:08 - 3:10

The rolling window is the
3:10 - 3:13

easiest way to start a baseline. In
3:13 - 3:15

my opinion, it’s the most simple method.
3:15 - 3:17

The concept is we’re going to use this
3:17 - 3:21

little bar here. We’ve got an x-axis. We’re
3:21 - 3:23

going to say this is a full line, and
3:23 - 3:25

this is a timeline. So, this might be one
3:25 - 3:28

day, a week, a month, a
3:28 - 3:31

year, or maybe three months.
3:31 - 3:34

This is a historical part of
3:34 - 3:36

that time. Let’s say it’s one day.
3:36 - 3:38

This could be 23 hours. This could be a
3:38 - 3:40

week; it could be 6 days. This
3:40 - 3:42

could be a month; it could be 29 days.
3:42 - 3:44

This could be a year; it could be 11
3:44 - 3:47

months. The y-axis is a
3:47 - 3:50

portion of that time that we’re going to
3:50 - 3:52

look at. The x-axis portion will be our
3:52 - 3:55

baseline, the historical events.
3:55 - 3:56

The y-axis is what we’re going to look at.
3:56 - 3:58

We’re going to say, "Hey, looking at all
3:58 - 4:00

these events that have occurred, are
4:00 - 4:02

there any events in y that weren’t
4:02 - 4:05

in this baseline, that weren’t in x?"
4:05 - 4:06

If we do that, that’s the definition of
4:06 - 4:09

an anomaly. Anomalies are things that are
4:09 - 4:11

not in your baseline. So, we can use a
4:11 - 4:13

rolling window, and I’ll actually demo
4:13 - 4:15

how to do that. The other method is allow
4:15 - 4:18

listing, in which case we build a list of
4:18 - 4:20

our baseline. Again, you’ve got to figure
4:20 - 4:20

out how you’re going to build that
4:20 - 4:23

baseline. But if you do that, you put that
4:23 - 4:25

list into a lookup file. We can do that
4:25 - 4:26

by using the outputlookup command, and
4:26 - 4:29

then we use the lookup command in our logs.
4:29 - 4:31

We look at all the logs
4:31 - 4:33

coming in and say, "Do any of these logs
4:33 - 4:36

not have a matching pair
4:36 - 4:37

in our lookup?" If so, that would be an
4:37 - 4:41

indication that this is a new anomalous event.
4:41 - 4:45

Alright. I’ve given the
4:45 - 4:48

PowerPoint presentation on that. We’re
4:48 - 4:51

now going to go into demo time. We want
4:51 - 4:53

to demo this and show how this works in
4:53 - 4:56

actual practice. Again, the queries are at
4:56 - 4:58

the end of the presentation. I’m going to
4:58 - 5:00

to have to give a link to this PDF
5:00 - 5:02

so you can grab them if you want to
5:02 - 5:04

use them, or just slow down the video.
5:04 - 5:05

I’m sorry, I’ll have to record it
5:05 - 5:08

that way. But anyway, that’s what
5:08 - 5:09

we’re going to do.
5:09 - 5:12

Alright. For this demo, I wanted to
5:12 - 5:14

make sure that any of you could go home
5:14 - 5:17

and use this very same thing, the same
5:17 - 5:19

dataset. So, I went and grabbed
5:19 - 5:21

Splunk’s freely available Boss of the SOC,
5:21 - 5:25

referred to as BOTS v3. You could grab
5:25 - 5:27

v2 or v3, but these
5:27 - 5:28

things will work with your own
5:28 - 5:31

data. However, I wanted to make sure that you
5:31 - 5:32

could do these very same scenarios when
5:32 - 5:34

you went back home after this
5:34 - 5:38

conference. So, I went to Google, typed in
5:38 - 5:40

"BOTS v3," and added "GitHub," and that
5:40 - 5:44

brought me to this little link here.
5:44 - 5:46

It’s just an app you can go
5:46 - 5:49

download. It’s a relatively large app
5:49 - 5:51

because it contains all the data in an
5:51 - 5:53

index already pre-indexed for you. So,
5:53 - 5:55

when you put this in, you’ll have
5:55 - 5:57

all the exact same data that I’m using
5:57 - 5:59

in these queries, allowing you to
5:59 - 6:02

easily run the exact same
6:02 - 6:04

thing in your own environment. As you
6:04 - 6:05

learn these queries, you’ll be able to
6:05 - 6:07

use them elsewhere. All the
6:07 - 6:09

documentation is right here if you want
6:09 - 6:12

to use any of these source types--
6:12 - 6:14

they’re available.
6:14 - 6:15

And any of the required
6:15 - 6:17

software needed to run any of the
6:17 - 6:19

TAs in order to get your data
6:19 - 6:22

to parse correctly. We’re going to
6:22 - 6:25

primarily use the stream data and some
6:25 - 6:27

network host logs.
6:27 - 6:30

I’m going to jump over to my
6:30 - 6:33

environment. If I do a head 100 on this
6:33 - 6:36

little command here, the BOTS v3 stream
6:36 - 6:40

TCP, this is TCP network traffic, and I
6:40 - 6:42

can see the network traffic. I can see
6:42 - 6:45

certs going through. I can see
6:45 - 6:47

connections with bytes in and bytes out,
6:47 - 6:51

destination IP, source IP, destination port,
6:51 - 6:52

source port. And what I’m going to want
6:52 - 6:55

to do is baseline what the
6:55 - 6:57

normal IP traffic on my network is.
6:57 - 7:00

Then, when I see abnormal IP traffic, I
7:00 - 7:03

want to be alerted about it. And this has
7:03 - 7:05

varying levels of success based on how
7:05 - 7:08

random, how many new machines your
7:08 - 7:11

systems go out and visit. Workstations
7:11 - 7:13

browsing the Internet are going to
7:13 - 7:15

have a lot of new IP addresses on a
7:15 - 7:17

daily basis. Servers are probably
7:17 - 7:20

not going to go out and talk to a whole
7:20 - 7:23

lot of different devices. Specialized
7:23 - 7:25

devices, such as OT (Operational Technology)
7:25 - 7:28

devices, they won’t talk to a lot of
7:28 - 7:31

machines. Their communication is
7:31 - 7:34

pretty standard. So, we can actually use
7:34 - 7:37

that to understand what’s going on.
7:37 - 7:39

If I come in here, let’s run that
7:39 - 7:41

query. The concept is I’m using an
7:41 - 7:43

all-time query, but that’s because I’m
7:43 - 7:46

using this Bot v3 data. I have it in the
7:46 - 7:49

notes in my PowerPoint on how
7:49 - 7:51

to turn this into a 90-day rolling window.
7:51 - 7:54

But to make this work on the Bot’s data, I
7:54 - 7:58

had to actually set my time and do
7:58 - 8:00

a little bit differently. I’ll
8:00 - 8:02

show you how that looks when I’m done.
8:02 - 8:03

What we’re gonna do is: index equals
8:03 - 8:07

Bot v3, source type equals stream TCP. And
8:07 - 8:08

what we’re gonna do here, this is the
8:08 - 8:10

magic: we’re just gonna use a stats
8:10 - 8:12

command. If I just did stats count by
8:12 - 8:14

source IP, destination IP, that would give
8:14 - 8:17

me every tuple that I’ve seen during
8:17 - 8:21

this window. But if I put the stats min(time),
8:21 - 8:22

it’s gonna give me the earliest
8:22 - 8:25

time, the smallest time value that it has
8:25 - 8:28

seen in this tuple. And so, this is giving
8:28 - 8:30

me the earliest time this has popped
8:30 - 8:33

up. So, I’ve got a 90-day rolling window
8:33 - 8:35

of tuples, and I will tag it with the
8:35 - 8:39

earliest value seen. If I do that
8:39 - 8:41

just like this,
8:45 - 8:48

we’ll see here comes back the earliest time.
8:49 - 8:51

If I undo that, now I’m going to
8:51 - 8:52

come in here. I’m gonna change it. Now,
8:52 - 8:55

I want to set a time. I want to know
8:55 - 8:58

anytime that this earliest time is
8:58 - 9:00

greater. So, in a normal scenario, I might go
9:00 - 9:02

back 86400. That’s the amount of seconds
9:02 - 9:04

in a day. So, I might be looking for any
9:04 - 9:07

new tuples in a day. I had to use this
9:07 - 9:10

value here to move it to a new day based
9:10 - 9:12

on this Bot v3 data. There’s only two
9:12 - 9:14

and a half days’ worth of data in this
9:14 - 9:16

Bot dataset. So, I had to...
9:16 - 9:18

In order to make it work, so I had to put this
9:18 - 9:19

specific timestamp in. Normally, you
9:19 - 9:23

would be using something like now - 86400,
9:23 - 9:25

and I’ll show that. But we’re going
9:25 - 9:26

to come down here. We’ll go where
9:26 - 9:29

earliest time is greater than now. So,
9:29 - 9:32

if this first time it’s been seen is
9:32 - 9:34

greater than this, we’re gonna
9:34 - 9:37

get the values back. If it’s not,
9:37 - 9:39

this wouldn't show up. That's going to
9:39 - 9:42

show if I do it like a day timestamp;
9:42 - 9:43

it’ll only show me any
9:43 - 9:46

new tuples that I’ve never seen in 90
9:46 - 9:48

days that have shown up today. So, if I
9:48 - 9:51

run this, we’re gonna flip this to
9:51 - 9:52

fast mode.
9:52 - 9:54

This will come back with all the
9:54 - 9:57

tuples, the brand-new tuples that it’s ever
9:57 - 9:59

seen. I’m gonna tell you this is still
9:59 - 10:01

too large of a list, but part of
10:01 - 10:04

this list would normally drop down. The
10:04 - 10:06

fact is, the bigger the window you make,
10:06 - 10:08

the less values you’ll have. If I’m
10:08 - 10:10

looking at one day and I’m looking at
10:10 - 10:13

the new values, you’re gonna have more results. If I
10:13 - 10:16

go 90 days, the number of new tuples will
10:16 - 10:19

shrink. The bigger the window you have
10:19 - 10:22

over here, the smaller the amount of
10:22 - 10:24

results will come because the more of
10:24 - 10:26

those, every now and then, that I go to will be
10:26 - 10:30

included in my list.
10:30 - 10:31

Alright, this works. Let’s grab
10:31 - 10:34

something even a little easier to grasp.
10:34 - 10:37

Now, I’m showing this. These are processes.
10:37 - 10:38

When I look at the processes, I’m looking
10:38 - 10:41

at processes firing off. This is
10:41 - 10:43

calculator being run, application frame
10:43 - 10:45

host, crash plan desktop. These are
10:45 - 10:48

processes on a machine. I want to know if
10:48 - 10:51

there are new processes that have fired
10:51 - 10:54

off. We’re going back to the exact same
10:54 - 10:56

query. We’re gonna, this time,
10:56 - 10:59

group by instance, which is like this
10:59 - 11:00

host name here.
11:00 - 11:02

Sorry, instance is application frame host
11:02 - 11:04

and the host here. We’re gonna look at
11:04 - 11:06

instance and host, and we’re gonna
11:06 - 11:09

again grab the earliest time it was
11:09 - 11:11

seen. And we’re gonna do an eval time
11:11 - 11:12

when it’s later, and then we’re gonna
11:12 - 11:15

run it. So, we’re basically saying, “Hey, did
11:15 - 11:17

I see this value?
11:17 - 11:20

What’s the newest instances
11:20 - 11:22

that have fired up in the last 24 hours
11:22 - 11:26

that I have not seen over my time period?”
11:26 - 11:27

I run that,
11:28 - 11:32

and you can see that software
11:32 - 11:34

running processes are going to be a lot
11:34 - 11:36

less frequent on your system. And so,
11:36 - 11:37

we can run that,
11:37 - 11:41

and we get back the new processes
11:41 - 11:45

that ran on this machine in the last 24
11:45 - 11:47

hours. And you can see
11:47 - 11:50

what happens is SCP and SSH, those
11:50 - 11:52

are brand new processes. If I was doing
11:52 - 11:55

an investigation, and all of a sudden machines
11:55 - 11:56

that have never done it start
11:56 - 11:59

involving SCP and SSH, that’s probably
11:59 - 12:01

something I want to be looking at.
12:01 - 12:03

And so, baselining and knowing what your
12:03 - 12:05

systems run, and then when new processes
12:05 - 12:07

fire, we can look at them and say, “Do I
12:07 - 12:08

want to look at this?” We can build alerts
12:08 - 12:09

off of it.
12:09 - 12:12

Let’s jump to another example. This
12:12 - 12:14

time, listening ports. The amount of ports
12:14 - 12:16

that your machine is listening on should
12:16 - 12:18

be very static. It’s not going to change
12:18 - 12:20

a ton. But if someone’s opened up new
12:20 - 12:22

applications, they might be opening up
12:22 - 12:24

new listening ports. So, you want to look
12:24 - 12:26

at that. We can see here kind of the data
12:26 - 12:30

coming back. We can see which machine,
12:30 - 12:32

what ports are being opened, and what they’re
12:32 - 12:36

listening on. If we use this very same concept
12:40 - 12:44

we can see min(time) as earliest
12:44 - 12:46

time. This time, we’re looking at host and
12:46 - 12:48

desk port. That’s my pairing. That’s what
12:48 - 12:51

I’m looking for anomalies in. Grab
12:51 - 12:53

the earliest time seen.
12:53 - 12:55

Grab the window that I want to
12:55 - 12:57

track, and we’re going to say where earliest
12:57 - 12:59

time is greater than now time. And in
12:59 - 13:02

this one, make sure I flip it to verbose
13:02 - 13:04

because ports are really static.
13:05 - 13:08

What a surprise--I’m going to get
13:08 - 13:10

zero results back. And that’s actually
13:10 - 13:12

what I’m looking for. That’ll work out
13:12 - 13:14

really well for me. So, I’ve shown three
13:14 - 13:16

examples here of how you can just grab
13:16 - 13:18

any form of data. You look for what you
13:18 - 13:22

want to find, group it by what’s normal,
13:22 - 13:25

grab a big window, and then
13:25 - 13:27

set your time to say anything that’s
13:27 - 13:30

occurred--any new tuple that I see
13:30 - 13:33

new since this time.
13:33 - 13:36

So, we jump over here. We can quickly
13:36 - 13:38

see this is how it looked in real-time.
13:38 - 13:41

This is how we do it at my place.
13:41 - 13:47

Now - 86400, last 90 days. We do the
13:49 - 13:52

now - 86400. This says, “Give me a
13:52 - 13:56

90-day window, go back one day.”
13:56 - 13:59

Very simple. We don’t change much, and
13:59 - 14:01

we just have that
14:01 - 14:03

working. Now, if we come over here, we can
14:03 - 14:07

do the exact same thing with our Splunk
14:07 - 14:10

searches. We can come over here, and we can
14:10 - 14:13

take this to the next level in another way.
14:13 - 14:15

Instead of saying, “I want a 90-day window,”
14:15 - 14:17

there’s a problem with the 90-day window.
14:17 - 14:19

As soon as this anomalous event occurs,
14:19 - 14:23

it’s not going to be anomalous tomorrow.
14:23 - 14:24

So, what we can do is we can actually
14:24 - 14:27

build a lookup and say, “I’m going to make
14:27 - 14:29

everything, I’m gonna do the same
14:29 - 14:32

concept to pull them together.” This time, I
14:32 - 14:34

don’t need a time. I’m just going to grab
14:34 - 14:37

all of my tuples, and I’m going to output
14:37 - 14:40

lookup into a CSV, or I could do a KV
14:40 - 14:44

store, and that becomes my window. That's
14:44 - 14:48

because... So, new anomalous events will not
14:48 - 14:51

repopulate unless I rerun this output
14:51 - 14:53

lookup. So, when I run this, I can do
14:53 - 14:56

this: I build my baseline, and
14:56 - 14:58

then I’d have a scheduled search
14:58 - 15:00

or whatever that would search, and
15:00 - 15:02

I’m going to do it against stats count.
15:02 - 15:03

I'm going to do a lookup, going to match
15:03 - 15:06

on source IP and desk IP, and output
15:06 - 15:08

count, say, as matched. And I’m
15:08 - 15:10

going to do where isnull(matched),
15:10 - 15:12

meaning I’ve got a source IP and
15:12 - 15:14

destination IP, but those are not in this
15:14 - 15:17

lookup. That would make it null, and this
15:17 - 15:19

will alert me. And if tomorrow the same
15:19 - 15:21

source IP and destination IP appears, it
15:21 - 15:23

will also alert me because as long as
15:23 - 15:28

I don’t update this lookup table, it will
15:28 - 15:30

always be anomalous.
15:30 - 15:32

And so, there are pros and cons. This is a
15:32 - 15:34

dynamic growing list, like the ones I did
15:34 - 15:37

over here with where earliest greater
15:37 - 15:39

than. But over here, we’re building a
15:39 - 15:43

lookup list, and we’re doing a match. Same
15:43 - 15:44

principle over here. We can take the
15:44 - 15:47

perfmon process, exact same thing. We’re
15:47 - 15:50

going to output it to a CSV, and then we
15:50 - 15:53

can set up a search to run every day where we
15:53 - 15:56

do this lookup on instance and host, and
15:56 - 15:59

where it’s matched. Or we can go to
15:59 - 16:03

listening ports. We can output the lookup, and
16:03 - 16:04

we can do this. One of the things you
16:04 - 16:09

could do is you could actually take an
16:09 - 16:13

evaluation of the two. You could actually
16:13 - 16:15

take all the alerts that are
16:15 - 16:18

popping up each day and compare them to this
16:18 - 16:21

lookup list and see how much variance
16:21 - 16:24

there is. So, you could grab a 90-day
16:24 - 16:26

table and then compare it to this output
16:26 - 16:28

lookup. There are a lot of ways to evaluate
16:28 - 16:31

how much is changing in your environment,
16:31 - 16:34

but the big key is to use your
16:34 - 16:38

historical data to create a baseline and search on it.
16:40 - 16:44

Alright, basic summary there. In that
16:44 - 16:46

video, we showed how we can use the stats
16:46 - 16:48

command to baseline normal behavior from
16:48 - 16:50

historical data. We used that baseline to
16:50 - 16:51

determine new events. We’re able to
16:51 - 16:53

detect anomalous network connections,
16:53 - 16:55

anomalous processes, and anomalous open
16:55 - 16:57

ports. We then did those very same things
16:57 - 17:00

with a CSV and with baseline normal
17:00 - 17:02

behavior, and we were able to use that
17:02 - 17:03

CSV to detect the same thing--network
17:03 - 17:06

connections, processes, and
17:06 - 17:09

hosts. So, there are some gotchas that you
17:09 - 17:11

need to be aware of. This is a really cool
17:11 - 17:13

process, but as you start to get into it,
17:13 - 17:14

don’t let the gotchas get you. Don’t let
17:14 - 17:17

the quest for perfection get in the way
17:17 - 17:19

of getting something done or having a
17:19 - 17:22

good product. The rolling window and
17:22 - 17:25

allow list will get you a good answer.
17:25 - 17:27

It’s not perfect, and there will be some
17:27 - 17:29

gotchas along the road, but it will get
17:29 - 17:31

you most of the way. But now that you’ve
17:31 - 17:33

got those baselines, we’re going to
17:33 - 17:34

tell you some things you want to be
17:34 - 17:36

careful of. Rolling window: You’re going
17:36 - 17:38

to be alerted the first time that the
17:38 - 17:40

anomalous connection occurred. And then,
17:40 - 17:43

if you remember that X and Y, with X being
17:43 - 17:45

the baseline and Y being the new events, the
17:45 - 17:48

new events from Y are going to roll into
17:48 - 17:51

X. And now that anomalous event will be
17:51 - 17:52

part of your baseline. So, you’ll detect it
17:52 - 17:55

once, and then your anomaly is part of
17:55 - 17:57

your baseline. So, you do need to be aware of
17:57 - 17:59

that. And the frequency of the times
17:59 - 18:02

you run the alert is important.
18:02 - 18:03

You need to make sure how often you
18:03 - 18:05

run this alert. Remember that you can
18:05 - 18:07

have a small window. Say, “I’m going to
18:07 - 18:09

look at a 90-day window, and I just want
18:09 - 18:12

to look at one second.” So, the day will be
18:12 - 18:15

Y, and it will be one second, and the baseline
18:15 - 18:20

will be 89 days, 23 hours, 59
18:20 - 18:23

minutes, and 59 seconds, or whatever. The fact
18:23 - 18:24

is, it’s still going to look at 90 days'
18:24 - 18:27

worth of data. And so, no matter how big
18:27 - 18:29

your Y window is, it’s always going to take the
18:29 - 18:32

time required to run the entire X
18:32 - 18:35

and Y window together. So, you need to be
18:35 - 18:37

aware that it can take some time to run
18:37 - 18:39

this alert. It sounds great to run a
18:39 - 18:41

really long query, like “I want an all-time or
18:41 - 18:45

a year-long query.”
18:45 - 18:46

Recognize that if you run that every day,
18:46 - 18:48

you’re still running that query every
18:48 - 18:52

day. So, it’s going to take some time, and you want
18:52 - 18:53

to make sure it doesn’t impact the rest
18:53 - 18:55

of the stuff you’re doing. Allow
18:55 - 18:57

listing, on the other hand, is
18:57 - 18:59

going to run against whatever window. So,
18:59 - 19:00

if you look at the last 10 seconds, it’s
19:00 - 19:02

only going to run on a 10-second window. If
19:02 - 19:04

you’re looking at the last hour, it’s going to
19:04 - 19:07

run on a 1-hour window. So, it will run
19:07 - 19:10

faster. But, you need to remember that
19:10 - 19:12

as you run, you need to figure out: How am I going to build that
19:12 - 19:14

baseline? How do we get new items
19:14 - 19:17

into the baseline? You’ll need to
19:17 - 19:19

address that. And remember that a
19:19 - 19:21

baseline, whether it’s a CSV or a KV store, is
19:21 - 19:24

going to occupy space on your search
19:24 - 19:26

head, and you can run out of disk space.
19:26 - 19:28

You only have so much. Typically, we
19:28 - 19:30

build a lot of space onto our indexers. Our
19:30 - 19:33

search heads are not huge on disk space.
19:33 - 19:34

Just be aware that as you start to build
19:34 - 19:36

large baselines, one, you'll have
19:36 - 19:39

performance issues. The more values it
19:39 - 19:40

has to look against, the slower your
19:40 - 19:42

search will run, and it's going to take
19:42 - 19:45

up physical disk space on your machine.
19:45 - 19:48

So, that's something you need to just be aware of.
19:48 - 19:50

I'm going to recommend a hybrid
19:50 - 19:52

approach, and that's the ability to
19:52 - 19:54

combine both. We're going to do a rolling
19:54 - 19:58

window and allow listing. And so, the basic
19:58 - 20:01

concept is we're going to use--your
20:01 - 20:03

query goes here. So, you're going to write
20:03 - 20:04

your query, and then you're going to use
20:04 - 20:07

this collect command. This, this is not
20:07 - 20:09

a comment about the different syntax in
20:09 - 20:11

Splunk, but just know that if you use
20:11 - 20:13

this pipe collect command, you will write
20:13 - 20:15

to a summary index. Summary indexes are a
20:15 - 20:18

form of indexing that do not cost you on
20:18 - 20:20

ingestion license. You can
20:20 - 20:22

write to a summary index, and then you
20:22 - 20:24

can query that index just like you could
20:24 - 20:26

query any other index. And so, you can
20:26 - 20:28

save your results in an index. The
20:28 - 20:30

concept is I typically like, if I want to
20:30 - 20:32

build these, I might write every day a
20:32 - 20:33

query, and I'm going to write it to the
20:33 - 20:35

index, and it will timestamp it with
20:35 - 20:37

today's information. Tomorrow will have
20:37 - 20:38

tomorrow's information, and yesterday
20:38 - 20:40

will have yesterday's information. And
20:40 - 20:41

we can query it and search it. And so,
20:41 - 20:43

you'll basically write your query. You'll
20:43 - 20:45

run the collect command index=summary
20:45 - 20:47

source and give it a name. Then
20:47 - 20:49

what you want to do is, now that you've
20:49 - 20:51

done that, you'll build--that's going to
20:51 - 20:54

be building your alert. Then you're
20:54 - 20:55

going to come in here, and you're going
20:55 - 20:58

to look at that summary data, and you're
20:58 - 21:02

going to append to that those
21:02 - 21:04

results that fired for that day. So, you
21:04 - 21:06

look at the last set of time it ran. So
21:06 - 21:08

maybe you run this once a day. You look
21:08 - 21:10

at yesterday's results. You put that in
21:10 - 21:11

here, and then you'll use this append
21:11 - 21:14

command, which will append the lookup. I
21:14 - 21:16

said allow list. It should be a disallow
21:16 - 21:18

list. That was a bad writing here. Gotta
21:18 - 21:21

love--gotta be careful with the descriptions
21:21 - 21:23

you use. This is a--you're going to grab a
21:23 - 21:26

list of things you don’t want to consider to
21:26 - 21:28

be anomalous. So, if you see these, you
21:28 - 21:31

want to flag them. It's not what I've
21:31 - 21:33

done before with the CSV, which is this is my
21:33 - 21:35

normal baseline. These are bad events. I
21:35 - 21:37

don’t want these events. And so, I'm going
21:37 - 21:39

to do the input lookup allowlist.csv,
21:39 - 21:41

and I'm going to do a table on the
21:41 - 21:44

matching fields. So, this is matching
21:44 - 21:45

from this query over here, and then I'm
21:45 - 21:47

going to stats count by the matching
21:47 - 21:49

fields. That will basically dedupe.
21:49 - 21:51

The dedupe command will remove the
21:51 - 21:54

duplicates, but stats does the same thing,
21:54 - 21:55

and it's more efficient. So, if you want
21:55 - 21:57

to write dedupe, you can, but I recommend
21:57 - 21:59

that you learn the power of the stats
21:59 - 22:02

command. It is fast. It's got it--it's
22:02 - 22:04

just the right--it’s the right command to
22:04 - 22:05

use. So, stats count by matching is
22:05 - 22:07

basically removing the duplicates. So, if
22:07 - 22:10

it was in this index summary index and
22:10 - 22:11

it's in my lookup, we're not going to
22:11 - 22:12

write it in twice. And then we'll write
22:12 - 22:15

it to this allow list CSV, which will
22:15 - 22:17

update it, which means all the new things
22:17 - 22:18

that were found will then be written
22:18 - 22:21

into this lookup, and then it will be
22:21 - 22:23

updated. It'll have a new lookup with the
22:23 - 22:26

results combined. So, an example that
22:26 - 22:28

would be: index=bots_v3,
22:28 - 22:32

source=perfmod_mk_process, stats min_time
22:32 - 22:34

as earliest_time max. This is
22:34 - 22:36

all the exact same query. Nothing’s
22:36 - 22:39

really changed. The difference is, after I
22:39 - 22:40

do this eval time, I'm not going to do
22:40 - 22:42

this lookup. Lookup, I'm going to use the
22:42 - 22:44

name of the CSV. I'm going to do
22:44 - 22:47

instance as instance, host as host, output
22:47 - 22:49

instance as recurring. I need a value
22:49 - 22:51

that shows, hey. I matched the CSV. And
22:51 - 22:53

then I'm going to go where--and I
22:53 - 22:55

actually changed this for max time.
22:55 - 22:57

I want a min time and a max time. And
22:57 - 22:59

the reason being is the min time is
22:59 - 23:02

looking to see if the value falls in the
23:02 - 23:04

X of the X-Y on my rolling window. The max
23:04 - 23:07

time is used to find if it's in the Y
23:07 - 23:09

area, and we'll explain that. So, I still
23:09 - 23:10

have the same where
23:10 - 23:11

earliest_time > now_time.
23:11 - 23:13

That’s going to say, “Hey, I’ve never seen this event
23:13 - 23:16

before.” Or recurring = *, meaning,
23:16 - 23:20

“Hey, I got a match on this value,” and the
23:20 - 23:22

latest_time > now_time,
23:22 - 23:25

meaning it’s in the Y section. That means
23:25 - 23:27

this alert that is on my list of things
23:27 - 23:30

I don’t want to keep seeing, I don’t want to allow--
23:30 - 23:33

it just showed up again. That way,
23:33 - 23:35

you’ll be notified again that it
23:35 - 23:37

occurred. You're updating your queue, your
23:37 - 23:40

lookup file, and you're using a
23:40 - 23:42

rolling window. And so, you kind of get
23:42 - 23:44

the best of both worlds, and you can use
23:44 - 23:48

this as a method to automate
23:48 - 23:51

keeping up to date on any of your
23:51 - 23:54

alerts. And now we're going to demo that.
23:54 - 23:56

I've got a video of it. We're going to go
23:56 - 24:00

watch that, and then we'll come back.
24:00 - 24:02

Alright. So, this is the hybrid approach
24:02 - 24:03

that we've been talking about. It’s going
24:03 - 24:05

to look exactly like we did before.
24:05 - 24:07

You've got yourself the normal
24:07 - 24:10

query. We're going to just make some form
24:10 - 24:12

of query here. This is going to build
24:12 - 24:13

our processes, and then we're going to
24:13 - 24:16

write a collect command. This collect
24:16 - 24:18

command will write our results
24:18 - 24:22

into a summary index, and that’s the
24:22 - 24:24

index=summary. And then we go
24:24 - 24:27

source=new_process. That’s going
24:27 - 24:30

to define the name of the source for
24:30 - 24:32

this summary index. And so, we're going to
24:32 - 24:34

do that, run the results, and we
24:34 - 24:36

can see that we got two values coming
24:36 - 24:38

back. If we jump over here, this is where
24:38 - 24:40

we're going to take that very same--we
24:40 - 24:42

can see the summary index being run.
24:42 - 24:44

There are my two results. We can query them
24:44 - 24:47

just like any other index, and you'll
24:47 - 24:49

notice index=summary, source=new_process,
24:49 - 24:49

and we use this
24:49 - 24:52

append command. This append is going to
24:52 - 24:55

add this lookup, this new_process.csv,
24:55 - 24:57

and then we're going to put in the table
24:57 - 24:59

of the instance and the host. I'm just
24:59 - 25:00

going to do a stats count by instance,
25:00 - 25:03

host. That's basically going to deduplicate so I
25:03 - 25:04

don’t--it’s going to take the index
25:04 - 25:06

summary and the input lookup and make
25:06 - 25:08

them one if there are any duplicates there,
25:08 - 25:10

so I'll get them twice. That’s what
25:10 - 25:12

that command’s going to do. It's faster
25:12 - 25:14

than dedupe, but I’m going to output the lookup
25:14 - 25:17

to new_process.csv. And that's going to
25:17 - 25:20

write what was the original new_process.csv,
25:20 - 25:21

and it’s going to update it
25:21 - 25:24

with any new fields coming from this summary index.
25:24 - 25:27

So, we can see that being run if I
25:27 - 25:29

go over, we're going to get that
25:29 - 25:32

taken care of. We go run
25:32 - 25:34

this. What it's going to do is it's going to
25:34 - 25:35

grab the summary index stuff, and it's
25:35 - 25:36

going to grab the stuff that was already
25:36 - 25:38

in my CSV, and it's going to write them
25:38 - 25:42

in there. And so now I have four values,
25:42 - 25:44

and those all got written into the
25:44 - 25:46

CSV. Now I’m going to write my query that
25:46 - 25:48

I've been doing this rolling window all
25:48 - 25:50

over again. We're going to do the
25:50 - 25:53

difference is I'm going to add a max
25:53 - 25:54

time in there. That's not just going to be
25:54 - 25:56

a min time. It's also going to be a max
25:56 - 25:57

time, so I can look at the Y side of the
25:57 - 26:00

equation. And then I'm going to do this
26:00 - 26:02

lookup. New_process instances as instance,
26:02 - 26:05

host as host, output instance as
26:05 - 26:07

recurring. I need to make the output
26:07 - 26:09

instance that shows me what matched on
26:09 - 26:10

this lookup. It’s kind of like a join
26:10 - 26:12

command. It’s going to join the CSV to
26:12 - 26:14

the previous values, and whatever
26:14 - 26:17

matches is going to be output.
26:17 - 26:18

And I’m going to do the same like I’ve
26:18 - 26:20

always been doing: earliest_time greater
26:20 - 26:22

than now_time, and that’s normal. Normal. And then
26:22 - 26:24

we’re going to add this recurring = *.
26:24 - 26:25

Recurring = * means it
26:25 - 26:27

matched on something. I have a value, and
26:27 - 26:30

latest_time > now_time. And
26:30 - 26:32

that’s going to look--is there a value in
26:32 - 26:35

the Y field, and is it a recurring field?
26:35 - 26:37

And if it is, that's going to alert. And
26:37 - 26:40

so we can see that being run,
26:47 - 26:49

and we're just going to see the--
26:49 - 26:50

we're going to go back to those two
26:50 - 26:52

fields. These two new
26:52 - 26:55

fields occurred during the new window,
26:55 - 26:56

and they'll keep showing up as often as
26:56 - 26:58

they occur.
27:00 - 27:02

Alright. We basically showed how we can
27:02 - 27:04

combine those two approaches in a hybrid
27:04 - 27:07

approach. We created our lookup of
27:07 - 27:08

anomalous behaviors so they don't get
27:08 - 27:11

excluded. We use that to validate how--and
27:11 - 27:12

then there are other things, like, we can
27:12 - 27:15

do. We can look at the results of
27:15 - 27:18

one against the other, and we can see if
27:18 - 27:19

there are changes in our environment, how
27:19 - 27:21

much change is going on. There's a lot of
27:21 - 27:23

things that this gives you the ability to
27:23 - 27:25

gain more insights into.
27:25 - 27:28

Alright? So what's next? I've shown
27:28 - 27:30

you how you can build baselines. I've
27:30 - 27:32

shown you examples of them. I've given
27:32 - 27:34

you multiple methods. What I want you to
27:34 - 27:36

do is now look at your environment. Think
27:36 - 27:39

right now. What do I have in my
27:39 - 27:41

environment that I could baseline? What
27:41 - 27:43

could I grab? What logs could I use? I
27:43 - 27:45

could take that very same approach, and I
27:45 - 27:46

want you to think about it right now.
27:46 - 27:50

Write it down, and let's go do it. This
27:50 - 27:52

video is great, but if you don't take
27:52 - 27:55

action on it, this video will not have
27:55 - 27:56

served its full purpose. So take that
27:56 - 27:58

time right now to think: What videos,
27:58 - 28:03

what logs do I have that I can
28:03 - 28:05

use, and what approach can I take to make a
28:05 - 28:09

baseline and check for anomalous events?
28:09 - 28:11

Thank you so much for your time,
28:11 - 28:14

and I now open it up to questions.

Title:: Anomaly Detection So Easy Your Grandma Can Do It. No ML degree Required Splunk .conf 2024 Rehearsal
Description:: more » « less
Video Language:: English
Duration:: 28:21

	OEVIDEOS edited English subtitles for Anomaly Detection So Easy Your Grandma Can Do It. No ML degree Required Splunk .conf 2024 Rehearsal
	OEVIDEOS edited English subtitles for Anomaly Detection So Easy Your Grandma Can Do It. No ML degree Required Splunk .conf 2024 Rehearsal
	OEVIDEOS edited English subtitles for Anomaly Detection So Easy Your Grandma Can Do It. No ML degree Required Splunk .conf 2024 Rehearsal
	OEVIDEOS edited English subtitles for Anomaly Detection So Easy Your Grandma Can Do It. No ML degree Required Splunk .conf 2024 Rehearsal

English subtitles

Revisions Compare revisions

Revision 4 Edited

OEVIDEOS
Revision 3 Edited

OEVIDEOS
Revision 2 Edited

OEVIDEOS
Revision 1 Uploaded

OEVIDEOS

	Revision Number	Author	Created
	4	OEVIDEOS
	3	OEVIDEOS
	2	OEVIDEOS
	1	OEVIDEOS

Anomaly Detection So Easy Your Grandma Can Do It. No ML degree Required Splunk .conf 2024 Rehearsal

Revisions Compare revisions

Our website uses cookies

Operating cookies (Required)