-
Alright. Welcome to another L.A.M.E.
-
Creations video. This is going to be more
-
or less a revisit of the
-
splunk.conf address I gave this
-
previous June 2024 in Vegas. So, let's
-
go give it. It was entitled "Anomaly
-
Detection: So Easy Your Grandma Could Do
-
It--No ML Degree Required."
-
I'm not going to spend much
-
time introducing myself; if you don't know who I am,
-
I'm Troy Moore from Log Analysis Made
-
Easy. Here's my contact information.
-
Alright, what we're going to discuss in
-
this conference breakout session
-
are common baselines you might need. What
-
are some things you might need in your
-
company? How do you create those baselines?
-
Let’s give a demo. I'm not a "death by
-
PowerPoint" person, so we’ll go
-
into a live demo on this. What should you
-
do after seeing this presentation, and
-
some "gotchas" to baselining?
-
Let’s discuss what baselining
-
is. Baselining is the expected values or
-
conditions against which all
-
performances are compared. I’ve given
-
a definition for it. What does that look like in
-
practice? Let’s go the other way. Common
-
baselines--let’s discuss what some are.
-
Maybe a hardware baseline. Can I get a
-
software baseline? Can I get network
-
ports and protocol baselines? User
-
baselines? User behavior baselines?
-
I know that I’ve been working
-
in the cyber world for many, many
-
years, and I will ask, as an auditor, "Do
-
you by chance have an inventory?" And you
-
know what? It's a funny answer. Ask
-
yourself, does your company
-
have a network inventory? How
-
thorough is it? How accurate is that
-
network inventory? Well, if you don’t have
-
that, can you give me a baseline of what's
-
on your network? A lot of people will
-
tell you it's really difficult to give a
-
baseline if you don't have a network
-
inventory. And so, you ask these
-
questions and often don’t get the
-
answers. I know I’ve told auditors year
-
after year at places I’ve worked, "I
-
don't have an inventory. I don't have
-
those kinds of things."
-
What we’re going to do in this
-
presentation is show how
-
Splunk and statistical models can make you
-
that hero. You can be the person who
-
provides that inventory. You can be the
-
person in this presentation who provides
-
baselines and can show what is normal in our environment.
-
And what I'm going to do is,
-
in order to know what’s normal, we need
-
to look at the past. Hopefully,
-
this makes sense. If you don’t know
-
what happened in the past, you won’t
-
be able to know what’s normal. The
-
past is what defines normalcy.
-
So what we can do is look at
-
historical IP logs to see the connections,
-
and that can help us build a baseline. We
-
can track the processes that have been
-
running, and we can build a baseline. We
-
can look at the ports used by systems
-
historically, and we can build a baseline.
-
We can track historical login events, and
-
we can build a login event baseline.
-
Splunk is a logging system. So, if you've
-
been getting those logs,
-
then you have a tool now to build a baseline.
-
Here is the concept that
-
I'll need you to understand
-
to be able to grasp everything else. There are
-
two methods for baselining:
-
there's what I call the rolling window and the allow listing.
-
The rolling window is the
-
easiest way to start a baseline. In
-
my opinion, it’s the most simple method.
-
The concept is we’re going to use this
-
little bar here. We’ve got an x-axis. We’re
-
going to say this is a full line, and
-
this is a timeline. So, this might be one
-
day, a week, a month, a
-
year, or maybe three months.
-
This is a historical part of
-
that time. Let’s say it’s one day.
-
This could be 23 hours. This could be a
-
week; it could be 6 days. This
-
could be a month; it could be 29 days.
-
This could be a year; it could be 11
-
months. The y-axis is a
-
portion of that time that we’re going to
-
look at. The x-axis portion will be our
-
baseline, the historical events.
-
The y-axis is what we’re going to look at.
-
We’re going to say, "Hey, looking at all
-
these events that have occurred, are
-
there any events in y that weren’t
-
in this baseline, that weren’t in x?"
-
If we do that, that’s the definition of
-
an anomaly. Anomalies are things that are
-
not in your baseline. So, we can use a
-
rolling window, and I’ll actually demo
-
how to do that. The other method is allow
-
listing, in which case we build a list of
-
our baseline. Again, you’ve got to figure
-
out how you’re going to build that
-
baseline. But if you do that, you put that
-
list into a lookup file. We can do that
-
by using the outputlookup command, and
-
then we use the lookup command in our logs.
-
We look at all the logs
-
coming in and say, "Do any of these logs
-
not have a matching pair
-
in our lookup?" If so, that would be an
-
indication that this is a new anomalous event.
-
Alright. I’ve given the
-
PowerPoint presentation on that. We’re
-
now going to go into demo time. We want
-
to demo this and show how this works in
-
actual practice. Again, the queries are at
-
the end of the presentation. I’m going to
-
to have to give a link to this PDF
-
so you can grab them if you want to
-
use them, or just slow down the video.
-
I’m sorry, I’ll have to record it
-
that way. But anyway, that’s what
-
we’re going to do.
-
Alright. For this demo, I wanted to
-
make sure that any of you could go home
-
and use this very same thing, the same
-
dataset. So, I went and grabbed
-
Splunk’s freely available Boss of the SOC,
-
referred to as BOTS v3. You could grab
-
v2 or v3, but these
-
things will work with your own
-
data. However, I wanted to make sure that you
-
could do these very same scenarios when
-
you went back home after this
-
conference. So, I went to Google, typed in
-
"BOTS v3," and added "GitHub," and that
-
brought me to this little link here.
-
It’s just an app you can go
-
download. It’s a relatively large app
-
because it contains all the data in an
-
index already pre-indexed for you. So,
-
when you put this in, you’ll have
-
all the exact same data that I’m using
-
in these queries, allowing you to
-
easily run the exact same
-
thing in your own environment. As you
-
learn these queries, you’ll be able to
-
use them elsewhere. All the
-
documentation is right here if you want
-
to use any of these source types--
-
they’re available.
-
And any of the required
-
software needed to run any of the
-
TAs in order to get your data
-
to parse correctly. We’re going to
-
primarily use the stream data and some
-
network host logs.
-
I’m going to jump over to my
-
environment. If I do a head 100 on this
-
little command here, the BOTS v3 stream
-
TCP, this is TCP network traffic, and I
-
can see the network traffic. I can see
-
certs going through. I can see
-
connections with bytes in and bytes out,
-
destination IP, source IP, destination port,
-
source port. And what I’m going to want
-
to do is baseline what the
-
normal IP traffic on my network is.
-
Then, when I see abnormal IP traffic, I
-
want to be alerted about it. And this has
-
varying levels of success based on how
-
random, how many new machines your
-
systems go out and visit. Workstations
-
browsing the Internet are going to
-
have a lot of new IP addresses on a
-
daily basis. Servers are probably
-
not going to go out and talk to a whole
-
lot of different devices. Specialized
-
devices, such as OT (Operational Technology)
-
devices, they won’t talk to a lot of
-
machines. Their communication is
-
pretty standard. So, we can actually use
-
that to understand what’s going on.
-
If I come in here, let’s run that
-
query. The concept is I’m using an
-
all-time query, but that’s because I’m
-
using this Bot v3 data. I have it in the
-
notes in my PowerPoint on how
-
to turn this into a 90-day rolling window.
-
But to make this work on the Bot’s data, I
-
had to actually set my time and do
-
a little bit differently. I’ll
-
show you how that looks when I’m done.
-
What we’re gonna do is: index equals
-
Bot v3, source type equals stream TCP. And
-
what we’re gonna do here, this is the
-
magic: we’re just gonna use a stats
-
command. If I just did stats count by
-
source IP, destination IP, that would give
-
me every tuple that I’ve seen during
-
this window. But if I put the stats min(time),
-
it’s gonna give me the earliest
-
time, the smallest time value that it has
-
seen in this tuple. And so, this is giving
-
me the earliest time this has popped
-
up. So, I’ve got a 90-day rolling window
-
of tuples, and I will tag it with the
-
earliest value seen. If I do that
-
just like this,
-
we’ll see here comes back the earliest time.
-
If I undo that, now I’m going to
-
come in here. I’m gonna change it. Now,
-
I want to set a time. I want to know
-
anytime that this earliest time is
-
greater. So, in a normal scenario, I might go
-
back 86400. That’s the amount of seconds
-
in a day. So, I might be looking for any
-
new tuples in a day. I had to use this
-
value here to move it to a new day based
-
on this Bot v3 data. There’s only two
-
and a half days’ worth of data in this
-
Bot dataset. So, I had to...
-
In order to make it work, so I had to put this
-
specific timestamp in. Normally, you
-
would be using something like now - 86400,
-
and I’ll show that. But we’re going
-
to come down here. We’ll go where
-
earliest time is greater than now. So,
-
if this first time it’s been seen is
-
greater than this, we’re gonna
-
get the values back. If it’s not,
-
this wouldn't show up. That's going to
-
show if I do it like a day timestamp;
-
it’ll only show me any
-
new tuples that I’ve never seen in 90
-
days that have shown up today. So, if I
-
run this, we’re gonna flip this to
-
fast mode.
-
This will come back with all the
-
tuples, the brand-new tuples that it’s ever
-
seen. I’m gonna tell you this is still
-
too large of a list, but part of
-
this list would normally drop down. The
-
fact is, the bigger the window you make,
-
the less values you’ll have. If I’m
-
looking at one day and I’m looking at
-
the new values, you’re gonna have more results. If I
-
go 90 days, the number of new tuples will
-
shrink. The bigger the window you have
-
over here, the smaller the amount of
-
results will come because the more of
-
those, every now and then, that I go to will be
-
included in my list.
-
Alright, this works. Let’s grab
-
something even a little easier to grasp.
-
Now, I’m showing this. These are processes.
-
When I look at the processes, I’m looking
-
at processes firing off. This is
-
calculator being run, application frame
-
host, crash plan desktop. These are
-
processes on a machine. I want to know if
-
there are new processes that have fired
-
off. We’re going back to the exact same
-
query. We’re gonna, this time,
-
group by instance, which is like this
-
host name here.
-
Sorry, instance is application frame host
-
and the host here. We’re gonna look at
-
instance and host, and we’re gonna
-
again grab the earliest time it was
-
seen. And we’re gonna do an eval time
-
when it’s later, and then we’re gonna
-
run it. So, we’re basically saying, “Hey, did
-
I see this value?
-
What’s the newest instances
-
that have fired up in the last 24 hours
-
that I have not seen over my time period?”
-
I run that,
-
that and and you can see software
-
running processes are going to be a lot
-
less running on your system and so we
-
can run
-
that we get back the new in processes
-
that ran on this machine in the last 24
-
hours and you can
-
see what happens is SCP and SSH those
-
are brand new processes if I was doing
-
an investigation also sudden machines
-
they've never done it they start
-
involving SCP and SE SSH that's probably
-
be something I want to be looking at and
-
so baselining and knowing what your
-
systems run and then when new processes
-
fire we can look at them and say do I
-
want to look at this we can build alerts
-
off of
-
it let's jump to another example this
-
time listening ports the amount of ports
-
that your machine is listening on should
-
be very static it's not going to change
-
a ton but if someone's opened up new
-
applications they might be opening up
-
new listening ports so you want to look
-
at that we can see here kind of the data
-
coming back we can see M what machine
-
what ports are being opened what they're
-
listening on if we use this very same
-
concept we can see mintime as early as
-
time this time we're looking at host and
-
desport that's my pairing that's what
-
I'm looking for for anomalies in grab
-
the earliest time
-
scene grab the window that I want to
-
track and I'm going to say where early
-
time is greater than now time and in
-
this one make sure I flip it to verbose
-
because ports are really
-
static what a surprise I'm going to get
-
zero results back and that's actually
-
what I'm looking for that'll work out
-
really well for me so I've shown three
-
examples here of how you can just grab
-
any form of data you look for what you
-
want to find as group it by nor at
-
what's normal grab a big window and then
-
set your time to say anything that's
-
occurred that any new Tuple that I I see
-
new since this
-
time so we jump over here we can quickly
-
see this is how it look like in real
-
this how we do it at my
-
place
-
now minus 86400 last 90 days we do the
-
now minus 86400 this says give me a
-
90-day window go back one
-
day very simple we don't change much and
-
we just have that
-
working now if we come over here we can
-
do the exact same thing with our Splunk
-
searches we can come over here we can
-
take this to the next level another way
-
instead of saying I want a 90-day window
-
there's a problem with a 90-day window
-
as soon as this anomalous event occurs
-
it's not going to be anomalous tomorrow
-
so what we can do is we can actually
-
build a lookup and say I'm going to make
-
everything I'm going to do the same
-
concept Tuple them together this time I
-
don't need a time I'm just going to grab
-
all of my tupal and I'm going to Output
-
look up into a CSV or I could do a KV
-
store and that becomes my window that's
-
be and so new anomalous events will not
-
repopulate unless I rerun this output
-
lookup so when I run this I can do so
-
this would be i' build my Baseline and
-
then I would I'd have a scheduled search
-
or whatever that would search search and
-
I'm going to do again stats count and
-
I'm going to do a lookup going to match
-
on Source IP and desk IP and output
-
account say as matched and then I'm
-
going to do where is null is matched
-
meaning I've got a source IP and
-
destination IP but that are not in this
-
lookup that would make it null and this
-
will alert me and if tomorrow the same
-
Source IP and destination IP appears it
-
will also alert me because I as long as
-
I don't update this lookup table it will
-
always be anomalous
-
and so there's pros and cons this is a
-
dynamic growing list these ones I did
-
over here with the where earliest grader
-
than but over here we're building a
-
lookup list and we're doing a match same
-
principle over here we can take the
-
perfmon process exact same thing we're
-
going to Output it to a CSV and then we
-
can set up a search to run every day we
-
do this look up on instance on host and
-
where it's matched or we can go to
-
listening ports we can output the lookup
-
we can do this one of the things you
-
could do is you could actually take a an
-
evaluation of the two you could actually
-
look take all the the alerts that are
-
popping each day and compare it to this
-
L this list and see how much variance
-
there is so you could grab a 90-day
-
table and then compare it to this output
-
lookup there's a lot of ways to Value
-
how much is changing on your environment
-
but the big key is use the his use your
-
historical data to create a Baseline and
-
search on
-
it all right basic summary there that
-
video we showed how we can use the stats
-
command to Baseline normal behavior from
-
historical data we use that Baseline
-
determine new events we're able to
-
detect anomalous network connections
-
anomalous processes and anomalous open
-
ports we then did those very same things
-
with a CSV and we Baseline normal
-
behavior and we were able to use that
-
CSV to detect the same thing network
-
connections processes
-
hosts so there are some gotas that you
-
need to be aware of this is really cool
-
process but as you start to get on it my
-
don't let the gotas get you don't let
-
the Quest for Perfection get in the way
-
of getting something done or having a
-
good product and the rolling window and
-
allow list will get you a good answer
-
it's not perfect and there'll be some
-
gotes along the road but it's will get
-
you most of the way but now that you've
-
got those baselines now we're going to
-
tell you some things you want to be
-
careful of rolling window you're going
-
to be alerted one time that the
-
anomalous connection occurred and then
-
if you remember that X and Y the X being
-
the Baseline y being the new events the
-
new events from y are going to roll into
-
X and now that anomalous event will be
-
part of your Baseline so you'll detect
-
once and then your anomaly is part of
-
your Baseline so you need to be aware of
-
that and the frequency of the the times
-
you run the alert is run in a day that
-
you need to make sure that how often you
-
run this alert remember that you can
-
have a small window say I'm going to
-
look at a 90-day window and I just want
-
to look at 1 second so the day will be
-
the Y will be 1 second and the Baseline
-
will be 899 days 23 hours ex uh 59
-
minutes and 59 seconds whatever the fact
-
is it's still going to look at 90 days
-
worth of data and so no matter how big
-
your y window is it's always to take the
-
time it's required to run the entire X
-
and Y window together so you need to be
-
aware that it can take some time to run
-
this alert it sounds great to run a
-
really long I want an alltime or run a
-
real time I want a a yearlong twoyear
-
recognize that if you run it every day
-
you're still running that query every
-
day so there is it's going to take it's
-
going to take some time up and you want
-
to make sure it doesn't impact the rest
-
your the stuff you're doing allow
-
listing other on the other hand it's
-
going to run against whatever window so
-
if you look at the last 10 seconds it's
-
only going to run on a 10-second window
-
you look at the last hour it's going to
-
run on a 1 hour window so it will run
-
faster but uh you need to remember that
-
as you one how am I going to build that
-
Baseline how do I get items new items
-
into the Baseline um you'll need to
-
address that and remember that a
-
baseline it's a CSV a KV store it's
-
going to occupy space on your search
-
head and you can run out of dis space
-
you only have so much typically you
-
build a lot of space on our indexers our
-
search heads are not huge on dis space
-
just be aware as you start to build
-
large baselines one you'll have
-
performance issues the more values it
-
has to look against the slower your
-
search will run and it's going to take
-
up physical disc spas on your uh machine
-
so that's something you need to just
-
just be aware
-
of I'm going to recommend a hybrid
-
approach and that's the ability to com
-
combine both we're going to do a rolling
-
window and AOW listing and so the basic
-
concept is we're going to use uh your
-
query goes here so you're going to write
-
your query and then you're going to use
-
this collect command this uh this is not
-
a Comm about the different syntax and
-
Splunk but just know that if you use
-
this pipe collect command you will write
-
to a summary index summary indexes are a
-
form of indexing that do not cost you on
-
inest on an ingestion license you can
-
write to a summary index and then you
-
can query that index just like you could
-
query any other index and so you can
-
save your results in an index and the
-
concept is I typically like if I want to
-
build these I might write every day a
-
query and I'm going to write it to the
-
index and it will timestamp it with
-
today's information tomorrow we'll have
-
tomorrow's information and yesterday
-
we'll have yesterday's information and
-
we can query it and search it and so
-
you'll basically write your query you'll
-
run the collect command index equal
-
summary source and give it a name then
-
what you want to do is now that you've
-
done that you'll build that's going to
-
be building your alert uh then you're
-
going to come in here and you're going
-
to look at that summary data and you're
-
going going to append to that those
-
results that fired for that day so you
-
look at the last set of time it ran so
-
maybe you run this once a day you look
-
at yesterday's results you put that in
-
here and then you'll use this append
-
command which will append the lookup I
-
said allow list it should be a disallow
-
list that was a bad writing here gota
-
love gota be careful the uh descriptions
-
you use this is a you're going to grab a
-
list of things you don't you consider to
-
be anomalous so if you see these you
-
want to flag on them it's not what I've
-
done before the CSV which is this is my
-
normal Baseline these are bad events I
-
don't want these events and so I'm going
-
to do the input lookup allow list. CSV
-
and I'm going to do a table on the
-
matching Fields so this what matching
-
from this query over here and then I'm
-
going to Stats count by the matching
-
fields that will basically dup for those
-
who the DD command will remove the
-
duplicates but stats does the same thing
-
and it's more efficient so if you want
-
to write D you can but I I recommend
-
that you learn the power of the stats
-
command it is fast it's got it uh it's
-
just the right it's the right command to
-
use so stats count by matching this
-
basically removing the duplicate so if
-
it was in this index summary index and
-
it's in my lookup we're not going to
-
write it in twice and then we'll write
-
it to this allow lless CSV which will
-
update it which means all the new things
-
that were found will be then written
-
into this lookup and then it will be
-
updated it'll have a new lookup with the
-
results combined so an example that
-
would be index equals Bots V3 uh so
-
Source typ equals perm MK process stats
-
Min time as early as time Max this is
-
all the exact same query nothing's
-
really changed the difference is after I
-
do this eval time I'm not going to do
-
this lookup lookup I'm going to use the
-
name of this CSV I'm going to do
-
instance as instance host as host output
-
instance as recurring I need a value
-
that shows hey I matched the CSV and
-
then I'm going to go where and I
-
actually changed this I forgot Max time
-
I want a mint time and a Max time and
-
the reason being is the min time is
-
looking to see if the value falls in the
-
x of the XY on my rowing window the max
-
time is to use to find if it's in the Y
-
area and we'll explain that so I still
-
have the same where earliest time is
-
greater than now time that's going to
-
say hey I've never seen this event
-
before or recurring equals star meaning
-
hey I got a match on this value and the
-
latest time is greater than the now time
-
meaning it's in the Y section that means
-
this alert that is on my list of things
-
I don't want to keep seeing I don't want
-
to
-
allow it just showed up again that way
-
you can you'll be notified again that it
-
occurred you're updating your key your
-
uh lookup file and you're using a
-
rolling window and so you kind of get
-
the best of both worlds and you can use
-
this to a as a method to automate
-
keeping up to date on any of your
-
alerts and now we're going to demo that
-
I've got a video of it we're going to go
-
watch that and then we'll come back
-
all right so this is the hybrid approach
-
that we've been talking about it's going
-
to look exactly like we did before
-
you've got yourself a the the normal
-
query we're going to just make some form
-
of query here uh this is going to build
-
our processes and then we're going to
-
write a collect command this col collect
-
command will write our results
-
into a summary index and that's the
-
index equals summary and then we go
-
Source equals new process that's going
-
to define the the name of the source for
-
this summary index and so we're going to
-
do that run the run the results and we
-
can see that we got two values coming
-
back if we jump over here this is where
-
we're going to take that very same we
-
can see the summary index being run
-
there's my two results we can query them
-
just like any other index and you'll
-
notice index equals summary Source
-
equals new process I'm going to use this
-
append command this append is going to
-
add this lookup this new process. CSV
-
and then we're going to put in the table
-
of the instance and the host I'm just
-
going to do a stats count by instance
-
host that's basically going to dup so I
-
don't it's going to take the index
-
summary and the input lookup and make
-
them one if there's any duplicates there
-
so I don't get them twice that's what
-
that command's going to do it's faster
-
than DD I'm going to Output the lookup
-
to new process. CSV and that's going to
-
write that what was the original new
-
process CSV and it's going to update
-
with any new fills coming from this
-
summary
-
index so we can see that being run if I
-
go over and we're going to get that
-
taken care of we go run
-
this what it's going to do it's going to
-
grab the summary index stuff and it's
-
going to grab the stuff that was already
-
in my CSV and it's going to write them
-
in there and so now I have four
-
values and those all got written to the
-
CSV now I'm going to write my query that
-
I've been doing this rolling window all
-
all over again we're going to do the
-
difference is I'm going to add a Max
-
time in there it's not just going to be
-
a m time it's also going to be a Max
-
time so I can look at the Y side of the
-
equation and then I'm going to do this
-
lookup new process instance as instance
-
host as host output instance as
-
recurring I need to make the output
-
instance that shows me what matched on
-
this lookup it's kind of like a join
-
command it's going to join the CSV to
-
the previous values and whatever
-
whatever matches is going to be output
-
and I'm going to do same like I've
-
always been doing earliest time greater
-
than now time and that's normal and then
-
we're going to add this recurring equals
-
star recurring equals star means it
-
matched on something I have a value and
-
latest time is greater than now time and
-
that's going to look is there a value in
-
the Y field and is it a recurring field
-
and if it is that's going to alert and
-
so we can see that being
-
run and we're just going to see the
-
we're going to go back to those two
-
Fields these two new field these two
-
Fields occurred during the new window
-
and they'll keep showing up as often as
-
they occur
-
all right we basically showed how we can
-
combine those two approaches in a hybrid
-
me uh approach we created our look up of
-
anomalous Behavior so they don't get
-
excluded we use that to validate how and
-
then there are other things like we can
-
do we can result look at the results of
-
one against the other and we can see if
-
there's changes in our environment how
-
much change is going on there's a lot of
-
things that this gives you abilities to
-
gain more insights
-
into all right so what's next I shown
-
you how you can build baselines I've
-
shown you examples of them I've given
-
you multiple methods what I want you to
-
do is now look at your environment think
-
right now what do I have in my
-
environment that I could Baseline what
-
could I grab what logs could I do I
-
could take that very same approach and I
-
want you to think about it right now
-
write it down and let's go go do it this
-
video is great but if you don't take
-
action on it this video will not have
-
served its full purpose so take that
-
time right now to think what video
-
videos what logs do I have that I can
-
use and what approach can I do to make a
-
Baseline and check for anomalous
-
events thank you so much for your time
-
and I now open it up two questions