WEBVTT

00:00:01.040 --> 00:00:03.280
Hi there. My name is Greg Ainslie-Malik,

00:00:03.280 --> 00:00:05.040
and I'd like to take you on a really

00:00:05.040 --> 00:00:06.319
brief tour

00:00:06.319 --> 00:00:08.320
through Splunk's machine learning

00:00:08.320 --> 00:00:10.160
toolkit.

00:00:10.160 --> 00:00:14.240
Originally developed for what Gartner

00:00:14.240 --> 00:00:17.279
termed citizen data scientists,

00:00:17.279 --> 00:00:19.520
the machine learning toolkit presents a

00:00:19.520 --> 00:00:20.720
whole host of

00:00:20.720 --> 00:00:24.240
features for customers

00:00:24.240 --> 00:00:26.800
mostly focused around assistance and

00:00:26.800 --> 00:00:27.840
experiments

00:00:27.840 --> 00:00:29.519
to help users who aren't familiar with

00:00:29.519 --> 00:00:31.359
data science

00:00:31.359 --> 00:00:34.000
train and test machine learning models

00:00:34.000 --> 00:00:36.640
and deploy them into production.

00:00:36.640 --> 00:00:38.879
And most of these assistants present as

00:00:38.879 --> 00:00:41.600
kind of guided interfaces where you can

00:00:41.600 --> 00:00:44.000
input some SPL, something that our users

00:00:44.000 --> 00:00:46.000
are very familiar with,

00:00:46.000 --> 00:00:47.760
select some algorithms, do some

00:00:47.760 --> 00:00:49.200
pre-processing,

00:00:49.200 --> 00:00:50.879
things that our users are less familiar

00:00:50.879 --> 00:00:53.840
with, and then view a set of dashboards, a

00:00:53.840 --> 00:00:56.000
set of reports that tell them about

00:00:56.000 --> 00:00:59.840
their model's performance.

00:01:00.000 --> 00:01:03.359
However, what we see from the telemetry

00:01:03.359 --> 00:01:06.240
is that these experiments are generally

00:01:06.240 --> 00:01:09.439
used as almost like pseudo training to help

00:01:09.439 --> 00:01:13.680
users familiarize themselves with MLTK, but of

00:01:13.680 --> 00:01:15.840
the monthly active users,

00:01:15.840 --> 00:01:19.680
actually more than 95% of them run

00:01:19.680 --> 00:01:22.400
MLTK searches straight from the search

00:01:22.400 --> 00:01:23.439
bar.

00:01:23.439 --> 00:01:25.840
So here you can see an example of that

00:01:25.840 --> 00:01:27.600
where we're using the fit command

00:01:27.600 --> 00:01:30.799
that ships with MLTK to apply an anomaly

00:01:30.799 --> 00:01:32.880
detection search.

00:01:32.880 --> 00:01:34.720
And you can see that this is actually

00:01:34.720 --> 00:01:37.119
just two lines of SPL.

00:01:37.119 --> 00:01:40.000
So for our NOC and SOC personas, those

00:01:40.000 --> 00:01:41.439
who are very familiar to us

00:01:41.439 --> 00:01:44.720
at Splunk, this is quite a simple thing

00:01:44.720 --> 00:01:47.040
to do.

00:01:47.280 --> 00:01:50.159
Now, while the search bar and the

00:01:50.159 --> 00:01:52.399
experiments can help our users develop

00:01:52.399 --> 00:01:53.520
and deploy

00:01:53.520 --> 00:01:55.439
simple techniques like this for finding

00:01:55.439 --> 00:01:58.399
anomalies or making predictions,

00:01:58.399 --> 00:02:01.360
what we're starting to see is a trend

00:02:01.360 --> 00:02:02.079
towards

00:02:02.079 --> 00:02:04.479
use case focused workflows. Here we have

00:02:04.479 --> 00:02:07.670
one for ITSI

00:02:07.670 --> 00:02:08.560
where

00:02:08.560 --> 00:02:10.399
more complex techniques can be run

00:02:10.399 --> 00:02:11.840
against data without

00:02:11.840 --> 00:02:14.319
having to see the details of the ML

00:02:14.319 --> 00:02:15.760
that's being applied at all.

00:02:15.760 --> 00:02:17.840
So here we have a list of episodes,

00:02:17.840 --> 00:02:20.239
incidents in ITSI.

00:02:20.239 --> 00:02:24.000
Where I'm clicking on an incident, some-

00:02:24.000 --> 00:02:26.160
a technique called causal inference gets

00:02:26.160 --> 00:02:27.360
run in the background

00:02:27.360 --> 00:02:29.360
to determine the root cause of that

00:02:29.360 --> 00:02:31.040
incident, and you can see here a graph

00:02:31.040 --> 00:02:33.040
structure that has mapped out

00:02:33.040 --> 00:02:36.080
those root cause relationships, and up

00:02:36.080 --> 00:02:38.080
here you can see a table where

00:02:38.080 --> 00:02:40.400
for the service that was impacted by the

00:02:40.400 --> 00:02:41.200
incident,

00:02:41.200 --> 00:02:43.200
here are all the KPIs that are affected

00:02:43.200 --> 00:02:45.120
it. And I'm clicking in this,

00:02:45.120 --> 00:02:48.319
we can quickly drill down and see what

00:02:48.319 --> 00:02:50.720
the raw data looked like,

00:02:50.720 --> 00:02:52.400
and I could draw the conclusion that

00:02:52.400 --> 00:02:54.720
perhaps it was disk space used

00:02:54.720 --> 00:02:57.120
that was the reason behind this incident

00:02:57.120 --> 00:03:01.840
in this case.