Hi there. My name is Greg Ainslie-Malik,
and I'd like to take you on a really
brief tour
through Splunk's machine learning
toolkit.
Originally developed for what Gartner
termed citizen data scientists,
the machine learning toolkit presents a
whole host of
features for customers
mostly focused around assistance and
experiments
to help users who aren't familiar with
data science
train and test machine learning models
and deploy them into production.
And most of these assistants present as
kind of guided interfaces where you can
input some SPL, something that our users
are very familiar with,
select some algorithms, do some
pre-processing,
things that our users are less familiar
with, and then view a set of dashboards, a
set of reports that tell them about
their model's performance.
However, what we see from the telemetry
is that these experiments are generally
used as almost like pseudo training to help
users familiarize themselves with MLTK, but of
the monthly active users,
actually more than 95% of them run
MLTK searches straight from the search
bar.
So here you can see an example of that
where we're using the fit command
that ships with MLTK to apply an anomaly
detection search.
And you can see that this is actually
just two lines of SPL.
So for our NOC and SOC personas, those
who are very familiar to us
at Splunk, this is quite a simple thing
to do.
Now, while the search bar and the
experiments can help our users develop
and deploy
simple techniques like this for finding
anomalies or making predictions,
what we're starting to see is a trend
towards
use case focused workflows. Here we have
one for ITSI
where
more complex techniques can be run
against data without
having to see the details of the ML
that's being applied at all.
So here we have a list of episodes,
incidents in ITSI.
Where I'm clicking on an incident, some-
a technique called causal inference gets
run in the background
to determine the root cause of that
incident, and you can see here a graph
structure that has mapped out
those root cause relationships, and up
here you can see a table where
for the service that was impacted by the
incident,
here are all the KPIs that are affected
it. And I'm clicking in this,
we can quickly drill down and see what
the raw data looked like,
and I could draw the conclusion that
perhaps it was disk space used
that was the reason behind this incident
in this case.