[Script Info]
Title: 
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:01.20,0:00:03.76,Default,,0000,0000,0000,,Hello everyone, my name is Victor. I'm
Dialogue: 0,0:00:03.76,0:00:05.36,Default,,0000,0000,0000,,your friendly neighborhood data
Dialogue: 0,0:00:05.36,0:00:07.76,Default,,0000,0000,0000,,scientist from DreamCatcher. So in this
Dialogue: 0,0:00:07.76,0:00:10.16,Default,,0000,0000,0000,,presentation, I would like to talk about
Dialogue: 0,0:00:10.16,0:00:12.76,Default,,0000,0000,0000,,a specific industry use case of AI or
Dialogue: 0,0:00:12.76,0:00:15.07,Default,,0000,0000,0000,,machine learning which is predictive
Dialogue: 0,0:00:15.07,0:00:19.00,Default,,0000,0000,0000,,maintenance. So I will be covering these
Dialogue: 0,0:00:19.00,0:00:21.32,Default,,0000,0000,0000,,topics and feel free to jump forward to
Dialogue: 0,0:00:21.32,0:00:23.36,Default,,0000,0000,0000,,the specific part in the video where I
Dialogue: 0,0:00:23.36,0:00:25.16,Default,,0000,0000,0000,,talk about all these topics. So I'm going
Dialogue: 0,0:00:25.16,0:00:27.16,Default,,0000,0000,0000,,to start off with a general preview of
Dialogue: 0,0:00:27.16,0:00:29.08,Default,,0000,0000,0000,,AI and machine learning. Then, I'll
Dialogue: 0,0:00:29.08,0:00:30.84,Default,,0000,0000,0000,,discuss the use case which is predictive
Dialogue: 0,0:00:30.84,0:00:32.72,Default,,0000,0000,0000,,maintenance. I'll talk about the basics
Dialogue: 0,0:00:32.72,0:00:34.80,Default,,0000,0000,0000,,of machine learning, the workflow of
Dialogue: 0,0:00:34.80,0:00:37.24,Default,,0000,0000,0000,,machine learning, and then we will come
Dialogue: 0,0:00:37.24,0:00:40.76,Default,,0000,0000,0000,,to the meat of this presentation which
Dialogue: 0,0:00:40.76,0:00:43.68,Default,,0000,0000,0000,,is essentially a demonstration of the
Dialogue: 0,0:00:43.68,0:00:45.40,Default,,0000,0000,0000,,machine learning workflow from end to
Dialogue: 0,0:00:45.40,0:00:47.58,Default,,0000,0000,0000,,end on a real life predictive
Dialogue: 0,0:00:47.58,0:00:51.52,Default,,0000,0000,0000,,maintenance domain problem. All right, so
Dialogue: 0,0:00:51.52,0:00:53.64,Default,,0000,0000,0000,,without any further ado, let's jump into
Dialogue: 0,0:00:53.64,0:00:56.68,Default,,0000,0000,0000,,it. So let's start off with a quick
Dialogue: 0,0:00:56.68,0:01:00.08,Default,,0000,0000,0000,,preview of AI and machine learning. Well
Dialogue: 0,0:01:00.08,0:01:03.60,Default,,0000,0000,0000,,AI is a very general term, it encompasses
Dialogue: 0,0:01:03.60,0:01:06.68,Default,,0000,0000,0000,,the entire area of science and
Dialogue: 0,0:01:06.68,0:01:09.04,Default,,0000,0000,0000,,engineering that is related to creating
Dialogue: 0,0:01:09.04,0:01:10.84,Default,,0000,0000,0000,,software programs and machines that
Dialogue: 0,0:01:10.84,0:01:13.76,Default,,0000,0000,0000,,will be capable of performing tasks
Dialogue: 0,0:01:13.76,0:01:16.08,Default,,0000,0000,0000,,that would normally require human
Dialogue: 0,0:01:16.08,0:01:19.60,Default,,0000,0000,0000,,intelligence. But AI is a catchall term,
Dialogue: 0,0:01:19.60,0:01:22.92,Default,,0000,0000,0000,,so really when we talk about apply AI,
Dialogue: 0,0:01:22.92,0:01:25.92,Default,,0000,0000,0000,,how we use AI in our daily work, we are
Dialogue: 0,0:01:25.92,0:01:27.72,Default,,0000,0000,0000,,really going to be talking about machine
Dialogue: 0,0:01:27.72,0:01:30.00,Default,,0000,0000,0000,,learning. So machine learning is the
Dialogue: 0,0:01:30.00,0:01:31.68,Default,,0000,0000,0000,,design and application of software
Dialogue: 0,0:01:31.68,0:01:34.08,Default,,0000,0000,0000,,algorithms that are capable of learning
Dialogue: 0,0:01:34.08,0:01:37.96,Default,,0000,0000,0000,,on their own without any explicit human
Dialogue: 0,0:01:37.96,0:01:40.40,Default,,0000,0000,0000,,intervention. And the primary purpose of
Dialogue: 0,0:01:40.40,0:01:43.28,Default,,0000,0000,0000,,these algorithms are to optimize
Dialogue: 0,0:01:43.28,0:01:46.84,Default,,0000,0000,0000,,performance in a specific task. And the
Dialogue: 0,0:01:46.84,0:01:49.68,Default,,0000,0000,0000,,primary performance or the primary task
Dialogue: 0,0:01:49.68,0:01:52.00,Default,,0000,0000,0000,,that you want to optimize performance in
Dialogue: 0,0:01:52.00,0:01:54.24,Default,,0000,0000,0000,,is to be able to make accurate
Dialogue: 0,0:01:54.24,0:01:57.48,Default,,0000,0000,0000,,predictions about future outcomes based
Dialogue: 0,0:01:57.48,0:02:00.56,Default,,0000,0000,0000,,on the analysis of historical data
Dialogue: 0,0:02:00.56,0:02:02.96,Default,,0000,0000,0000,,from the past. So essentially machine
Dialogue: 0,0:02:02.96,0:02:05.32,Default,,0000,0000,0000,,learning is about making predictions
Dialogue: 0,0:02:05.32,0:02:06.88,Default,,0000,0000,0000,,about the future or what we call
Dialogue: 0,0:02:06.88,0:02:08.92,Default,,0000,0000,0000,,predictive analytics.
Dialogue: 0,0:02:08.92,0:02:11.00,Default,,0000,0000,0000,,And there are many different
Dialogue: 0,0:02:11.00,0:02:12.72,Default,,0000,0000,0000,,kinds of algorithms that are available in
Dialogue: 0,0:02:12.72,0:02:14.52,Default,,0000,0000,0000,,machine learning under the three primary
Dialogue: 0,0:02:14.52,0:02:16.44,Default,,0000,0000,0000,,categories of supervised learning,
Dialogue: 0,0:02:16.44,0:02:18.92,Default,,0000,0000,0000,,unsupervised learning, and reinforcement
Dialogue: 0,0:02:18.92,0:02:21.44,Default,,0000,0000,0000,,learning. And here we can see some of the
Dialogue: 0,0:02:21.44,0:02:23.56,Default,,0000,0000,0000,,different kinds of algorithms and their
Dialogue: 0,0:02:23.56,0:02:27.48,Default,,0000,0000,0000,,use cases in various areas in
Dialogue: 0,0:02:27.48,0:02:29.68,Default,,0000,0000,0000,,industry. So we have various domain use
Dialogue: 0,0:02:29.68,0:02:30.48,Default,,0000,0000,0000,,cases
Dialogue: 0,0:02:30.48,0:02:31.80,Default,,0000,0000,0000,,for all these different kind of
Dialogue: 0,0:02:31.80,0:02:33.84,Default,,0000,0000,0000,,algorithms, and we can see that different
Dialogue: 0,0:02:33.84,0:02:38.12,Default,,0000,0000,0000,,algorithms are fitted for different use cases.
Dialogue: 0,0:02:38.12,0:02:41.00,Default,,0000,0000,0000,,Deep learning is an advanced form
Dialogue: 0,0:02:41.00,0:02:42.40,Default,,0000,0000,0000,,of machine learning that's based on
Dialogue: 0,0:02:42.40,0:02:44.28,Default,,0000,0000,0000,,something called an artificial neural
Dialogue: 0,0:02:44.28,0:02:46.32,Default,,0000,0000,0000,,network or ANN for short, and this
Dialogue: 0,0:02:46.32,0:02:47.84,Default,,0000,0000,0000,,essentially simulates the structure of
Dialogue: 0,0:02:47.84,0:02:49.52,Default,,0000,0000,0000,,the human brain whereby neurons
Dialogue: 0,0:02:49.52,0:02:51.36,Default,,0000,0000,0000,,interconnect and work together to
Dialogue: 0,0:02:51.36,0:02:54.96,Default,,0000,0000,0000,,process and learn new information. So DL
Dialogue: 0,0:02:54.96,0:02:57.24,Default,,0000,0000,0000,,is the foundational technology for most
Dialogue: 0,0:02:57.24,0:02:59.36,Default,,0000,0000,0000,,of the popular AI tools that you
Dialogue: 0,0:02:59.36,0:03:01.40,Default,,0000,0000,0000,,probably have heard of today. So I'm sure
Dialogue: 0,0:03:01.40,0:03:03.20,Default,,0000,0000,0000,,you have heard of ChatGPT if you haven't
Dialogue: 0,0:03:03.20,0:03:05.36,Default,,0000,0000,0000,,been living in a cave for the past 2
Dialogue: 0,0:03:05.36,0:03:08.28,Default,,0000,0000,0000,,years. And yeah, so ChatGPT is an example
Dialogue: 0,0:03:08.28,0:03:10.12,Default,,0000,0000,0000,,of what we call a large language model
Dialogue: 0,0:03:10.12,0:03:11.60,Default,,0000,0000,0000,,and that's based on this technology
Dialogue: 0,0:03:11.60,0:03:14.88,Default,,0000,0000,0000,,called deep learning. Also, all the modern
Dialogue: 0,0:03:14.88,0:03:17.44,Default,,0000,0000,0000,,computer vision applications where a
Dialogue: 0,0:03:17.44,0:03:20.04,Default,,0000,0000,0000,,computer program can classify images or
Dialogue: 0,0:03:20.04,0:03:23.24,Default,,0000,0000,0000,,detect images or recognize images on
Dialogue: 0,0:03:23.24,0:03:25.28,Default,,0000,0000,0000,,its own, okay, we call this computer
Dialogue: 0,0:03:25.28,0:03:27.76,Default,,0000,0000,0000,,vision applications. They also use
Dialogue: 0,0:03:27.76,0:03:29.52,Default,,0000,0000,0000,,this particular form of machine learning
Dialogue: 0,0:03:29.52,0:03:31.56,Default,,0000,0000,0000,,called deep learning, right? So this is a
Dialogue: 0,0:03:31.56,0:03:33.64,Default,,0000,0000,0000,,example of an artificial neural network.
Dialogue: 0,0:03:33.64,0:03:35.20,Default,,0000,0000,0000,,For example, here I have an image of a
Dialogue: 0,0:03:35.20,0:03:37.16,Default,,0000,0000,0000,,bird that's fed into this artificial
Dialogue: 0,0:03:37.16,0:03:39.56,Default,,0000,0000,0000,,neural network, and output from this
Dialogue: 0,0:03:39.56,0:03:41.24,Default,,0000,0000,0000,,artificial neural network is a
Dialogue: 0,0:03:41.24,0:03:43.96,Default,,0000,0000,0000,,classification of this image into one of
Dialogue: 0,0:03:43.96,0:03:46.40,Default,,0000,0000,0000,,these three potential categories. So in
Dialogue: 0,0:03:46.40,0:03:49.08,Default,,0000,0000,0000,,this case, if the ANN has been trained
Dialogue: 0,0:03:49.08,0:03:51.80,Default,,0000,0000,0000,,properly, we fit in this image, this
Dialogue: 0,0:03:51.80,0:03:54.08,Default,,0000,0000,0000,,ANN should correctly classify this image
Dialogue: 0,0:03:54.08,0:03:56.88,Default,,0000,0000,0000,,as a bird, right? So this is a image
Dialogue: 0,0:03:56.88,0:03:58.96,Default,,0000,0000,0000,,classification problem which is a
Dialogue: 0,0:03:58.96,0:04:01.08,Default,,0000,0000,0000,,classic use case for an artificial
Dialogue: 0,0:04:01.08,0:04:03.93,Default,,0000,0000,0000,,neural network in the field of computer
Dialogue: 0,0:04:03.93,0:04:07.88,Default,,0000,0000,0000,,vision. And just like in the case of
Dialogue: 0,0:04:07.88,0:04:09.40,Default,,0000,0000,0000,,machine learning, there are a variety of
Dialogue: 0,0:04:09.40,0:04:11.64,Default,,0000,0000,0000,,algorithms that are available for
Dialogue: 0,0:04:11.64,0:04:13.60,Default,,0000,0000,0000,,deep learning under the category of
Dialogue: 0,0:04:13.60,0:04:15.00,Default,,0000,0000,0000,,supervised learning and also
Dialogue: 0,0:04:15.00,0:04:16.84,Default,,0000,0000,0000,,unsupervised learning.
Dialogue: 0,0:04:16.84,0:04:19.20,Default,,0000,0000,0000,,All right, so this is how we can
Dialogue: 0,0:04:19.20,0:04:20.84,Default,,0000,0000,0000,,kind of categorize this. You can think of
Dialogue: 0,0:04:20.84,0:04:23.88,Default,,0000,0000,0000,,AI is a general area of smart systems
Dialogue: 0,0:04:23.88,0:04:26.56,Default,,0000,0000,0000,,and machine. Machine learning is
Dialogue: 0,0:04:26.56,0:04:29.36,Default,,0000,0000,0000,,basically apply AI and deep learning
Dialogue: 0,0:04:29.36,0:04:29.82,Default,,0000,0000,0000,,is a
Dialogue: 0,0:04:29.82,0:04:32.56,Default,,0000,0000,0000,,subspecialization of machine learning
Dialogue: 0,0:04:32.56,0:04:35.00,Default,,0000,0000,0000,,using a particular architecture called
Dialogue: 0,0:04:35.00,0:04:38.76,Default,,0000,0000,0000,,an artificial neural network.
Dialogue: 0,0:04:38.76,0:04:42.16,Default,,0000,0000,0000,,And generative AI, so if you talk
Dialogue: 0,0:04:42.16,0:04:45.28,Default,,0000,0000,0000,,about ChatGPT, okay, Google Gemini,
Dialogue: 0,0:04:45.28,0:04:47.64,Default,,0000,0000,0000,,Microsoft Copilot, okay, all these
Dialogue: 0,0:04:47.64,0:04:49.60,Default,,0000,0000,0000,,examples of generative AI, they are
Dialogue: 0,0:04:49.60,0:04:51.60,Default,,0000,0000,0000,,basically large language models, and they
Dialogue: 0,0:04:51.60,0:04:53.88,Default,,0000,0000,0000,,are a further subcategory within the
Dialogue: 0,0:04:53.88,0:04:55.17,Default,,0000,0000,0000,,area of deep
Dialogue: 0,0:04:55.17,0:04:57.76,Default,,0000,0000,0000,,learning. And there are many applications
Dialogue: 0,0:04:57.76,0:04:59.40,Default,,0000,0000,0000,,of machine learning in industry right
Dialogue: 0,0:04:59.40,0:05:01.76,Default,,0000,0000,0000,,now, so pick which particular industry
Dialogue: 0,0:05:01.76,0:05:03.68,Default,,0000,0000,0000,,are you involved in, and these are all the
Dialogue: 0,0:05:03.68,0:05:05.06,Default,,0000,0000,0000,,specific areas of
Dialogue: 0,0:05:05.06,0:05:09.96,Default,,0000,0000,0000,,applications, right? So probably, I'm
Dialogue: 0,0:05:09.96,0:05:11.68,Default,,0000,0000,0000,,going to guess the vast majority of you
Dialogue: 0,0:05:11.68,0:05:12.88,Default,,0000,0000,0000,,who are watching this video, you're
Dialogue: 0,0:05:12.88,0:05:14.36,Default,,0000,0000,0000,,probably coming from the manufacturing
Dialogue: 0,0:05:14.36,0:05:16.64,Default,,0000,0000,0000,,industry, and so in the manufacturing
Dialogue: 0,0:05:16.64,0:05:18.48,Default,,0000,0000,0000,,industry some of the standard use cases
Dialogue: 0,0:05:18.48,0:05:20.04,Default,,0000,0000,0000,,for machine learning and deep learning
Dialogue: 0,0:05:20.04,0:05:23.08,Default,,0000,0000,0000,,are predicting potential problems, okay?
Dialogue: 0,0:05:23.08,0:05:25.32,Default,,0000,0000,0000,,So sometimes you call this predictive
Dialogue: 0,0:05:25.32,0:05:27.16,Default,,0000,0000,0000,,maintenance where you want to predict
Dialogue: 0,0:05:27.16,0:05:28.80,Default,,0000,0000,0000,,when a problem is going to happen and
Dialogue: 0,0:05:28.80,0:05:30.40,Default,,0000,0000,0000,,then kind of address it before it
Dialogue: 0,0:05:30.40,0:05:32.76,Default,,0000,0000,0000,,happens. And then monitoring systems,
Dialogue: 0,0:05:32.76,0:05:35.20,Default,,0000,0000,0000,,automating your manufacturing assembly
Dialogue: 0,0:05:35.20,0:05:37.88,Default,,0000,0000,0000,,line or production line, okay, smart
Dialogue: 0,0:05:37.88,0:05:40.12,Default,,0000,0000,0000,,scheduling, and detecting anomaly on your
Dialogue: 0,0:05:40.12,0:05:41.48,Default,,0000,0000,0000,,production line.
Dialogue: 0,0:05:42.39,0:05:44.16,Default,,0000,0000,0000,,Okay, so let's talk about the use
Dialogue: 0,0:05:44.16,0:05:45.68,Default,,0000,0000,0000,,case here which is predictive
Dialogue: 0,0:05:45.68,0:05:49.28,Default,,0000,0000,0000,,maintenance, right? So what is predictive
Dialogue: 0,0:05:49.28,0:05:51.72,Default,,0000,0000,0000,,maintenance? Well predictive maintenance,
Dialogue: 0,0:05:51.72,0:05:53.20,Default,,0000,0000,0000,,here's the long definition, is a
Dialogue: 0,0:05:53.20,0:05:54.64,Default,,0000,0000,0000,,equipment maintenance strategy that
Dialogue: 0,0:05:54.64,0:05:56.28,Default,,0000,0000,0000,,relies on real-time monitoring of
Dialogue: 0,0:05:56.28,0:05:58.36,Default,,0000,0000,0000,,equipment conditions and data to predict
Dialogue: 0,0:05:58.36,0:06:00.28,Default,,0000,0000,0000,,equipment failures in advance.
Dialogue: 0,0:06:00.28,0:06:02.68,Default,,0000,0000,0000,,And this uses advanced data models,
Dialogue: 0,0:06:02.68,0:06:05.24,Default,,0000,0000,0000,,analytics, and machine learning whereby
Dialogue: 0,0:06:05.24,0:06:07.48,Default,,0000,0000,0000,,we can reliably assess when failures are
Dialogue: 0,0:06:07.48,0:06:09.20,Default,,0000,0000,0000,,more likely to occur, including which
Dialogue: 0,0:06:09.20,0:06:11.12,Default,,0000,0000,0000,,components are more likely to be
Dialogue: 0,0:06:11.12,0:06:13.56,Default,,0000,0000,0000,,affected on your production or assembly
Dialogue: 0,0:06:13.56,0:06:16.60,Default,,0000,0000,0000,,line. So where does predictive
Dialogue: 0,0:06:16.60,0:06:18.76,Default,,0000,0000,0000,,maintenance fit into the overall scheme
Dialogue: 0,0:06:18.76,0:06:20.76,Default,,0000,0000,0000,,of things, right? So let's talk about the
Dialogue: 0,0:06:20.76,0:06:23.04,Default,,0000,0000,0000,,kind of standard way that, you know,
Dialogue: 0,0:06:23.04,0:06:25.52,Default,,0000,0000,0000,,factories or production
Dialogue: 0,0:06:25.52,0:06:27.68,Default,,0000,0000,0000,,lines, assembly lines in factories tend
Dialogue: 0,0:06:27.68,0:06:31.08,Default,,0000,0000,0000,,to handle maintenance issues say
Dialogue: 0,0:06:31.08,0:06:33.12,Default,,0000,0000,0000,,10 or 20 years ago, right? So what you
Dialogue: 0,0:06:33.12,0:06:34.52,Default,,0000,0000,0000,,have is the, what you would probably
Dialogue: 0,0:06:34.52,0:06:36.40,Default,,0000,0000,0000,,start off is the most basic mode
Dialogue: 0,0:06:36.40,0:06:38.24,Default,,0000,0000,0000,,which is reactive maintenance. So you
Dialogue: 0,0:06:38.24,0:06:40.68,Default,,0000,0000,0000,,just wait until your machine breaks down
Dialogue: 0,0:06:40.68,0:06:43.04,Default,,0000,0000,0000,,and then you repair, right? The simplest,
Dialogue: 0,0:06:43.04,0:06:44.72,Default,,0000,0000,0000,,but, of course, I'm sure if you have worked on a
Dialogue: 0,0:06:44.72,0:06:46.72,Default,,0000,0000,0000,,production line for any period of time,
Dialogue: 0,0:06:46.72,0:06:48.88,Default,,0000,0000,0000,,you know that this reactive maintenance
Dialogue: 0,0:06:48.88,0:06:50.76,Default,,0000,0000,0000,,can give you a whole bunch of headaches
Dialogue: 0,0:06:50.76,0:06:52.16,Default,,0000,0000,0000,,especially if the machine breaks down
Dialogue: 0,0:06:52.16,0:06:54.12,Default,,0000,0000,0000,,just before a critical delivery deadline,
Dialogue: 0,0:06:54.12,0:06:55.52,Default,,0000,0000,0000,,right? Then you're going to have a
Dialogue: 0,0:06:55.52,0:06:56.80,Default,,0000,0000,0000,,backlog of orders and you're going to
Dialogue: 0,0:06:56.80,0:06:59.16,Default,,0000,0000,0000,,run to a lot of problems. Okay, so we move on
Dialogue: 0,0:06:59.16,0:07:00.88,Default,,0000,0000,0000,,to preventive maintenance which is
Dialogue: 0,0:07:00.88,0:07:03.84,Default,,0000,0000,0000,,you regularly schedule a maintenance of
Dialogue: 0,0:07:03.84,0:07:07.00,Default,,0000,0000,0000,,your production machines to reduce
Dialogue: 0,0:07:07.00,0:07:08.80,Default,,0000,0000,0000,,the failure rate. So you might do
Dialogue: 0,0:07:08.80,0:07:10.52,Default,,0000,0000,0000,,maintenance once every month, once every
Dialogue: 0,0:07:10.52,0:07:13.12,Default,,0000,0000,0000,,two weeks, whatever. Okay, this is great,
Dialogue: 0,0:07:13.12,0:07:15.24,Default,,0000,0000,0000,,but the problem, of course, then is well
Dialogue: 0,0:07:15.24,0:07:16.20,Default,,0000,0000,0000,,sometimes you're doing too much
Dialogue: 0,0:07:16.20,0:07:18.40,Default,,0000,0000,0000,,maintenance, it's not really necessary,
Dialogue: 0,0:07:18.40,0:07:20.64,Default,,0000,0000,0000,,and it still doesn't totally prevent
Dialogue: 0,0:07:20.64,0:07:23.24,Default,,0000,0000,0000,,this, you know, a failure of the
Dialogue: 0,0:07:23.24,0:07:25.64,Default,,0000,0000,0000,,machine that occurs outside of your planned
Dialogue: 0,0:07:25.64,0:07:28.68,Default,,0000,0000,0000,,maintenance, right? So a bit of an
Dialogue: 0,0:07:28.68,0:07:31.16,Default,,0000,0000,0000,,improvement, but not that much better.
Dialogue: 0,0:07:31.16,0:07:33.28,Default,,0000,0000,0000,,And then, these last two categories is
Dialogue: 0,0:07:33.28,0:07:34.68,Default,,0000,0000,0000,,where we bring in AI and machine
Dialogue: 0,0:07:34.68,0:07:36.76,Default,,0000,0000,0000,,learning. So with machine learning, we're
Dialogue: 0,0:07:36.76,0:07:39.28,Default,,0000,0000,0000,,going to use sensors to do real-time
Dialogue: 0,0:07:39.28,0:07:41.76,Default,,0000,0000,0000,,monitoring of the data, and then using
Dialogue: 0,0:07:41.76,0:07:43.32,Default,,0000,0000,0000,,that data we're going to build a machine
Dialogue: 0,0:07:43.32,0:07:46.48,Default,,0000,0000,0000,,learning model which helps us to predict,
Dialogue: 0,0:07:46.48,0:07:50.00,Default,,0000,0000,0000,,with a reasonable level of accuracy, when
Dialogue: 0,0:07:50.00,0:07:52.52,Default,,0000,0000,0000,,the next failure is going to happen on
Dialogue: 0,0:07:52.52,0:07:54.44,Default,,0000,0000,0000,,your assembly or production line on a
Dialogue: 0,0:07:54.44,0:07:57.44,Default,,0000,0000,0000,,specific component or specific machine,
Dialogue: 0,0:07:57.44,0:07:59.52,Default,,0000,0000,0000,,right? So you just want to be predict to
Dialogue: 0,0:07:59.52,0:08:01.96,Default,,0000,0000,0000,,a high level of accuracy like maybe
Dialogue: 0,0:08:01.96,0:08:04.44,Default,,0000,0000,0000,,to the specific day, even the specific
Dialogue: 0,0:08:04.44,0:08:06.40,Default,,0000,0000,0000,,hour, or even minute itself when you
Dialogue: 0,0:08:06.40,0:08:08.36,Default,,0000,0000,0000,,expect that particular product to fail
Dialogue: 0,0:08:08.36,0:08:10.96,Default,,0000,0000,0000,,or the particular machine to fail. All
Dialogue: 0,0:08:10.96,0:08:12.64,Default,,0000,0000,0000,,right, so these are the advantages of
Dialogue: 0,0:08:12.64,0:08:14.88,Default,,0000,0000,0000,,predictive maintenance. It minimizes
Dialogue: 0,0:08:14.88,0:08:16.72,Default,,0000,0000,0000,,the occurrence of unscheduled downtime, it
Dialogue: 0,0:08:16.72,0:08:18.08,Default,,0000,0000,0000,,gives you a real-time overview of your
Dialogue: 0,0:08:18.08,0:08:19.92,Default,,0000,0000,0000,,current condition of assets, ensures
Dialogue: 0,0:08:19.92,0:08:22.68,Default,,0000,0000,0000,,minimal disruptions to productivity,
Dialogue: 0,0:08:22.68,0:08:24.72,Default,,0000,0000,0000,,optimizes time you spend on maintenance work,
Dialogue: 0,0:08:24.72,0:08:26.64,Default,,0000,0000,0000,,optimizes the use of spare parts, and so
Dialogue: 0,0:08:26.64,0:08:28.28,Default,,0000,0000,0000,,on. And of course there are some
Dialogue: 0,0:08:28.28,0:08:30.64,Default,,0000,0000,0000,,disadvantages, which is the
Dialogue: 0,0:08:30.64,0:08:32.56,Default,,0000,0000,0000,,primary one, you need a specialized set
Dialogue: 0,0:08:32.56,0:08:35.52,Default,,0000,0000,0000,,of skills among your engineers to
Dialogue: 0,0:08:35.52,0:08:37.72,Default,,0000,0000,0000,,understand and create machine learning
Dialogue: 0,0:08:37.72,0:08:40.60,Default,,0000,0000,0000,,models that can work on the real-time
Dialogue: 0,0:08:40.60,0:08:43.56,Default,,0000,0000,0000,,data that you're getting. Okay, so we're
Dialogue: 0,0:08:43.56,0:08:45.00,Default,,0000,0000,0000,,going to take a look at some real life
Dialogue: 0,0:08:45.00,0:08:47.20,Default,,0000,0000,0000,,use cases. So these are a bunch of links
Dialogue: 0,0:08:47.20,0:08:48.72,Default,,0000,0000,0000,,here, so if you navigate to these links
Dialogue: 0,0:08:48.72,0:08:50.12,Default,,0000,0000,0000,,here, you'll be able to get a look at
Dialogue: 0,0:08:50.12,0:08:54.36,Default,,0000,0000,0000,,some real life use cases of machine
Dialogue: 0,0:08:54.36,0:08:57.64,Default,,0000,0000,0000,,learning in predictive maintenance. So
Dialogue: 0,0:08:57.64,0:09:00.96,Default,,0000,0000,0000,,the IBM website, okay, gives you a look at
Dialogue: 0,0:09:00.96,0:09:04.88,Default,,0000,0000,0000,,a bunch of five use cases, so you can
Dialogue: 0,0:09:04.88,0:09:06.52,Default,,0000,0000,0000,,click on these links and follow up with
Dialogue: 0,0:09:06.52,0:09:08.28,Default,,0000,0000,0000,,them if you want to read more. Okay, this
Dialogue: 0,0:09:08.28,0:09:11.48,Default,,0000,0000,0000,,is waste management, manufacturing, okay,
Dialogue: 0,0:09:11.48,0:09:14.76,Default,,0000,0000,0000,,building services, and renewable energy,
Dialogue: 0,0:09:14.76,0:09:16.88,Default,,0000,0000,0000,,and also mining, right? So these are all
Dialogue: 0,0:09:16.88,0:09:18.28,Default,,0000,0000,0000,,use cases, if you want to know more about
Dialogue: 0,0:09:18.28,0:09:20.48,Default,,0000,0000,0000,,them, you can read up and follow them
Dialogue: 0,0:09:20.48,0:09:23.60,Default,,0000,0000,0000,,from this website. And this website
Dialogue: 0,0:09:23.60,0:09:25.76,Default,,0000,0000,0000,,gives, this is a pretty good website. I
Dialogue: 0,0:09:25.76,0:09:27.72,Default,,0000,0000,0000,,would really encourage you to just look
Dialogue: 0,0:09:27.72,0:09:28.88,Default,,0000,0000,0000,,through this if you're interested in
Dialogue: 0,0:09:28.88,0:09:31.16,Default,,0000,0000,0000,,predictive maintenance. So here, it tells
Dialogue: 0,0:09:31.16,0:09:34.28,Default,,0000,0000,0000,,you about, you know, an industry survey of
Dialogue: 0,0:09:34.28,0:09:36.36,Default,,0000,0000,0000,,predictive maintenance. We can see that a
Dialogue: 0,0:09:36.36,0:09:38.20,Default,,0000,0000,0000,,large portion of the industry,
Dialogue: 0,0:09:38.20,0:09:39.68,Default,,0000,0000,0000,,manufacturing industry agreed that
Dialogue: 0,0:09:39.68,0:09:41.36,Default,,0000,0000,0000,,predictive maintenance is a real need to
Dialogue: 0,0:09:41.36,0:09:43.96,Default,,0000,0000,0000,,stay competitive and predictive
Dialogue: 0,0:09:43.96,0:09:45.24,Default,,0000,0000,0000,,maintenance is essential for
Dialogue: 0,0:09:45.24,0:09:46.72,Default,,0000,0000,0000,,manufacturing industry and will gain
Dialogue: 0,0:09:46.72,0:09:48.28,Default,,0000,0000,0000,,additional strength in the future. So
Dialogue: 0,0:09:48.28,0:09:50.20,Default,,0000,0000,0000,,this is a survey that was done quite
Dialogue: 0,0:09:50.20,0:09:52.04,Default,,0000,0000,0000,,some time ago and this was the results
Dialogue: 0,0:09:52.04,0:09:53.88,Default,,0000,0000,0000,,that we got back. So we can see the vast
Dialogue: 0,0:09:53.88,0:09:55.72,Default,,0000,0000,0000,,majority of key industry players in the
Dialogue: 0,0:09:55.72,0:09:57.64,Default,,0000,0000,0000,,manufacturing sector, they consider
Dialogue: 0,0:09:57.64,0:09:59.00,Default,,0000,0000,0000,,predictive maintenance to be a very
Dialogue: 0,0:09:59.00,0:09:59.84,Default,,0000,0000,0000,,important
Dialogue: 0,0:09:59.84,0:10:01.60,Default,,0000,0000,0000,,activity that they want to
Dialogue: 0,0:10:01.60,0:10:04.52,Default,,0000,0000,0000,,incorporate into their workflow, right?
Dialogue: 0,0:10:04.52,0:10:07.72,Default,,0000,0000,0000,,And we can see here the kind of ROI that
Dialogue: 0,0:10:07.72,0:10:10.68,Default,,0000,0000,0000,,we expect on investment in predictive
Dialogue: 0,0:10:10.68,0:10:13.40,Default,,0000,0000,0000,,maintenance, so 45% reduction in downtime,
Dialogue: 0,0:10:13.40,0:10:17.12,Default,,0000,0000,0000,,25% growth in productivity, 75% fault
Dialogue: 0,0:10:17.12,0:10:19.48,Default,,0000,0000,0000,,elimination, 30% reduction in maintenance
Dialogue: 0,0:10:19.48,0:10:22.64,Default,,0000,0000,0000,,cost, okay? And best of all, if you really
Dialogue: 0,0:10:22.64,0:10:25.04,Default,,0000,0000,0000,,want to kind of take a look at examples,
Dialogue: 0,0:10:25.04,0:10:26.68,Default,,0000,0000,0000,,all right, so there are all these
Dialogue: 0,0:10:26.68,0:10:28.12,Default,,0000,0000,0000,,different companies that have
Dialogue: 0,0:10:28.12,0:10:30.16,Default,,0000,0000,0000,,significantly invested in predictive
Dialogue: 0,0:10:30.16,0:10:31.64,Default,,0000,0000,0000,,maintenance technology in their
Dialogue: 0,0:10:31.64,0:10:34.24,Default,,0000,0000,0000,,manufacturing processes. So PepsiCo, we
Dialogue: 0,0:10:34.24,0:10:38.96,Default,,0000,0000,0000,,have got Frito-Lay, General Motors, Mondi, Ecoplant,
Dialogue: 0,0:10:38.96,0:10:40.96,Default,,0000,0000,0000,,all right? So you can jump over here
Dialogue: 0,0:10:40.96,0:10:42.96,Default,,0000,0000,0000,,and take a look at some of these
Dialogue: 0,0:10:42.96,0:10:46.04,Default,,0000,0000,0000,,use cases. Let me perhaps, let me try and
Dialogue: 0,0:10:46.04,0:10:48.08,Default,,0000,0000,0000,,open this up, for example, Mondi, right? You
Dialogue: 0,0:10:48.08,0:10:51.88,Default,,0000,0000,0000,,can see Mondi has impl- oops. Mondi has used
Dialogue: 0,0:10:51.88,0:10:53.72,Default,,0000,0000,0000,,this particular piece of software
Dialogue: 0,0:10:53.72,0:10:55.84,Default,,0000,0000,0000,,called MATLAB, all right, or MathWorks
Dialogue: 0,0:10:55.84,0:10:59.76,Default,,0000,0000,0000,,sorry, to do predictive maintenance
Dialogue: 0,0:10:59.76,0:11:01.92,Default,,0000,0000,0000,,for their manufacturing processes using
Dialogue: 0,0:11:01.92,0:11:05.08,Default,,0000,0000,0000,,machine learning. And we can talk, you can
Dialogue: 0,0:11:05.08,0:11:07.68,Default,,0000,0000,0000,,study how they have used it, all right,
Dialogue: 0,0:11:07.68,0:11:09.00,Default,,0000,0000,0000,,and how it works, what was their
Dialogue: 0,0:11:09.00,0:11:10.92,Default,,0000,0000,0000,,challenge, all right, the problems they
Dialogue: 0,0:11:10.92,0:11:12.64,Default,,0000,0000,0000,,were facing, the solution that they use
Dialogue: 0,0:11:12.64,0:11:14.56,Default,,0000,0000,0000,,using this MathWorks Consulting piece of
Dialogue: 0,0:11:14.56,0:11:17.16,Default,,0000,0000,0000,,software, and data that they collected in
Dialogue: 0,0:11:17.16,0:11:20.40,Default,,0000,0000,0000,,a MATLAB database, all right, sorry
Dialogue: 0,0:11:20.40,0:11:23.64,Default,,0000,0000,0000,,in a Oracle database.
Dialogue: 0,0:11:23.64,0:11:26.40,Default,,0000,0000,0000,,So using MathWorks from MATLAB, all
Dialogue: 0,0:11:26.40,0:11:27.96,Default,,0000,0000,0000,,right, they were able to create a deep
Dialogue: 0,0:11:27.96,0:11:30.56,Default,,0000,0000,0000,,learning model to, you know, to
Dialogue: 0,0:11:30.56,0:11:32.84,Default,,0000,0000,0000,,solve this particular issue for their
Dialogue: 0,0:11:32.84,0:11:35.72,Default,,0000,0000,0000,,domain. So if you're interested, please, I
Dialogue: 0,0:11:35.72,0:11:37.64,Default,,0000,0000,0000,,strongly encourage you to read up on all
Dialogue: 0,0:11:37.64,0:11:40.44,Default,,0000,0000,0000,,these real life customer stories with
Dialogue: 0,0:11:40.44,0:11:43.40,Default,,0000,0000,0000,,showcase use cases for predictive
Dialogue: 0,0:11:43.40,0:11:48.24,Default,,0000,0000,0000,,maintenance. Okay, so that's it for
Dialogue: 0,0:11:48.24,0:11:52.20,Default,,0000,0000,0000,,real life use cases for predictive maintenance.
Dialogue: 0,0:11:53.82,0:11:56.60,Default,,0000,0000,0000,,Now in this topic, I'm
Dialogue: 0,0:11:56.60,0:11:58.00,Default,,0000,0000,0000,,going to talk about machine learning
Dialogue: 0,0:11:58.00,0:12:00.04,Default,,0000,0000,0000,,basics, so what is actually involved
Dialogue: 0,0:12:00.04,0:12:01.48,Default,,0000,0000,0000,,in machine learning, and I'm going to
Dialogue: 0,0:12:01.48,0:12:03.84,Default,,0000,0000,0000,,give a very quick, fast, conceptual, high
Dialogue: 0,0:12:03.84,0:12:05.92,Default,,0000,0000,0000,,level overview of machine learning, all
Dialogue: 0,0:12:05.92,0:12:09.00,Default,,0000,0000,0000,,right? So there are several categories of
Dialogue: 0,0:12:09.00,0:12:10.96,Default,,0000,0000,0000,,machine learning, supervised, unsupervised,
Dialogue: 0,0:12:10.96,0:12:13.00,Default,,0000,0000,0000,,semi-supervised, reinforcement, and deep
Dialogue: 0,0:12:13.00,0:12:15.88,Default,,0000,0000,0000,,learning, okay? And let's talk about the
Dialogue: 0,0:12:15.88,0:12:19.36,Default,,0000,0000,0000,,most common and widely used category of
Dialogue: 0,0:12:19.36,0:12:20.56,Default,,0000,0000,0000,,machine learning which is called
Dialogue: 0,0:12:20.56,0:12:25.04,Default,,0000,0000,0000,,supervised learning. So the particular use
Dialogue: 0,0:12:25.04,0:12:26.28,Default,,0000,0000,0000,,case here that I'm going to be
Dialogue: 0,0:12:26.28,0:12:28.56,Default,,0000,0000,0000,,discussing, predictive maintenance, it's
Dialogue: 0,0:12:28.56,0:12:31.32,Default,,0000,0000,0000,,basically a form of supervised learning.
Dialogue: 0,0:12:31.32,0:12:33.48,Default,,0000,0000,0000,,So how does supervised learning work?
Dialogue: 0,0:12:33.48,0:12:35.20,Default,,0000,0000,0000,,Well in supervised learning, you're going
Dialogue: 0,0:12:35.20,0:12:37.24,Default,,0000,0000,0000,,to create a machine learning model by
Dialogue: 0,0:12:37.24,0:12:39.36,Default,,0000,0000,0000,,providing what is called a labelled data
Dialogue: 0,0:12:39.36,0:12:41.68,Default,,0000,0000,0000,,set as a input to a machine learning
Dialogue: 0,0:12:41.68,0:12:44.68,Default,,0000,0000,0000,,program or algorithm. And this dataset
Dialogue: 0,0:12:44.68,0:12:46.44,Default,,0000,0000,0000,,is going to contain what is called an
Dialogue: 0,0:12:46.44,0:12:48.76,Default,,0000,0000,0000,,independent or feature variables, all
Dialogue: 0,0:12:48.76,0:12:51.24,Default,,0000,0000,0000,,right, so this will be a set of variables.
Dialogue: 0,0:12:51.24,0:12:52.96,Default,,0000,0000,0000,,And there will be one dependent or
Dialogue: 0,0:12:52.96,0:12:54.96,Default,,0000,0000,0000,,target variable which we also call the
Dialogue: 0,0:12:54.96,0:12:57.72,Default,,0000,0000,0000,,label, and the idea is that the
Dialogue: 0,0:12:57.72,0:12:59.84,Default,,0000,0000,0000,,independent or the feature variables are
Dialogue: 0,0:12:59.84,0:13:01.60,Default,,0000,0000,0000,,the attributes or properties of your
Dialogue: 0,0:13:01.60,0:13:04.16,Default,,0000,0000,0000,,data set that influence the dependent or
Dialogue: 0,0:13:04.16,0:13:07.76,Default,,0000,0000,0000,,the target variable, okay? So this process
Dialogue: 0,0:13:07.76,0:13:09.12,Default,,0000,0000,0000,,that I've just described is called
Dialogue: 0,0:13:09.12,0:13:11.60,Default,,0000,0000,0000,,training the machine learning model, and
Dialogue: 0,0:13:11.60,0:13:14.28,Default,,0000,0000,0000,,the model is fundamentally a
Dialogue: 0,0:13:14.28,0:13:16.40,Default,,0000,0000,0000,,mathematical function that best
Dialogue: 0,0:13:16.40,0:13:18.40,Default,,0000,0000,0000,,approximates the relationship between
Dialogue: 0,0:13:18.40,0:13:20.64,Default,,0000,0000,0000,,the independent variables and the
Dialogue: 0,0:13:20.64,0:13:22.64,Default,,0000,0000,0000,,dependent variable. All right, so that's
Dialogue: 0,0:13:22.64,0:13:24.48,Default,,0000,0000,0000,,quite a bit of a mouthful, so let's jump
Dialogue: 0,0:13:24.48,0:13:26.32,Default,,0000,0000,0000,,into a diagram that maybe illustrates
Dialogue: 0,0:13:26.32,0:13:27.88,Default,,0000,0000,0000,,this more clearly. So let's say you have
Dialogue: 0,0:13:27.88,0:13:30.00,Default,,0000,0000,0000,,a dataset here, an Excel spreadsheet,
Dialogue: 0,0:13:30.00,0:13:32.16,Default,,0000,0000,0000,,right? And this Excel spreadsheet has a
Dialogue: 0,0:13:32.16,0:13:34.04,Default,,0000,0000,0000,,bunch of columns here and a bunch of
Dialogue: 0,0:13:34.04,0:13:36.80,Default,,0000,0000,0000,,rows, okay? So these rows here represent
Dialogue: 0,0:13:36.80,0:13:39.00,Default,,0000,0000,0000,,observations, or these rows are what
Dialogue: 0,0:13:39.00,0:13:40.96,Default,,0000,0000,0000,,we call observations or samples or data
Dialogue: 0,0:13:40.96,0:13:43.12,Default,,0000,0000,0000,,points in our data set, okay? So let's
Dialogue: 0,0:13:43.12,0:13:46.88,Default,,0000,0000,0000,,assume this data set is gathered by a
Dialogue: 0,0:13:46.88,0:13:49.96,Default,,0000,0000,0000,,marketing manager at a mall, at a retail
Dialogue: 0,0:13:49.96,0:13:52.28,Default,,0000,0000,0000,,mall, all right? So they've got all this
Dialogue: 0,0:13:52.28,0:13:54.92,Default,,0000,0000,0000,,information about the customers who
Dialogue: 0,0:13:54.92,0:13:56.80,Default,,0000,0000,0000,,purchase products at this mall, all right?
Dialogue: 0,0:13:56.80,0:13:58.52,Default,,0000,0000,0000,,So some of the information they've
Dialogue: 0,0:13:58.52,0:14:00.00,Default,,0000,0000,0000,,gotten about the customers are their
Dialogue: 0,0:14:00.00,0:14:01.84,Default,,0000,0000,0000,,gender, their age, their income, and the
Dialogue: 0,0:14:01.84,0:14:03.60,Default,,0000,0000,0000,,number of children. So all this
Dialogue: 0,0:14:03.60,0:14:05.68,Default,,0000,0000,0000,,information about the customers, we call
Dialogue: 0,0:14:05.68,0:14:07.36,Default,,0000,0000,0000,,this the independent or the feature
Dialogue: 0,0:14:07.36,0:14:10.08,Default,,0000,0000,0000,,variables, all right? And based on all
Dialogue: 0,0:14:10.08,0:14:12.76,Default,,0000,0000,0000,,this information about the customer, we
Dialogue: 0,0:14:12.76,0:14:16.20,Default,,0000,0000,0000,,also managed to get some or we record
Dialogue: 0,0:14:16.20,0:14:17.60,Default,,0000,0000,0000,,the information about how much the
Dialogue: 0,0:14:17.60,0:14:20.48,Default,,0000,0000,0000,,customer spends, all right? So this
Dialogue: 0,0:14:20.48,0:14:22.08,Default,,0000,0000,0000,,information or these numbers here, we call
Dialogue: 0,0:14:22.08,0:14:23.84,Default,,0000,0000,0000,,this the target variable or the
Dialogue: 0,0:14:23.84,0:14:26.60,Default,,0000,0000,0000,,dependent variable, right? So on the
Dialogue: 0,0:14:26.60,0:14:29.52,Default,,0000,0000,0000,,single row, the data point, one single sample, one
Dialogue: 0,0:14:29.52,0:14:32.56,Default,,0000,0000,0000,,single data point, contains all the data
Dialogue: 0,0:14:32.56,0:14:35.04,Default,,0000,0000,0000,,for the feature variables and one single
Dialogue: 0,0:14:35.04,0:14:37.80,Default,,0000,0000,0000,,value for the label or the target
Dialogue: 0,0:14:37.80,0:14:41.20,Default,,0000,0000,0000,,variable, okay? And the primary purpose of
Dialogue: 0,0:14:41.20,0:14:43.24,Default,,0000,0000,0000,,the machine learning model is to create
Dialogue: 0,0:14:43.24,0:14:45.52,Default,,0000,0000,0000,,a mapping from all your feature
Dialogue: 0,0:14:45.52,0:14:48.16,Default,,0000,0000,0000,,variables to your target variable, so
Dialogue: 0,0:14:48.16,0:14:50.92,Default,,0000,0000,0000,,somehow there's going to be a function,
Dialogue: 0,0:14:50.92,0:14:52.16,Default,,0000,0000,0000,,okay, this will be a mathematical
Dialogue: 0,0:14:52.16,0:14:54.80,Default,,0000,0000,0000,,function that maps all the values of
Dialogue: 0,0:14:54.80,0:14:57.04,Default,,0000,0000,0000,,your feature variable to the value of
Dialogue: 0,0:14:57.04,0:14:59.64,Default,,0000,0000,0000,,your target variable. In other words, this
Dialogue: 0,0:14:59.64,0:15:01.28,Default,,0000,0000,0000,,function represents the relationship
Dialogue: 0,0:15:01.28,0:15:03.36,Default,,0000,0000,0000,,between your feature variables and your
Dialogue: 0,0:15:03.36,0:15:07.08,Default,,0000,0000,0000,,target variable, okay? So this whole thing,
Dialogue: 0,0:15:07.08,0:15:08.56,Default,,0000,0000,0000,,this training process, we call this the
Dialogue: 0,0:15:08.56,0:15:11.32,Default,,0000,0000,0000,,fitting the model. And the target
Dialogue: 0,0:15:11.32,0:15:13.24,Default,,0000,0000,0000,,variable or the label, this thing here,
Dialogue: 0,0:15:13.24,0:15:15.12,Default,,0000,0000,0000,,this column here, or the values here,
Dialogue: 0,0:15:15.12,0:15:17.40,Default,,0000,0000,0000,,these are critical for providing a
Dialogue: 0,0:15:17.40,0:15:19.00,Default,,0000,0000,0000,,context to do the fitting or the
Dialogue: 0,0:15:19.00,0:15:21.16,Default,,0000,0000,0000,,training of the model. And once you've
Dialogue: 0,0:15:21.16,0:15:23.36,Default,,0000,0000,0000,,got a trained and fitted model, you can
Dialogue: 0,0:15:23.36,0:15:25.96,Default,,0000,0000,0000,,then use the model to make an accurate
Dialogue: 0,0:15:25.96,0:15:28.32,Default,,0000,0000,0000,,prediction of target values
Dialogue: 0,0:15:28.32,0:15:30.24,Default,,0000,0000,0000,,corresponding to new feature values that
Dialogue: 0,0:15:30.24,0:15:32.52,Default,,0000,0000,0000,,the model has yet to encounter or yet to
Dialogue: 0,0:15:32.52,0:15:34.76,Default,,0000,0000,0000,,see, and this, as I've already said
Dialogue: 0,0:15:34.76,0:15:36.24,Default,,0000,0000,0000,,earlier, this is called predictive
Dialogue: 0,0:15:36.24,0:15:38.48,Default,,0000,0000,0000,,analytics, okay? So let's see what's
Dialogue: 0,0:15:38.48,0:15:40.12,Default,,0000,0000,0000,,actually happening here, you take your
Dialogue: 0,0:15:40.12,0:15:43.08,Default,,0000,0000,0000,,training data, all right, so this is this
Dialogue: 0,0:15:43.08,0:15:44.88,Default,,0000,0000,0000,,whole bunch of data, this data set here
Dialogue: 0,0:15:44.88,0:15:47.44,Default,,0000,0000,0000,,consisting of a thousand rows of
Dialogue: 0,0:15:47.44,0:15:49.92,Default,,0000,0000,0000,,data, 10,000 rows of data, you take this
Dialogue: 0,0:15:49.92,0:15:52.04,Default,,0000,0000,0000,,entire data set, all right, this entire
Dialogue: 0,0:15:52.04,0:15:54.00,Default,,0000,0000,0000,,data set, you jam it into your machine
Dialogue: 0,0:15:54.00,0:15:56.52,Default,,0000,0000,0000,,learning algorithm, and a couple of hours
Dialogue: 0,0:15:56.52,0:15:58.08,Default,,0000,0000,0000,,later your machine learning algorithm
Dialogue: 0,0:15:58.08,0:16:01.36,Default,,0000,0000,0000,,comes up with a model. And the model is
Dialogue: 0,0:16:01.36,0:16:04.20,Default,,0000,0000,0000,,essentially a function that maps all
Dialogue: 0,0:16:04.20,0:16:05.96,Default,,0000,0000,0000,,your feature variables which is these
Dialogue: 0,0:16:05.96,0:16:08.20,Default,,0000,0000,0000,,four columns here, to your target
Dialogue: 0,0:16:08.20,0:16:10.44,Default,,0000,0000,0000,,variable which is this one single column
Dialogue: 0,0:16:10.44,0:16:14.28,Default,,0000,0000,0000,,here, okay? So once you have the model, you
Dialogue: 0,0:16:14.28,0:16:17.04,Default,,0000,0000,0000,,can put in a new data point. So basically
Dialogue: 0,0:16:17.04,0:16:19.08,Default,,0000,0000,0000,,the new data point represents data about a
Dialogue: 0,0:16:19.08,0:16:20.96,Default,,0000,0000,0000,,new customer, a new customer that you
Dialogue: 0,0:16:20.96,0:16:23.12,Default,,0000,0000,0000,,have never seen before. So let's say
Dialogue: 0,0:16:23.12,0:16:25.08,Default,,0000,0000,0000,,you've already got information about
Dialogue: 0,0:16:25.08,0:16:27.56,Default,,0000,0000,0000,,10,000 customers that have visited this
Dialogue: 0,0:16:27.56,0:16:29.92,Default,,0000,0000,0000,,mall and how much each of these 10,000
Dialogue: 0,0:16:29.92,0:16:31.52,Default,,0000,0000,0000,,customers have spent when they are at this
Dialogue: 0,0:16:31.52,0:16:34.04,Default,,0000,0000,0000,,mall. So now you have a totally new
Dialogue: 0,0:16:34.04,0:16:35.80,Default,,0000,0000,0000,,customer that comes in the mall, this
Dialogue: 0,0:16:35.80,0:16:37.80,Default,,0000,0000,0000,,customer has never come into this mall
Dialogue: 0,0:16:37.80,0:16:39.84,Default,,0000,0000,0000,,before, and what we know about this
Dialogue: 0,0:16:39.84,0:16:42.68,Default,,0000,0000,0000,,customer is that he is a male, the age is
Dialogue: 0,0:16:42.68,0:16:45.20,Default,,0000,0000,0000,,50, the income is 18, and they have nine
Dialogue: 0,0:16:45.20,0:16:48.16,Default,,0000,0000,0000,,children. So now when you take this data
Dialogue: 0,0:16:48.16,0:16:50.52,Default,,0000,0000,0000,,and you pump that into your model, your
Dialogue: 0,0:16:50.52,0:16:52.92,Default,,0000,0000,0000,,model is going to make a prediction, it's
Dialogue: 0,0:16:52.92,0:16:55.72,Default,,0000,0000,0000,,going to say, hey, you know what? Based on
Dialogue: 0,0:16:55.72,0:16:57.28,Default,,0000,0000,0000,,everything that I have been trained before
Dialogue: 0,0:16:57.28,0:16:59.36,Default,,0000,0000,0000,,and based on the model I've developed,
Dialogue: 0,0:16:59.36,0:17:01.96,Default,,0000,0000,0000,,I am going to predict that a customer
Dialogue: 0,0:17:01.96,0:17:04.88,Default,,0000,0000,0000,,that is of a male gender, of the age 50
Dialogue: 0,0:17:04.88,0:17:08.28,Default,,0000,0000,0000,,with the income of 18, and nine children,
Dialogue: 0,0:17:08.28,0:17:12.40,Default,,0000,0000,0000,,that customer is going to spend 25 ringgit
Dialogue: 0,0:17:12.40,0:17:15.84,Default,,0000,0000,0000,,at the mall. And this is it, this is what
Dialogue: 0,0:17:15.84,0:17:18.60,Default,,0000,0000,0000,,you want. Right there, right here,
Dialogue: 0,0:17:18.60,0:17:21.32,Default,,0000,0000,0000,,can you see here? That is the final
Dialogue: 0,0:17:21.32,0:17:23.48,Default,,0000,0000,0000,,output of your machine learning model.
Dialogue: 0,0:17:23.48,0:17:27.36,Default,,0000,0000,0000,,It's going to make a prediction about
Dialogue: 0,0:17:27.36,0:17:29.76,Default,,0000,0000,0000,,something that it has not ever seen
Dialogue: 0,0:17:29.76,0:17:32.92,Default,,0000,0000,0000,,before, okay? That is the core, this is
Dialogue: 0,0:17:32.92,0:17:35.52,Default,,0000,0000,0000,,essentially the core of machine learning.
Dialogue: 0,0:17:35.52,0:17:38.64,Default,,0000,0000,0000,,Predictive analytics, making prediction
Dialogue: 0,0:17:38.64,0:17:40.12,Default,,0000,0000,0000,,about the future
Dialogue: 0,0:17:41.17,0:17:43.80,Default,,0000,0000,0000,,based on a historical data set.
Dialogue: 0,0:17:44.38,0:17:47.44,Default,,0000,0000,0000,,Okay, so there are two areas of
Dialogue: 0,0:17:47.44,0:17:49.48,Default,,0000,0000,0000,,supervised learning, regression and
Dialogue: 0,0:17:49.48,0:17:51.40,Default,,0000,0000,0000,,classification. So regression is used to
Dialogue: 0,0:17:51.40,0:17:53.44,Default,,0000,0000,0000,,predict a numerical target variable, such
Dialogue: 0,0:17:53.44,0:17:55.32,Default,,0000,0000,0000,,as the price of a house or the salary of
Dialogue: 0,0:17:55.32,0:17:57.80,Default,,0000,0000,0000,,an employee, whereas classification is
Dialogue: 0,0:17:57.80,0:17:59.92,Default,,0000,0000,0000,,used to predict a categorical target
Dialogue: 0,0:17:59.92,0:18:03.56,Default,,0000,0000,0000,,variable or class label, okay? So for
Dialogue: 0,0:18:03.56,0:18:05.80,Default,,0000,0000,0000,,classification you can have either
Dialogue: 0,0:18:05.80,0:18:08.68,Default,,0000,0000,0000,,binary or multiclass, so, for example,
Dialogue: 0,0:18:08.68,0:18:11.56,Default,,0000,0000,0000,,binary will be just true or false, zero
Dialogue: 0,0:18:11.56,0:18:14.84,Default,,0000,0000,0000,,or one. So whether your machine is going
Dialogue: 0,0:18:14.84,0:18:17.36,Default,,0000,0000,0000,,to fail or is it not going to fail, right?
Dialogue: 0,0:18:17.36,0:18:19.00,Default,,0000,0000,0000,,So just two classes, two possible,
Dialogue: 0,0:18:19.00,0:18:21.64,Default,,0000,0000,0000,,outcomes, or is the customer going to
Dialogue: 0,0:18:21.64,0:18:23.68,Default,,0000,0000,0000,,make a purchase or is the customer not
Dialogue: 0,0:18:23.68,0:18:26.16,Default,,0000,0000,0000,,going to make a purchase. We call this
Dialogue: 0,0:18:26.16,0:18:28.12,Default,,0000,0000,0000,,binary classification. And then for
Dialogue: 0,0:18:28.12,0:18:29.68,Default,,0000,0000,0000,,multiclass, when there are more than two
Dialogue: 0,0:18:29.68,0:18:32.56,Default,,0000,0000,0000,,classes or types of values. So, for
Dialogue: 0,0:18:32.56,0:18:34.04,Default,,0000,0000,0000,,example, here this would be a
Dialogue: 0,0:18:34.04,0:18:35.76,Default,,0000,0000,0000,,classification problem. So if you have a
Dialogue: 0,0:18:35.76,0:18:37.96,Default,,0000,0000,0000,,data set here, you've got information
Dialogue: 0,0:18:37.96,0:18:39.36,Default,,0000,0000,0000,,about your customers, you've got your
Dialogue: 0,0:18:39.36,0:18:41.16,Default,,0000,0000,0000,,gender of the customer, the age of the
Dialogue: 0,0:18:41.16,0:18:42.92,Default,,0000,0000,0000,,customer, the salary of the customer, and
Dialogue: 0,0:18:42.92,0:18:44.64,Default,,0000,0000,0000,,you also have record about whether the
Dialogue: 0,0:18:44.64,0:18:47.68,Default,,0000,0000,0000,,customer made a purchase or not, okay? So
Dialogue: 0,0:18:47.68,0:18:50.08,Default,,0000,0000,0000,,you can take this data set to train a
Dialogue: 0,0:18:50.08,0:18:52.44,Default,,0000,0000,0000,,classification model, and then the
Dialogue: 0,0:18:52.44,0:18:54.12,Default,,0000,0000,0000,,classification model can then make a
Dialogue: 0,0:18:54.12,0:18:56.32,Default,,0000,0000,0000,,prediction about a new customer, and
Dialogue: 0,0:18:56.32,0:18:58.80,Default,,0000,0000,0000,,they're going to predict zero which
Dialogue: 0,0:18:58.80,0:19:00.48,Default,,0000,0000,0000,,means the customer didn't make a
Dialogue: 0,0:19:00.48,0:19:03.16,Default,,0000,0000,0000,,purchase or one which means the customer
Dialogue: 0,0:19:03.16,0:19:06.32,Default,,0000,0000,0000,,make a purchase, right? And regression,
Dialogue: 0,0:19:06.32,0:19:08.60,Default,,0000,0000,0000,,this is regression, so let's say you want
Dialogue: 0,0:19:08.60,0:19:11.28,Default,,0000,0000,0000,,to predict the wind speed, and you've got
Dialogue: 0,0:19:11.28,0:19:13.80,Default,,0000,0000,0000,,historical data about all these four
Dialogue: 0,0:19:13.80,0:19:16.56,Default,,0000,0000,0000,,other independent variables or feature
Dialogue: 0,0:19:16.56,0:19:18.04,Default,,0000,0000,0000,,variables, so you have recorded
Dialogue: 0,0:19:18.04,0:19:19.64,Default,,0000,0000,0000,,temperature, the pressure, the relative
Dialogue: 0,0:19:19.64,0:19:21.80,Default,,0000,0000,0000,,humidity, and the wind direction for the
Dialogue: 0,0:19:21.80,0:19:24.80,Default,,0000,0000,0000,,past 10 days, 15 days, or whatever, okay? So
Dialogue: 0,0:19:24.80,0:19:26.76,Default,,0000,0000,0000,,now you are going to train your machine
Dialogue: 0,0:19:26.76,0:19:28.72,Default,,0000,0000,0000,,learning model using this data set, and
Dialogue: 0,0:19:28.72,0:19:31.68,Default,,0000,0000,0000,,the target variable column, okay, this
Dialogue: 0,0:19:31.68,0:19:33.76,Default,,0000,0000,0000,,column here, the label is basically a
Dialogue: 0,0:19:33.76,0:19:37.08,Default,,0000,0000,0000,,number, right? So now with this number,
Dialogue: 0,0:19:37.08,0:19:39.60,Default,,0000,0000,0000,,this is a regression model, and so now
Dialogue: 0,0:19:39.60,0:19:41.76,Default,,0000,0000,0000,,you can put in a new data point, so a new
Dialogue: 0,0:19:41.76,0:19:45.08,Default,,0000,0000,0000,,data point means a new set of values for
Dialogue: 0,0:19:45.08,0:19:46.96,Default,,0000,0000,0000,,temperature, pressure, relative humidity,
Dialogue: 0,0:19:46.96,0:19:48.60,Default,,0000,0000,0000,,and wind direction, and your machine
Dialogue: 0,0:19:48.60,0:19:50.68,Default,,0000,0000,0000,,learning model will then predict the
Dialogue: 0,0:19:50.68,0:19:53.64,Default,,0000,0000,0000,,wind speed for that new data point, okay?
Dialogue: 0,0:19:53.64,0:19:57.48,Default,,0000,0000,0000,,So that's a regression model.
Dialogue: 0,0:19:59.16,0:20:02.28,Default,,0000,0000,0000,,All right. So in this particular topic
Dialogue: 0,0:20:02.28,0:20:04.92,Default,,0000,0000,0000,,I'm going to talk about the workflow of
Dialogue: 0,0:20:04.92,0:20:07.96,Default,,0000,0000,0000,,that's involved in machine learning. So
Dialogue: 0,0:20:07.96,0:20:12.64,Default,,0000,0000,0000,,in the previous slides, I talked about
Dialogue: 0,0:20:12.64,0:20:14.60,Default,,0000,0000,0000,,developing the model, all right? But
Dialogue: 0,0:20:14.60,0:20:16.36,Default,,0000,0000,0000,,that's just one part of the entire
Dialogue: 0,0:20:16.36,0:20:19.08,Default,,0000,0000,0000,,workflow. So in real life when you use
Dialogue: 0,0:20:19.08,0:20:20.48,Default,,0000,0000,0000,,machine learning, there's an end-to-end
Dialogue: 0,0:20:20.48,0:20:22.48,Default,,0000,0000,0000,,workflow that's involved. So the first
Dialogue: 0,0:20:22.48,0:20:24.16,Default,,0000,0000,0000,,thing, of course, is you need to get your
Dialogue: 0,0:20:24.16,0:20:26.88,Default,,0000,0000,0000,,data, and then you need to clean your
Dialogue: 0,0:20:26.88,0:20:29.00,Default,,0000,0000,0000,,data, and then you need to explore your
Dialogue: 0,0:20:29.00,0:20:30.80,Default,,0000,0000,0000,,data. You need to see what's going on in
Dialogue: 0,0:20:30.80,0:20:33.28,Default,,0000,0000,0000,,your data set, right? And your data set,
Dialogue: 0,0:20:33.28,0:20:35.72,Default,,0000,0000,0000,,real life data sets are not trivial, they
Dialogue: 0,0:20:35.72,0:20:38.76,Default,,0000,0000,0000,,are hundreds of rows, thousands of rows,
Dialogue: 0,0:20:38.76,0:20:40.64,Default,,0000,0000,0000,,sometimes millions of rows, billions of
Dialogue: 0,0:20:40.64,0:20:43.08,Default,,0000,0000,0000,,rows, we're talking about billions or
Dialogue: 0,0:20:43.08,0:20:45.12,Default,,0000,0000,0000,,millions of data points especially if
Dialogue: 0,0:20:45.12,0:20:47.12,Default,,0000,0000,0000,,you're using an IoT sensor to get data
Dialogue: 0,0:20:47.12,0:20:49.00,Default,,0000,0000,0000,,in real time. So you've got all these
Dialogue: 0,0:20:49.00,0:20:51.32,Default,,0000,0000,0000,,super large data sets, you need to clean
Dialogue: 0,0:20:51.32,0:20:53.40,Default,,0000,0000,0000,,them, and explore them, and then you need
Dialogue: 0,0:20:53.40,0:20:56.36,Default,,0000,0000,0000,,to prepare them into a right format so
Dialogue: 0,0:20:56.36,0:20:59.60,Default,,0000,0000,0000,,that you can put them into the training
Dialogue: 0,0:20:59.60,0:21:01.52,Default,,0000,0000,0000,,process to create your machine learning
Dialogue: 0,0:21:01.52,0:21:04.80,Default,,0000,0000,0000,,model, and then subsequently you check
Dialogue: 0,0:21:04.80,0:21:07.56,Default,,0000,0000,0000,,how good is the model, right? How accurate
Dialogue: 0,0:21:07.56,0:21:10.08,Default,,0000,0000,0000,,is the model in terms of its ability to
Dialogue: 0,0:21:10.08,0:21:12.56,Default,,0000,0000,0000,,generate predictions for the
Dialogue: 0,0:21:12.56,0:21:14.96,Default,,0000,0000,0000,,future, right? How accurate are the
Dialogue: 0,0:21:14.96,0:21:16.68,Default,,0000,0000,0000,,predictions that are coming up from your
Dialogue: 0,0:21:16.68,0:21:18.40,Default,,0000,0000,0000,,machine learning model. So that's
Dialogue: 0,0:21:18.40,0:21:20.76,Default,,0000,0000,0000,,validating or evaluating your model, and
Dialogue: 0,0:21:20.76,0:21:22.56,Default,,0000,0000,0000,,then subsequently if you determine that
Dialogue: 0,0:21:22.56,0:21:25.40,Default,,0000,0000,0000,,your model is of adequate accuracy to
Dialogue: 0,0:21:25.40,0:21:27.24,Default,,0000,0000,0000,,meet whatever your domain use case
Dialogue: 0,0:21:27.24,0:21:29.40,Default,,0000,0000,0000,,requirements are, right? So let's say the
Dialogue: 0,0:21:29.40,0:21:31.44,Default,,0000,0000,0000,,accuracy that's required for your domain
Dialogue: 0,0:21:31.44,0:21:32.44,Default,,0000,0000,0000,,use case is
Dialogue: 0,0:21:32.44,0:21:35.32,Default,,0000,0000,0000,,85%, okay? If my machine learning model
Dialogue: 0,0:21:35.32,0:21:38.52,Default,,0000,0000,0000,,can give an 85% accuracy rate, I think
Dialogue: 0,0:21:38.52,0:21:40.16,Default,,0000,0000,0000,,it's good enough, then I'm going to
Dialogue: 0,0:21:40.16,0:21:42.88,Default,,0000,0000,0000,,deploy it into real world use case. So
Dialogue: 0,0:21:42.88,0:21:45.00,Default,,0000,0000,0000,,here the machine learning model gets
Dialogue: 0,0:21:45.00,0:21:48.44,Default,,0000,0000,0000,,deployed on the server, and then other,
Dialogue: 0,0:21:48.44,0:21:50.76,Default,,0000,0000,0000,,you know, other data sources are going to
Dialogue: 0,0:21:50.76,0:21:52.56,Default,,0000,0000,0000,,be captured from somewhere. That data is
Dialogue: 0,0:21:52.56,0:21:54.20,Default,,0000,0000,0000,,pump into the machine learning model. The
Dialogue: 0,0:21:54.20,0:21:55.44,Default,,0000,0000,0000,,machine learning model generates
Dialogue: 0,0:21:55.44,0:21:57.76,Default,,0000,0000,0000,,predictions, and those predictions are
Dialogue: 0,0:21:57.76,0:21:59.60,Default,,0000,0000,0000,,then used to make decisions on the
Dialogue: 0,0:21:59.60,0:22:02.00,Default,,0000,0000,0000,,factory floor in real time or in any
Dialogue: 0,0:22:02.00,0:22:04.56,Default,,0000,0000,0000,,other particular scenario. And then you
Dialogue: 0,0:22:04.56,0:22:06.84,Default,,0000,0000,0000,,constantly monitor and update the model,
Dialogue: 0,0:22:06.84,0:22:09.36,Default,,0000,0000,0000,,you get more new data, and then the
Dialogue: 0,0:22:09.36,0:22:11.96,Default,,0000,0000,0000,,entire cycle repeats itself. So that's
Dialogue: 0,0:22:11.96,0:22:14.48,Default,,0000,0000,0000,,your machine learning workflow, okay, in a
Dialogue: 0,0:22:14.48,0:22:16.92,Default,,0000,0000,0000,,nutshell. Here's another example of
Dialogue: 0,0:22:16.92,0:22:18.52,Default,,0000,0000,0000,,the same thing maybe in a slightly
Dialogue: 0,0:22:18.52,0:22:20.04,Default,,0000,0000,0000,,different format, so, again, you have your
Dialogue: 0,0:22:20.04,0:22:22.16,Default,,0000,0000,0000,,data collection and preparation. Here we
Dialogue: 0,0:22:22.16,0:22:24.36,Default,,0000,0000,0000,,talk more about the different kinds of
Dialogue: 0,0:22:24.36,0:22:26.52,Default,,0000,0000,0000,,algorithms that available to create a
Dialogue: 0,0:22:26.52,0:22:28.12,Default,,0000,0000,0000,,model, and I'll talk about this more in
Dialogue: 0,0:22:28.12,0:22:30.00,Default,,0000,0000,0000,,detail when we look at the real world
Dialogue: 0,0:22:30.00,0:22:32.32,Default,,0000,0000,0000,,example of a end-to-end machine learning
Dialogue: 0,0:22:32.32,0:22:34.56,Default,,0000,0000,0000,,workflow for the predictive maintenance
Dialogue: 0,0:22:34.56,0:22:36.88,Default,,0000,0000,0000,,use case. So once you have chosen the
Dialogue: 0,0:22:36.88,0:22:38.84,Default,,0000,0000,0000,,appropriate algorithm, you then have
Dialogue: 0,0:22:38.84,0:22:41.24,Default,,0000,0000,0000,,trained your model, you then have
Dialogue: 0,0:22:41.24,0:22:44.08,Default,,0000,0000,0000,,selected the appropriate train model
Dialogue: 0,0:22:44.08,0:22:46.44,Default,,0000,0000,0000,,among the multiple models. You are
Dialogue: 0,0:22:46.44,0:22:47.80,Default,,0000,0000,0000,,probably going to develop multiple
Dialogue: 0,0:22:47.80,0:22:49.56,Default,,0000,0000,0000,,models from multiple algorithms, you're
Dialogue: 0,0:22:49.56,0:22:51.68,Default,,0000,0000,0000,,going to evaluate them all, and then
Dialogue: 0,0:22:51.68,0:22:53.20,Default,,0000,0000,0000,,you're going to say, hey, you know what?
Dialogue: 0,0:22:53.20,0:22:55.28,Default,,0000,0000,0000,,After I've evaluated and tested that,
Dialogue: 0,0:22:55.28,0:22:57.48,Default,,0000,0000,0000,,I've chosen the best model, I'm going to
Dialogue: 0,0:22:57.48,0:22:59.64,Default,,0000,0000,0000,,deploy the model, all right, so this is
Dialogue: 0,0:22:59.64,0:23:02.64,Default,,0000,0000,0000,,for real life production use, okay? Real
Dialogue: 0,0:23:02.64,0:23:04.28,Default,,0000,0000,0000,,life sensor data is going to be pumped
Dialogue: 0,0:23:04.28,0:23:06.04,Default,,0000,0000,0000,,into my model, my model is going to
Dialogue: 0,0:23:06.04,0:23:08.04,Default,,0000,0000,0000,,generate predictions, the predicted data
Dialogue: 0,0:23:08.04,0:23:10.12,Default,,0000,0000,0000,,is going to used immediately in real
Dialogue: 0,0:23:10.12,0:23:12.84,Default,,0000,0000,0000,,time for real life decision making, and
Dialogue: 0,0:23:12.84,0:23:15.00,Default,,0000,0000,0000,,then I'm going to monitor, right, the
Dialogue: 0,0:23:15.00,0:23:17.44,Default,,0000,0000,0000,,results. So somebody's using the
Dialogue: 0,0:23:17.44,0:23:19.28,Default,,0000,0000,0000,,predictions from my model, if the
Dialogue: 0,0:23:19.28,0:23:21.88,Default,,0000,0000,0000,,predictions are lousy, that goes into the
Dialogue: 0,0:23:21.88,0:23:23.44,Default,,0000,0000,0000,,monitoring, the monitoring system
Dialogue: 0,0:23:23.44,0:23:25.28,Default,,0000,0000,0000,,captures that. If the predictions are
Dialogue: 0,0:23:25.28,0:23:27.72,Default,,0000,0000,0000,,fantastic, well that is also captured by the
Dialogue: 0,0:23:27.72,0:23:29.80,Default,,0000,0000,0000,,monitoring system, and that gets
Dialogue: 0,0:23:29.80,0:23:32.36,Default,,0000,0000,0000,,feedback again to the next cycle of my
Dialogue: 0,0:23:32.36,0:23:33.68,Default,,0000,0000,0000,,machine learning
Dialogue: 0,0:23:33.68,0:23:35.96,Default,,0000,0000,0000,,pipeline. Okay, so that's the kind of
Dialogue: 0,0:23:35.96,0:23:38.36,Default,,0000,0000,0000,,overall view, and here are the kind of
Dialogue: 0,0:23:38.36,0:23:41.56,Default,,0000,0000,0000,,key phases of your workflow. So one of
Dialogue: 0,0:23:41.56,0:23:43.96,Default,,0000,0000,0000,,the important phases is called EDA,
Dialogue: 0,0:23:43.96,0:23:47.52,Default,,0000,0000,0000,,exploratory data analysis and in this
Dialogue: 0,0:23:47.52,0:23:49.88,Default,,0000,0000,0000,,particular phase, you're going to
Dialogue: 0,0:23:49.88,0:23:53.12,Default,,0000,0000,0000,,do a lot of stuff, primarily just to
Dialogue: 0,0:23:53.12,0:23:54.88,Default,,0000,0000,0000,,understand your data set. So like I said,
Dialogue: 0,0:23:54.88,0:23:56.56,Default,,0000,0000,0000,,real life data sets, they tend to be very
Dialogue: 0,0:23:56.56,0:23:59.32,Default,,0000,0000,0000,,complex, and they tend to have various
Dialogue: 0,0:23:59.32,0:24:01.04,Default,,0000,0000,0000,,statistical properties, all right,
Dialogue: 0,0:24:01.04,0:24:02.68,Default,,0000,0000,0000,,statistics is a very important component
Dialogue: 0,0:24:02.68,0:24:05.60,Default,,0000,0000,0000,,of machine learning. So an EDA helps you
Dialogue: 0,0:24:05.60,0:24:07.48,Default,,0000,0000,0000,,to kind of get an overview of your data
Dialogue: 0,0:24:07.48,0:24:09.68,Default,,0000,0000,0000,,set, get an overview of any problems in
Dialogue: 0,0:24:09.68,0:24:11.52,Default,,0000,0000,0000,,your data set like any data that's
Dialogue: 0,0:24:11.52,0:24:13.44,Default,,0000,0000,0000,,missing, the statistical properties of your
Dialogue: 0,0:24:13.44,0:24:15.16,Default,,0000,0000,0000,,data set, the distribution of your data
Dialogue: 0,0:24:15.16,0:24:17.28,Default,,0000,0000,0000,,set, the statistical correlation of
Dialogue: 0,0:24:17.28,0:24:19.19,Default,,0000,0000,0000,,variables in your data set, etc,
Dialogue: 0,0:24:19.19,0:24:23.40,Default,,0000,0000,0000,,etc. Okay, then we have data cleaning or
Dialogue: 0,0:24:23.40,0:24:25.28,Default,,0000,0000,0000,,sometimes you call it data cleansing, and
Dialogue: 0,0:24:25.28,0:24:27.60,Default,,0000,0000,0000,,in this phase what you want to do is
Dialogue: 0,0:24:27.60,0:24:29.44,Default,,0000,0000,0000,,primarily, you want to kind of do things
Dialogue: 0,0:24:29.44,0:24:31.96,Default,,0000,0000,0000,,like remove duplicate records or rows in
Dialogue: 0,0:24:31.96,0:24:33.68,Default,,0000,0000,0000,,your table, you want to make sure that
Dialogue: 0,0:24:33.68,0:24:36.80,Default,,0000,0000,0000,,your data or your data
Dialogue: 0,0:24:36.80,0:24:39.40,Default,,0000,0000,0000,,points or your samples have appropriate IDs,
Dialogue: 0,0:24:39.40,0:24:41.08,Default,,0000,0000,0000,,and most importantly, you want to make
Dialogue: 0,0:24:41.08,0:24:43.04,Default,,0000,0000,0000,,sure there's not too many missing values
Dialogue: 0,0:24:43.04,0:24:44.88,Default,,0000,0000,0000,,in your data set. So what I mean by
Dialogue: 0,0:24:44.88,0:24:46.32,Default,,0000,0000,0000,,missing values are things like that,
Dialogue: 0,0:24:46.32,0:24:48.20,Default,,0000,0000,0000,,right? You have got a data set, and for
Dialogue: 0,0:24:48.20,0:24:51.64,Default,,0000,0000,0000,,some reason there are some cells or
Dialogue: 0,0:24:51.64,0:24:54.56,Default,,0000,0000,0000,,locations in your data set which are
Dialogue: 0,0:24:54.56,0:24:56.52,Default,,0000,0000,0000,,missing values, right? And if you have a
Dialogue: 0,0:24:56.52,0:24:58.68,Default,,0000,0000,0000,,lot of these missing values, then you've
Dialogue: 0,0:24:58.68,0:25:00.44,Default,,0000,0000,0000,,got a poor quality data set, and you're
Dialogue: 0,0:25:00.44,0:25:02.20,Default,,0000,0000,0000,,not going to be able to build a good
Dialogue: 0,0:25:02.20,0:25:04.16,Default,,0000,0000,0000,,model from this data set. You're not
Dialogue: 0,0:25:04.16,0:25:06.00,Default,,0000,0000,0000,,going to be able to train a good machine
Dialogue: 0,0:25:06.00,0:25:08.12,Default,,0000,0000,0000,,learning model from a data set with a
Dialogue: 0,0:25:08.12,0:25:10.20,Default,,0000,0000,0000,,lot of missing values like this. So you
Dialogue: 0,0:25:10.20,0:25:11.88,Default,,0000,0000,0000,,have to figure out whether there are a
Dialogue: 0,0:25:11.88,0:25:13.40,Default,,0000,0000,0000,,lot of missing values in your data set,
Dialogue: 0,0:25:13.40,0:25:15.40,Default,,0000,0000,0000,,how do you handle them. Another thing
Dialogue: 0,0:25:15.40,0:25:16.92,Default,,0000,0000,0000,,that's important in data cleansing is
Dialogue: 0,0:25:16.92,0:25:18.80,Default,,0000,0000,0000,,figuring out the outliers in your data
Dialogue: 0,0:25:18.80,0:25:21.92,Default,,0000,0000,0000,,set. So outliers are things like this,
Dialogue: 0,0:25:21.92,0:25:24.04,Default,,0000,0000,0000,,you know, data points that are very far from
Dialogue: 0,0:25:24.04,0:25:26.44,Default,,0000,0000,0000,,the general trend of data points in your
Dialogue: 0,0:25:26.44,0:25:29.56,Default,,0000,0000,0000,,data set, right? And so there are also
Dialogue: 0,0:25:29.56,0:25:31.92,Default,,0000,0000,0000,,several ways to detect outliers in your
Dialogue: 0,0:25:31.92,0:25:34.20,Default,,0000,0000,0000,,data set, and there are several ways to
Dialogue: 0,0:25:34.20,0:25:36.64,Default,,0000,0000,0000,,handle outliers in your data set.
Dialogue: 0,0:25:36.64,0:25:38.20,Default,,0000,0000,0000,,Similarly as well, there are several ways
Dialogue: 0,0:25:38.20,0:25:39.96,Default,,0000,0000,0000,,to handle missing values in your data
Dialogue: 0,0:25:39.96,0:25:42.88,Default,,0000,0000,0000,,set. So handling missing values, handling
Dialogue: 0,0:25:42.88,0:25:45.68,Default,,0000,0000,0000,,outliers, those are really two very key
Dialogue: 0,0:25:45.68,0:25:47.28,Default,,0000,0000,0000,,importance of data
Dialogue: 0,0:25:47.28,0:25:49.12,Default,,0000,0000,0000,,cleansing, and there are many, many
Dialogue: 0,0:25:49.12,0:25:50.76,Default,,0000,0000,0000,,techniques to handle this, so a data
Dialogue: 0,0:25:50.76,0:25:52.00,Default,,0000,0000,0000,,scientist needs to be acquainted with
Dialogue: 0,0:25:52.00,0:25:55.36,Default,,0000,0000,0000,,all of this. All right, why do I need to
Dialogue: 0,0:25:55.36,0:25:58.00,Default,,0000,0000,0000,,do data cleansing? Well, here is the key
Dialogue: 0,0:25:58.00,0:25:59.36,Default,,0000,0000,0000,,point.
Dialogue: 0,0:25:59.36,0:26:02.80,Default,,0000,0000,0000,,If you have a very poor quality data set,
Dialogue: 0,0:26:02.80,0:26:04.88,Default,,0000,0000,0000,,which means you've got a lot of outliers
Dialogue: 0,0:26:04.88,0:26:06.72,Default,,0000,0000,0000,,which are errors in your data set, or you
Dialogue: 0,0:26:06.72,0:26:08.16,Default,,0000,0000,0000,,got a lot of missing values in your data
Dialogue: 0,0:26:08.16,0:26:10.84,Default,,0000,0000,0000,,set, even though you've got a fantastic
Dialogue: 0,0:26:10.84,0:26:13.04,Default,,0000,0000,0000,,algorithm, you've got a fantastic model,
Dialogue: 0,0:26:13.04,0:26:15.72,Default,,0000,0000,0000,,the predictions that your model is going
Dialogue: 0,0:26:15.72,0:26:18.96,Default,,0000,0000,0000,,to give is absolutely rubbish. It's kind
Dialogue: 0,0:26:18.96,0:26:22.08,Default,,0000,0000,0000,,of like taking water and putting water
Dialogue: 0,0:26:22.08,0:26:26.00,Default,,0000,0000,0000,,into the tank of a Mercedes-Benz. So
Dialogue: 0,0:26:26.00,0:26:28.44,Default,,0000,0000,0000,,Mercedes-Benz is a great car, but if you
Dialogue: 0,0:26:28.44,0:26:30.08,Default,,0000,0000,0000,,take water and put it into your
Dialogue: 0,0:26:30.08,0:26:33.40,Default,,0000,0000,0000,,Mercedes-Benz, it will just die, right? Your
Dialogue: 0,0:26:33.40,0:26:36.52,Default,,0000,0000,0000,,car will just die, it can't run on water,
Dialogue: 0,0:26:36.52,0:26:38.28,Default,,0000,0000,0000,,right? On the other hand, if you have a
Dialogue: 0,0:26:38.28,0:26:41.56,Default,,0000,0000,0000,,Myvi, Myvi is just a lousy, shit car, but if
Dialogue: 0,0:26:41.56,0:26:44.84,Default,,0000,0000,0000,,you take a high octane, good petrol and
Dialogue: 0,0:26:44.84,0:26:47.24,Default,,0000,0000,0000,,you put into a Myvi, the Myvi will just go at,
Dialogue: 0,0:26:47.24,0:26:49.48,Default,,0000,0000,0000,,you know, 100 miles an hour. It would just
Dialogue: 0,0:26:49.48,0:26:51.16,Default,,0000,0000,0000,,completely destroy the Mercedes-Benz in
Dialogue: 0,0:26:51.16,0:26:53.36,Default,,0000,0000,0000,,terms of performance, so it
Dialogue: 0,0:26:53.36,0:26:54.80,Default,,0000,0000,0000,,doesn't really matter what model you're
Dialogue: 0,0:26:54.80,0:26:57.08,Default,,0000,0000,0000,,using here, right? So you can be using the most
Dialogue: 0,0:26:57.08,0:26:58.68,Default,,0000,0000,0000,,fantastic model like the
Dialogue: 0,0:26:58.68,0:27:01.20,Default,,0000,0000,0000,,Mercedes-Benz or machine learning, but if
Dialogue: 0,0:27:01.20,0:27:03.08,Default,,0000,0000,0000,,your data is lousy quality, your
Dialogue: 0,0:27:03.08,0:27:06.48,Default,,0000,0000,0000,,predictions is also going to be rubbish,
Dialogue: 0,0:27:06.48,0:27:10.00,Default,,0000,0000,0000,,okay? So cleansing data set is, in fact,
Dialogue: 0,0:27:10.00,0:27:11.88,Default,,0000,0000,0000,,probably the most important thing that
Dialogue: 0,0:27:11.88,0:27:13.64,Default,,0000,0000,0000,,data scientists need to do and that's
Dialogue: 0,0:27:13.64,0:27:15.52,Default,,0000,0000,0000,,what they spend most of the time doing,
Dialogue: 0,0:27:15.52,0:27:17.60,Default,,0000,0000,0000,,right, building the model, training the
Dialogue: 0,0:27:17.60,0:27:20.24,Default,,0000,0000,0000,,model, getting the right algorithms, and
Dialogue: 0,0:27:20.24,0:27:23.24,Default,,0000,0000,0000,,so on, that's really a small portion of
Dialogue: 0,0:27:23.24,0:27:25.20,Default,,0000,0000,0000,,the actual machine learning workflow,
Dialogue: 0,0:27:25.20,0:27:27.36,Default,,0000,0000,0000,,right? The actual machine learning
Dialogue: 0,0:27:27.36,0:27:29.68,Default,,0000,0000,0000,,workflow, the vast majority of time is on
Dialogue: 0,0:27:29.68,0:27:31.56,Default,,0000,0000,0000,,cleaning and organizing your
Dialogue: 0,0:27:31.56,0:27:33.36,Default,,0000,0000,0000,,data. Then you have something called
Dialogue: 0,0:27:33.36,0:27:35.08,Default,,0000,0000,0000,,feature engineering which is you
Dialogue: 0,0:27:35.08,0:27:37.00,Default,,0000,0000,0000,,preprocess the feature variables of
Dialogue: 0,0:27:37.00,0:27:38.92,Default,,0000,0000,0000,,your original data set prior to using
Dialogue: 0,0:27:38.92,0:27:40.60,Default,,0000,0000,0000,,them to train the model, and this is
Dialogue: 0,0:27:40.60,0:27:41.96,Default,,0000,0000,0000,,either through addition, deletion,
Dialogue: 0,0:27:41.96,0:27:43.60,Default,,0000,0000,0000,,combination, or transformation of these
Dialogue: 0,0:27:43.60,0:27:45.40,Default,,0000,0000,0000,,variables. And then the idea is you want
Dialogue: 0,0:27:45.40,0:27:47.00,Default,,0000,0000,0000,,to improve the predictive accuracy of
Dialogue: 0,0:27:47.00,0:27:49.32,Default,,0000,0000,0000,,the model, and also because some models
Dialogue: 0,0:27:49.32,0:27:51.08,Default,,0000,0000,0000,,can only work with numeric data, so you
Dialogue: 0,0:27:51.08,0:27:53.72,Default,,0000,0000,0000,,need to transform categorical data into
Dialogue: 0,0:27:53.72,0:27:57.04,Default,,0000,0000,0000,,numeric data. All right, so just now, in
Dialogue: 0,0:27:57.04,0:27:58.80,Default,,0000,0000,0000,,the earlier slides, I showed you that you
Dialogue: 0,0:27:58.80,0:28:00.76,Default,,0000,0000,0000,,take your original data set, you pump it
Dialogue: 0,0:28:00.76,0:28:03.20,Default,,0000,0000,0000,,into algorithm, and then a couple of hours
Dialogue: 0,0:28:03.20,0:28:05.20,Default,,0000,0000,0000,,later, you get a machine learning model,
Dialogue: 0,0:28:05.20,0:28:08.64,Default,,0000,0000,0000,,right? So you didn't do anything to your
Dialogue: 0,0:28:08.64,0:28:10.16,Default,,0000,0000,0000,,data set, to the feature variables in
Dialogue: 0,0:28:10.16,0:28:12.16,Default,,0000,0000,0000,,your data set before you pump it into a
Dialogue: 0,0:28:12.16,0:28:14.40,Default,,0000,0000,0000,,machine learning algorithm. So
Dialogue: 0,0:28:14.40,0:28:15.84,Default,,0000,0000,0000,,what I showed you earlier is you just
Dialogue: 0,0:28:15.84,0:28:18.92,Default,,0000,0000,0000,,take the data set exactly as it is and
Dialogue: 0,0:28:18.92,0:28:20.80,Default,,0000,0000,0000,,you just pump it into the algorithm,
Dialogue: 0,0:28:20.80,0:28:23.12,Default,,0000,0000,0000,,couple of hours later, you get a model,
Dialogue: 0,0:28:23.12,0:28:27.64,Default,,0000,0000,0000,,right? But that's not what generally
Dialogue: 0,0:28:27.64,0:28:29.60,Default,,0000,0000,0000,,happens in in real life. In real life,
Dialogue: 0,0:28:29.60,0:28:31.56,Default,,0000,0000,0000,,you're going to take all the original
Dialogue: 0,0:28:31.56,0:28:34.32,Default,,0000,0000,0000,,feature variables from your data set and
Dialogue: 0,0:28:34.32,0:28:36.72,Default,,0000,0000,0000,,you're going to transform them in some
Dialogue: 0,0:28:36.72,0:28:38.96,Default,,0000,0000,0000,,way. So you can see here these are the
Dialogue: 0,0:28:38.96,0:28:42.12,Default,,0000,0000,0000,,columns of data from my original data set,
Dialogue: 0,0:28:42.12,0:28:46.04,Default,,0000,0000,0000,,and before I actually put all these data
Dialogue: 0,0:28:46.04,0:28:48.24,Default,,0000,0000,0000,,points from my original data set into my
Dialogue: 0,0:28:48.24,0:28:50.72,Default,,0000,0000,0000,,algorithm to train and get my model, I
Dialogue: 0,0:28:50.72,0:28:54.96,Default,,0000,0000,0000,,will actually transform them, okay? So the
Dialogue: 0,0:28:54.96,0:28:57.60,Default,,0000,0000,0000,,transformation of these feature variable
Dialogue: 0,0:28:57.60,0:29:00.60,Default,,0000,0000,0000,,values, we call this feature engineering.
Dialogue: 0,0:29:00.60,0:29:02.44,Default,,0000,0000,0000,,And there are many, many techniques to do
Dialogue: 0,0:29:02.44,0:29:04.96,Default,,0000,0000,0000,,feature engineering, so one-hot encoding,
Dialogue: 0,0:29:04.96,0:29:08.28,Default,,0000,0000,0000,,scaling, log transformation,
Dialogue: 0,0:29:08.28,0:29:10.48,Default,,0000,0000,0000,,discretization, date extraction, boolean
Dialogue: 0,0:29:10.48,0:29:12.04,Default,,0000,0000,0000,,logic, etc, etc.
Dialogue: 0,0:29:12.04,0:29:14.88,Default,,0000,0000,0000,,Okay, then finally we do something
Dialogue: 0,0:29:14.88,0:29:16.80,Default,,0000,0000,0000,,called a train-test split, so where we
Dialogue: 0,0:29:16.80,0:29:19.44,Default,,0000,0000,0000,,take our original dataset, right? So this
Dialogue: 0,0:29:19.44,0:29:21.36,Default,,0000,0000,0000,,was the original dataset, and we break
Dialogue: 0,0:29:21.36,0:29:23.72,Default,,0000,0000,0000,,it into two parts, so one is called the
Dialogue: 0,0:29:23.72,0:29:25.76,Default,,0000,0000,0000,,training dataset and the other is
Dialogue: 0,0:29:25.76,0:29:28.12,Default,,0000,0000,0000,,called the test dataset. And the primary
Dialogue: 0,0:29:28.12,0:29:30.00,Default,,0000,0000,0000,,purpose for this is when we feed and
Dialogue: 0,0:29:30.00,0:29:31.40,Default,,0000,0000,0000,,train the machine learning model, we're
Dialogue: 0,0:29:31.40,0:29:32.64,Default,,0000,0000,0000,,going to use what is called the training
Dialogue: 0,0:29:32.64,0:29:35.56,Default,,0000,0000,0000,,dataset, and when we want to evaluate
Dialogue: 0,0:29:35.56,0:29:37.40,Default,,0000,0000,0000,,the accuracy of the model, right? So this
Dialogue: 0,0:29:37.40,0:29:40.96,Default,,0000,0000,0000,,is the key part of your machine learning
Dialogue: 0,0:29:40.96,0:29:43.64,Default,,0000,0000,0000,,life cycle because you are not only just
Dialogue: 0,0:29:43.64,0:29:45.44,Default,,0000,0000,0000,,going to have one possible models
Dialogue: 0,0:29:45.44,0:29:47.72,Default,,0000,0000,0000,,because there are a vast range of
Dialogue: 0,0:29:47.72,0:29:50.08,Default,,0000,0000,0000,,algorithms that you can use to create a
Dialogue: 0,0:29:50.08,0:29:53.00,Default,,0000,0000,0000,,model. So fundamentally you have a wide
Dialogue: 0,0:29:53.00,0:29:55.68,Default,,0000,0000,0000,,range of choices, right, like wide range
Dialogue: 0,0:29:55.68,0:29:57.64,Default,,0000,0000,0000,,of cars, right? You want to buy a car, you
Dialogue: 0,0:29:57.64,0:30:00.56,Default,,0000,0000,0000,,can buy a Myvi, you can buy a Perodua,
Dialogue: 0,0:30:00.56,0:30:02.64,Default,,0000,0000,0000,,you can buy a Honda, you can buy a
Dialogue: 0,0:30:02.64,0:30:05.04,Default,,0000,0000,0000,,Mercedes-Benz, you can buy a Audi, you can
Dialogue: 0,0:30:05.04,0:30:07.76,Default,,0000,0000,0000,,buy a beamer, many, many different cars
Dialogue: 0,0:30:07.76,0:30:09.24,Default,,0000,0000,0000,,that available for you if you want
Dialogue: 0,0:30:09.24,0:30:11.68,Default,,0000,0000,0000,,to buy a car, right? Same thing. With a
Dialogue: 0,0:30:11.68,0:30:14.36,Default,,0000,0000,0000,,machine learning model there are a vast
Dialogue: 0,0:30:14.36,0:30:16.72,Default,,0000,0000,0000,,variety of algorithms that you can
Dialogue: 0,0:30:16.72,0:30:19.48,Default,,0000,0000,0000,,choose from in order to create a model,
Dialogue: 0,0:30:19.48,0:30:21.52,Default,,0000,0000,0000,,and so once you create a model from a
Dialogue: 0,0:30:21.52,0:30:24.48,Default,,0000,0000,0000,,given algorithm you need to say, hey, how
Dialogue: 0,0:30:24.48,0:30:26.44,Default,,0000,0000,0000,,accurate is this model that I've created
Dialogue: 0,0:30:26.44,0:30:28.64,Default,,0000,0000,0000,,from this algorithm. And different
Dialogue: 0,0:30:28.64,0:30:30.40,Default,,0000,0000,0000,,algorithms are going to create different
Dialogue: 0,0:30:30.40,0:30:33.72,Default,,0000,0000,0000,,models with different rates of accuracy.
Dialogue: 0,0:30:33.72,0:30:35.68,Default,,0000,0000,0000,,And so the primary purpose of the test
Dialogue: 0,0:30:35.68,0:30:38.20,Default,,0000,0000,0000,,dataset is to evaluate the accuracy
Dialogue: 0,0:30:38.20,0:30:41.48,Default,,0000,0000,0000,,of the model to see hey, is this model
Dialogue: 0,0:30:41.48,0:30:43.36,Default,,0000,0000,0000,,that I've created using this algorithm,
Dialogue: 0,0:30:43.36,0:30:45.88,Default,,0000,0000,0000,,is it adequate for me to use in a real
Dialogue: 0,0:30:45.88,0:30:48.60,Default,,0000,0000,0000,,life production use case? Okay? So that's
Dialogue: 0,0:30:48.60,0:30:52.32,Default,,0000,0000,0000,,what it's all about. Okay, so this is my
Dialogue: 0,0:30:52.32,0:30:54.28,Default,,0000,0000,0000,,original dataset, I break it into my
Dialogue: 0,0:30:54.28,0:30:56.56,Default,,0000,0000,0000,,feature dataset and
Dialogue: 0,0:30:56.56,0:30:58.52,Default,,0000,0000,0000,,also my target variable column, so my
Dialogue: 0,0:30:58.52,0:31:00.64,Default,,0000,0000,0000,,feature variable columns, the target
Dialogue: 0,0:31:00.64,0:31:02.20,Default,,0000,0000,0000,,variable columns, and then I further break
Dialogue: 0,0:31:02.20,0:31:04.24,Default,,0000,0000,0000,,it into a training dataset and a test
Dialogue: 0,0:31:04.24,0:31:06.60,Default,,0000,0000,0000,,dataset. The training dataset is to use
Dialogue: 0,0:31:06.60,0:31:08.32,Default,,0000,0000,0000,,to train, to create the machine learning
Dialogue: 0,0:31:08.32,0:31:10.48,Default,,0000,0000,0000,,model. And then once the machine learning
Dialogue: 0,0:31:10.48,0:31:12.20,Default,,0000,0000,0000,,model is created, I then use the test
Dialogue: 0,0:31:12.20,0:31:15.08,Default,,0000,0000,0000,,dataset to evaluate the accuracy of the
Dialogue: 0,0:31:15.08,0:31:17.26,Default,,0000,0000,0000,,machine learning model.
Dialogue: 0,0:31:17.26,0:31:21.00,Default,,0000,0000,0000,,All right. And then finally we can
Dialogue: 0,0:31:21.00,0:31:23.20,Default,,0000,0000,0000,,see what are the different parts or
Dialogue: 0,0:31:23.20,0:31:26.08,Default,,0000,0000,0000,,aspects that go into a successful model,
Dialogue: 0,0:31:26.08,0:31:29.52,Default,,0000,0000,0000,,so EDA about 10%, data cleansing about
Dialogue: 0,0:31:29.52,0:31:32.36,Default,,0000,0000,0000,,20%, feature engineering about
Dialogue: 0,0:31:32.36,0:31:36.32,Default,,0000,0000,0000,,25%, selecting a specific algorithm about
Dialogue: 0,0:31:36.32,0:31:39.12,Default,,0000,0000,0000,,10%, and then training the model from
Dialogue: 0,0:31:39.12,0:31:41.64,Default,,0000,0000,0000,,that algorithm about 15%, and then
Dialogue: 0,0:31:41.64,0:31:43.68,Default,,0000,0000,0000,,finally evaluating the model, deciding
Dialogue: 0,0:31:43.68,0:31:45.96,Default,,0000,0000,0000,,which is the best model with the highest
Dialogue: 0,0:31:45.96,0:31:51.82,Default,,0000,0000,0000,,accuracy rate, that's about 20%.
Dialogue: 0,0:31:54.08,0:31:56.92,Default,,0000,0000,0000,,All right, so we have reached the
Dialogue: 0,0:31:56.92,0:31:58.88,Default,,0000,0000,0000,,most interesting part of this
Dialogue: 0,0:31:58.88,0:32:01.04,Default,,0000,0000,0000,,presentation which is the demonstration
Dialogue: 0,0:32:01.04,0:32:03.76,Default,,0000,0000,0000,,of an end-to-end machine learning workflow
Dialogue: 0,0:32:03.76,0:32:06.08,Default,,0000,0000,0000,,on a real life dataset that
Dialogue: 0,0:32:06.08,0:32:10.08,Default,,0000,0000,0000,,demonstrates the use case of predictive
Dialogue: 0,0:32:10.08,0:32:13.52,Default,,0000,0000,0000,,maintenance. So for the data set for
Dialogue: 0,0:32:13.52,0:32:16.24,Default,,0000,0000,0000,,this particular use case, I've used a
Dialogue: 0,0:32:16.24,0:32:19.20,Default,,0000,0000,0000,,data set from Kaggle. So for those of you
Dialogue: 0,0:32:19.20,0:32:21.40,Default,,0000,0000,0000,,are not aware of this, Kaggle is the
Dialogue: 0,0:32:21.40,0:32:24.88,Default,,0000,0000,0000,,world's largest open-source community
Dialogue: 0,0:32:24.88,0:32:28.08,Default,,0000,0000,0000,,for data science and AI, and they have a
Dialogue: 0,0:32:28.08,0:32:31.16,Default,,0000,0000,0000,,large collection of datasets from all
Dialogue: 0,0:32:31.16,0:32:34.44,Default,,0000,0000,0000,,various areas of industry and human
Dialogue: 0,0:32:34.44,0:32:37.04,Default,,0000,0000,0000,,endeavor, and they also have a large
Dialogue: 0,0:32:37.04,0:32:38.84,Default,,0000,0000,0000,,collection of models that have been
Dialogue: 0,0:32:38.84,0:32:42.88,Default,,0000,0000,0000,,developed using these data sets. So here
Dialogue: 0,0:32:42.88,0:32:47.04,Default,,0000,0000,0000,,we have a data set for the particular
Dialogue: 0,0:32:47.04,0:32:50.52,Default,,0000,0000,0000,,use case, predictive maintenance, okay? So
Dialogue: 0,0:32:50.52,0:32:52.92,Default,,0000,0000,0000,,this is some information about the data
Dialogue: 0,0:32:52.92,0:32:56.44,Default,,0000,0000,0000,,set, so in case you do not know how
Dialogue: 0,0:32:56.44,0:32:59.20,Default,,0000,0000,0000,,to get to there, this is the URL to click
Dialogue: 0,0:32:59.20,0:33:02.24,Default,,0000,0000,0000,,on, okay, to get to that dataset. So once
Dialogue: 0,0:33:02.24,0:33:05.12,Default,,0000,0000,0000,,your at the data set here, you can- or the
Dialogue: 0,0:33:05.12,0:33:07.40,Default,,0000,0000,0000,,page for about this dataset, you can see
Dialogue: 0,0:33:07.40,0:33:09.96,Default,,0000,0000,0000,,all the information about this data set,
Dialogue: 0,0:33:09.96,0:33:12.96,Default,,0000,0000,0000,,and you can download the data set in a
Dialogue: 0,0:33:12.96,0:33:14.16,Default,,0000,0000,0000,,CSV format.
Dialogue: 0,0:33:14.16,0:33:16.36,Default,,0000,0000,0000,,Okay, so let's take a look at the
Dialogue: 0,0:33:16.36,0:33:19.56,Default,,0000,0000,0000,,dataset. So this dataset has a total of
Dialogue: 0,0:33:19.56,0:33:23.44,Default,,0000,0000,0000,,10,000 samples, okay? And these are the
Dialogue: 0,0:33:23.44,0:33:26.28,Default,,0000,0000,0000,,feature variables, the type, the product
Dialogue: 0,0:33:26.28,0:33:28.44,Default,,0000,0000,0000,,ID, the air temperature, process
Dialogue: 0,0:33:28.44,0:33:30.90,Default,,0000,0000,0000,,temperature, rotational speed, torque, tool
Dialogue: 0,0:33:30.90,0:33:34.80,Default,,0000,0000,0000,,wear, and this is the target variable,
Dialogue: 0,0:33:34.80,0:33:36.72,Default,,0000,0000,0000,,all right? So the target variable is what
Dialogue: 0,0:33:36.72,0:33:38.16,Default,,0000,0000,0000,,we are interested in, what we are
Dialogue: 0,0:33:38.16,0:33:40.96,Default,,0000,0000,0000,,interested in using to train the machine
Dialogue: 0,0:33:40.96,0:33:42.60,Default,,0000,0000,0000,,learning model, and also what we are
Dialogue: 0,0:33:42.60,0:33:45.28,Default,,0000,0000,0000,,interested to predict, okay? So these are
Dialogue: 0,0:33:45.28,0:33:47.96,Default,,0000,0000,0000,,the feature variables, they describe or
Dialogue: 0,0:33:47.96,0:33:49.96,Default,,0000,0000,0000,,they provide information about this
Dialogue: 0,0:33:49.96,0:33:52.88,Default,,0000,0000,0000,,particular machine on the production
Dialogue: 0,0:33:52.88,0:33:55.08,Default,,0000,0000,0000,,line, on the assembly line, so you might
Dialogue: 0,0:33:55.08,0:33:56.80,Default,,0000,0000,0000,,know the product ID, the type, the air
Dialogue: 0,0:33:56.80,0:33:58.12,Default,,0000,0000,0000,,temperature, process temperature,
Dialogue: 0,0:33:58.12,0:34:00.48,Default,,0000,0000,0000,,rotational speed, torque, tool wear, right? So
Dialogue: 0,0:34:00.48,0:34:03.16,Default,,0000,0000,0000,,let's say you've got a IoT sensor system
Dialogue: 0,0:34:03.16,0:34:06.12,Default,,0000,0000,0000,,that's basically capturing all this data
Dialogue: 0,0:34:06.12,0:34:08.36,Default,,0000,0000,0000,,about a product or a machine on your
Dialogue: 0,0:34:08.36,0:34:10.68,Default,,0000,0000,0000,,production or assembly line, okay? And
Dialogue: 0,0:34:10.68,0:34:13.92,Default,,0000,0000,0000,,you've also captured information about
Dialogue: 0,0:34:13.92,0:34:17.20,Default,,0000,0000,0000,,whether is for a specific sample,
Dialogue: 0,0:34:17.20,0:34:19.84,Default,,0000,0000,0000,,whether that sample experience a
Dialogue: 0,0:34:19.84,0:34:23.04,Default,,0000,0000,0000,,failure or not, okay? So the target value
Dialogue: 0,0:34:23.04,0:34:25.52,Default,,0000,0000,0000,,of zero, okay, indicates that there's no
Dialogue: 0,0:34:25.52,0:34:28.00,Default,,0000,0000,0000,,failure. So zero means no failure, and we
Dialogue: 0,0:34:28.00,0:34:30.20,Default,,0000,0000,0000,,can see that the vast majority of data
Dialogue: 0,0:34:30.20,0:34:32.52,Default,,0000,0000,0000,,points in this data set are no failure.
Dialogue: 0,0:34:32.52,0:34:34.00,Default,,0000,0000,0000,,And here we can see an example here
Dialogue: 0,0:34:34.00,0:34:36.72,Default,,0000,0000,0000,,where you have a case of a failure, so a
Dialogue: 0,0:34:36.72,0:34:40.16,Default,,0000,0000,0000,,failure is marked as a one, positive, and
Dialogue: 0,0:34:40.16,0:34:42.64,Default,,0000,0000,0000,,no failure is marked as zero, negative,
Dialogue: 0,0:34:42.64,0:34:44.88,Default,,0000,0000,0000,,all right? So here we have one type of a
Dialogue: 0,0:34:44.88,0:34:47.04,Default,,0000,0000,0000,,failure, it's called a power failure. And
Dialogue: 0,0:34:47.04,0:34:49.00,Default,,0000,0000,0000,,if you scroll down the data set, you see
Dialogue: 0,0:34:49.00,0:34:50.40,Default,,0000,0000,0000,,there are also other kinds of failures
Dialogue: 0,0:34:50.40,0:34:52.84,Default,,0000,0000,0000,,like a tool wear
Dialogue: 0,0:34:52.84,0:34:56.96,Default,,0000,0000,0000,,failure, we have a overstrain failure
Dialogue: 0,0:34:56.96,0:34:58.68,Default,,0000,0000,0000,,here, for example,
Dialogue: 0,0:34:58.68,0:35:00.76,Default,,0000,0000,0000,,we also have a power failure again,
Dialogue: 0,0:35:00.76,0:35:02.20,Default,,0000,0000,0000,,and so on. So if you scroll down through
Dialogue: 0,0:35:02.20,0:35:04.16,Default,,0000,0000,0000,,these 10,000 data points, or if
Dialogue: 0,0:35:04.16,0:35:06.04,Default,,0000,0000,0000,,you're familiar with using Excel to
Dialogue: 0,0:35:06.04,0:35:08.84,Default,,0000,0000,0000,,filter out values in a column, you can
Dialogue: 0,0:35:08.84,0:35:12.28,Default,,0000,0000,0000,,see that in this particular column here
Dialogue: 0,0:35:12.28,0:35:14.48,Default,,0000,0000,0000,,which is the so-called target variable
Dialogue: 0,0:35:14.48,0:35:16.96,Default,,0000,0000,0000,,column, you are going to have the vast
Dialogue: 0,0:35:16.96,0:35:18.92,Default,,0000,0000,0000,,majority of values as zero which means
Dialogue: 0,0:35:18.92,0:35:22.76,Default,,0000,0000,0000,,no failure, and some of the rows or the
Dialogue: 0,0:35:22.76,0:35:24.04,Default,,0000,0000,0000,,data points you are going to have a
Dialogue: 0,0:35:24.04,0:35:26.36,Default,,0000,0000,0000,,value of one, and for those rows that you
Dialogue: 0,0:35:26.36,0:35:28.12,Default,,0000,0000,0000,,have a value of one, for example,
Dialogue: 0,0:35:28.12,0:35:31.28,Default,,0000,0000,0000,,here you are- Sorry, for example, here you
Dialogue: 0,0:35:31.28,0:35:32.84,Default,,0000,0000,0000,,are going to have different types of
Dialogue: 0,0:35:32.84,0:35:34.64,Default,,0000,0000,0000,,failures, so like I said just now power
Dialogue: 0,0:35:34.64,0:35:38.96,Default,,0000,0000,0000,,failure, tool set failure, etc, etc. So we are
Dialogue: 0,0:35:38.96,0:35:40.64,Default,,0000,0000,0000,,going to go through the entire machine
Dialogue: 0,0:35:40.64,0:35:43.76,Default,,0000,0000,0000,,learning workflow process with this dataset.
Dialogue: 0,0:35:43.76,0:35:46.64,Default,,0000,0000,0000,,So to see an example of that, we are
Dialogue: 0,0:35:46.64,0:35:50.40,Default,,0000,0000,0000,,going to use a- we're going to go to the
Dialogue: 0,0:35:50.40,0:35:52.28,Default,,0000,0000,0000,,code section here, all right, so if I
Dialogue: 0,0:35:52.28,0:35:54.28,Default,,0000,0000,0000,,click on the code section here. And right
Dialogue: 0,0:35:54.28,0:35:56.40,Default,,0000,0000,0000,,down here we have see what is called a
Dialogue: 0,0:35:56.40,0:35:59.36,Default,,0000,0000,0000,,dataset notebook. So this is basically a
Dialogue: 0,0:35:59.36,0:36:02.32,Default,,0000,0000,0000,,Jupyter notebook. Jupyter is basically an
Dialogue: 0,0:36:02.32,0:36:05.28,Default,,0000,0000,0000,,Python application which allows you to
Dialogue: 0,0:36:05.28,0:36:09.24,Default,,0000,0000,0000,,create a Python machine learning
Dialogue: 0,0:36:09.24,0:36:11.68,Default,,0000,0000,0000,,program that basically builds your
Dialogue: 0,0:36:11.68,0:36:14.52,Default,,0000,0000,0000,,machine learning model, assesses or
Dialogue: 0,0:36:14.52,0:36:16.48,Default,,0000,0000,0000,,evaluates its accuracy, and generates
Dialogue: 0,0:36:16.48,0:36:19.04,Default,,0000,0000,0000,,predictions from it, okay? So here we have
Dialogue: 0,0:36:19.04,0:36:21.68,Default,,0000,0000,0000,,a whole bunch of Jupyter notebooks that
Dialogue: 0,0:36:21.68,0:36:24.56,Default,,0000,0000,0000,,are available, and you can select any one
Dialogue: 0,0:36:24.56,0:36:26.00,Default,,0000,0000,0000,,of them. All these notebooks are
Dialogue: 0,0:36:26.00,0:36:28.72,Default,,0000,0000,0000,,essentially going to process the data
Dialogue: 0,0:36:28.72,0:36:31.72,Default,,0000,0000,0000,,from this particular dataset. So if I go
Dialogue: 0,0:36:31.72,0:36:34.72,Default,,0000,0000,0000,,to this code page here, I've actually
Dialogue: 0,0:36:34.72,0:36:37.32,Default,,0000,0000,0000,,selected a specific notebook that I'm
Dialogue: 0,0:36:37.32,0:36:39.96,Default,,0000,0000,0000,,going to run through to demonstrate an
Dialogue: 0,0:36:39.96,0:36:42.84,Default,,0000,0000,0000,,end-to-end machine learning workflow using
Dialogue: 0,0:36:42.84,0:36:45.56,Default,,0000,0000,0000,,various machine learning libraries from
Dialogue: 0,0:36:45.56,0:36:49.80,Default,,0000,0000,0000,,the Python programming language, okay? So
Dialogue: 0,0:36:49.80,0:36:52.44,Default,,0000,0000,0000,,the particular notebook I'm going to
Dialogue: 0,0:36:52.44,0:36:55.16,Default,,0000,0000,0000,,use is this particular notebook here, and
Dialogue: 0,0:36:55.16,0:36:57.16,Default,,0000,0000,0000,,you can also get the URL for that
Dialogue: 0,0:36:57.16,0:37:00.44,Default,,0000,0000,0000,,particular notebook from here.
Dialogue: 0,0:37:00.44,0:37:03.76,Default,,0000,0000,0000,,Okay, so let's quickly do a quick
Dialogue: 0,0:37:03.76,0:37:05.97,Default,,0000,0000,0000,,revision again. What are we trying to do
Dialogue: 0,0:37:05.97,0:37:08.00,Default,,0000,0000,0000,,here? We're trying to build a machine
Dialogue: 0,0:37:08.00,0:37:11.36,Default,,0000,0000,0000,,learning classification model, right? So
Dialogue: 0,0:37:11.36,0:37:12.96,Default,,0000,0000,0000,,we said there are two primary areas of
Dialogue: 0,0:37:12.96,0:37:14.56,Default,,0000,0000,0000,,supervised learning, one is regression
Dialogue: 0,0:37:14.56,0:37:16.20,Default,,0000,0000,0000,,which is used to predict a numerical
Dialogue: 0,0:37:16.20,0:37:18.64,Default,,0000,0000,0000,,target variable, and the second kind of
Dialogue: 0,0:37:18.64,0:37:21.36,Default,,0000,0000,0000,,supervised learning is classification
Dialogue: 0,0:37:21.36,0:37:23.08,Default,,0000,0000,0000,,which is what we're doing here. We're
Dialogue: 0,0:37:23.08,0:37:25.84,Default,,0000,0000,0000,,trying to predict a categorical target
Dialogue: 0,0:37:25.84,0:37:29.68,Default,,0000,0000,0000,,variable, okay? So in this particular
Dialogue: 0,0:37:29.68,0:37:32.12,Default,,0000,0000,0000,,example, we actually have two kinds of
Dialogue: 0,0:37:32.12,0:37:34.48,Default,,0000,0000,0000,,ways we can classify, either a binary
Dialogue: 0,0:37:34.48,0:37:36.68,Default,,0000,0000,0000,,classification or a multiclass
Dialogue: 0,0:37:36.68,0:37:39.52,Default,,0000,0000,0000,,classification. So for binary
Dialogue: 0,0:37:39.52,0:37:41.44,Default,,0000,0000,0000,,classification, we are only going to
Dialogue: 0,0:37:41.44,0:37:43.40,Default,,0000,0000,0000,,classify the product or machine as
Dialogue: 0,0:37:43.40,0:37:47.16,Default,,0000,0000,0000,,either it failed or it did not fail, okay?
Dialogue: 0,0:37:47.16,0:37:48.88,Default,,0000,0000,0000,,So if we go back to the dataset that I
Dialogue: 0,0:37:48.88,0:37:50.84,Default,,0000,0000,0000,,showed you just now, if you look at this
Dialogue: 0,0:37:50.84,0:37:52.68,Default,,0000,0000,0000,,target variable column, there are only
Dialogue: 0,0:37:52.68,0:37:54.52,Default,,0000,0000,0000,,two possible values here. They are either
Dialogue: 0,0:37:54.52,0:37:58.28,Default,,0000,0000,0000,,zero or one. Zero means there's no failure.
Dialogue: 0,0:37:58.28,0:38:01.24,Default,,0000,0000,0000,,One means there's a failure, okay? So this
Dialogue: 0,0:38:01.24,0:38:03.44,Default,,0000,0000,0000,,is an example of a binary classification.
Dialogue: 0,0:38:03.44,0:38:07.24,Default,,0000,0000,0000,,Only two possible outcomes, zero or one,
Dialogue: 0,0:38:07.24,0:38:10.12,Default,,0000,0000,0000,,didn't fail or fail, all right? Two
Dialogue: 0,0:38:10.12,0:38:13.08,Default,,0000,0000,0000,,possible outcomes. And then we can also,
Dialogue: 0,0:38:13.08,0:38:15.48,Default,,0000,0000,0000,,for the same dataset, we can extend it
Dialogue: 0,0:38:15.48,0:38:18.08,Default,,0000,0000,0000,,and make it a multiclass classification
Dialogue: 0,0:38:18.08,0:38:20.88,Default,,0000,0000,0000,,problem, all right? So if we kind of want
Dialogue: 0,0:38:20.88,0:38:23.72,Default,,0000,0000,0000,,to drill down further, we can say that
Dialogue: 0,0:38:23.72,0:38:26.80,Default,,0000,0000,0000,,not only is there a failure, we can
Dialogue: 0,0:38:26.80,0:38:29.20,Default,,0000,0000,0000,,actually say there are different types of
Dialogue: 0,0:38:29.20,0:38:32.44,Default,,0000,0000,0000,,failures, okay? So we have one category of
Dialogue: 0,0:38:32.44,0:38:35.60,Default,,0000,0000,0000,,class that is basically no failure, okay?
Dialogue: 0,0:38:35.60,0:38:37.40,Default,,0000,0000,0000,,Then we have a category for the
Dialogue: 0,0:38:37.40,0:38:40.40,Default,,0000,0000,0000,,different types of failures, right? So you
Dialogue: 0,0:38:40.40,0:38:43.92,Default,,0000,0000,0000,,can have a power failure, you could have
Dialogue: 0,0:38:43.92,0:38:46.40,Default,,0000,0000,0000,,a tool wear failure,
Dialogue: 0,0:38:46.40,0:38:48.92,Default,,0000,0000,0000,,you could have- let's go down
Dialogue: 0,0:38:48.92,0:38:50.88,Default,,0000,0000,0000,,here, you could have a overstrain
Dialogue: 0,0:38:50.88,0:38:53.76,Default,,0000,0000,0000,,failure, and etc, etc. So you can have
Dialogue: 0,0:38:53.76,0:38:57.16,Default,,0000,0000,0000,,multiple classes of failure in addition
Dialogue: 0,0:38:57.16,0:39:00.52,Default,,0000,0000,0000,,to the general overall or the majority
Dialogue: 0,0:39:00.52,0:39:04.32,Default,,0000,0000,0000,,class of no failure, and that would be a
Dialogue: 0,0:39:04.32,0:39:06.68,Default,,0000,0000,0000,,multiclass classification problem. So
Dialogue: 0,0:39:06.68,0:39:08.40,Default,,0000,0000,0000,,with this data set, we are going to see
Dialogue: 0,0:39:08.40,0:39:11.04,Default,,0000,0000,0000,,how to make it a binary classification
Dialogue: 0,0:39:11.04,0:39:12.80,Default,,0000,0000,0000,,problem and also a multiclass
Dialogue: 0,0:39:12.80,0:39:15.08,Default,,0000,0000,0000,,classification problem. Okay, so let's
Dialogue: 0,0:39:15.08,0:39:16.88,Default,,0000,0000,0000,,look at the workflow. So let's say we've
Dialogue: 0,0:39:16.88,0:39:18.88,Default,,0000,0000,0000,,already got the data, so right now we do
Dialogue: 0,0:39:18.88,0:39:20.84,Default,,0000,0000,0000,,have the dataset. This is the dataset
Dialogue: 0,0:39:20.84,0:39:22.72,Default,,0000,0000,0000,,that we have, so let's assume we've
Dialogue: 0,0:39:22.72,0:39:24.56,Default,,0000,0000,0000,,somehow managed to get this dataset
Dialogue: 0,0:39:24.56,0:39:26.88,Default,,0000,0000,0000,,from some IoT sensors that are
Dialogue: 0,0:39:26.88,0:39:29.12,Default,,0000,0000,0000,,monitoring real-time data in our
Dialogue: 0,0:39:29.12,0:39:31.08,Default,,0000,0000,0000,,production environment. On the assembly
Dialogue: 0,0:39:31.08,0:39:32.80,Default,,0000,0000,0000,,line, on the production line we've got
Dialogue: 0,0:39:32.80,0:39:34.68,Default,,0000,0000,0000,,sensors reading data that gives us all
Dialogue: 0,0:39:34.68,0:39:37.96,Default,,0000,0000,0000,,these data that we have in this CSV file.
Dialogue: 0,0:39:37.96,0:39:40.08,Default,,0000,0000,0000,,Okay, so we've already got the data, we've
Dialogue: 0,0:39:40.08,0:39:41.60,Default,,0000,0000,0000,,retrieved the data, now we're going to go
Dialogue: 0,0:39:41.60,0:39:45.00,Default,,0000,0000,0000,,on to the cleaning and exploration part
Dialogue: 0,0:39:45.00,0:39:47.52,Default,,0000,0000,0000,,of your machine learning life cycle. All
Dialogue: 0,0:39:47.52,0:39:49.80,Default,,0000,0000,0000,,right, so let's look at the data cleaning
Dialogue: 0,0:39:49.80,0:39:51.40,Default,,0000,0000,0000,,part. So the data cleaning part, we're
Dialogue: 0,0:39:51.40,0:39:53.72,Default,,0000,0000,0000,,interested in checking for missing
Dialogue: 0,0:39:53.72,0:39:56.20,Default,,0000,0000,0000,,values and maybe removing the rows you
Dialogue: 0,0:39:56.20,0:39:58.08,Default,,0000,0000,0000,,missing values, okay?
Dialogue: 0,0:39:58.08,0:39:59.76,Default,,0000,0000,0000,,So the kind of things we can- sorry,
Dialogue: 0,0:39:59.76,0:40:01.00,Default,,0000,0000,0000,,the kind of things we can do in missing
Dialogue: 0,0:40:01.00,0:40:02.88,Default,,0000,0000,0000,,values, we can remove the rows missing
Dialogue: 0,0:40:02.88,0:40:05.84,Default,,0000,0000,0000,,values, we can put in some new values,
Dialogue: 0,0:40:05.84,0:40:08.00,Default,,0000,0000,0000,,some replacement values which could be a
Dialogue: 0,0:40:08.00,0:40:09.88,Default,,0000,0000,0000,,average of all the values in that that
Dialogue: 0,0:40:09.88,0:40:12.88,Default,,0000,0000,0000,,particular column, etc, etc, we could also try to
Dialogue: 0,0:40:12.88,0:40:15.48,Default,,0000,0000,0000,,identify outliers in our data set and
Dialogue: 0,0:40:15.48,0:40:17.48,Default,,0000,0000,0000,,also there are a variety of ways to deal
Dialogue: 0,0:40:17.48,0:40:19.48,Default,,0000,0000,0000,,with that. So this is called data
Dialogue: 0,0:40:19.48,0:40:21.36,Default,,0000,0000,0000,,cleansing which is a really important
Dialogue: 0,0:40:21.36,0:40:23.32,Default,,0000,0000,0000,,part of your machine learning workflow,
Dialogue: 0,0:40:23.32,0:40:25.52,Default,,0000,0000,0000,,right? So that's where we are now at,
Dialogue: 0,0:40:25.52,0:40:26.84,Default,,0000,0000,0000,,we're doing cleansing, and then we're
Dialogue: 0,0:40:26.84,0:40:27.94,Default,,0000,0000,0000,,going to follow up with
Dialogue: 0,0:40:27.94,0:40:31.16,Default,,0000,0000,0000,,exploration. So let's look at the actual
Dialogue: 0,0:40:31.16,0:40:33.16,Default,,0000,0000,0000,,code that does the cleansing here. So
Dialogue: 0,0:40:33.16,0:40:35.80,Default,,0000,0000,0000,,here we are right at the start of the
Dialogue: 0,0:40:35.80,0:40:38.25,Default,,0000,0000,0000,,machine learning life cycle here, so
Dialogue: 0,0:40:38.25,0:40:40.84,Default,,0000,0000,0000,,this is a Jupyter notebook. So here we
Dialogue: 0,0:40:40.84,0:40:43.36,Default,,0000,0000,0000,,have a brief description of the problem
Dialogue: 0,0:40:43.36,0:40:45.92,Default,,0000,0000,0000,,statement, all right? So this dataset
Dialogue: 0,0:40:45.92,0:40:47.64,Default,,0000,0000,0000,,reflects real life predictive
Dialogue: 0,0:40:47.64,0:40:49.24,Default,,0000,0000,0000,,maintenance encountered industry with
Dialogue: 0,0:40:49.24,0:40:50.48,Default,,0000,0000,0000,,measurements from real equipment. The
Dialogue: 0,0:40:50.48,0:40:52.40,Default,,0000,0000,0000,,features description is taken directly
Dialogue: 0,0:40:52.40,0:40:54.52,Default,,0000,0000,0000,,from the data source set. So here we have
Dialogue: 0,0:40:54.52,0:40:57.40,Default,,0000,0000,0000,,a description of the six key features in
Dialogue: 0,0:40:57.40,0:40:59.60,Default,,0000,0000,0000,,our dataset type which is the quality
Dialogue: 0,0:40:59.60,0:41:02.52,Default,,0000,0000,0000,,of the product, the air temperature, the
Dialogue: 0,0:41:02.52,0:41:04.68,Default,,0000,0000,0000,,process temperature, the rotational speed,
Dialogue: 0,0:41:04.68,0:41:06.60,Default,,0000,0000,0000,,the torque, and the tool wear, all right? So
Dialogue: 0,0:41:06.60,0:41:08.88,Default,,0000,0000,0000,,these are the six feature variables, and
Dialogue: 0,0:41:08.88,0:41:11.32,Default,,0000,0000,0000,,there are the two target variables, so
Dialogue: 0,0:41:11.32,0:41:13.12,Default,,0000,0000,0000,,just now- I showed you just now there's
Dialogue: 0,0:41:13.12,0:41:15.12,Default,,0000,0000,0000,,one target variable which only has two
Dialogue: 0,0:41:15.12,0:41:17.44,Default,,0000,0000,0000,,possible values, either zero or one, okay?
Dialogue: 0,0:41:17.44,0:41:20.08,Default,,0000,0000,0000,,Zero or one means failure or no failure,
Dialogue: 0,0:41:20.08,0:41:23.08,Default,,0000,0000,0000,,so that will be this column here, right?
Dialogue: 0,0:41:23.08,0:41:24.88,Default,,0000,0000,0000,,So let me go all the way back up to here.
Dialogue: 0,0:41:24.88,0:41:26.64,Default,,0000,0000,0000,,So this column here, we already saw it
Dialogue: 0,0:41:26.64,0:41:29.44,Default,,0000,0000,0000,,only has two possible values, it's either zero or
Dialogue: 0,0:41:29.44,0:41:32.68,Default,,0000,0000,0000,,one. And then we also have this column
Dialogue: 0,0:41:32.68,0:41:35.04,Default,,0000,0000,0000,,here, and this column here is basically
Dialogue: 0,0:41:35.04,0:41:38.08,Default,,0000,0000,0000,,the failure type. And so the- we have- as I
Dialogue: 0,0:41:38.08,0:41:40.80,Default,,0000,0000,0000,,already demonstrated just now, we do have
Dialogue: 0,0:41:40.80,0:41:43.44,Default,,0000,0000,0000,,several categories of types of
Dialogue: 0,0:41:43.44,0:41:45.56,Default,,0000,0000,0000,,failure, and so here we call this
Dialogue: 0,0:41:45.56,0:41:46.24,Default,,0000,0000,0000,,multiclass
Dialogue: 0,0:41:46.24,0:41:50.00,Default,,0000,0000,0000,,classification. So we can either build a
Dialogue: 0,0:41:50.00,0:41:51.84,Default,,0000,0000,0000,,binary classification model for this
Dialogue: 0,0:41:51.84,0:41:53.52,Default,,0000,0000,0000,,problem domain, or we can build a
Dialogue: 0,0:41:53.52,0:41:54.49,Default,,0000,0000,0000,,multiclass
Dialogue: 0,0:41:54.49,0:41:58.12,Default,,0000,0000,0000,,classification problem, all right. So this
Dialogue: 0,0:41:58.12,0:41:59.84,Default,,0000,0000,0000,,Jupyter notebook is going to demonstrate
Dialogue: 0,0:41:59.84,0:42:02.32,Default,,0000,0000,0000,,both approaches to us. So first step, we
Dialogue: 0,0:42:02.32,0:42:04.80,Default,,0000,0000,0000,,are going to write all this Python code
Dialogue: 0,0:42:04.80,0:42:06.88,Default,,0000,0000,0000,,that's going to import all the libraries
Dialogue: 0,0:42:06.88,0:42:09.08,Default,,0000,0000,0000,,that we need to use, okay? So this is
Dialogue: 0,0:42:09.08,0:42:12.32,Default,,0000,0000,0000,,basically Python code, okay, and it's
Dialogue: 0,0:42:12.32,0:42:15.12,Default,,0000,0000,0000,,importing the relevant machine learn-
Dialogue: 0,0:42:15.12,0:42:17.96,Default,,0000,0000,0000,,oops. We are importing the relevant
Dialogue: 0,0:42:17.96,0:42:20.60,Default,,0000,0000,0000,,machine learning libraries related to
Dialogue: 0,0:42:20.60,0:42:23.52,Default,,0000,0000,0000,,our domain use case, okay? Then we load in
Dialogue: 0,0:42:23.52,0:42:26.44,Default,,0000,0000,0000,,our dataset, okay, so this our dataset.
Dialogue: 0,0:42:26.44,0:42:28.32,Default,,0000,0000,0000,,We describe it, we have some quick
Dialogue: 0,0:42:28.32,0:42:30.92,Default,,0000,0000,0000,,insights into the dataset. And then
Dialogue: 0,0:42:30.92,0:42:32.84,Default,,0000,0000,0000,,we just take a look at all the variables
Dialogue: 0,0:42:32.84,0:42:36.00,Default,,0000,0000,0000,,of the feature variables, etc, and so on.
Dialogue: 0,0:42:36.00,0:42:38.00,Default,,0000,0000,0000,,What we're doing now is just
Dialogue: 0,0:42:38.00,0:42:39.80,Default,,0000,0000,0000,,doing a quick overview of the dataset,
Dialogue: 0,0:42:39.80,0:42:41.56,Default,,0000,0000,0000,,so this all this Python code here that
Dialogue: 0,0:42:41.56,0:42:43.76,Default,,0000,0000,0000,,we're writing is allowing us, the data
Dialogue: 0,0:42:43.76,0:42:45.36,Default,,0000,0000,0000,,scientist, to get a quick overview of our
Dialogue: 0,0:42:45.36,0:42:48.21,Default,,0000,0000,0000,,dataset, right, okay, like how many varia-
Dialogue: 0,0:42:48.21,0:42:50.24,Default,,0000,0000,0000,,how many rows are there, how many columns
Dialogue: 0,0:42:50.24,0:42:51.76,Default,,0000,0000,0000,,are there, what are the data types of the
Dialogue: 0,0:42:51.76,0:42:53.44,Default,,0000,0000,0000,,columns, what are the name of the columns,
Dialogue: 0,0:42:53.44,0:42:57.36,Default,,0000,0000,0000,,etc, etc. Okay, then we zoom in on to the
Dialogue: 0,0:42:57.36,0:42:58.84,Default,,0000,0000,0000,,target variables. So we look at the
Dialogue: 0,0:42:58.84,0:43:02.00,Default,,0000,0000,0000,,target variables, how many counts
Dialogue: 0,0:43:02.00,0:43:04.52,Default,,0000,0000,0000,,there are of this target variable, and
Dialogue: 0,0:43:04.52,0:43:06.44,Default,,0000,0000,0000,,so on. How many different types of
Dialogue: 0,0:43:06.44,0:43:08.24,Default,,0000,0000,0000,,failures there are. Then you want to
Dialogue: 0,0:43:08.24,0:43:09.00,Default,,0000,0000,0000,,check whether there are any
Dialogue: 0,0:43:09.00,0:43:10.76,Default,,0000,0000,0000,,inconsistencies between the target and
Dialogue: 0,0:43:10.76,0:43:13.56,Default,,0000,0000,0000,,the failure type, etc. Okay, so when you do
Dialogue: 0,0:43:13.56,0:43:15.12,Default,,0000,0000,0000,,all this checking, you're going to
Dialogue: 0,0:43:15.12,0:43:16.96,Default,,0000,0000,0000,,discover there are some discrepancies in
Dialogue: 0,0:43:16.96,0:43:20.28,Default,,0000,0000,0000,,your dataset, so using a specific Python
Dialogue: 0,0:43:20.28,0:43:21.84,Default,,0000,0000,0000,,code to do checking, you're going to say
Dialogue: 0,0:43:21.84,0:43:23.48,Default,,0000,0000,0000,,hey, you know what? There's some errors
Dialogue: 0,0:43:23.48,0:43:25.00,Default,,0000,0000,0000,,here, right? There are nine values that
Dialogue: 0,0:43:25.00,0:43:26.60,Default,,0000,0000,0000,,classify as failure in target variable,
Dialogue: 0,0:43:26.60,0:43:28.20,Default,,0000,0000,0000,,but as no failure in the failure type
Dialogue: 0,0:43:28.20,0:43:29.72,Default,,0000,0000,0000,,variable, so that means there's a
Dialogue: 0,0:43:29.72,0:43:33.20,Default,,0000,0000,0000,,discrepancy in your data point, right?
Dialogue: 0,0:43:33.20,0:43:34.76,Default,,0000,0000,0000,,So these are all the ones that
Dialogue: 0,0:43:34.76,0:43:36.36,Default,,0000,0000,0000,,are discrepancies because the target
Dialogue: 0,0:43:36.36,0:43:39.00,Default,,0000,0000,0000,,variable says one, and we already know
Dialogue: 0,0:43:39.00,0:43:41.24,Default,,0000,0000,0000,,that target variable one is supposed to
Dialogue: 0,0:43:41.24,0:43:43.10,Default,,0000,0000,0000,,mean there is a failure, right? Target
Dialogue: 0,0:43:43.10,0:43:44.88,Default,,0000,0000,0000,,variable one is supposed to mean there is
Dialogue: 0,0:43:44.88,0:43:47.12,Default,,0000,0000,0000,,a failure, so we are kind of expecting to
Dialogue: 0,0:43:47.12,0:43:49.68,Default,,0000,0000,0000,,see the failure classification, but some
Dialogue: 0,0:43:49.68,0:43:51.40,Default,,0000,0000,0000,,rows actually say there's no failure
Dialogue: 0,0:43:51.40,0:43:53.80,Default,,0000,0000,0000,,although the target type is one. Well here
Dialogue: 0,0:43:53.80,0:43:55.92,Default,,0000,0000,0000,,is a classic example of an error that
Dialogue: 0,0:43:55.92,0:43:58.64,Default,,0000,0000,0000,,can very well occur in a dataset, so now
Dialogue: 0,0:43:58.64,0:44:00.56,Default,,0000,0000,0000,,the question is what do you do with
Dialogue: 0,0:44:00.56,0:44:04.72,Default,,0000,0000,0000,,these errors in your dataset, right? So
Dialogue: 0,0:44:04.72,0:44:06.24,Default,,0000,0000,0000,,here the data scientist says, I think it
Dialogue: 0,0:44:06.24,0:44:07.52,Default,,0000,0000,0000,,would make sense to remove those
Dialogue: 0,0:44:07.52,0:44:09.92,Default,,0000,0000,0000,,instances, and so they write some code
Dialogue: 0,0:44:09.92,0:44:12.68,Default,,0000,0000,0000,,then to remove those instances or those
Dialogue: 0,0:44:12.68,0:44:14.92,Default,,0000,0000,0000,,rows or data points from the overall
Dialogue: 0,0:44:14.92,0:44:17.28,Default,,0000,0000,0000,,data set, and same thing we can, again,
Dialogue: 0,0:44:17.28,0:44:19.24,Default,,0000,0000,0000,,check for other issues. So we find there's
Dialogue: 0,0:44:19.24,0:44:21.16,Default,,0000,0000,0000,,another issue here with our data set which
Dialogue: 0,0:44:21.16,0:44:24.08,Default,,0000,0000,0000,,is another warning, so, again, we can
Dialogue: 0,0:44:24.08,0:44:26.24,Default,,0000,0000,0000,,possibly remove them. So you're going to
Dialogue: 0,0:44:26.24,0:44:31.28,Default,,0000,0000,0000,,remove 27 instances or rows from your
Dialogue: 0,0:44:31.28,0:44:34.44,Default,,0000,0000,0000,,overall data set. So your data set has
Dialogue: 0,0:44:34.44,0:44:37.08,Default,,0000,0000,0000,,10,000 rows or data points. You're
Dialogue: 0,0:44:37.08,0:44:40.16,Default,,0000,0000,0000,,removing 27 which is only 0.27 of the
Dialogue: 0,0:44:40.16,0:44:42.24,Default,,0000,0000,0000,,entire dataset. And these were the
Dialogue: 0,0:44:42.24,0:44:45.72,Default,,0000,0000,0000,,reasons why you removed them, okay? So if
Dialogue: 0,0:44:45.72,0:44:48.16,Default,,0000,0000,0000,,you're just removing 0.27% of the
Dialogue: 0,0:44:48.16,0:44:50.80,Default,,0000,0000,0000,,entire dataset, no big deal, right? Still
Dialogue: 0,0:44:50.80,0:44:53.08,Default,,0000,0000,0000,,okay, but you needed to remove them
Dialogue: 0,0:44:53.08,0:44:55.00,Default,,0000,0000,0000,,because these errors right, these
Dialogue: 0,0:44:55.00,0:44:58.04,Default,,0000,0000,0000,,27
Dialogue: 0,0:44:58.04,0:45:00.56,Default,,0000,0000,0000,,errors, okay, data points with errors in
Dialogue: 0,0:45:00.56,0:45:02.96,Default,,0000,0000,0000,,your dataset could really affect the
Dialogue: 0,0:45:02.96,0:45:05.00,Default,,0000,0000,0000,,training of your machine learning model.
Dialogue: 0,0:45:05.00,0:45:08.64,Default,,0000,0000,0000,,So we need to do your data cleansing,
Dialogue: 0,0:45:08.64,0:45:11.72,Default,,0000,0000,0000,,right? So we are actually cleansing now
Dialogue: 0,0:45:11.72,0:45:15.20,Default,,0000,0000,0000,,some kind of data that is
Dialogue: 0,0:45:15.20,0:45:17.52,Default,,0000,0000,0000,,incorrect or erroneous in your original
Dialogue: 0,0:45:17.52,0:45:21.44,Default,,0000,0000,0000,,dataset. Okay, so then we go on to the
Dialogue: 0,0:45:21.44,0:45:23.84,Default,,0000,0000,0000,,next part which is called EDA, right? So
Dialogue: 0,0:45:23.84,0:45:28.88,Default,,0000,0000,0000,,EDA is where we kind of explore our data,
Dialogue: 0,0:45:28.88,0:45:31.72,Default,,0000,0000,0000,,and we want to, kind of, get a visual
Dialogue: 0,0:45:31.72,0:45:34.24,Default,,0000,0000,0000,,overview of our data as a whole, and also
Dialogue: 0,0:45:34.24,0:45:35.88,Default,,0000,0000,0000,,take a look at the statistical
Dialogue: 0,0:45:35.88,0:45:38.16,Default,,0000,0000,0000,,properties of our data. The statistical
Dialogue: 0,0:45:38.16,0:45:40.48,Default,,0000,0000,0000,,distribution of the data in all the
Dialogue: 0,0:45:40.48,0:45:43.08,Default,,0000,0000,0000,,various columns, the correlation between
Dialogue: 0,0:45:43.08,0:45:44.64,Default,,0000,0000,0000,,the variables, between the feature
Dialogue: 0,0:45:44.64,0:45:46.68,Default,,0000,0000,0000,,variables different columns, and also the
Dialogue: 0,0:45:46.68,0:45:48.60,Default,,0000,0000,0000,,feature variable and the target variable.
Dialogue: 0,0:45:48.60,0:45:52.04,Default,,0000,0000,0000,,So all of this is called EDA, and EDA in
Dialogue: 0,0:45:52.04,0:45:54.08,Default,,0000,0000,0000,,a machine learning workflow is typically
Dialogue: 0,0:45:54.08,0:45:57.16,Default,,0000,0000,0000,,done through visualization,
Dialogue: 0,0:45:57.16,0:45:58.84,Default,,0000,0000,0000,,all right? So let's go back here and take
Dialogue: 0,0:45:58.84,0:46:00.60,Default,,0000,0000,0000,,a look, right? So, for example, here we are
Dialogue: 0,0:46:00.60,0:46:03.40,Default,,0000,0000,0000,,looking at correlation, so we plot the
Dialogue: 0,0:46:03.40,0:46:05.68,Default,,0000,0000,0000,,values of all the various feature
Dialogue: 0,0:46:05.68,0:46:07.60,Default,,0000,0000,0000,,variables against each other and look
Dialogue: 0,0:46:07.60,0:46:10.80,Default,,0000,0000,0000,,for potential correlations and patterns
Dialogue: 0,0:46:10.80,0:46:13.36,Default,,0000,0000,0000,,and so on. And all the different shapes
Dialogue: 0,0:46:13.36,0:46:17.28,Default,,0000,0000,0000,,that you see here in this pair plot, okay,
Dialogue: 0,0:46:17.28,0:46:18.40,Default,,0000,0000,0000,,will have different meaning,
Dialogue: 0,0:46:18.40,0:46:20.00,Default,,0000,0000,0000,,statistical meaning, and so the data
Dialogue: 0,0:46:20.00,0:46:21.80,Default,,0000,0000,0000,,scientist has to, kind of, visually
Dialogue: 0,0:46:21.80,0:46:23.76,Default,,0000,0000,0000,,inspect this pair plot, make some
Dialogue: 0,0:46:23.76,0:46:25.56,Default,,0000,0000,0000,,interpretations of these different
Dialogue: 0,0:46:25.56,0:46:27.68,Default,,0000,0000,0000,,patterns that he sees here, all right. So
Dialogue: 0,0:46:27.68,0:46:30.48,Default,,0000,0000,0000,,these are some of the insights that
Dialogue: 0,0:46:30.48,0:46:32.84,Default,,0000,0000,0000,,can be deduced from looking at these
Dialogue: 0,0:46:32.84,0:46:34.32,Default,,0000,0000,0000,,patterns, so, for example, the torque and
Dialogue: 0,0:46:34.32,0:46:36.28,Default,,0000,0000,0000,,rotational speed are highly correlated,
Dialogue: 0,0:46:36.28,0:46:38.04,Default,,0000,0000,0000,,the process temperature and air
Dialogue: 0,0:46:38.04,0:46:39.92,Default,,0000,0000,0000,,temperature also highly correlated, that
Dialogue: 0,0:46:39.92,0:46:41.56,Default,,0000,0000,0000,,failures occur for extreme values of
Dialogue: 0,0:46:41.56,0:46:44.52,Default,,0000,0000,0000,,some features, etc, etc. Then you can plot
Dialogue: 0,0:46:44.52,0:46:45.96,Default,,0000,0000,0000,,certain kinds of charts. This called a
Dialogue: 0,0:46:45.96,0:46:48.48,Default,,0000,0000,0000,,violin chart to, again, get new insights.
Dialogue: 0,0:46:48.48,0:46:49.84,Default,,0000,0000,0000,,For example, regarding the torque and
Dialogue: 0,0:46:49.84,0:46:51.48,Default,,0000,0000,0000,,rotational speed, it can see, again, that
Dialogue: 0,0:46:51.48,0:46:53.12,Default,,0000,0000,0000,,most failures are triggered for much
Dialogue: 0,0:46:53.12,0:46:55.12,Default,,0000,0000,0000,,lower or much higher values than the
Dialogue: 0,0:46:55.12,0:46:57.40,Default,,0000,0000,0000,,mean when they're not failing. So all
Dialogue: 0,0:46:57.40,0:47:00.72,Default,,0000,0000,0000,,these visualizations, they are there, and
Dialogue: 0,0:47:00.72,0:47:02.48,Default,,0000,0000,0000,,a trained data scientist can look at
Dialogue: 0,0:47:02.48,0:47:05.08,Default,,0000,0000,0000,,them, inspect them, and make some kind of
Dialogue: 0,0:47:05.08,0:47:08.40,Default,,0000,0000,0000,,insightful deductions from them, okay?
Dialogue: 0,0:47:08.40,0:47:11.08,Default,,0000,0000,0000,,Percentage of failure, right? The
Dialogue: 0,0:47:11.08,0:47:13.64,Default,,0000,0000,0000,,correlation heat map, okay, between all
Dialogue: 0,0:47:13.64,0:47:15.56,Default,,0000,0000,0000,,these different feature variables, and
Dialogue: 0,0:47:15.56,0:47:16.43,Default,,0000,0000,0000,,also the target
Dialogue: 0,0:47:16.43,0:47:19.60,Default,,0000,0000,0000,,variable, okay? The product types,
Dialogue: 0,0:47:19.60,0:47:21.08,Default,,0000,0000,0000,,percentage of product types, percentage
Dialogue: 0,0:47:21.08,0:47:23.16,Default,,0000,0000,0000,,of failure with respect to the product
Dialogue: 0,0:47:23.16,0:47:25.72,Default,,0000,0000,0000,,type, so we can also kind of visualize
Dialogue: 0,0:47:25.72,0:47:27.80,Default,,0000,0000,0000,,that as well. So certain products have a
Dialogue: 0,0:47:27.80,0:47:29.84,Default,,0000,0000,0000,,higher ratio of failure compared to other
Dialogue: 0,0:47:29.84,0:47:33.24,Default,,0000,0000,0000,,product types, etc. Or, for example, M
Dialogue: 0,0:47:33.24,0:47:35.80,Default,,0000,0000,0000,,tends to fail more than H products, etc,
Dialogue: 0,0:47:35.80,0:47:38.88,Default,,0000,0000,0000,,etc. So we can create a vast variety of
Dialogue: 0,0:47:38.88,0:47:41.32,Default,,0000,0000,0000,,visualizations in the EDA stage, so you
Dialogue: 0,0:47:41.32,0:47:43.96,Default,,0000,0000,0000,,can see here. And, again, the idea of this
Dialogue: 0,0:47:43.96,0:47:46.36,Default,,0000,0000,0000,,visualization is just to give us some
Dialogue: 0,0:47:46.36,0:47:49.68,Default,,0000,0000,0000,,insight, some preliminary insight into
Dialogue: 0,0:47:49.68,0:47:52.52,Default,,0000,0000,0000,,our dataset that helps us to model it
Dialogue: 0,0:47:52.52,0:47:54.12,Default,,0000,0000,0000,,more correctly. So some more insights
Dialogue: 0,0:47:54.12,0:47:56.20,Default,,0000,0000,0000,,that we get into our data set from all
Dialogue: 0,0:47:56.20,0:47:57.60,Default,,0000,0000,0000,,this visualization.
Dialogue: 0,0:47:57.60,0:47:59.56,Default,,0000,0000,0000,,Then we can plot the distribution so we
Dialogue: 0,0:47:59.56,0:48:00.72,Default,,0000,0000,0000,,can see whether it's a normal
Dialogue: 0,0:48:00.72,0:48:02.79,Default,,0000,0000,0000,,distribution or some other kind of
Dialogue: 0,0:48:02.79,0:48:05.64,Default,,0000,0000,0000,,distribution. We can have a box plot
Dialogue: 0,0:48:05.64,0:48:07.76,Default,,0000,0000,0000,,to see whether there are any outliers in
Dialogue: 0,0:48:07.76,0:48:10.40,Default,,0000,0000,0000,,your data set and so on, right? So we can
Dialogue: 0,0:48:10.40,0:48:11.64,Default,,0000,0000,0000,,see from the box plots, we can see
Dialogue: 0,0:48:11.64,0:48:14.60,Default,,0000,0000,0000,,rotational speed and have outliers. So we
Dialogue: 0,0:48:14.60,0:48:16.88,Default,,0000,0000,0000,,already saw outliers are basically a
Dialogue: 0,0:48:16.88,0:48:18.80,Default,,0000,0000,0000,,problem that you may need to kind of
Dialogue: 0,0:48:18.80,0:48:22.52,Default,,0000,0000,0000,,tackle, right? So outliers are an issue,
Dialogue: 0,0:48:22.52,0:48:24.80,Default,,0000,0000,0000,,it's a part of data cleansing. And
Dialogue: 0,0:48:24.80,0:48:26.96,Default,,0000,0000,0000,,so you may need to tackle this, so we may
Dialogue: 0,0:48:26.96,0:48:28.88,Default,,0000,0000,0000,,have to check okay, well where are the
Dialogue: 0,0:48:28.88,0:48:31.32,Default,,0000,0000,0000,,potential outliers so we can analyze
Dialogue: 0,0:48:31.32,0:48:35.32,Default,,0000,0000,0000,,them from the box plot, okay? But then
Dialogue: 0,0:48:35.32,0:48:37.08,Default,,0000,0000,0000,,we can say well they are outliers, but
Dialogue: 0,0:48:37.08,0:48:38.80,Default,,0000,0000,0000,,maybe they're not really horrible
Dialogue: 0,0:48:38.80,0:48:40.76,Default,,0000,0000,0000,,outliers so we can tolerate them or
Dialogue: 0,0:48:40.76,0:48:42.88,Default,,0000,0000,0000,,maybe we want to remove them. So we can
Dialogue: 0,0:48:42.88,0:48:44.92,Default,,0000,0000,0000,,see what our mean and maximum values for
Dialogue: 0,0:48:44.92,0:48:46.72,Default,,0000,0000,0000,,all these with respect to product type,
Dialogue: 0,0:48:46.72,0:48:49.68,Default,,0000,0000,0000,,how many of them are above or highly
Dialogue: 0,0:48:49.68,0:48:51.44,Default,,0000,0000,0000,,correlated with the product type in
Dialogue: 0,0:48:51.44,0:48:54.24,Default,,0000,0000,0000,,terms of the maximum and minimum, okay,
Dialogue: 0,0:48:54.24,0:48:56.96,Default,,0000,0000,0000,,and then so on. So the insight is well we
Dialogue: 0,0:48:56.96,0:48:59.60,Default,,0000,0000,0000,,got 4.8% of the instances are outliers,
Dialogue: 0,0:48:59.60,0:49:02.56,Default,,0000,0000,0000,,so maybe 4.87% is not really that much,
Dialogue: 0,0:49:02.56,0:49:04.92,Default,,0000,0000,0000,,the outliers are not horrible, so we just
Dialogue: 0,0:49:04.92,0:49:06.96,Default,,0000,0000,0000,,leave them in the dataset. Now for a
Dialogue: 0,0:49:06.96,0:49:08.52,Default,,0000,0000,0000,,different dataset, the data scientist
Dialogue: 0,0:49:08.52,0:49:10.28,Default,,0000,0000,0000,,could come to a different conclusion, so
Dialogue: 0,0:49:10.28,0:49:12.28,Default,,0000,0000,0000,,then they would do whatever they've
Dialogue: 0,0:49:12.28,0:49:15.40,Default,,0000,0000,0000,,deemed is appropriate to, kind of, cleanse
Dialogue: 0,0:49:15.40,0:49:18.08,Default,,0000,0000,0000,,the dataset. Okay, so now that we have
Dialogue: 0,0:49:18.08,0:49:20.00,Default,,0000,0000,0000,,done all the EDA, the next thing we're
Dialogue: 0,0:49:20.00,0:49:23.16,Default,,0000,0000,0000,,going to do is we are going to do what
Dialogue: 0,0:49:23.16,0:49:26.20,Default,,0000,0000,0000,,is called feature engineering. So we are
Dialogue: 0,0:49:26.20,0:49:28.76,Default,,0000,0000,0000,,going to transform our original feature
Dialogue: 0,0:49:28.76,0:49:31.28,Default,,0000,0000,0000,,variables and these are our original
Dialogue: 0,0:49:31.28,0:49:32.96,Default,,0000,0000,0000,,feature variables, right? These are our
Dialogue: 0,0:49:32.96,0:49:35.04,Default,,0000,0000,0000,,original feature variables, and we are
Dialogue: 0,0:49:35.04,0:49:37.76,Default,,0000,0000,0000,,going to transform them, all right? We're
Dialogue: 0,0:49:37.76,0:49:40.32,Default,,0000,0000,0000,,going to transform them in some sense
Dialogue: 0,0:49:40.32,0:49:43.76,Default,,0000,0000,0000,,into some other form before we fit this
Dialogue: 0,0:49:43.76,0:49:45.64,Default,,0000,0000,0000,,for training into our machine learning
Dialogue: 0,0:49:45.64,0:49:48.60,Default,,0000,0000,0000,,algorithm, all right? So these are
Dialogue: 0,0:49:48.60,0:49:51.60,Default,,0000,0000,0000,,examples of- let's say these are examples of a
Dialogue: 0,0:49:51.60,0:49:55.20,Default,,0000,0000,0000,,original data set, right? And this is
Dialogue: 0,0:49:55.20,0:49:56.84,Default,,0000,0000,0000,,examples, these are some of the examples,
Dialogue: 0,0:49:56.84,0:49:58.04,Default,,0000,0000,0000,,you don't have to use all of them, but
Dialogue: 0,0:49:58.04,0:49:59.44,Default,,0000,0000,0000,,these are some of the examples of what we
Dialogue: 0,0:49:59.44,0:50:00.84,Default,,0000,0000,0000,,call feature engineering which you can
Dialogue: 0,0:50:00.84,0:50:03.56,Default,,0000,0000,0000,,then transform your original values in
Dialogue: 0,0:50:03.56,0:50:05.28,Default,,0000,0000,0000,,your feature variables to all these
Dialogue: 0,0:50:05.28,0:50:07.92,Default,,0000,0000,0000,,transform values here. So we're going to
Dialogue: 0,0:50:07.92,0:50:09.68,Default,,0000,0000,0000,,pretty much do that here, so we have a
Dialogue: 0,0:50:09.68,0:50:12.60,Default,,0000,0000,0000,,ordinal encoding, we do scaling of the
Dialogue: 0,0:50:12.60,0:50:14.84,Default,,0000,0000,0000,,data so the dataset is scaled, we use a
Dialogue: 0,0:50:14.84,0:50:18.24,Default,,0000,0000,0000,,MinMax scaling, and then finally, we come
Dialogue: 0,0:50:18.24,0:50:21.72,Default,,0000,0000,0000,,to do a modeling. So we have to split our
Dialogue: 0,0:50:21.72,0:50:24.36,Default,,0000,0000,0000,,dataset into a training dataset and a
Dialogue: 0,0:50:24.36,0:50:28.64,Default,,0000,0000,0000,,test dataset. So coming back to here again,
Dialogue: 0,0:50:28.64,0:50:32.16,Default,,0000,0000,0000,,we said that before you train your
Dialogue: 0,0:50:32.16,0:50:33.80,Default,,0000,0000,0000,,model, sorry, before you train your model,
Dialogue: 0,0:50:33.80,0:50:35.60,Default,,0000,0000,0000,,you have to take your original dataset,
Dialogue: 0,0:50:35.60,0:50:37.32,Default,,0000,0000,0000,,now this is a featured engineered dataset.
Dialogue: 0,0:50:37.32,0:50:38.84,Default,,0000,0000,0000,,We're going to break it into two or
Dialogue: 0,0:50:38.84,0:50:40.84,Default,,0000,0000,0000,,more subsets, okay. So one is called the
Dialogue: 0,0:50:40.84,0:50:42.40,Default,,0000,0000,0000,,training dataset that we use to feed
Dialogue: 0,0:50:42.40,0:50:44.00,Default,,0000,0000,0000,,and train a machine learning model. The
Dialogue: 0,0:50:44.00,0:50:45.92,Default,,0000,0000,0000,,second is test dataset to evaluate the
Dialogue: 0,0:50:45.92,0:50:47.96,Default,,0000,0000,0000,,accuracy of the model, okay? So we got
Dialogue: 0,0:50:47.96,0:50:50.94,Default,,0000,0000,0000,,this training dataset, your test dataset,
Dialogue: 0,0:50:50.94,0:50:52.72,Default,,0000,0000,0000,,and we also need
Dialogue: 0,0:50:52.72,0:50:56.16,Default,,0000,0000,0000,,to sample. So from our original data set
Dialogue: 0,0:50:56.16,0:50:57.40,Default,,0000,0000,0000,,we need to sample some points
Dialogue: 0,0:50:57.40,0:50:58.84,Default,,0000,0000,0000,,that go into your training dataset, some
Dialogue: 0,0:50:58.84,0:51:00.56,Default,,0000,0000,0000,,points that go in your test dataset. So
Dialogue: 0,0:51:00.56,0:51:02.72,Default,,0000,0000,0000,,there are many ways to do sampling. One
Dialogue: 0,0:51:02.72,0:51:04.92,Default,,0000,0000,0000,,way is to do stratified sampling where
Dialogue: 0,0:51:04.92,0:51:06.72,Default,,0000,0000,0000,,we ensure the same proportion of data
Dialogue: 0,0:51:06.72,0:51:09.00,Default,,0000,0000,0000,,from each stata or class because right
Dialogue: 0,0:51:09.00,0:51:10.96,Default,,0000,0000,0000,,now we have a multiclass classification
Dialogue: 0,0:51:10.96,0:51:12.32,Default,,0000,0000,0000,,problem, so you want to make sure the
Dialogue: 0,0:51:12.32,0:51:13.96,Default,,0000,0000,0000,,same proportion of data from each strata or
Dialogue: 0,0:51:13.96,0:51:15.84,Default,,0000,0000,0000,,class is equally proportional in the
Dialogue: 0,0:51:15.84,0:51:17.92,Default,,0000,0000,0000,,training and test dataset as the
Dialogue: 0,0:51:17.92,0:51:20.12,Default,,0000,0000,0000,,original dataset which is very useful
Dialogue: 0,0:51:20.12,0:51:21.64,Default,,0000,0000,0000,,for dealing with what is called an
Dialogue: 0,0:51:21.64,0:51:24.32,Default,,0000,0000,0000,,imbalanced dataset. So here we have an
Dialogue: 0,0:51:24.32,0:51:25.84,Default,,0000,0000,0000,,example of what is called an imbalanced
Dialogue: 0,0:51:25.84,0:51:29.52,Default,,0000,0000,0000,,dataset in the sense that you have the
Dialogue: 0,0:51:29.52,0:51:32.76,Default,,0000,0000,0000,,vast majority of data points in your
Dialogue: 0,0:51:32.76,0:51:34.96,Default,,0000,0000,0000,,data set, they are going to have the
Dialogue: 0,0:51:34.96,0:51:37.48,Default,,0000,0000,0000,,value of zero for their target variable
Dialogue: 0,0:51:37.48,0:51:40.20,Default,,0000,0000,0000,,column. So only a extremely small
Dialogue: 0,0:51:40.20,0:51:43.44,Default,,0000,0000,0000,,minority of the data points in your dataset
Dialogue: 0,0:51:43.44,0:51:45.32,Default,,0000,0000,0000,,will actually have the value of one
Dialogue: 0,0:51:45.32,0:51:48.72,Default,,0000,0000,0000,,for their target variable column, okay? So
Dialogue: 0,0:51:48.72,0:51:51.04,Default,,0000,0000,0000,,a situation where you have your class or
Dialogue: 0,0:51:51.04,0:51:52.52,Default,,0000,0000,0000,,your target variable column where the
Dialogue: 0,0:51:52.52,0:51:54.48,Default,,0000,0000,0000,,vast majority of values are from one
Dialogue: 0,0:51:54.48,0:51:58.12,Default,,0000,0000,0000,,class and a tiny small minority are from
Dialogue: 0,0:51:58.12,0:52:00.52,Default,,0000,0000,0000,,another class, we call this an imbalanced
Dialogue: 0,0:52:00.52,0:52:02.72,Default,,0000,0000,0000,,dataset. And for an imbalanced dataset,
Dialogue: 0,0:52:02.72,0:52:04.32,Default,,0000,0000,0000,,typically we will have a specific
Dialogue: 0,0:52:04.32,0:52:05.92,Default,,0000,0000,0000,,technique to do the train test split
Dialogue: 0,0:52:05.92,0:52:08.12,Default,,0000,0000,0000,,which is called stratified sampling, and
Dialogue: 0,0:52:08.12,0:52:09.60,Default,,0000,0000,0000,,so that's what's exactly happening here.
Dialogue: 0,0:52:09.60,0:52:12.00,Default,,0000,0000,0000,,We're doing a stratified split here, so
Dialogue: 0,0:52:12.00,0:52:14.84,Default,,0000,0000,0000,,we are doing a train test split here,
Dialogue: 0,0:52:14.84,0:52:17.52,Default,,0000,0000,0000,,and we are doing a stratified split.
Dialogue: 0,0:52:17.52,0:52:20.36,Default,,0000,0000,0000,,And then now we actually develop the
Dialogue: 0,0:52:20.36,0:52:23.36,Default,,0000,0000,0000,,models. So now we've got the train test
Dialogue: 0,0:52:23.36,0:52:25.48,Default,,0000,0000,0000,,split, now here is where we actually
Dialogue: 0,0:52:25.48,0:52:27.08,Default,,0000,0000,0000,,train the models.
Dialogue: 0,0:52:27.08,0:52:29.92,Default,,0000,0000,0000,,Now in terms of classification there are
Dialogue: 0,0:52:29.92,0:52:31.30,Default,,0000,0000,0000,,a whole bunch of
Dialogue: 0,0:52:31.30,0:52:35.40,Default,,0000,0000,0000,,possibilities, right, that you can use.
Dialogue: 0,0:52:35.40,0:52:38.48,Default,,0000,0000,0000,,There are many, many different algorithms
Dialogue: 0,0:52:38.48,0:52:41.00,Default,,0000,0000,0000,,that we can use to create a
Dialogue: 0,0:52:41.00,0:52:42.84,Default,,0000,0000,0000,,classification model. So these are an
Dialogue: 0,0:52:42.84,0:52:45.08,Default,,0000,0000,0000,,example of some of the more common ones.
Dialogue: 0,0:52:45.08,0:52:47.48,Default,,0000,0000,0000,,Logistic, support vector machine, decision
Dialogue: 0,0:52:47.48,0:52:49.52,Default,,0000,0000,0000,,trees, random forest, bagging, balanced
Dialogue: 0,0:52:49.52,0:52:52.72,Default,,0000,0000,0000,,bagging, boost, ensemble. So all
Dialogue: 0,0:52:52.72,0:52:55.04,Default,,0000,0000,0000,,these are different algorithms which
Dialogue: 0,0:52:55.04,0:52:57.76,Default,,0000,0000,0000,,will create different kinds of models
Dialogue: 0,0:52:57.76,0:53:01.60,Default,,0000,0000,0000,,which will result in different accuracy
Dialogue: 0,0:53:01.60,0:53:05.40,Default,,0000,0000,0000,,measures, okay? So it's the goal of the
Dialogue: 0,0:53:05.40,0:53:08.92,Default,,0000,0000,0000,,data scientist to find the best model
Dialogue: 0,0:53:08.92,0:53:11.52,Default,,0000,0000,0000,,that gives the best accuracy for the
Dialogue: 0,0:53:11.52,0:53:14.12,Default,,0000,0000,0000,,given dataset, for training on that
Dialogue: 0,0:53:14.12,0:53:16.88,Default,,0000,0000,0000,,given dataset. So let's head back, again,
Dialogue: 0,0:53:16.88,0:53:19.76,Default,,0000,0000,0000,,to our machine learning workflow. So
Dialogue: 0,0:53:19.76,0:53:21.52,Default,,0000,0000,0000,,here basically what I'm doing is I'm
Dialogue: 0,0:53:21.52,0:53:23.52,Default,,0000,0000,0000,,creating a whole bunch of models here,
Dialogue: 0,0:53:23.52,0:53:25.52,Default,,0000,0000,0000,,all right? So one is a random forest, one
Dialogue: 0,0:53:25.52,0:53:27.16,Default,,0000,0000,0000,,is balanced bagging, one is a boost
Dialogue: 0,0:53:27.16,0:53:29.52,Default,,0000,0000,0000,,classifier, one's a ensemble classifier,
Dialogue: 0,0:53:29.52,0:53:32.76,Default,,0000,0000,0000,,and using all of these, I am going to
Dialogue: 0,0:53:32.76,0:53:35.32,Default,,0000,0000,0000,,basically feed or train my model using
Dialogue: 0,0:53:35.32,0:53:37.44,Default,,0000,0000,0000,,all these algorithms. And then I'm going
Dialogue: 0,0:53:37.44,0:53:39.80,Default,,0000,0000,0000,,to evaluate them, okay? I'm going to
Dialogue: 0,0:53:39.80,0:53:42.48,Default,,0000,0000,0000,,evaluate how good each of these models
Dialogue: 0,0:53:42.48,0:53:45.76,Default,,0000,0000,0000,,are. And here you can see your
Dialogue: 0,0:53:45.76,0:53:48.84,Default,,0000,0000,0000,,evaluation data, right? Okay and this is
Dialogue: 0,0:53:48.84,0:53:50.84,Default,,0000,0000,0000,,the confusion matrix which is another
Dialogue: 0,0:53:50.84,0:53:54.28,Default,,0000,0000,0000,,way of evaluating. So now we come to the,
Dialogue: 0,0:53:54.28,0:53:56.32,Default,,0000,0000,0000,,kind of, the key part here which
Dialogue: 0,0:53:56.32,0:53:58.52,Default,,0000,0000,0000,,is how do I distinguish between
Dialogue: 0,0:53:58.52,0:54:00.08,Default,,0000,0000,0000,,all these models, right? I've got all
Dialogue: 0,0:54:00.08,0:54:01.40,Default,,0000,0000,0000,,these different models which are built
Dialogue: 0,0:54:01.40,0:54:03.04,Default,,0000,0000,0000,,with different algorithms which I'm
Dialogue: 0,0:54:03.04,0:54:05.36,Default,,0000,0000,0000,,using to train on the same dataset, how
Dialogue: 0,0:54:05.36,0:54:07.36,Default,,0000,0000,0000,,do I distinguish between all these
Dialogue: 0,0:54:07.36,0:54:10.36,Default,,0000,0000,0000,,models, okay? And so for that sense, for
Dialogue: 0,0:54:10.36,0:54:13.88,Default,,0000,0000,0000,,that we actually have a whole bunch of
Dialogue: 0,0:54:13.88,0:54:16.20,Default,,0000,0000,0000,,common evaluation metrics for
Dialogue: 0,0:54:16.20,0:54:18.32,Default,,0000,0000,0000,,classification, right? So this evaluation
Dialogue: 0,0:54:18.32,0:54:22.24,Default,,0000,0000,0000,,metrics tell us how good a model is in
Dialogue: 0,0:54:22.24,0:54:24.32,Default,,0000,0000,0000,,terms of its accuracy in
Dialogue: 0,0:54:24.32,0:54:27.00,Default,,0000,0000,0000,,classification. So in terms of
Dialogue: 0,0:54:27.00,0:54:29.44,Default,,0000,0000,0000,,accuracy, we actually have many different
Dialogue: 0,0:54:29.44,0:54:31.68,Default,,0000,0000,0000,,models, sorry, many different measures,
Dialogue: 0,0:54:31.68,0:54:33.44,Default,,0000,0000,0000,,right? You might think well, accuracy is
Dialogue: 0,0:54:33.44,0:54:35.40,Default,,0000,0000,0000,,just accuracy, well that's all right, it's
Dialogue: 0,0:54:35.40,0:54:36.88,Default,,0000,0000,0000,,just either it's accurate or it's not
Dialogue: 0,0:54:36.88,0:54:39.32,Default,,0000,0000,0000,,accurate, right? But actually it's not
Dialogue: 0,0:54:39.32,0:54:41.36,Default,,0000,0000,0000,,that simple. There are many different
Dialogue: 0,0:54:41.36,0:54:43.84,Default,,0000,0000,0000,,ways to measure the accuracy of a
Dialogue: 0,0:54:43.84,0:54:45.48,Default,,0000,0000,0000,,classification model, and these are some
Dialogue: 0,0:54:45.48,0:54:48.28,Default,,0000,0000,0000,,of the more common ones. So, for example,
Dialogue: 0,0:54:48.28,0:54:51.00,Default,,0000,0000,0000,,the confusion matrix tells us how many
Dialogue: 0,0:54:51.00,0:54:54.00,Default,,0000,0000,0000,,true positives, that means the value is
Dialogue: 0,0:54:54.00,0:54:55.88,Default,,0000,0000,0000,,positive, the prediction is positive, how
Dialogue: 0,0:54:55.88,0:54:57.52,Default,,0000,0000,0000,,many false positives which means the
Dialogue: 0,0:54:57.52,0:54:59.04,Default,,0000,0000,0000,,value is negative the machine learning
Dialogue: 0,0:54:59.04,0:55:01.84,Default,,0000,0000,0000,,model predicts positive. How many false
Dialogue: 0,0:55:01.84,0:55:03.84,Default,,0000,0000,0000,,negatives which means that the machine
Dialogue: 0,0:55:03.84,0:55:05.56,Default,,0000,0000,0000,,learning model predicts negative, but
Dialogue: 0,0:55:05.56,0:55:07.48,Default,,0000,0000,0000,,it's actually positive. And how many true
Dialogue: 0,0:55:07.48,0:55:09.36,Default,,0000,0000,0000,,negatives there are which means that the
Dialogue: 0,0:55:09.36,0:55:11.24,Default,,0000,0000,0000,,the machine learning model
Dialogue: 0,0:55:11.24,0:55:12.88,Default,,0000,0000,0000,,predicts negative and the true value is
Dialogue: 0,0:55:12.88,0:55:14.76,Default,,0000,0000,0000,,also negative. So this is called a
Dialogue: 0,0:55:14.76,0:55:16.92,Default,,0000,0000,0000,,confusion matrix. This is one way we
Dialogue: 0,0:55:16.92,0:55:19.48,Default,,0000,0000,0000,,assess or evaluate the performance of a
Dialogue: 0,0:55:19.48,0:55:20.52,Default,,0000,0000,0000,,classification model,
Dialogue: 0,0:55:20.52,0:55:23.32,Default,,0000,0000,0000,,okay? This is for binary
Dialogue: 0,0:55:23.32,0:55:24.68,Default,,0000,0000,0000,,classification, we can also have
Dialogue: 0,0:55:24.68,0:55:26.88,Default,,0000,0000,0000,,multiclass confusion matrix,
Dialogue: 0,0:55:26.88,0:55:29.00,Default,,0000,0000,0000,,and then we can also measure things like
Dialogue: 0,0:55:29.00,0:55:31.72,Default,,0000,0000,0000,,accuracy. So accuracy is the true
Dialogue: 0,0:55:31.72,0:55:34.08,Default,,0000,0000,0000,,positives plus the true negatives which
Dialogue: 0,0:55:34.08,0:55:35.44,Default,,0000,0000,0000,,is the total number of correct
Dialogue: 0,0:55:35.44,0:55:37.84,Default,,0000,0000,0000,,predictions made by the model divided by
Dialogue: 0,0:55:37.84,0:55:39.84,Default,,0000,0000,0000,,the total number of data points in your
Dialogue: 0,0:55:39.84,0:55:42.60,Default,,0000,0000,0000,,dataset. And then you have also other
Dialogue: 0,0:55:42.60,0:55:43.15,Default,,0000,0000,0000,,kinds of
Dialogue: 0,0:55:43.15,0:55:46.60,Default,,0000,0000,0000,,measures such as recall. And this a
Dialogue: 0,0:55:46.60,0:55:49.16,Default,,0000,0000,0000,,formula for recall, this is a formula for
Dialogue: 0,0:55:49.16,0:55:51.48,Default,,0000,0000,0000,,the F1 score, okay? And then there's
Dialogue: 0,0:55:51.48,0:55:55.56,Default,,0000,0000,0000,,something called the ROC curve, right? So
Dialogue: 0,0:55:55.56,0:55:57.04,Default,,0000,0000,0000,,without going too much in the detail of
Dialogue: 0,0:55:57.04,0:55:59.00,Default,,0000,0000,0000,,what each of these entails, essentially
Dialogue: 0,0:55:59.00,0:56:00.64,Default,,0000,0000,0000,,these are all different ways, these are
Dialogue: 0,0:56:00.64,0:56:03.28,Default,,0000,0000,0000,,different KPI, right? Just like if you
Dialogue: 0,0:56:03.28,0:56:06.12,Default,,0000,0000,0000,,work in a company, you have different KPI,
Dialogue: 0,0:56:06.12,0:56:08.08,Default,,0000,0000,0000,,right? Certain employees have certain KPI
Dialogue: 0,0:56:08.08,0:56:11.28,Default,,0000,0000,0000,,that measures how good or how, you
Dialogue: 0,0:56:11.28,0:56:13.20,Default,,0000,0000,0000,,know, efficient or how effective a
Dialogue: 0,0:56:13.20,0:56:15.50,Default,,0000,0000,0000,,particular employee is, right? So the
Dialogue: 0,0:56:15.50,0:56:19.88,Default,,0000,0000,0000,,KPI for your machine learning models
Dialogue: 0,0:56:19.88,0:56:24.24,Default,,0000,0000,0000,,are ROC curve, F1 score, recall, accuracy,
Dialogue: 0,0:56:24.24,0:56:26.60,Default,,0000,0000,0000,,okay, and your confusion matrix. So
Dialogue: 0,0:56:26.60,0:56:29.84,Default,,0000,0000,0000,,fundamentally after I have built, right,
Dialogue: 0,0:56:29.84,0:56:33.36,Default,,0000,0000,0000,,so here I've built my four different
Dialogue: 0,0:56:33.36,0:56:35.24,Default,,0000,0000,0000,,models. So after I built these four
Dialogue: 0,0:56:35.24,0:56:37.64,Default,,0000,0000,0000,,different models, I'm going to check and
Dialogue: 0,0:56:37.64,0:56:39.68,Default,,0000,0000,0000,,evaluate them using all those different
Dialogue: 0,0:56:39.68,0:56:42.44,Default,,0000,0000,0000,,metrics like, for example, the F1 score,
Dialogue: 0,0:56:42.44,0:56:44.84,Default,,0000,0000,0000,,the precision score, the recall score, all
Dialogue: 0,0:56:44.84,0:56:47.32,Default,,0000,0000,0000,,right. So for this model, I can check out
Dialogue: 0,0:56:47.32,0:56:50.04,Default,,0000,0000,0000,,the ROC score, the F1 score, the precision
Dialogue: 0,0:56:50.04,0:56:52.12,Default,,0000,0000,0000,,score, the recall score. Then for this
Dialogue: 0,0:56:52.12,0:56:54.80,Default,,0000,0000,0000,,model, this is the ROC score, the F1 score,
Dialogue: 0,0:56:54.80,0:56:56.84,Default,,0000,0000,0000,,the precision score, the recall score.
Dialogue: 0,0:56:56.84,0:56:59.68,Default,,0000,0000,0000,,Then for this model and so on. So for
Dialogue: 0,0:56:59.68,0:57:03.24,Default,,0000,0000,0000,,every single model I've created using my
Dialogue: 0,0:57:03.24,0:57:05.84,Default,,0000,0000,0000,,training data set, I will have all my set
Dialogue: 0,0:57:05.84,0:57:08.00,Default,,0000,0000,0000,,of evaluation metrics that I can use to
Dialogue: 0,0:57:08.00,0:57:11.84,Default,,0000,0000,0000,,evaluate how good this model is, okay?
Dialogue: 0,0:57:11.84,0:57:13.12,Default,,0000,0000,0000,,Same thing here, I've got a confusion
Dialogue: 0,0:57:13.12,0:57:15.08,Default,,0000,0000,0000,,matrix here, right, so I can use that,
Dialogue: 0,0:57:15.08,0:57:18.12,Default,,0000,0000,0000,,again, to evaluate between all these four
Dialogue: 0,0:57:18.12,0:57:20.20,Default,,0000,0000,0000,,different models, and then I, kind of,
Dialogue: 0,0:57:20.20,0:57:22.24,Default,,0000,0000,0000,,summarize it up here. So we can see from
Dialogue: 0,0:57:22.24,0:57:25.44,Default,,0000,0000,0000,,this summary here that actually the top
Dialogue: 0,0:57:25.44,0:57:27.60,Default,,0000,0000,0000,,two models, right, which are I'm going to
Dialogue: 0,0:57:27.60,0:57:29.44,Default,,0000,0000,0000,,give a lot, as a data scientist, I'm now
Dialogue: 0,0:57:29.44,0:57:31.12,Default,,0000,0000,0000,,going to just focus on these two models.
Dialogue: 0,0:57:31.12,0:57:33.44,Default,,0000,0000,0000,,So these two models are bagging
Dialogue: 0,0:57:33.44,0:57:36.00,Default,,0000,0000,0000,,classifier and random forest classifier.
Dialogue: 0,0:57:36.00,0:57:38.48,Default,,0000,0000,0000,,They have the highest values of F1 score,
Dialogue: 0,0:57:38.48,0:57:40.48,Default,,0000,0000,0000,,and the highest values of the ROC curve
Dialogue: 0,0:57:40.48,0:57:42.64,Default,,0000,0000,0000,,score, okay? So we can say these are the
Dialogue: 0,0:57:42.64,0:57:45.84,Default,,0000,0000,0000,,top two models in terms of accuracy, okay,
Dialogue: 0,0:57:45.84,0:57:48.92,Default,,0000,0000,0000,,using the F1 evaluation metric and the
Dialogue: 0,0:57:48.92,0:57:53.72,Default,,0000,0000,0000,,ROC AUC evaluation metric, okay? So these
Dialogue: 0,0:57:53.72,0:57:57.48,Default,,0000,0000,0000,,results, kind of, summarize here, and
Dialogue: 0,0:57:57.48,0:57:59.08,Default,,0000,0000,0000,,then we use different sampling
Dialogue: 0,0:57:59.08,0:58:00.88,Default,,0000,0000,0000,,techniques, okay, so just now I talked
Dialogue: 0,0:58:00.88,0:58:03.68,Default,,0000,0000,0000,,about um different kinds of sampling
Dialogue: 0,0:58:03.68,0:58:06.40,Default,,0000,0000,0000,,techniques and so the idea of different
Dialogue: 0,0:58:06.40,0:58:08.32,Default,,0000,0000,0000,,kinds of sampling techniques is to just
Dialogue: 0,0:58:08.32,0:58:11.32,Default,,0000,0000,0000,,get a different feel for different
Dialogue: 0,0:58:11.32,0:58:13.72,Default,,0000,0000,0000,,distributions of the data in different
Dialogue: 0,0:58:13.72,0:58:16.36,Default,,0000,0000,0000,,areas of your data set so that you want
Dialogue: 0,0:58:16.36,0:58:20.00,Default,,0000,0000,0000,,to just kind of make sure that your your
Dialogue: 0,0:58:20.00,0:58:22.80,Default,,0000,0000,0000,,your evaluation of accuracy is actually
Dialogue: 0,0:58:22.80,0:58:27.08,Default,,0000,0000,0000,,statistically correct right so we can um
Dialogue: 0,0:58:27.08,0:58:29.60,Default,,0000,0000,0000,,do what is called oversampling and under
Dialogue: 0,0:58:29.60,0:58:30.88,Default,,0000,0000,0000,,sampling which is very useful when
Dialogue: 0,0:58:30.88,0:58:32.28,Default,,0000,0000,0000,,you're working with an imbalance data
Dialogue: 0,0:58:32.28,0:58:35.04,Default,,0000,0000,0000,,set so this is example of doing that and
Dialogue: 0,0:58:35.04,0:58:37.24,Default,,0000,0000,0000,,then here we again again check out the
Dialogue: 0,0:58:37.24,0:58:38.80,Default,,0000,0000,0000,,results for all these different
Dialogue: 0,0:58:38.80,0:58:41.68,Default,,0000,0000,0000,,techniques we use uh the F1 score the Au
Dialogue: 0,0:58:41.68,0:58:43.60,Default,,0000,0000,0000,,score all right these are the two key
Dialogue: 0,0:58:43.60,0:58:46.76,Default,,0000,0000,0000,,measures of accuracy right so and then
Dialogue: 0,0:58:46.76,0:58:47.92,Default,,0000,0000,0000,,we can check out the scores for the
Dialogue: 0,0:58:47.92,0:58:50.48,Default,,0000,0000,0000,,different approaches okay so we can see
Dialogue: 0,0:58:50.48,0:58:53.12,Default,,0000,0000,0000,,oh well overall the models have lower Au
Dialogue: 0,0:58:53.12,0:58:55.72,Default,,0000,0000,0000,,r r Au C score but they have a much
Dialogue: 0,0:58:55.72,0:58:58.28,Default,,0000,0000,0000,,higher F1 score the begging classifier
Dialogue: 0,0:58:58.28,0:59:00.84,Default,,0000,0000,0000,,had the highest R1 highest roc1 score
Dialogue: 0,0:59:00.84,0:59:04.12,Default,,0000,0000,0000,,but F1 score was too low okay then in
Dialogue: 0,0:59:04.12,0:59:06.52,Default,,0000,0000,0000,,the data scientist opinion the random
Dialogue: 0,0:59:06.52,0:59:08.52,Default,,0000,0000,0000,,forest with this particular technique of
Dialogue: 0,0:59:08.52,0:59:10.76,Default,,0000,0000,0000,,sampling has equilibrium between the F1
Dialogue: 0,0:59:10.76,0:59:14.48,Default,,0000,0000,0000,,R F1 R and A score so the takeaway one
Dialogue: 0,0:59:14.48,0:59:16.68,Default,,0000,0000,0000,,is the macro F1 score improves
Dialogue: 0,0:59:16.68,0:59:18.48,Default,,0000,0000,0000,,dramatically using the sampl sampling
Dialogue: 0,0:59:18.48,0:59:20.16,Default,,0000,0000,0000,,techniqu so these models might be better
Dialogue: 0,0:59:20.16,0:59:22.44,Default,,0000,0000,0000,,compared to the balanced ones all right
Dialogue: 0,0:59:22.44,0:59:26.28,Default,,0000,0000,0000,,so based on all this uh evaluation the
Dialogue: 0,0:59:26.28,0:59:27.68,Default,,0000,0000,0000,,data scientist says they're going to
Dialogue: 0,0:59:27.68,0:59:29.92,Default,,0000,0000,0000,,continue to work with these two models
Dialogue: 0,0:59:29.92,0:59:31.44,Default,,0000,0000,0000,,all right and the balance begging one
Dialogue: 0,0:59:31.44,0:59:33.08,Default,,0000,0000,0000,,and then continue to make further
Dialogue: 0,0:59:33.08,0:59:35.04,Default,,0000,0000,0000,,comparisons all right so then we
Dialogue: 0,0:59:35.04,0:59:37.08,Default,,0000,0000,0000,,continue to keep refining on our
Dialogue: 0,0:59:37.08,0:59:38.60,Default,,0000,0000,0000,,evaluation work here we're going to
Dialogue: 0,0:59:38.60,0:59:41.00,Default,,0000,0000,0000,,train the models one more time again so
Dialogue: 0,0:59:41.00,0:59:43.04,Default,,0000,0000,0000,,we again do a training test plate and
Dialogue: 0,0:59:43.04,0:59:44.80,Default,,0000,0000,0000,,then we do that for this particular uh
Dialogue: 0,0:59:44.80,0:59:47.04,Default,,0000,0000,0000,,approach model and then we print out we
Dialogue: 0,0:59:47.04,0:59:48.20,Default,,0000,0000,0000,,print out what is called a
Dialogue: 0,0:59:48.20,0:59:50.96,Default,,0000,0000,0000,,classification report and this is
Dialogue: 0,0:59:50.96,0:59:53.40,Default,,0000,0000,0000,,basically a summary of all those metrics
Dialogue: 0,0:59:53.40,0:59:55.36,Default,,0000,0000,0000,,that I talk about just now so just now
Dialogue: 0,0:59:55.36,0:59:57.52,Default,,0000,0000,0000,,remember I said the the there was
Dialogue: 0,0:59:57.52,0:59:59.68,Default,,0000,0000,0000,,several evaluation metrics right so uh
Dialogue: 0,0:59:59.68,1:00:01.48,Default,,0000,0000,0000,,we had the confusion matrics the
Dialogue: 0,1:00:01.48,1:00:04.12,Default,,0000,0000,0000,,accuracy the Precision the recall the Au
Dialogue: 0,1:00:04.12,1:00:08.12,Default,,0000,0000,0000,,ccore so here with the um classification
Dialogue: 0,1:00:08.12,1:00:09.88,Default,,0000,0000,0000,,report I can get a summary of all of
Dialogue: 0,1:00:09.88,1:00:11.76,Default,,0000,0000,0000,,that so I can see all the values here
Dialogue: 0,1:00:11.76,1:00:14.64,Default,,0000,0000,0000,,okay for this particular model begging
Dialogue: 0,1:00:14.64,1:00:17.16,Default,,0000,0000,0000,,Tomac links and then I can do that for
Dialogue: 0,1:00:17.16,1:00:18.64,Default,,0000,0000,0000,,another model the random Forest
Dialogue: 0,1:00:18.64,1:00:20.60,Default,,0000,0000,0000,,borderline SME and then I can do that
Dialogue: 0,1:00:20.60,1:00:22.20,Default,,0000,0000,0000,,for another model which is the balance
Dialogue: 0,1:00:22.20,1:00:25.16,Default,,0000,0000,0000,,ping so again we see this a lot of
Dialogue: 0,1:00:25.16,1:00:27.08,Default,,0000,0000,0000,,comparison between different models
Dialogue: 0,1:00:27.08,1:00:28.64,Default,,0000,0000,0000,,trying to figure out what all these
Dialogue: 0,1:00:28.64,1:00:30.72,Default,,0000,0000,0000,,evaluation metrics are telling us all
Dialogue: 0,1:00:30.72,1:00:32.96,Default,,0000,0000,0000,,right then again we have a confusion
Dialogue: 0,1:00:32.96,1:00:35.88,Default,,0000,0000,0000,,Matrix so we generate a confusion Matrix
Dialogue: 0,1:00:35.88,1:00:38.88,Default,,0000,0000,0000,,for the bagging with the toac links
Dialogue: 0,1:00:38.88,1:00:40.72,Default,,0000,0000,0000,,under sampling for the random followers
Dialogue: 0,1:00:40.72,1:00:42.68,Default,,0000,0000,0000,,with the borderline mod over sampling
Dialogue: 0,1:00:42.68,1:00:44.96,Default,,0000,0000,0000,,and just balance begging by itself then
Dialogue: 0,1:00:44.96,1:00:47.72,Default,,0000,0000,0000,,again we compare between these three uh
Dialogue: 0,1:00:47.72,1:00:50.80,Default,,0000,0000,0000,,models uh using the confusion Matrix
Dialogue: 0,1:00:50.80,1:00:52.60,Default,,0000,0000,0000,,evaluation Matrix and then we can kind
Dialogue: 0,1:00:52.60,1:00:55.68,Default,,0000,0000,0000,,of come to some conclusions all right so
Dialogue: 0,1:00:55.68,1:00:58.16,Default,,0000,0000,0000,,right so now we look at all the data
Dialogue: 0,1:00:58.16,1:01:01.20,Default,,0000,0000,0000,,then we move on and look at another um
Dialogue: 0,1:01:01.20,1:01:03.16,Default,,0000,0000,0000,,another kind of evaluation metrix which
Dialogue: 0,1:01:03.16,1:01:06.72,Default,,0000,0000,0000,,is the r score right so this is one of
Dialogue: 0,1:01:06.72,1:01:08.68,Default,,0000,0000,0000,,the other evaluation metrics I talk
Dialogue: 0,1:01:08.68,1:01:11.20,Default,,0000,0000,0000,,about so this one is a kind of a curve
Dialogue: 0,1:01:11.20,1:01:12.52,Default,,0000,0000,0000,,you look at it to see the area
Dialogue: 0,1:01:12.52,1:01:14.36,Default,,0000,0000,0000,,underneath the curve this is called AOC
Dialogue: 0,1:01:14.36,1:01:18.08,Default,,0000,0000,0000,,R area under the curve sorry Au Au R
Dialogue: 0,1:01:18.08,1:01:19.88,Default,,0000,0000,0000,,area under the curve all right so the
Dialogue: 0,1:01:19.88,1:01:21.84,Default,,0000,0000,0000,,area under the curve uh
Dialogue: 0,1:01:21.84,1:01:24.32,Default,,0000,0000,0000,,score will give us some idea about the
Dialogue: 0,1:01:24.32,1:01:25.60,Default,,0000,0000,0000,,threshold that we're going to use for
Dialogue: 0,1:01:25.60,1:01:27.68,Default,,0000,0000,0000,,classif ification so we can examine this
Dialogue: 0,1:01:27.68,1:01:29.20,Default,,0000,0000,0000,,for the bagging classifier for the
Dialogue: 0,1:01:29.20,1:01:30.96,Default,,0000,0000,0000,,random forest classifier for the balance
Dialogue: 0,1:01:30.96,1:01:33.60,Default,,0000,0000,0000,,bagging classifier okay then we can also
Dialogue: 0,1:01:33.60,1:01:36.20,Default,,0000,0000,0000,,again do that uh finally we can check
Dialogue: 0,1:01:36.20,1:01:37.88,Default,,0000,0000,0000,,the classification report of this
Dialogue: 0,1:01:37.88,1:01:39.68,Default,,0000,0000,0000,,particular model so we keep doing this
Dialogue: 0,1:01:39.68,1:01:43.20,Default,,0000,0000,0000,,over and over again evaluating this m
Dialogue: 0,1:01:43.20,1:01:45.72,Default,,0000,0000,0000,,The Matrix the the accuracy Matrix the
Dialogue: 0,1:01:45.72,1:01:46.88,Default,,0000,0000,0000,,evaluation Matrix for all these
Dialogue: 0,1:01:46.88,1:01:48.88,Default,,0000,0000,0000,,different models so we keep doing this
Dialogue: 0,1:01:48.88,1:01:50.52,Default,,0000,0000,0000,,over and over again for different
Dialogue: 0,1:01:50.52,1:01:53.44,Default,,0000,0000,0000,,thresholds or for classification and so
Dialogue: 0,1:01:53.44,1:01:56.88,Default,,0000,0000,0000,,as we keep drilling into these we kind
Dialogue: 0,1:01:56.88,1:02:00.84,Default,,0000,0000,0000,,of get more and more understanding of
Dialogue: 0,1:02:00.84,1:02:02.80,Default,,0000,0000,0000,,all these different models which one is
Dialogue: 0,1:02:02.80,1:02:04.76,Default,,0000,0000,0000,,the best one that gives the best
Dialogue: 0,1:02:04.76,1:02:08.52,Default,,0000,0000,0000,,performance for our data set okay so
Dialogue: 0,1:02:08.52,1:02:11.44,Default,,0000,0000,0000,,finally we come to this conclusion this
Dialogue: 0,1:02:11.44,1:02:13.52,Default,,0000,0000,0000,,particular model is not able to reduce
Dialogue: 0,1:02:13.52,1:02:15.28,Default,,0000,0000,0000,,the record on failure test than
Dialogue: 0,1:02:15.28,1:02:17.52,Default,,0000,0000,0000,,95.8% on the other hand balance begging
Dialogue: 0,1:02:17.52,1:02:19.40,Default,,0000,0000,0000,,with a decision thresold of 0.6 is able
Dialogue: 0,1:02:19.40,1:02:21.52,Default,,0000,0000,0000,,to have a better recall blah blah blah
Dialogue: 0,1:02:21.52,1:02:25.32,Default,,0000,0000,0000,,Etc so finally after having done all of
Dialogue: 0,1:02:25.32,1:02:27.48,Default,,0000,0000,0000,,this evalu ations
Dialogue: 0,1:02:27.48,1:02:31.12,Default,,0000,0000,0000,,okay this is the conclusion
Dialogue: 0,1:02:31.12,1:02:33.96,Default,,0000,0000,0000,,so after having gone so right now we
Dialogue: 0,1:02:33.96,1:02:35.28,Default,,0000,0000,0000,,have gone through all the steps of the
Dialogue: 0,1:02:35.28,1:02:37.76,Default,,0000,0000,0000,,Machining learning life cycle and which
Dialogue: 0,1:02:37.76,1:02:40.24,Default,,0000,0000,0000,,means we have right now or the data
Dialogue: 0,1:02:40.24,1:02:41.96,Default,,0000,0000,0000,,scientist right now has gone through all
Dialogue: 0,1:02:41.96,1:02:43.00,Default,,0000,0000,0000,,these
Dialogue: 0,1:02:43.00,1:02:47.08,Default,,0000,0000,0000,,steps uh which is now we have done this
Dialogue: 0,1:02:47.08,1:02:48.64,Default,,0000,0000,0000,,validation so we have done the cleaning
Dialogue: 0,1:02:48.64,1:02:50.56,Default,,0000,0000,0000,,exploration preparation transformation
Dialogue: 0,1:02:50.56,1:02:52.60,Default,,0000,0000,0000,,the future engineering we have developed
Dialogue: 0,1:02:52.60,1:02:54.36,Default,,0000,0000,0000,,and trained multiple models we have
Dialogue: 0,1:02:54.36,1:02:56.48,Default,,0000,0000,0000,,evaluated all these different models so
Dialogue: 0,1:02:56.48,1:02:58.60,Default,,0000,0000,0000,,right now we have reached this stage so
Dialogue: 0,1:02:58.60,1:03:02.72,Default,,0000,0000,0000,,at this stage we as the data scientist
Dialogue: 0,1:03:02.72,1:03:05.48,Default,,0000,0000,0000,,kind of have completed our job so we've
Dialogue: 0,1:03:05.48,1:03:08.12,Default,,0000,0000,0000,,come to some very useful conclusions
Dialogue: 0,1:03:08.12,1:03:09.64,Default,,0000,0000,0000,,which we now can share with our
Dialogue: 0,1:03:09.64,1:03:13.24,Default,,0000,0000,0000,,colleagues all right and based on this
Dialogue: 0,1:03:13.24,1:03:15.40,Default,,0000,0000,0000,,uh conclusions or recommendations
Dialogue: 0,1:03:15.40,1:03:17.16,Default,,0000,0000,0000,,somebody is going to choose a
Dialogue: 0,1:03:17.16,1:03:19.16,Default,,0000,0000,0000,,appropriate model and that model is
Dialogue: 0,1:03:19.16,1:03:22.64,Default,,0000,0000,0000,,going to get deployed for realtime use
Dialogue: 0,1:03:22.64,1:03:25.32,Default,,0000,0000,0000,,in a real life production environment
Dialogue: 0,1:03:25.32,1:03:27.24,Default,,0000,0000,0000,,okay and that decision is going to be
Dialogue: 0,1:03:27.24,1:03:29.36,Default,,0000,0000,0000,,made based on the recommendations coming
Dialogue: 0,1:03:29.36,1:03:30.88,Default,,0000,0000,0000,,from the data scientist at the end of
Dialogue: 0,1:03:30.88,1:03:33.48,Default,,0000,0000,0000,,this phase okay so at the end of this
Dialogue: 0,1:03:33.48,1:03:35.08,Default,,0000,0000,0000,,phase the data scientist is going to
Dialogue: 0,1:03:35.08,1:03:36.88,Default,,0000,0000,0000,,come up with these conclusions so
Dialogue: 0,1:03:36.88,1:03:41.76,Default,,0000,0000,0000,,conclusions is okay if the engineering
Dialogue: 0,1:03:41.76,1:03:44.52,Default,,0000,0000,0000,,team they are looking okay the
Dialogue: 0,1:03:44.52,1:03:46.12,Default,,0000,0000,0000,,engineering team right the engineering
Dialogue: 0,1:03:46.12,1:03:48.72,Default,,0000,0000,0000,,team if they are looking for the highest
Dialogue: 0,1:03:48.72,1:03:51.84,Default,,0000,0000,0000,,failure detection rate possible then
Dialogue: 0,1:03:51.84,1:03:54.48,Default,,0000,0000,0000,,they should go with this particular
Dialogue: 0,1:03:54.48,1:03:56.52,Default,,0000,0000,0000,,model okay
Dialogue: 0,1:03:56.52,1:03:58.68,Default,,0000,0000,0000,,and if they want a balance between
Dialogue: 0,1:03:58.68,1:04:01.04,Default,,0000,0000,0000,,precision and recall then they should
Dialogue: 0,1:04:01.04,1:04:03.24,Default,,0000,0000,0000,,choose between the begging model with a
Dialogue: 0,1:04:03.24,1:04:05.96,Default,,0000,0000,0000,,0.4 decision threshold or the random
Dialogue: 0,1:04:05.96,1:04:09.60,Default,,0000,0000,0000,,forest model with a 0.5 threshold but if
Dialogue: 0,1:04:09.60,1:04:11.88,Default,,0000,0000,0000,,they don't care so much about predicting
Dialogue: 0,1:04:11.88,1:04:14.48,Default,,0000,0000,0000,,every failure and they want the highest
Dialogue: 0,1:04:14.48,1:04:16.76,Default,,0000,0000,0000,,Precision possible then they should opt
Dialogue: 0,1:04:16.76,1:04:19.80,Default,,0000,0000,0000,,for the begging toax link classifier
Dialogue: 0,1:04:19.80,1:04:23.16,Default,,0000,0000,0000,,with a bit higher decision threshold and
Dialogue: 0,1:04:23.16,1:04:26.16,Default,,0000,0000,0000,,so this is the key thing that the data
Dialogue: 0,1:04:26.16,1:04:28.32,Default,,0000,0000,0000,,scientist is going to give right this is
Dialogue: 0,1:04:28.32,1:04:30.76,Default,,0000,0000,0000,,the key takeaway this is the kind of the
Dialogue: 0,1:04:30.76,1:04:32.68,Default,,0000,0000,0000,,end result of the entire machine
Dialogue: 0,1:04:32.68,1:04:34.68,Default,,0000,0000,0000,,learning life cycle right now the data
Dialogue: 0,1:04:34.68,1:04:36.40,Default,,0000,0000,0000,,scientist is going to tell the
Dialogue: 0,1:04:36.40,1:04:38.60,Default,,0000,0000,0000,,engineering team all right you guys
Dialogue: 0,1:04:38.60,1:04:41.16,Default,,0000,0000,0000,,which is more important for you point a
Dialogue: 0,1:04:41.16,1:04:45.04,Default,,0000,0000,0000,,point B or Point C make your decision so
Dialogue: 0,1:04:45.04,1:04:47.40,Default,,0000,0000,0000,,the engineering team will then discuss
Dialogue: 0,1:04:47.40,1:04:48.96,Default,,0000,0000,0000,,among themselves and say hey you know
Dialogue: 0,1:04:48.96,1:04:52.28,Default,,0000,0000,0000,,what what we want is we want to get the
Dialogue: 0,1:04:52.28,1:04:54.72,Default,,0000,0000,0000,,highest failure detection possible
Dialogue: 0,1:04:54.72,1:04:58.36,Default,,0000,0000,0000,,because any kind kind of failure of that
Dialogue: 0,1:04:58.36,1:05:00.40,Default,,0000,0000,0000,,machine or the product on the samply
Dialogue: 0,1:05:00.40,1:05:03.12,Default,,0000,0000,0000,,line is really going to screw us up big
Dialogue: 0,1:05:03.12,1:05:05.64,Default,,0000,0000,0000,,time so what we're looking for is the
Dialogue: 0,1:05:05.64,1:05:08.08,Default,,0000,0000,0000,,model that will give us the highest
Dialogue: 0,1:05:08.08,1:05:10.88,Default,,0000,0000,0000,,failure detection rate we don't care
Dialogue: 0,1:05:10.88,1:05:13.48,Default,,0000,0000,0000,,about Precision but we want to be make
Dialogue: 0,1:05:13.48,1:05:15.44,Default,,0000,0000,0000,,sure that if there's a failure we are
Dialogue: 0,1:05:15.44,1:05:17.72,Default,,0000,0000,0000,,going to catch it right so that's what
Dialogue: 0,1:05:17.72,1:05:19.60,Default,,0000,0000,0000,,they want and so the data scientist will
Dialogue: 0,1:05:19.60,1:05:22.20,Default,,0000,0000,0000,,say Hey you go for the balance begging
Dialogue: 0,1:05:22.20,1:05:24.88,Default,,0000,0000,0000,,model okay then the data scientist saves
Dialogue: 0,1:05:24.88,1:05:27.72,Default,,0000,0000,0000,,this all right uh and then once you have
Dialogue: 0,1:05:27.72,1:05:30.00,Default,,0000,0000,0000,,saved this uh you can then go right
Dialogue: 0,1:05:30.00,1:05:32.32,Default,,0000,0000,0000,,ahead and deploy that so you can go
Dialogue: 0,1:05:32.32,1:05:33.52,Default,,0000,0000,0000,,right ahead and deploy that to
Dialogue: 0,1:05:33.52,1:05:37.16,Default,,0000,0000,0000,,production okay and so if you want to
Dialogue: 0,1:05:37.16,1:05:38.84,Default,,0000,0000,0000,,continue we can actually further
Dialogue: 0,1:05:38.84,1:05:41.12,Default,,0000,0000,0000,,continue this modeling problem so just
Dialogue: 0,1:05:41.12,1:05:43.48,Default,,0000,0000,0000,,now I model this problem as a binary
Dialogue: 0,1:05:43.48,1:05:46.72,Default,,0000,0000,0000,,classification problem uh sorry just I
Dialogue: 0,1:05:46.72,1:05:48.24,Default,,0000,0000,0000,,modeled this problem as a binary
Dialogue: 0,1:05:48.24,1:05:49.52,Default,,0000,0000,0000,,classification which means it's either
Dialogue: 0,1:05:49.52,1:05:51.68,Default,,0000,0000,0000,,zero or one either fail or not fail but
Dialogue: 0,1:05:51.68,1:05:53.60,Default,,0000,0000,0000,,we can also model it as a multiclass
Dialogue: 0,1:05:53.60,1:05:55.64,Default,,0000,0000,0000,,classification problem right because as
Dialogue: 0,1:05:55.64,1:05:57.64,Default,,0000,0000,0000,,as I said earlier just now for the
Dialogue: 0,1:05:57.64,1:06:00.20,Default,,0000,0000,0000,,Target variable colum which is sorry for
Dialogue: 0,1:06:00.20,1:06:02.52,Default,,0000,0000,0000,,the failure type colume you actually
Dialogue: 0,1:06:02.52,1:06:04.84,Default,,0000,0000,0000,,have multiple kinds of failures right
Dialogue: 0,1:06:04.84,1:06:07.56,Default,,0000,0000,0000,,for example you may have a power failure
Dialogue: 0,1:06:07.56,1:06:10.00,Default,,0000,0000,0000,,uh you may have a towar failure uh you
Dialogue: 0,1:06:10.00,1:06:12.92,Default,,0000,0000,0000,,may have a overstrain failure so now we
Dialogue: 0,1:06:12.92,1:06:14.84,Default,,0000,0000,0000,,can model the problem slightly
Dialogue: 0,1:06:14.84,1:06:17.24,Default,,0000,0000,0000,,differently so we can model it as a
Dialogue: 0,1:06:17.24,1:06:19.68,Default,,0000,0000,0000,,multiclass classification problem and
Dialogue: 0,1:06:19.68,1:06:21.16,Default,,0000,0000,0000,,then we go through the entire same
Dialogue: 0,1:06:21.16,1:06:22.68,Default,,0000,0000,0000,,process that we went through just now so
Dialogue: 0,1:06:22.68,1:06:24.88,Default,,0000,0000,0000,,we create different models we test this
Dialogue: 0,1:06:24.88,1:06:26.72,Default,,0000,0000,0000,,out but now the confusion Matrix is for
Dialogue: 0,1:06:26.72,1:06:30.12,Default,,0000,0000,0000,,a multiclass classification isue right
Dialogue: 0,1:06:30.12,1:06:30.96,Default,,0000,0000,0000,,so we're going
Dialogue: 0,1:06:30.96,1:06:34.04,Default,,0000,0000,0000,,to check them out we're going to again
Dialogue: 0,1:06:34.04,1:06:36.08,Default,,0000,0000,0000,,uh try different algorithms or models
Dialogue: 0,1:06:36.08,1:06:38.04,Default,,0000,0000,0000,,again train and test our data set do the
Dialogue: 0,1:06:38.04,1:06:39.76,Default,,0000,0000,0000,,training test split uh on these
Dialogue: 0,1:06:39.76,1:06:42.00,Default,,0000,0000,0000,,different models all right so we have
Dialogue: 0,1:06:42.00,1:06:43.40,Default,,0000,0000,0000,,like for example we have bon random
Dialogue: 0,1:06:43.40,1:06:46.16,Default,,0000,0000,0000,,Forest B random Forest a great search
Dialogue: 0,1:06:46.16,1:06:47.72,Default,,0000,0000,0000,,then you train the models using what is
Dialogue: 0,1:06:47.72,1:06:49.68,Default,,0000,0000,0000,,called hyperparameter tuning then you
Dialogue: 0,1:06:49.68,1:06:51.08,Default,,0000,0000,0000,,get the scores all right so you get the
Dialogue: 0,1:06:51.08,1:06:53.16,Default,,0000,0000,0000,,same evaluation scores again you check
Dialogue: 0,1:06:53.16,1:06:54.60,Default,,0000,0000,0000,,out the evaluation scores compare
Dialogue: 0,1:06:54.60,1:06:57.08,Default,,0000,0000,0000,,between them generate a confusion Matrix
Dialogue: 0,1:06:57.08,1:06:59.96,Default,,0000,0000,0000,,so this is a multiclass confusion Matrix
Dialogue: 0,1:06:59.96,1:07:02.40,Default,,0000,0000,0000,,and then you come to the final
Dialogue: 0,1:07:02.40,1:07:05.76,Default,,0000,0000,0000,,conclusion so now if you are interested
Dialogue: 0,1:07:05.76,1:07:09.00,Default,,0000,0000,0000,,to frame your problem domain as a
Dialogue: 0,1:07:09.00,1:07:11.36,Default,,0000,0000,0000,,multiclass classification problem all
Dialogue: 0,1:07:11.36,1:07:13.84,Default,,0000,0000,0000,,right then these are the recommendations
Dialogue: 0,1:07:13.84,1:07:15.48,Default,,0000,0000,0000,,from the data scientist so the data
Dialogue: 0,1:07:15.48,1:07:17.24,Default,,0000,0000,0000,,scientist will say you know what I'm
Dialogue: 0,1:07:17.24,1:07:19.56,Default,,0000,0000,0000,,going to pick this particular model the
Dialogue: 0,1:07:19.56,1:07:22.04,Default,,0000,0000,0000,,balance backing classifier and these are
Dialogue: 0,1:07:22.04,1:07:24.52,Default,,0000,0000,0000,,all the reasons that the data scientist
Dialogue: 0,1:07:24.52,1:07:27.28,Default,,0000,0000,0000,,is going to give as a rational for
Dialogue: 0,1:07:27.28,1:07:29.40,Default,,0000,0000,0000,,selecting this particular
Dialogue: 0,1:07:29.40,1:07:32.04,Default,,0000,0000,0000,,model and then once that's done you save
Dialogue: 0,1:07:32.04,1:07:35.00,Default,,0000,0000,0000,,the model and that's that's it that's it
Dialogue: 0,1:07:35.00,1:07:38.92,Default,,0000,0000,0000,,so that's all done now and so then the
Dialogue: 0,1:07:38.92,1:07:41.04,Default,,0000,0000,0000,,uh the model the machine learning model
Dialogue: 0,1:07:41.04,1:07:43.72,Default,,0000,0000,0000,,now you can put it live run it on the
Dialogue: 0,1:07:43.72,1:07:45.28,Default,,0000,0000,0000,,server and now the machine learning
Dialogue: 0,1:07:45.28,1:07:47.20,Default,,0000,0000,0000,,model is ready to work which means it's
Dialogue: 0,1:07:47.20,1:07:48.92,Default,,0000,0000,0000,,ready to generate predictions right
Dialogue: 0,1:07:48.92,1:07:50.28,Default,,0000,0000,0000,,that's the main job of the machine
Dialogue: 0,1:07:50.28,1:07:52.04,Default,,0000,0000,0000,,learning model you have picked the best
Dialogue: 0,1:07:52.04,1:07:53.68,Default,,0000,0000,0000,,machine learning model with the best
Dialogue: 0,1:07:53.68,1:07:55.80,Default,,0000,0000,0000,,evaluation metrics for whatever accur
Dialogue: 0,1:07:55.80,1:07:57.76,Default,,0000,0000,0000,,see goal you're trying to achieve and
Dialogue: 0,1:07:57.76,1:07:59.64,Default,,0000,0000,0000,,now you're going to run it on a server
Dialogue: 0,1:07:59.64,1:08:00.80,Default,,0000,0000,0000,,and now you're going to get all this
Dialogue: 0,1:08:00.80,1:08:02.96,Default,,0000,0000,0000,,real time data that's coming from your
Dialogue: 0,1:08:02.96,1:08:04.52,Default,,0000,0000,0000,,sensus you're going to pump that into
Dialogue: 0,1:08:04.52,1:08:06.36,Default,,0000,0000,0000,,your machine learning model your machine
Dialogue: 0,1:08:06.36,1:08:07.88,Default,,0000,0000,0000,,learning model will pump out a whole
Dialogue: 0,1:08:07.88,1:08:09.52,Default,,0000,0000,0000,,bunch of predictions and we're going to
Dialogue: 0,1:08:09.52,1:08:12.80,Default,,0000,0000,0000,,use that predictions in real time to
Dialogue: 0,1:08:12.80,1:08:15.40,Default,,0000,0000,0000,,make real time real world decision
Dialogue: 0,1:08:15.40,1:08:17.56,Default,,0000,0000,0000,,making right you're going to say okay
Dialogue: 0,1:08:17.56,1:08:19.60,Default,,0000,0000,0000,,I'm predicting that that machine is
Dialogue: 0,1:08:19.60,1:08:23.20,Default,,0000,0000,0000,,going to fail on Thursday at 5:00 p.m.
Dialogue: 0,1:08:23.20,1:08:25.52,Default,,0000,0000,0000,,so you better get your service folks in
Dialogue: 0,1:08:25.52,1:08:28.64,Default,,0000,0000,0000,,to service it on Thursday 2: p.m. or you
Dialogue: 0,1:08:28.64,1:08:31.64,Default,,0000,0000,0000,,know whatever so you can you know uh
Dialogue: 0,1:08:31.64,1:08:33.48,Default,,0000,0000,0000,,make decisions on when you want to do
Dialogue: 0,1:08:33.48,1:08:35.32,Default,,0000,0000,0000,,your maintenance you know and and make
Dialogue: 0,1:08:35.32,1:08:37.64,Default,,0000,0000,0000,,the best decisions to optimize the cost
Dialogue: 0,1:08:37.64,1:08:41.16,Default,,0000,0000,0000,,of Maintenance etc etc and then based on
Dialogue: 0,1:08:41.16,1:08:42.12,Default,,0000,0000,0000,,the
Dialogue: 0,1:08:42.12,1:08:45.00,Default,,0000,0000,0000,,results that are coming up from the
Dialogue: 0,1:08:45.00,1:08:46.76,Default,,0000,0000,0000,,predictions so the predictions may be
Dialogue: 0,1:08:46.76,1:08:49.12,Default,,0000,0000,0000,,good the predictions may be lousy the
Dialogue: 0,1:08:49.12,1:08:51.36,Default,,0000,0000,0000,,predictions may be average right so we
Dialogue: 0,1:08:51.36,1:08:53.72,Default,,0000,0000,0000,,are we're constantly monitoring how good
Dialogue: 0,1:08:53.72,1:08:55.44,Default,,0000,0000,0000,,or how useful are the predictions
Dialogue: 0,1:08:55.44,1:08:57.76,Default,,0000,0000,0000,,generated by this realtime model that's
Dialogue: 0,1:08:57.76,1:08:59.88,Default,,0000,0000,0000,,running on the server and based on our
Dialogue: 0,1:08:59.88,1:09:02.68,Default,,0000,0000,0000,,monitoring we will then take some new
Dialogue: 0,1:09:02.68,1:09:05.32,Default,,0000,0000,0000,,data and then repeat this entire life
Dialogue: 0,1:09:05.32,1:09:07.04,Default,,0000,0000,0000,,cycle again so this is basically a
Dialogue: 0,1:09:07.04,1:09:09.24,Default,,0000,0000,0000,,workflow that's iterative and we are
Dialogue: 0,1:09:09.24,1:09:11.12,Default,,0000,0000,0000,,constantly or the data scientist is
Dialogue: 0,1:09:11.12,1:09:13.32,Default,,0000,0000,0000,,constantly getting in all these new data
Dialogue: 0,1:09:13.32,1:09:15.28,Default,,0000,0000,0000,,points and then refining the model
Dialogue: 0,1:09:15.28,1:09:17.96,Default,,0000,0000,0000,,picking maybe a new model deploying the
Dialogue: 0,1:09:17.96,1:09:21.68,Default,,0000,0000,0000,,new model onto the server and so on all
Dialogue: 0,1:09:21.68,1:09:23.92,Default,,0000,0000,0000,,right and so that's it so that is
Dialogue: 0,1:09:23.92,1:09:26.40,Default,,0000,0000,0000,,basically your machine learning workflow
Dialogue: 0,1:09:26.40,1:09:29.48,Default,,0000,0000,0000,,in a nutshell okay so for this
Dialogue: 0,1:09:29.48,1:09:32.08,Default,,0000,0000,0000,,particular approach we have used a bunch
Dialogue: 0,1:09:32.08,1:09:34.56,Default,,0000,0000,0000,,of uh data science libraries from python
Dialogue: 0,1:09:34.56,1:09:36.52,Default,,0000,0000,0000,,so we have used pandas which is the most
Dialogue: 0,1:09:36.52,1:09:38.56,Default,,0000,0000,0000,,B basic data science libraries that
Dialogue: 0,1:09:38.56,1:09:40.28,Default,,0000,0000,0000,,provides all the tools to work with raw
Dialogue: 0,1:09:40.28,1:09:42.52,Default,,0000,0000,0000,,data we have used numai which is a high
Dialogue: 0,1:09:42.52,1:09:44.08,Default,,0000,0000,0000,,performance library for implementing
Dialogue: 0,1:09:44.08,1:09:46.44,Default,,0000,0000,0000,,complex array metrix operations we have
Dialogue: 0,1:09:46.44,1:09:49.56,Default,,0000,0000,0000,,used met plot lip and cbon which is used
Dialogue: 0,1:09:49.56,1:09:52.44,Default,,0000,0000,0000,,for doing the Eda the explorat
Dialogue: 0,1:09:52.44,1:09:55.56,Default,,0000,0000,0000,,exploratory data analysis phase machine
Dialogue: 0,1:09:55.56,1:09:57.04,Default,,0000,0000,0000,,learning where you visualize all your
Dialogue: 0,1:09:57.04,1:09:59.04,Default,,0000,0000,0000,,data we have used psyit learn which is
Dialogue: 0,1:09:59.04,1:10:01.28,Default,,0000,0000,0000,,the machine L learning library to do all
Dialogue: 0,1:10:01.28,1:10:02.92,Default,,0000,0000,0000,,your implementation for all your call
Dialogue: 0,1:10:02.92,1:10:06.00,Default,,0000,0000,0000,,machine learning algorithms uh we we we
Dialogue: 0,1:10:06.00,1:10:08.00,Default,,0000,0000,0000,,have not used this because this is not a
Dialogue: 0,1:10:08.00,1:10:11.04,Default,,0000,0000,0000,,deep learning uh problem but if you are
Dialogue: 0,1:10:11.04,1:10:12.80,Default,,0000,0000,0000,,working with a deep learning problem
Dialogue: 0,1:10:12.80,1:10:15.36,Default,,0000,0000,0000,,like image classification image
Dialogue: 0,1:10:15.36,1:10:17.84,Default,,0000,0000,0000,,recognition object detection okay
Dialogue: 0,1:10:17.84,1:10:20.20,Default,,0000,0000,0000,,natural language processing text
Dialogue: 0,1:10:20.20,1:10:21.92,Default,,0000,0000,0000,,classification well then you're going to
Dialogue: 0,1:10:21.92,1:10:24.36,Default,,0000,0000,0000,,use these libraries from python which is
Dialogue: 0,1:10:24.36,1:10:28.96,Default,,0000,0000,0000,,tensor flow okay and also py
Dialogue: 0,1:10:28.96,1:10:32.68,Default,,0000,0000,0000,,to and then lastly that whole thing that
Dialogue: 0,1:10:32.68,1:10:34.72,Default,,0000,0000,0000,,whole data science project that you saw
Dialogue: 0,1:10:34.72,1:10:36.80,Default,,0000,0000,0000,,just now this entire data science
Dialogue: 0,1:10:36.80,1:10:38.88,Default,,0000,0000,0000,,project is actually developed in
Dialogue: 0,1:10:38.88,1:10:41.08,Default,,0000,0000,0000,,something called a Jupiter notebook so
Dialogue: 0,1:10:41.08,1:10:44.04,Default,,0000,0000,0000,,all this python code along with all the
Dialogue: 0,1:10:44.04,1:10:46.36,Default,,0000,0000,0000,,observations from the data
Dialogue: 0,1:10:46.36,1:10:48.68,Default,,0000,0000,0000,,scientists okay for this entire data
Dialogue: 0,1:10:48.68,1:10:50.44,Default,,0000,0000,0000,,science project was actually run in
Dialogue: 0,1:10:50.44,1:10:53.36,Default,,0000,0000,0000,,something called a Jupiter notebook so
Dialogue: 0,1:10:53.36,1:10:55.76,Default,,0000,0000,0000,,that is uh the
Dialogue: 0,1:10:55.76,1:10:59.08,Default,,0000,0000,0000,,most widely used tool for interactively
Dialogue: 0,1:10:59.08,1:11:02.36,Default,,0000,0000,0000,,developing and presenting data science
Dialogue: 0,1:11:02.36,1:11:04.64,Default,,0000,0000,0000,,projects okay so that brings me to the
Dialogue: 0,1:11:04.64,1:11:07.40,Default,,0000,0000,0000,,end of this entire presentation I hope
Dialogue: 0,1:11:07.40,1:11:10.36,Default,,0000,0000,0000,,that you find it useful for you and that
Dialogue: 0,1:11:10.36,1:11:13.20,Default,,0000,0000,0000,,you can appreciate the importance of
Dialogue: 0,1:11:13.20,1:11:15.28,Default,,0000,0000,0000,,machine learning and how it can be
Dialogue: 0,1:11:15.28,1:11:19.80,Default,,0000,0000,0000,,applied in a real life use case in a
Dialogue: 0,1:11:19.80,1:11:23.36,Default,,0000,0000,0000,,typical production environment all right
Dialogue: 0,1:11:23.36,1:11:27.24,Default,,0000,0000,0000,,thank you all so much for watching