Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook
-
0:01 - 0:04Hello everyone, my name is Victor. I'm
-
0:04 - 0:05your friendly neighborhood data
-
0:05 - 0:08scientist from DreamCatcher. So in this
-
0:08 - 0:10presentation, I would like to talk about
-
0:10 - 0:13a specific industry use case of AI or
-
0:13 - 0:15machine learning which is predictive
-
0:15 - 0:19maintenance. So I will be covering these
-
0:19 - 0:21topics and feel free to jump forward to
-
0:21 - 0:23the specific part in the video where I
-
0:23 - 0:25talk about all these topics. So I'm going
-
0:25 - 0:27to start off with a general preview of
-
0:27 - 0:29AI and machine learning. Then, I'll
-
0:29 - 0:31discuss the use case which is predictive
-
0:31 - 0:33maintenance. I'll talk about the basics
-
0:33 - 0:35of machine learning, the workflow of
-
0:35 - 0:37machine learning, and then we will come
-
0:37 - 0:41to the meat of this presentation which
-
0:41 - 0:44is essentially a demonstration of the
-
0:44 - 0:45machine learning workflow from end to
-
0:45 - 0:48end on a real life predictive
-
0:48 - 0:52maintenance domain problem. All right, so
-
0:52 - 0:54without any further ado, let's jump into
-
0:54 - 0:57it. So let's start off with a quick
-
0:57 - 1:00preview of AI and machine learning. Well
-
1:00 - 1:04AI is a very general term, it encompasses
-
1:04 - 1:07the entire area of science and
-
1:07 - 1:09engineering that is related to creating
-
1:09 - 1:11software programs and machines that
-
1:11 - 1:14will be capable of performing tasks
-
1:14 - 1:16that would normally require human
-
1:16 - 1:20intelligence. But AI is a catchall term,
-
1:20 - 1:23so really when we talk about apply AI,
-
1:23 - 1:26how we use AI in our daily work, we are
-
1:26 - 1:28really going to be talking about machine
-
1:28 - 1:30learning. So machine learning is the
-
1:30 - 1:32design and application of software
-
1:32 - 1:34algorithms that are capable of learning
-
1:34 - 1:38on their own without any explicit human
-
1:38 - 1:40intervention. And the primary purpose of
-
1:40 - 1:43these algorithms are to optimize
-
1:43 - 1:47performance in a specific task. And the
-
1:47 - 1:50primary performance or the primary task
-
1:50 - 1:52that you want to optimize performance in
-
1:52 - 1:54is to be able to make accurate
-
1:54 - 1:57predictions about future outcomes based
-
1:57 - 2:01on the analysis of historical data
-
2:01 - 2:03from the past. So essentially machine
-
2:03 - 2:05learning is about making predictions
-
2:05 - 2:07about the future or what we call
-
2:07 - 2:09predictive analytics.
-
2:09 - 2:11And there are many different
-
2:11 - 2:13kinds of algorithms that are available in
-
2:13 - 2:15machine learning under the three primary
-
2:15 - 2:16categories of supervised learning,
-
2:16 - 2:19unsupervised learning, and reinforcement
-
2:19 - 2:21learning. And here we can see some of the
-
2:21 - 2:24different kinds of algorithms and their
-
2:24 - 2:27use cases in various areas in
-
2:27 - 2:30industry. So we have various domain use
-
2:30 - 2:30cases
-
2:30 - 2:32for all these different kind of
-
2:32 - 2:34algorithms, and we can see that different
-
2:34 - 2:38algorithms are fitted for different use cases.
-
2:38 - 2:41Deep learning is an advanced form
-
2:41 - 2:42of machine learning that's based on
-
2:42 - 2:44something called an artificial neural
-
2:44 - 2:46network or ANN for short, and this
-
2:46 - 2:48essentially simulates the structure of
-
2:48 - 2:50the human brain whereby neurons
-
2:50 - 2:51interconnect and work together to
-
2:51 - 2:55process and learn new information. So DL
-
2:55 - 2:57is the foundational technology for most
-
2:57 - 2:59of the popular AI tools that you
-
2:59 - 3:01probably have heard of today. So I'm sure
-
3:01 - 3:03you have heard of ChatGPT if you haven't
-
3:03 - 3:05been living in a cave for the past 2
-
3:05 - 3:08years. And yeah, so ChatGPT is an example
-
3:08 - 3:10of what we call a large language model
-
3:10 - 3:12and that's based on this technology
-
3:12 - 3:15called deep learning. Also, all the modern
-
3:15 - 3:17computer vision applications where a
-
3:17 - 3:20computer program can classify images or
-
3:20 - 3:23detect images or recognize images on
-
3:23 - 3:25its own, okay, we call this computer
-
3:25 - 3:28vision applications. They also use
-
3:28 - 3:30this particular form of machine learning
-
3:30 - 3:32called deep learning, right? So this is a
-
3:32 - 3:34example of an artificial neural network.
-
3:34 - 3:35For example, here I have an image of a
-
3:35 - 3:37bird that's fed into this artificial
-
3:37 - 3:40neural network, and output from this
-
3:40 - 3:41artificial neural network is a
-
3:41 - 3:44classification of this image into one of
-
3:44 - 3:46these three potential categories. So in
-
3:46 - 3:49this case, if the ANN has been trained
-
3:49 - 3:52properly, we fit in this image, this
-
3:52 - 3:54ANN should correctly classify this image
-
3:54 - 3:57as a bird, right? So this is a image
-
3:57 - 3:59classification problem which is a
-
3:59 - 4:01classic use case for an artificial
-
4:01 - 4:04neural network in the field of computer
-
4:04 - 4:08vision. And just like in the case of
-
4:08 - 4:09machine learning, there are a variety of
-
4:09 - 4:12algorithms that are available for
-
4:12 - 4:14deep learning under the category of
-
4:14 - 4:15supervised learning and also
-
4:15 - 4:17unsupervised learning.
-
4:17 - 4:19All right, so this is how we can
-
4:19 - 4:21kind of categorize this. You can think of
-
4:21 - 4:24AI is a general area of smart systems
-
4:24 - 4:27and machine. Machine learning is
-
4:27 - 4:29basically apply AI and deep learning
-
4:29 - 4:30is a
-
4:30 - 4:33subspecialization of machine learning
-
4:33 - 4:35using a particular architecture called
-
4:35 - 4:39an artificial neural network.
-
4:39 - 4:42And generative AI, so if you talk
-
4:42 - 4:45about ChatGPT, okay, Google Gemini,
-
4:45 - 4:48Microsoft Copilot, okay, all these
-
4:48 - 4:50examples of generative AI, they are
-
4:50 - 4:52basically large language models, and they
-
4:52 - 4:54are a further subcategory within the
-
4:54 - 4:55area of deep
-
4:55 - 4:58learning. And there are many applications
-
4:58 - 4:59of machine learning in industry right
-
4:59 - 5:02now, so pick which particular industry
-
5:02 - 5:04are you involved in, and these are all the
-
5:04 - 5:05specific areas of
-
5:05 - 5:10applications, right? So probably, I'm
-
5:10 - 5:12going to guess the vast majority of you
-
5:12 - 5:13who are watching this video, you're
-
5:13 - 5:14probably coming from the manufacturing
-
5:14 - 5:17industry, and so in the manufacturing
-
5:17 - 5:18industry some of the standard use cases
-
5:18 - 5:20for machine learning and deep learning
-
5:20 - 5:23are predicting potential problems, okay?
-
5:23 - 5:25So sometimes you call this predictive
-
5:25 - 5:27maintenance where you want to predict
-
5:27 - 5:29when a problem is going to happen and
-
5:29 - 5:30then kind of address it before it
-
5:30 - 5:33happens. And then monitoring systems,
-
5:33 - 5:35automating your manufacturing assembly
-
5:35 - 5:38line or production line, okay, smart
-
5:38 - 5:40scheduling, and detecting anomaly on your
-
5:40 - 5:41production line.
-
5:42 - 5:44Okay, so let's talk about the use
-
5:44 - 5:46case here which is predictive
-
5:46 - 5:49maintenance, right? So what is predictive
-
5:49 - 5:52maintenance? Well predictive maintenance,
-
5:52 - 5:53here's the long definition, is a
-
5:53 - 5:55equipment maintenance strategy that
-
5:55 - 5:56relies on real-time monitoring of
-
5:56 - 5:58equipment conditions and data to predict
-
5:58 - 6:00equipment failures in advance.
-
6:00 - 6:03And this uses advanced data models,
-
6:03 - 6:05analytics, and machine learning whereby
-
6:05 - 6:07we can reliably assess when failures are
-
6:07 - 6:09more likely to occur, including which
-
6:09 - 6:11components are more likely to be
-
6:11 - 6:14affected on your production or assembly
-
6:14 - 6:17line. So where does predictive
-
6:17 - 6:19maintenance fit into the overall scheme
-
6:19 - 6:21of things, right? So let's talk about the
-
6:21 - 6:23kind of standard way that, you know,
-
6:23 - 6:26factories or production
-
6:26 - 6:28lines, assembly lines in factories tend
-
6:28 - 6:31to handle maintenance issues say
-
6:31 - 6:3310 or 20 years ago, right? So what you
-
6:33 - 6:35have is the, what you would probably
-
6:35 - 6:36start off is the most basic mode
-
6:36 - 6:38which is reactive maintenance. So you
-
6:38 - 6:41just wait until your machine breaks down
-
6:41 - 6:43and then you repair, right? The simplest,
-
6:43 - 6:45but, of course, I'm sure if you have worked on a
-
6:45 - 6:47production line for any period of time,
-
6:47 - 6:49you know that this reactive maintenance
-
6:49 - 6:51can give you a whole bunch of headaches
-
6:51 - 6:52especially if the machine breaks down
-
6:52 - 6:54just before a critical delivery deadline,
-
6:54 - 6:56right? Then you're going to have a
-
6:56 - 6:57backlog of orders and you're going to
-
6:57 - 6:59run to a lot of problems. Okay, so we move on
-
6:59 - 7:01to preventive maintenance which is
-
7:01 - 7:04you regularly schedule a maintenance of
-
7:04 - 7:07your production machines to reduce
-
7:07 - 7:09the failure rate. So you might do
-
7:09 - 7:11maintenance once every month, once every
-
7:11 - 7:13two weeks, whatever. Okay, this is great,
-
7:13 - 7:15but the problem, of course, then is well
-
7:15 - 7:16sometimes you're doing too much
-
7:16 - 7:18maintenance, it's not really necessary,
-
7:18 - 7:21and it still doesn't totally prevent
-
7:21 - 7:23this, you know, a failure of the
-
7:23 - 7:26machine that occurs outside of your planned
-
7:26 - 7:29maintenance, right? So a bit of an
-
7:29 - 7:31improvement, but not that much better.
-
7:31 - 7:33And then, these last two categories is
-
7:33 - 7:35where we bring in AI and machine
-
7:35 - 7:37learning. So with machine learning, we're
-
7:37 - 7:39going to use sensors to do real-time
-
7:39 - 7:42monitoring of the data, and then using
-
7:42 - 7:43that data we're going to build a machine
-
7:43 - 7:46learning model which helps us to predict,
-
7:46 - 7:50with a reasonable level of accuracy, when
-
7:50 - 7:53the next failure is going to happen on
-
7:53 - 7:54your assembly or production line on a
-
7:54 - 7:57specific component or specific machine,
-
7:57 - 8:00right? So you just want to be predict to
-
8:00 - 8:02a high level of accuracy like maybe
-
8:02 - 8:04to the specific day, even the specific
-
8:04 - 8:06hour, or even minute itself when you
-
8:06 - 8:08expect that particular product to fail
-
8:08 - 8:11or the particular machine to fail. All
-
8:11 - 8:13right, so these are the advantages of
-
8:13 - 8:15predictive maintenance. It minimizes
-
8:15 - 8:17the occurrence of unscheduled downtime, it
-
8:17 - 8:18gives you a real-time overview of your
-
8:18 - 8:20current condition of assets, ensures
-
8:20 - 8:23minimal disruptions to productivity,
-
8:23 - 8:25optimizes time you spend on maintenance work,
-
8:25 - 8:27optimizes the use of spare parts, and so
-
8:27 - 8:28on. And of course there are some
-
8:28 - 8:31disadvantages, which is the
-
8:31 - 8:33primary one, you need a specialized set
-
8:33 - 8:36of skills among your engineers to
-
8:36 - 8:38understand and create machine learning
-
8:38 - 8:41models that can work on the realtime
-
8:41 - 8:44data that you're getting. Okay, so we're
-
8:44 - 8:45going to take a look at some real life
-
8:45 - 8:47use cases. So these are a bunch of links
-
8:47 - 8:49here, so if you navigate to these links
-
8:49 - 8:50here, you'll be able to get a look at
-
8:50 - 8:54some real life use cases of machine
-
8:54 - 8:58learning in predictive maintenance. So
-
8:58 - 9:01the IBM website, okay, gives you a look at
-
9:01 - 9:05a bunch of five use cases, so you can
-
9:05 - 9:07click on these links and follow up with
-
9:07 - 9:08them if you want to read more. Okay, this
-
9:08 - 9:11is waste management, manufacturing, okay,
-
9:11 - 9:15building services, and renewable energy,
-
9:15 - 9:17and also mining, right? So these are all
-
9:17 - 9:18use cases, if you want to know more about
-
9:18 - 9:20them, you can read up and follow them
-
9:20 - 9:24from this website. And this website
-
9:24 - 9:26gives, this is a pretty good website. I
-
9:26 - 9:28would really encourage you to just look
-
9:28 - 9:29through this if you're interested in
-
9:29 - 9:31predictive maintenance. So here, it tells
-
9:31 - 9:34you about, you know, an industry survey of
-
9:34 - 9:36predictive maintenance. We can see that a
-
9:36 - 9:38large portion of the industry,
-
9:38 - 9:40manufacturing industry agreed that
-
9:40 - 9:41predictive maintenance is a real need to
-
9:41 - 9:44stay competitive and predictive
-
9:44 - 9:45maintenance is essential for
-
9:45 - 9:47manufacturing industry and will gain
-
9:47 - 9:48additional strength in the future. So
-
9:48 - 9:50this is a survey that was done quite
-
9:50 - 9:52some time ago and this was the results
-
9:52 - 9:54that we got back. So we can see the vast
-
9:54 - 9:56majority of key industry players in the
-
9:56 - 9:58manufacturing sector, they consider
-
9:58 - 9:59predictive maintenance to be a very
-
9:59 - 10:00important
-
10:00 - 10:02activity that they want to
-
10:02 - 10:05incorporate into their workflow, right?
-
10:05 - 10:08And we can see here the kind of ROI that
-
10:08 - 10:11we expect on investment in predictive
-
10:11 - 10:13maintenance, so 45% reduction in downtime,
-
10:13 - 10:1725% growth in productivity, 75% fault
-
10:17 - 10:19elimination, 30% reduction in maintenance
-
10:19 - 10:23cost, okay? And best of all, if you really
-
10:23 - 10:25want to kind of take a look at examples,
-
10:25 - 10:27all right, so there are all these
-
10:27 - 10:28different companies that have
-
10:28 - 10:30significantly invested in predictive
-
10:30 - 10:32maintenance technology in their
-
10:32 - 10:34manufacturing processes. So PepsiCo, we
-
10:34 - 10:39have got Frito-Lay, General Motors, Mondi, Ecoplant,
-
10:39 - 10:41all right? So you can jump over here
-
10:41 - 10:43and take a look at some of these
-
10:43 - 10:46use cases. Let me perhaps, let me try and
-
10:46 - 10:48open this up, for example, Mondi, right? You
-
10:48 - 10:52can see Mondi has impl- oops. Mondi has used
-
10:52 - 10:54this particular piece of software
-
10:54 - 10:56called MATLAB, all right, or MathWorks
-
10:56 - 11:00sorry, to do predictive maintenance
-
11:00 - 11:02for their manufacturing processes using
-
11:02 - 11:05machine learning. And we can talk, you can
-
11:05 - 11:08study how they have used it, all right,
-
11:08 - 11:09and how it works, what was their
-
11:09 - 11:11challenge, all right, the problems they
-
11:11 - 11:13were facing, the solution that they use
-
11:13 - 11:15using this MathWorks Consulting piece of
-
11:15 - 11:17software, and data that they collected in
-
11:17 - 11:20a MATLAB database, all right, sorry
-
11:20 - 11:24in a Oracle database.
-
11:24 - 11:26So using MathWorks from MATLAB, all
-
11:26 - 11:28right, they were able to create a deep
-
11:28 - 11:31learning model to, you know, to
-
11:31 - 11:33solve this particular issue for their
-
11:33 - 11:36domain. So if you're interested, please, I
-
11:36 - 11:38strongly encourage you to read up on all
-
11:38 - 11:40these real life customer stories with
-
11:40 - 11:43showcase use cases for predictive
-
11:43 - 11:48maintenance. Okay, so that's it for
-
11:48 - 11:52real life use cases for predictive maintenance.
-
11:54 - 11:57Now in this topic, I'm
-
11:57 - 11:58going to talk about machine learning
-
11:58 - 12:00basics, so what is actually involved
-
12:00 - 12:01in machine learning, and I'm going to
-
12:01 - 12:04give a very quick, fast, conceptual, high
-
12:04 - 12:06level overview of machine learning, all
-
12:06 - 12:09right? So there are several categories of
-
12:09 - 12:11machine learning, supervised, unsupervised,
-
12:11 - 12:13semi-supervised, reinforcement, and deep
-
12:13 - 12:16learning, okay? And let's talk about the
-
12:16 - 12:19most common and widely used category of
-
12:19 - 12:21machine learning which is called
-
12:21 - 12:25supervised learning. So the particular use
-
12:25 - 12:26case here that I'm going to be
-
12:26 - 12:29discussing, predictive maintenance, it's
-
12:29 - 12:31basically a form of supervised learning.
-
12:31 - 12:33So how does supervised learning work?
-
12:33 - 12:35Well in supervised learning, you're going
-
12:35 - 12:37to create a machine learning model by
-
12:37 - 12:39providing what is called a labelled data
-
12:39 - 12:42set as a input to a machine learning
-
12:42 - 12:45program or algorithm. And this dataset
-
12:45 - 12:46is going to contain what is called an
-
12:46 - 12:49independent or feature variables, all
-
12:49 - 12:51right, so this will be a set of variables.
-
12:51 - 12:53And there will be one dependent or
-
12:53 - 12:55target variable which we also call the
-
12:55 - 12:58label, and the idea is that the
-
12:58 - 13:00independent or the feature variables are
-
13:00 - 13:02the attributes or properties of your
-
13:02 - 13:04data set that influence the dependent or
-
13:04 - 13:08the target variable, okay? So this process
-
13:08 - 13:09that I've just described is called
-
13:09 - 13:12training the machine learning model, and
-
13:12 - 13:14the model is fundamentally a
-
13:14 - 13:16mathematical function that best
-
13:16 - 13:18approximates the relationship between
-
13:18 - 13:21the independent variables and the
-
13:21 - 13:23dependent variable. All right, so that's
-
13:23 - 13:24quite a bit of a mouthful, so let's jump
-
13:24 - 13:26into a diagram that maybe illustrates
-
13:26 - 13:28this more clearly. So let's say you have
-
13:28 - 13:30a dataset here, an Excel spreadsheet,
-
13:30 - 13:32right? And this Excel spreadsheet has a
-
13:32 - 13:34bunch of columns here and a bunch of
-
13:34 - 13:37rows, okay? So these rows here represent
-
13:37 - 13:39observations, or these rows are what
-
13:39 - 13:41we call observations or samples or data
-
13:41 - 13:43points in our data set, okay? So let's
-
13:43 - 13:47assume this data set is gathered by a
-
13:47 - 13:50marketing manager at a mall, at a retail
-
13:50 - 13:52mall, all right? So they've got all this
-
13:52 - 13:55information about the customers who
-
13:55 - 13:57purchase products at this mall, all right?
-
13:57 - 13:59So some of the information they've
-
13:59 - 14:00gotten about the customers are their
-
14:00 - 14:02gender, their age, their income, and the
-
14:02 - 14:04number of children. So all this
-
14:04 - 14:06information about the customers, we call
-
14:06 - 14:07this the independent or the feature
-
14:07 - 14:10variables, all right? And based on all
-
14:10 - 14:13this information about the customer, we
-
14:13 - 14:16also managed to get some or we record
-
14:16 - 14:18the information about how much the
-
14:18 - 14:20customer spends, all right? So this
-
14:20 - 14:22information or these numbers here, we call
-
14:22 - 14:24this the target variable or the
-
14:24 - 14:27dependent variable, right? So on the
-
14:27 - 14:30single row, the data point, one single sample, one
-
14:30 - 14:33single data point, contains all the data
-
14:33 - 14:35for the feature variables and one single
-
14:35 - 14:38value for the label or the target
-
14:38 - 14:41variable, okay? And the primary purpose of
-
14:41 - 14:43the machine learning model is to create
-
14:43 - 14:46a mapping from all your feature
-
14:46 - 14:48variables to your target variable, so
-
14:48 - 14:51somehow there's going to be a function,
-
14:51 - 14:52okay, this will be a mathematical
-
14:52 - 14:55function that maps all the values of
-
14:55 - 14:57your feature variable to the value of
-
14:57 - 15:00your target variable. In other words, this
-
15:00 - 15:01function represents the relationship
-
15:01 - 15:03between your feature variables and your
-
15:03 - 15:07target variable, okay? So this whole thing,
-
15:07 - 15:09this training process, we call this the
-
15:09 - 15:11fitting the model. And the target
-
15:11 - 15:13variable or the label, this thing here,
-
15:13 - 15:15this column here, or the values here,
-
15:15 - 15:17these are critical for providing a
-
15:17 - 15:19context to do the fitting or the
-
15:19 - 15:21training of the model. And once you've
-
15:21 - 15:23got a trained and fitted model, you can
-
15:23 - 15:26then use the model to make an accurate
-
15:26 - 15:28prediction of target values
-
15:28 - 15:30corresponding to new feature values that
-
15:30 - 15:33the model has yet to encounter or yet to
-
15:33 - 15:35see, and this, as I've already said
-
15:35 - 15:36earlier, this is called predictive
-
15:36 - 15:38analytics, okay? So let's see what's
-
15:38 - 15:40actually happening here, you take your
-
15:40 - 15:43training data, all right, so this is this
-
15:43 - 15:45whole bunch of data, this data set here
-
15:45 - 15:47consisting of a thousand rows of
-
15:47 - 15:50data, 10,000 rows of data, you take this
-
15:50 - 15:52entire data set, all right, this entire
-
15:52 - 15:54data set, you jam it into your machine
-
15:54 - 15:57learning algorithm, and a couple of hours
-
15:57 - 15:58later your machine learning algorithm
-
15:58 - 16:01comes up with a model. And the model is
-
16:01 - 16:04essentially a function that maps all
-
16:04 - 16:06your feature variables which is these
-
16:06 - 16:08four columns here, to your target
-
16:08 - 16:10variable which is this one single column
-
16:10 - 16:14here, okay? So once you have the model, you
-
16:14 - 16:17can put in a new data point. So basically
-
16:17 - 16:19the new data point represents data about a
-
16:19 - 16:21new customer, a new customer that you
-
16:21 - 16:23have never seen before. So let's say
-
16:23 - 16:25you've already got information about
-
16:25 - 16:2810,000 customers that have visited this
-
16:28 - 16:30mall and how much each of these 10,000
-
16:30 - 16:32customers have spent when they are at this
-
16:32 - 16:34mall. So now you have a totally new
-
16:34 - 16:36customer that comes in the mall, this
-
16:36 - 16:38customer has never come into this mall
-
16:38 - 16:40before, and what we know about this
-
16:40 - 16:43customer is that he is a male, the age is
-
16:43 - 16:4550, the income is 18, and they have nine
-
16:45 - 16:48children. So now when you take this data
-
16:48 - 16:51and you pump that into your model, your
-
16:51 - 16:53model is going to make a prediction, it's
-
16:53 - 16:56going to say, hey, you know what? Based on
-
16:56 - 16:57everything that I have been trained before
-
16:57 - 16:59and based on the model I've developed,
-
16:59 - 17:02I am going to predict that a customer
-
17:02 - 17:05that is of a male gender, of the age 50
-
17:05 - 17:08with the income of 18, and nine children,
-
17:08 - 17:12that customer is going to spend 25 ringgit
-
17:12 - 17:16at the mall. And this is it, this is what
-
17:16 - 17:19you want. Right there, right here,
-
17:19 - 17:21can you see here? That is the final
-
17:21 - 17:23output of your machine learning model.
-
17:23 - 17:27It's going to make a prediction about
-
17:27 - 17:30something that it has not ever seen
-
17:30 - 17:33before, okay? That is the core, this is
-
17:33 - 17:36essentially the core of machine learning.
-
17:36 - 17:39Predictive analytics, making prediction
-
17:39 - 17:40about the future
-
17:41 - 17:44based on a historical data set.
-
17:44 - 17:47Okay, so there are two areas of
-
17:47 - 17:49supervised learning, regression and
-
17:49 - 17:51classification. So regression is used to
-
17:51 - 17:53predict a numerical target variable, such
-
17:53 - 17:55as the price of a house or the salary of
-
17:55 - 17:58an employee, whereas classification is
-
17:58 - 18:00used to predict a categorical target
-
18:00 - 18:04variable or class label, okay? So for
-
18:04 - 18:06classification you can have either
-
18:06 - 18:09binary or multiclass, so, for example,
-
18:09 - 18:12binary will be just true or false, zero
-
18:12 - 18:15or one. So whether your machine is going
-
18:15 - 18:17to fail or is it not going to fail, right?
-
18:17 - 18:19So just two classes, two possible,
-
18:19 - 18:22outcomes, or is the customer going to
-
18:22 - 18:24make a purchase or is the customer not
-
18:24 - 18:26going to make a purchase. We call this
-
18:26 - 18:28binary classification. And then for
-
18:28 - 18:30multiclass, when there are more than two
-
18:30 - 18:33classes or types of values. So, for
-
18:33 - 18:34example, here this would be a
-
18:34 - 18:36classification problem. So if you have a
-
18:36 - 18:38data set here, you've got information
-
18:38 - 18:39about your customers, you've got your
-
18:39 - 18:41gender of the customer, the age of the
-
18:41 - 18:43customer, the salary of the customer, and
-
18:43 - 18:45you also have record about whether the
-
18:45 - 18:48customer made a purchase or not, okay? So
-
18:48 - 18:50you can take this data set to train a
-
18:50 - 18:52classification model, and then the
-
18:52 - 18:54classification model can then make a
-
18:54 - 18:56prediction about a new customer, and
-
18:56 - 18:59they're going to predict zero which
-
18:59 - 19:00means the customer didn't make a
-
19:00 - 19:03purchase or one which means the customer
-
19:03 - 19:06make a purchase, right? And regression,
-
19:06 - 19:09this is regression, so let's say you want
-
19:09 - 19:11to predict the wind speed, and you've got
-
19:11 - 19:14historical data about all these four
-
19:14 - 19:17other independent variables or feature
-
19:17 - 19:18variables, so you have recorded
-
19:18 - 19:20temperature, the pressure, the relative
-
19:20 - 19:22humidity, and the wind direction for the
-
19:22 - 19:25past 10 days, 15 days, or whatever, okay? So
-
19:25 - 19:27now you are going to train your machine
-
19:27 - 19:29learning model using this data set, and
-
19:29 - 19:32the target variable column, okay, this
-
19:32 - 19:34column here, the label is basically a
-
19:34 - 19:37number, right? So now with this number,
-
19:37 - 19:40this is a regression model, and so now
-
19:40 - 19:42you can put in a new data point, so a new
-
19:42 - 19:45data point means a new set of values for
-
19:45 - 19:47temperature, pressure, relative humidity,
-
19:47 - 19:49and wind direction, and your machine
-
19:49 - 19:51learning model will then predict the
-
19:51 - 19:54wind speed for that new data point, okay?
-
19:54 - 19:57So that's a regression model.
-
19:59 - 20:02All right. So in this particular topic
-
20:02 - 20:05I'm going to talk about the workflow of
-
20:05 - 20:08that's involved in machine learning. So
-
20:08 - 20:13in the previous slides, I talked about
-
20:13 - 20:15developing the model, all right? But
-
20:15 - 20:16that's just one part of the entire
-
20:16 - 20:19workflow. So in real life when you use
-
20:19 - 20:20machine learning, there's an end-to-end
-
20:20 - 20:22workflow that's involved. So the first
-
20:22 - 20:24thing, of course, is you need to get your
-
20:24 - 20:27data, and then you need to clean your
-
20:27 - 20:29data, and then you need to explore your
-
20:29 - 20:31data. You need to see what's going on in
-
20:31 - 20:33your data set, right? And your data set,
-
20:33 - 20:36real life data sets are not trivial, they
-
20:36 - 20:39are hundreds of rows, thousands of rows,
-
20:39 - 20:41sometimes millions of rows, billions of
-
20:41 - 20:43rows, we're talking about billions or
-
20:43 - 20:45millions of data points especially if
-
20:45 - 20:47you're using an IoT sensor to get data
-
20:47 - 20:49in real time. So you've got all these
-
20:49 - 20:51super large data sets, you need to clean
-
20:51 - 20:53them, and explore them, and then you need
-
20:53 - 20:56to prepare them into a right format so
-
20:56 - 21:00that you can put them into the training
-
21:00 - 21:02process to create your machine learning
-
21:02 - 21:05model, and then subsequently you check
-
21:05 - 21:08how good is the model, right? How accurate
-
21:08 - 21:10is the model in terms of its ability to
-
21:10 - 21:13generate predictions for the
-
21:13 - 21:15future, right? How accurate are the
-
21:15 - 21:17predictions that are coming up from your
-
21:17 - 21:18machine learning model. So that's
-
21:18 - 21:21validating or evaluating your model, and
-
21:21 - 21:23then subsequently if you determine that
-
21:23 - 21:25your model is of adequate accuracy to
-
21:25 - 21:27meet whatever your domain use case
-
21:27 - 21:29requirements are, right? So let's say the
-
21:29 - 21:31accuracy that's required for your domain
-
21:31 - 21:32use case is
-
21:32 - 21:3585%, okay? If my machine learning model
-
21:35 - 21:39can give an 85% accuracy rate, I think
-
21:39 - 21:40it's good enough, then I'm going to
-
21:40 - 21:43deploy it into real world use case. So
-
21:43 - 21:45here the machine learning model gets
-
21:45 - 21:48deployed on the server, and then other,
-
21:48 - 21:51you know, other data sources are going to
-
21:51 - 21:53be captured from somewhere. That data is
-
21:53 - 21:54pump into the machine learning model. The
-
21:54 - 21:55machine learning model generates
-
21:55 - 21:58predictions, and those predictions are
-
21:58 - 22:00then used to make decisions on the
-
22:00 - 22:02factory floor in real time or in any
-
22:02 - 22:05other particular scenario. And then you
-
22:05 - 22:07constantly monitor and update the model,
-
22:07 - 22:09you get more new data, and then the
-
22:09 - 22:12entire cycle repeats itself. So that's
-
22:12 - 22:14your machine learning workflow, okay, in a
-
22:14 - 22:17nutshell. Here's another example of
-
22:17 - 22:19the same thing maybe in a slightly
-
22:19 - 22:20different format, so, again, you have your
-
22:20 - 22:22data collection and preparation. Here we
-
22:22 - 22:24talk more about the different kinds of
-
22:24 - 22:27algorithms that available to create a
-
22:27 - 22:28model, and I'll talk about this more in
-
22:28 - 22:30detail when we look at the real world
-
22:30 - 22:32example of a end-to-end machine learning
-
22:32 - 22:35workflow for the predictive maintenance
-
22:35 - 22:37use case. So once you have chosen the
-
22:37 - 22:39appropriate algorithm, you then have
-
22:39 - 22:41trained your model, you then have
-
22:41 - 22:44selected the appropriate train model
-
22:44 - 22:46among the multiple models. You are
-
22:46 - 22:48probably going to develop multiple
-
22:48 - 22:50models from multiple algorithms, you're
-
22:50 - 22:52going to evaluate them all, and then
-
22:52 - 22:53you're going to say, hey, you know what?
-
22:53 - 22:55After I've evaluated and tested that,
-
22:55 - 22:57I've chosen the best model, I'm going to
-
22:57 - 23:00deploy the model, all right, so this is
-
23:00 - 23:03for real life production use, okay? Real
-
23:03 - 23:04life sensor data is going to be pumped
-
23:04 - 23:06into my model, my model is going to
-
23:06 - 23:08generate predictions, the predicted data
-
23:08 - 23:10is going to used immediately in real
-
23:10 - 23:13time for real life decision making, and
-
23:13 - 23:15then I'm going to monitor, right, the
-
23:15 - 23:17results. So somebody's using the
-
23:17 - 23:19predictions from my model, if the
-
23:19 - 23:22predictions are lousy, that goes into the
-
23:22 - 23:23monitoring, the monitoring system
-
23:23 - 23:25captures that. If the predictions are
-
23:25 - 23:28fantastic, well that is also captured by the
-
23:28 - 23:30monitoring system, and that gets
-
23:30 - 23:32feedback again to the next cycle of my
-
23:32 - 23:34machine learning
-
23:34 - 23:36pipeline. Okay, so that's the kind of
-
23:36 - 23:38overall view, and here are the kind of
-
23:38 - 23:42key phases of your workflow. So one of
-
23:42 - 23:44the important phases is called EDA,
-
23:44 - 23:48exploratory data analysis and in this
-
23:48 - 23:50particular phase, you're going to
-
23:50 - 23:53do a lot of stuff, primarily just to
-
23:53 - 23:55understand your data set. So like I said,
-
23:55 - 23:57real life data sets, they tend to be very
-
23:57 - 23:59complex, and they tend to have various
-
23:59 - 24:01statistical properties, all right,
-
24:01 - 24:03statistics is a very important component
-
24:03 - 24:06of machine learning. So an EDA helps you
-
24:06 - 24:07to kind of get an overview of your data
-
24:07 - 24:10set, get an overview of any problems in
-
24:10 - 24:12your data set like any data that's
-
24:12 - 24:13missing, the statistical properties of your
-
24:13 - 24:15data set, the distribution of your data
-
24:15 - 24:17set, the statistical correlation of
-
24:17 - 24:19variables in your data set, etc,
-
24:19 - 24:23etc. Okay, then we have data cleaning or
-
24:23 - 24:25sometimes you call it data cleansing, and
-
24:25 - 24:28in this phase what you want to do is
-
24:28 - 24:29primarily, you want to kind of do things
-
24:29 - 24:32like remove duplicate records or rows in
-
24:32 - 24:34your table, you want to make sure that
-
24:34 - 24:37your data or your data
-
24:37 - 24:39points or your samples have appropriate IDs,
-
24:39 - 24:41and most importantly, you want to make
-
24:41 - 24:43sure there's not too many missing values
-
24:43 - 24:45in your data set. So what I mean by
-
24:45 - 24:46missing values are things like that,
-
24:46 - 24:48right? You have got a data set, and for
-
24:48 - 24:52some reason there are some cells or
-
24:52 - 24:55locations in your data set which are
-
24:55 - 24:57missing values, right? And if you have a
-
24:57 - 24:59lot of these missing values, then you've
-
24:59 - 25:00got a poor quality data set, and you're
-
25:00 - 25:02not going to be able to build a good
-
25:02 - 25:04model from this data set. You're not
-
25:04 - 25:06going to be able to train a good machine
-
25:06 - 25:08learning model from a data set with a
-
25:08 - 25:10lot of missing values like this. So you
-
25:10 - 25:12have to figure out whether there are a
-
25:12 - 25:13lot of missing values in your data set,
-
25:13 - 25:15how do you handle them. Another thing
-
25:15 - 25:17that's important in data cleansing is
-
25:17 - 25:19figuring out the outliers in your data
-
25:19 - 25:22set. So outliers are things like this
-
25:22 - 25:24you know data points are very far from
-
25:24 - 25:26the general trend of data points in your
-
25:26 - 25:30data set right and and so there are also
-
25:30 - 25:32several ways to detect outliers in your
-
25:32 - 25:34data set and there are several ways to
-
25:34 - 25:37handle outliers in your data set
-
25:37 - 25:38similarly as well there are several ways
-
25:38 - 25:40to handle missing values in your data
-
25:40 - 25:43set so handling missing values handling
-
25:43 - 25:46outliers those are really two very key
-
25:46 - 25:47importance of data
-
25:47 - 25:49cleansing and there are many many
-
25:49 - 25:51techniques to handle this so a data
-
25:51 - 25:52scientist needs to be acquainted with
-
25:52 - 25:55all of this all right why do I need to
-
25:55 - 25:58do data cleansing well here is the key
-
25:58 - 25:59point
-
25:59 - 26:03if you have a very poor quality data set
-
26:03 - 26:05which means youve got a lot of outliers
-
26:05 - 26:07which are errors in your data set or you
-
26:07 - 26:08got a lot of missing values in your data
-
26:08 - 26:11set even though youve got a fantastic
-
26:11 - 26:13algorithm you've got a fantastic model
-
26:13 - 26:16the predictions that your model is going
-
26:16 - 26:19to give is absolutely rubbish it's kind
-
26:19 - 26:22of like taking water and putting water
-
26:22 - 26:26into the tank of a mercedesbenz so
-
26:26 - 26:28Mercedes-Benz is a great car but if you
-
26:28 - 26:30take water and put it into your
-
26:30 - 26:33mercedes-ben it will just die right your
-
26:33 - 26:37car will just die can't run on on water
-
26:37 - 26:38right on the other hand if you have a
-
26:38 - 26:42myv myv is just a lousy car but if
-
26:42 - 26:45you take a high octane good Patrol and
-
26:45 - 26:47you point to a MV the MV will just go at
-
26:47 - 26:49you know 100 Mil hour it which just
-
26:49 - 26:51completely destroy the Mercedes-Benz in
-
26:51 - 26:53terms of performance so it doesn't it
-
26:53 - 26:55doesn't really matter what model you're
-
26:55 - 26:57using right so you can be using the most
-
26:57 - 26:59Fantastic Model like the the
-
26:59 - 27:01mercedesbenz or machine learning but if
-
27:01 - 27:03your data is lousy quality your
-
27:03 - 27:06predictions is also going to be rubbish
-
27:06 - 27:10okay so cleansing data set is in fact
-
27:10 - 27:12probably the most important thing that
-
27:12 - 27:14data scientists need to do and that's
-
27:14 - 27:16what they spend most of the time doing
-
27:16 - 27:18right building the model trading the
-
27:18 - 27:20model getting the right algorithms and
-
27:20 - 27:23so on that's really a small portion of
-
27:23 - 27:25the actual machine learning workflow
-
27:25 - 27:27right the actual uh machine learning
-
27:27 - 27:30workflow the vast majority of time is on
-
27:30 - 27:32cleaning and organizing your
-
27:32 - 27:33data then you have something called
-
27:33 - 27:35feature engineering which is you
-
27:35 - 27:37pre-process the feature variables of
-
27:37 - 27:39your original data set prior to using
-
27:39 - 27:41them to train the model and this is
-
27:41 - 27:42either through addition deletion
-
27:42 - 27:44combination or transformation of these
-
27:44 - 27:45variables and then the idea is you want
-
27:45 - 27:47to improve the predictive accuracy of
-
27:47 - 27:49the model and also because some models
-
27:49 - 27:51can only work with numeric data so you
-
27:51 - 27:54need to transform categorical data into
-
27:54 - 27:57numeric data all right so just now um in
-
27:57 - 27:59the earlier slides I showed you that you
-
27:59 - 28:01take your original data set you pum it
-
28:01 - 28:03into algorithm and then couple of hours
-
28:03 - 28:05later you get a machine learning model
-
28:05 - 28:09right so you didn't do anything to your
-
28:09 - 28:10data set to the feature variables in
-
28:10 - 28:12your data set before you pump it into a
-
28:12 - 28:14machine machine learning algorithm so
-
28:14 - 28:16what I showed you earlier is you just
-
28:16 - 28:19take the data set exactly as it is and
-
28:19 - 28:21you just pump it into the algorithm
-
28:21 - 28:23couple of hours later you get the model
-
28:23 - 28:28right uh but that's not what generally
-
28:28 - 28:30happens in in real life in real life
-
28:30 - 28:32you're going to take all the original
-
28:32 - 28:34feature variables from your data set and
-
28:34 - 28:37you're going to transform them in some
-
28:37 - 28:39way so you can see here these are the
-
28:39 - 28:42colums of data from my original data set
-
28:42 - 28:46and before I actually put all these data
-
28:46 - 28:48points from my original data set into my
-
28:48 - 28:51algorithm to train and get my model I
-
28:51 - 28:55will actually transform them okay so the
-
28:55 - 28:58transformation of these feature variable
-
28:58 - 29:01values we call this feature engineering
-
29:01 - 29:02and there are many many techniques to do
-
29:02 - 29:05feature engineering so one hot encoding
-
29:05 - 29:08scaling log transformation descri
-
29:08 - 29:10discretization date extraction Boolean
-
29:10 - 29:12logic etc
-
29:12 - 29:15etc okay then finally we do something
-
29:15 - 29:17called a train test plate so where we
-
29:17 - 29:19take our original data set right so this
-
29:19 - 29:21was the original data set and we break
-
29:21 - 29:24it into two parts so one is called the
-
29:24 - 29:26training data set and the other is
-
29:26 - 29:28called the test data set and the primary
-
29:28 - 29:30purpose for this is when we feed and
-
29:30 - 29:31train the machine learning model we're
-
29:31 - 29:33going to use what is called the training
-
29:33 - 29:36data set and we when we want to evaluate
-
29:36 - 29:37the accuracy of the model right so this
-
29:37 - 29:41is the key part of your machine learning
-
29:41 - 29:44life cycle because you are not only just
-
29:44 - 29:45going to have one possible models
-
29:45 - 29:48because there are a vast range of
-
29:48 - 29:50algorithms that you can use to create a
-
29:50 - 29:53model so fundamentally you have a wide
-
29:53 - 29:56range of choices right like wide range
-
29:56 - 29:58of cars right you want to buy a car you
-
29:58 - 30:01can buy buy a myv you can buy a paroda
-
30:01 - 30:03you can buy a Honda you can buy a
-
30:03 - 30:05mercedesbenz you can buy a Audi you can
-
30:05 - 30:08buy a beamer many many different cars
-
30:08 - 30:09you that available for you if you want
-
30:09 - 30:12to buy a car right same thing with a
-
30:12 - 30:14machine learning model that are aast
-
30:14 - 30:17variety of algorithms that you can
-
30:17 - 30:19choose from in order to create a model
-
30:19 - 30:22and so once you create a model from a
-
30:22 - 30:24given algorithm you need to say hey how
-
30:24 - 30:26accurate is this model that have created
-
30:26 - 30:29from this algorithm and and different
-
30:29 - 30:30algorithms are going to create different
-
30:30 - 30:34models with different rates of accuracy
-
30:34 - 30:36and so the primary purpose of the test
-
30:36 - 30:38data set is to evaluate the ACC accuracy
-
30:38 - 30:41of the model to see hey is this model
-
30:41 - 30:43that I've created using this algorithm
-
30:43 - 30:46is it adequate for me to use in a real
-
30:46 - 30:49life production use case Okay so that's
-
30:49 - 30:52what it's all about okay so this is my
-
30:52 - 30:54original data set I break it into my
-
30:54 - 30:57feature data uh feature data set and
-
30:57 - 30:59also my target variable colum so my
-
30:59 - 31:01feature variable uh colums the target
-
31:01 - 31:02variable colums and then I further break
-
31:02 - 31:04it into a training data set and a test
-
31:04 - 31:07data set the training data set is to use
-
31:07 - 31:08the train to create the machine learning
-
31:08 - 31:10model and then once the machine learning
-
31:10 - 31:12model is created I then use the test
-
31:12 - 31:15data set to evaluate the accuracy of the
-
31:15 - 31:16machine learning
-
31:16 - 31:21model all right and then finally we can
-
31:21 - 31:23see what are the different parts or
-
31:23 - 31:26aspects that go into a successful model
-
31:26 - 31:30so Eda about 10% data cleansing about
-
31:30 - 31:3220% feature engineering about
-
31:32 - 31:3625% selecting a specific algorithm about
-
31:36 - 31:3910% and then training the model from
-
31:39 - 31:42that algorithm about 15% and then
-
31:42 - 31:44finally evaluating the model deciding
-
31:44 - 31:46which is the best model with the highest
-
31:46 - 31:51accuracy rate that's about
-
31:54 - 31:5720% all right so we have reached the
-
31:57 - 31:59most interesting part of this
-
31:59 - 32:01presentation which is the demonstration
-
32:01 - 32:04of an endtoend machine learning workflow
-
32:04 - 32:06on a real life data set that
-
32:06 - 32:10demonstrates the use case of predictive
-
32:10 - 32:14maintenance so the for the data set for
-
32:14 - 32:16this particular use case I've used a
-
32:16 - 32:19data set from kegle so for those of you
-
32:19 - 32:21are not aware of this kegle is the
-
32:21 - 32:25world's largest open-source Community
-
32:25 - 32:28for data science and Ai and they have a
-
32:28 - 32:31large collection of data sets from all
-
32:31 - 32:34various uh areas of industry and human
-
32:34 - 32:37endeavor and they also have a large
-
32:37 - 32:39collection of models that have been
-
32:39 - 32:43developed using these data sets so here
-
32:43 - 32:47we have a data set for the particular
-
32:47 - 32:51use case predictive maintenance okay so
-
32:51 - 32:53this is some information about the data
-
32:53 - 32:56set uh so in case um you do not know how
-
32:56 - 32:59to get to there this is the URL to click
-
32:59 - 33:02on okay to get to that data set so once
-
33:02 - 33:05you at the data set here you can or the
-
33:05 - 33:07page for about this data set you can see
-
33:07 - 33:10all the information about this data set
-
33:10 - 33:13and you can download the data set in a
-
33:13 - 33:14CSV
-
33:14 - 33:16format okay so let's take a look at the
-
33:16 - 33:20data set so this data set has a total of
-
33:20 - 33:2310,000 samples okay and these are the
-
33:23 - 33:26feature variables the type the product
-
33:26 - 33:28ID the add temperature process
-
33:28 - 33:31temperature rotational speed talk tool
-
33:31 - 33:35Weare and this is the target variable
-
33:35 - 33:37all right so the target variable is what
-
33:37 - 33:38we are interested in what we are
-
33:38 - 33:41interested in using to train the machine
-
33:41 - 33:43learning model and also what we
-
33:43 - 33:45interested to predict okay so these are
-
33:45 - 33:48the feature variables they describe or
-
33:48 - 33:50they provide information about this
-
33:50 - 33:53particular machine on the production
-
33:53 - 33:55line on the assembly line so you might
-
33:55 - 33:57know the product ID the type the air
-
33:57 - 33:58temperature process temperature
-
33:58 - 34:00rotational speed talk to where right so
-
34:00 - 34:03let's say you've got a iot sensor system
-
34:03 - 34:06that's basically capturing all this data
-
34:06 - 34:08about a product or a machine on your
-
34:08 - 34:11production or assembly line okay and
-
34:11 - 34:14you've also captured information about
-
34:14 - 34:17whether is for a specific uh sample
-
34:17 - 34:20whether that sample uh experien a
-
34:20 - 34:23failure or not okay so the target value
-
34:23 - 34:26of zero okay indicates that there's no
-
34:26 - 34:28failure so zero means no failure and we
-
34:28 - 34:30can see that the vast majority of data
-
34:30 - 34:33points in this data set are no failure
-
34:33 - 34:34and here we can see an example here
-
34:34 - 34:37where you have a case of a failure so a
-
34:37 - 34:40failure is marked as a one positive and
-
34:40 - 34:43no failure is marked as zero negative
-
34:43 - 34:45all right so here we have one type of a
-
34:45 - 34:47failure it's called a power failure and
-
34:47 - 34:49if you scroll down the data set you see
-
34:49 - 34:50there are also other kinds of failures
-
34:50 - 34:53like a towar
-
34:53 - 34:57failure uh we have a over strain failure
-
34:57 - 34:59here for example
-
34:59 - 35:01uh we also have a power failure again
-
35:01 - 35:02and so on so if you scroll down through
-
35:02 - 35:04these 10,000 data points and or if
-
35:04 - 35:06you're familiar with using Excel to
-
35:06 - 35:09filter out values in a colume you can
-
35:09 - 35:12see that in this particular colume here
-
35:12 - 35:14which is the so-called Target variable
-
35:14 - 35:17colume you are going to have the vast
-
35:17 - 35:19majority of values as zero which means
-
35:19 - 35:23no failure and some of the rows or the
-
35:23 - 35:24data points you are going to have a
-
35:24 - 35:26value of one and for those rows that you
-
35:26 - 35:28have a value of one for example example
-
35:28 - 35:31here you are sorry for example here you
-
35:31 - 35:33are going to have different types of
-
35:33 - 35:35failure so like I said just now power
-
35:35 - 35:39failure tool set filia etc etc so we are
-
35:39 - 35:41going to go through the entire machine
-
35:41 - 35:44learning workflow process with this data
-
35:44 - 35:47set so to see an example of that we are
-
35:47 - 35:50going to use a we're going to go to the
-
35:50 - 35:52code section here all right so if I
-
35:52 - 35:54click on the code section here and right
-
35:54 - 35:56down here we have see what is called a
-
35:56 - 35:59data set notebook so this is basically a
-
35:59 - 36:02Jupiter notebook Jupiter is basically an
-
36:02 - 36:05python application which allows you to
-
36:05 - 36:09create a python machine learning
-
36:09 - 36:12program that basically builds your
-
36:12 - 36:15machine learning model assesses or
-
36:15 - 36:16evaluates his accuracy and generates
-
36:16 - 36:19predictions from it okay so here we have
-
36:19 - 36:22a whole bunch of Jupiter notebooks that
-
36:22 - 36:25are available and you can select any one
-
36:25 - 36:26of them all these notebooks are
-
36:26 - 36:29essentially going to process the data
-
36:29 - 36:32from this particular data set so if I go
-
36:32 - 36:35to this code page here I've actually
-
36:35 - 36:37selected a specific notebook that I'm
-
36:37 - 36:40going to run through to demonstrate an
-
36:40 - 36:43endtoend machine learning workflow using
-
36:43 - 36:46various machine learning libraries from
-
36:46 - 36:50the Python programming language okay so
-
36:50 - 36:52the uh particular notebook I'm going to
-
36:52 - 36:55use is this particular notebook here and
-
36:55 - 36:57you can also get the URL for that
-
36:57 - 37:00particular The Notebook from
-
37:00 - 37:04here okay so let's quickly do a quick
-
37:04 - 37:06revision again what are we trying to do
-
37:06 - 37:08here we're trying to build a machine
-
37:08 - 37:11learning classification model right so
-
37:11 - 37:13we said there are two primary areas of
-
37:13 - 37:15supervised learning one is regression
-
37:15 - 37:16which is used to predict a numerical
-
37:16 - 37:19Target variable and the second kind of
-
37:19 - 37:21supervised learning is classification
-
37:21 - 37:23which is what we're doing here we're
-
37:23 - 37:26trying to predict a categorical Target
-
37:26 - 37:30variable okay so in this particular
-
37:30 - 37:32example we actually have two kinds of
-
37:32 - 37:34ways we can classify either a binary
-
37:34 - 37:38classification or a multiclass
-
37:38 - 37:40classification so for binary
-
37:40 - 37:41classification we are only going to
-
37:41 - 37:43classify the product or machine as
-
37:43 - 37:47either it failed or it did not fail okay
-
37:47 - 37:49so if we go back to the data set that I
-
37:49 - 37:51showed you just now if you look at this
-
37:51 - 37:53target variable colume there are only
-
37:53 - 37:55two possible values here they either
-
37:55 - 37:58zero or one zero means there's no fi
-
37:58 - 38:01one means that's a failure okay so this
-
38:01 - 38:03is an example of a binary classification
-
38:03 - 38:07only two possible outcomes zero or one
-
38:07 - 38:10didn't fail or fail all right two
-
38:10 - 38:13possible outcomes and then we can also
-
38:13 - 38:15for the same data set we can extend it
-
38:15 - 38:18and make it a multiclass classification
-
38:18 - 38:21problem all right so if we kind of want
-
38:21 - 38:24to drill down further we can say that
-
38:24 - 38:27not only is there a failure we can
-
38:27 - 38:29actually say that are different types of
-
38:29 - 38:32failures okay so we have one category of
-
38:32 - 38:36class that is basically no failure okay
-
38:36 - 38:37then we have a category for the
-
38:37 - 38:40different types of failures right so you
-
38:40 - 38:44can have a power failure you could have
-
38:44 - 38:46a tool Weare
-
38:46 - 38:49failure uh you could have let's go down
-
38:49 - 38:51here you could have a over strain
-
38:51 - 38:54failure and etc etc so you can have
-
38:54 - 38:57multiple classes of failure in addition
-
38:57 - 39:01to the general overall or the majority
-
39:01 - 39:04class of no failure and that would be a
-
39:04 - 39:07multiclass classification problem so
-
39:07 - 39:08with this data set we are going to see
-
39:08 - 39:11how to make it a binary classification
-
39:11 - 39:13problem and also a multiclass
-
39:13 - 39:15classification problem okay so let's
-
39:15 - 39:17look at the workflow so let's say we've
-
39:17 - 39:19already got the data so right now we do
-
39:19 - 39:21have the data set this is the data set
-
39:21 - 39:23that we have so let's assume we've
-
39:23 - 39:25somehow managed to get this data set
-
39:25 - 39:27from some iot sensors that are
-
39:27 - 39:29monitoring realtime data in our
-
39:29 - 39:31production environment on the assembly
-
39:31 - 39:33line on the production line we've got
-
39:33 - 39:35sensors reading data that gives us all
-
39:35 - 39:38these data that we have in this CSV file
-
39:38 - 39:40Okay so we've already got the data we've
-
39:40 - 39:42retrieved the data now we're going to go
-
39:42 - 39:45on to the cleaning and exploration part
-
39:45 - 39:48of your machine learning life cycle all
-
39:48 - 39:50right so let's look at the data cleaning
-
39:50 - 39:51part so the data cleaning part we
-
39:51 - 39:54interested in uh checking for missing
-
39:54 - 39:56values and maybe removing the rows you
-
39:56 - 39:58missing values okay
-
39:58 - 40:00uh so the kind of things we can sorry
-
40:00 - 40:01the kind of things we can do in missing
-
40:01 - 40:03values we can remove the row missing
-
40:03 - 40:06values we can put in some new values uh
-
40:06 - 40:08some replacement values which could be a
-
40:08 - 40:10average of all the values in that that
-
40:10 - 40:13particular colume etc etc we also try to
-
40:13 - 40:15identify outliers in our data set and
-
40:15 - 40:17also there are a variety of ways to deal
-
40:17 - 40:19with that so this is called Data
-
40:19 - 40:21cleansing which is a really important
-
40:21 - 40:23part of your machine learning workflow
-
40:23 - 40:26right so that's where we are now at
-
40:26 - 40:27we're doing cleansing and then we're
-
40:27 - 40:29going to follow up with
-
40:29 - 40:31exploration so let's look at the actual
-
40:31 - 40:33code that does the cleansing here so
-
40:33 - 40:36here we are right at the start of the uh
-
40:36 - 40:38machine learning uh life cycle here so
-
40:38 - 40:41this is a Jupiter notebook so here we
-
40:41 - 40:43have a brief description of the problem
-
40:43 - 40:46statement all right so this data set
-
40:46 - 40:48reflects real life predictive
-
40:48 - 40:49maintenance enounter industry with
-
40:49 - 40:50measurements from real equipment the
-
40:50 - 40:52features description is taken directly
-
40:52 - 40:55from the data source set so here we have
-
40:55 - 40:57a description of the six key features in
-
40:57 - 41:00our data set type which is the quality
-
41:00 - 41:03of the product the air temperature the
-
41:03 - 41:05process temperature the rotational speed
-
41:05 - 41:07the talk and the towar all right so
-
41:07 - 41:09these are the six feature variables and
-
41:09 - 41:11there are the two target variables so
-
41:11 - 41:13just now I showed you just now there's
-
41:13 - 41:15one target variable which only has two
-
41:15 - 41:17possible values either zero or one okay
-
41:17 - 41:20zero or one means failure or no failure
-
41:20 - 41:23so that will be this colume here right
-
41:23 - 41:25so let me go all the way back up to here
-
41:25 - 41:27so this colume here we already saw it
-
41:27 - 41:29only has two I values is either zero or
-
41:29 - 41:33one and then we also have this column
-
41:33 - 41:35here and this column here is basically
-
41:35 - 41:38the failure type and so the we have as I
-
41:38 - 41:41already demonstrated just now we do have
-
41:41 - 41:43uh several categories of or types of
-
41:43 - 41:46failure and so here we call this
-
41:46 - 41:47multiclass
-
41:47 - 41:50classification so we can either build a
-
41:50 - 41:52binary classification model for this
-
41:52 - 41:54problem domain or we can build a
-
41:54 - 41:55multiclass
-
41:55 - 41:58classification problem all right so this
-
41:58 - 42:00jupyter notebook is going to demonstrate
-
42:00 - 42:02both approaches to us so first step we
-
42:02 - 42:05are going to write all this python code
-
42:05 - 42:07that's going to import all the libraries
-
42:07 - 42:09that we need to use okay so this is
-
42:09 - 42:12basically python code okay and it's
-
42:12 - 42:15importing the relevant machine learn
-
42:15 - 42:18oops we are importing the relevant
-
42:18 - 42:21machine learning libraries related to
-
42:21 - 42:24our domain use case okay then we load in
-
42:24 - 42:26our data set okay so this our data set
-
42:26 - 42:28we describe it we have some quick
-
42:28 - 42:31insights into the data set um and then
-
42:31 - 42:33we just take a look at all the variables
-
42:33 - 42:36of the feature variables Etc and so on
-
42:36 - 42:38we just what we're doing now is just
-
42:38 - 42:40doing a quick overview of the data set
-
42:40 - 42:42so this all this python code here they
-
42:42 - 42:44were writing is allowing us the data
-
42:44 - 42:45scientist to get a quick overview of our
-
42:45 - 42:48data set right okay like how many um V
-
42:48 - 42:50how many rows are there how many columns
-
42:50 - 42:52are there what are the data types of the
-
42:52 - 42:53colums what are the name of the columns
-
42:53 - 42:57etc etc okay then we zoom in on to the
-
42:57 - 42:59Target variables so we look at the
-
42:59 - 43:02Target variables how many uh counts
-
43:02 - 43:05there are of this target variable uh and
-
43:05 - 43:06so on how many different types of
-
43:06 - 43:08failures there are then you want to
-
43:08 - 43:09check whether there are any
-
43:09 - 43:11inconsistencies between the Target and
-
43:11 - 43:14the failure type Etc okay so when you do
-
43:14 - 43:15all this checking you're going to
-
43:15 - 43:17discover there are some discrepancies in
-
43:17 - 43:20your data set so using a specific python
-
43:20 - 43:22code to do checking you're going to say
-
43:22 - 43:23hey you know what there's some errors
-
43:23 - 43:25here right there are nine values that
-
43:25 - 43:27classify as failure and Target variable
-
43:27 - 43:28but as no no failure in the failure type
-
43:28 - 43:30variable so that means there's a
-
43:30 - 43:33discrepancy in your data point right so
-
43:33 - 43:35which are so these are all the ones that
-
43:35 - 43:36are discrepancies because the target
-
43:36 - 43:39variable says one and we already know
-
43:39 - 43:41that Target variable one is supposed to
-
43:41 - 43:43mean that it's a failure right target
-
43:43 - 43:45varable one is supposed to mean that is
-
43:45 - 43:47a failure so we are kind of expecting to
-
43:47 - 43:50see the failure classification but some
-
43:50 - 43:51rows actually say there's no failure
-
43:51 - 43:54although the target type is one but here
-
43:54 - 43:56is a classic example of an error that
-
43:56 - 43:59can very well Ur in a data set so now
-
43:59 - 44:01the question is what do you do with
-
44:01 - 44:05these errors in your data set right so
-
44:05 - 44:06here the data scientist says I think it
-
44:06 - 44:08would make sense to remove those
-
44:08 - 44:10instances and so they write some code
-
44:10 - 44:13then to remove those instances or those
-
44:13 - 44:15uh rows or data points from the overall
-
44:15 - 44:17data set and same thing we can again
-
44:17 - 44:19check for other ISU so we find there's
-
44:19 - 44:21another ISU here with our data set which
-
44:21 - 44:24is another warning so again we can
-
44:24 - 44:26possibly remove them so you're going to
-
44:26 - 44:31remove 20 7 instances or rows from your
-
44:31 - 44:34overall data set so your data set has a
-
44:34 - 44:3710,000 uh rows or data points you're
-
44:37 - 44:40removing 27 which is only 0.27 of the
-
44:40 - 44:42entire data set and these were the
-
44:42 - 44:46reasons why you remove them okay so if
-
44:46 - 44:48you're just removing to uh 0.27% of the
-
44:48 - 44:51anti data set no big deal right still
-
44:51 - 44:53okay but you needed to remove them
-
44:53 - 44:56because these errors right this
-
44:56 - 44:5827 um
-
44:58 - 45:01errors okay data points with errors in
-
45:01 - 45:03your data set could really affect the
-
45:03 - 45:05training of your machine learning model
-
45:05 - 45:09so we need to do your data cleansing
-
45:09 - 45:12right so we are actually cleansing now
-
45:12 - 45:15uh uh some kind of data that is
-
45:15 - 45:18incorrect or erroneous in your original
-
45:18 - 45:21data set okay so then we go on to the
-
45:21 - 45:24next part which is called Eda right so
-
45:24 - 45:29Eda is where we kind of explore our data
-
45:29 - 45:32and we want to kind of get a visual
-
45:32 - 45:34overview of our data as a whole and also
-
45:34 - 45:36take a look at the statistical
-
45:36 - 45:38properties of data the statistical
-
45:38 - 45:40distribution of the data in all the
-
45:40 - 45:43various colums the correlation between
-
45:43 - 45:45the variables between the feature
-
45:45 - 45:47variables different columns and also the
-
45:47 - 45:49feature variable and the target variable
-
45:49 - 45:52so all of this is called Eda and Eda in
-
45:52 - 45:54a machine learning workflow is typically
-
45:54 - 45:57done through visualization
-
45:57 - 45:59all right so let's go back here and take
-
45:59 - 46:01a look right so for example here we are
-
46:01 - 46:03looking at correlation so we plot the
-
46:03 - 46:06values of all the various feature
-
46:06 - 46:08variables against each other and look
-
46:08 - 46:11for potential correlations and patterns
-
46:11 - 46:13and so on and all the different shapes
-
46:13 - 46:17that you see here in this pair plot okay
-
46:17 - 46:18uh will have different meaning
-
46:18 - 46:20statistical meaning and so the data
-
46:20 - 46:22scientist has to kind of visually
-
46:22 - 46:24inspect this P plot makes some
-
46:24 - 46:26interpretations of these different
-
46:26 - 46:28patterns that he sees here all right so
-
46:28 - 46:30these are some of the insights that that
-
46:30 - 46:33can be deduced from looking at these
-
46:33 - 46:34pattern so for example the Tor and
-
46:34 - 46:36rotational speed are highly correlated
-
46:36 - 46:38the process temperature and a
-
46:38 - 46:40temperature so highly correlated that
-
46:40 - 46:42failures occur for extreme values of
-
46:42 - 46:45some features etc etc then you can plot
-
46:45 - 46:46certain kinds of charts this called a
-
46:46 - 46:48violing chart to again get new insights
-
46:48 - 46:50for example regarding the talk and
-
46:50 - 46:51rotational speed it can see again that
-
46:51 - 46:53most failures are triggered for much
-
46:53 - 46:55lower or much higher values than the
-
46:55 - 46:57mean when they're not failing so all
-
46:57 - 47:01these visualizations they are there and
-
47:01 - 47:02a trained data scientist can look at
-
47:02 - 47:05them inspect them and make some kind of
-
47:05 - 47:08insightful deductions from them okay
-
47:08 - 47:11percentage of failure right uh the
-
47:11 - 47:14correlation heat map okay between all
-
47:14 - 47:16these different feature variables and
-
47:16 - 47:17also the target
-
47:17 - 47:20variable okay uh the product types
-
47:20 - 47:21percentage of product types percentage
-
47:21 - 47:23of failure with respect to the product
-
47:23 - 47:26type so we can also kind of visualize
-
47:26 - 47:28that as well so certain products have a
-
47:28 - 47:30higher ratio of faure compared to other
-
47:30 - 47:33product types Etc or for example uh M
-
47:33 - 47:36tends to feel more than H products etc
-
47:36 - 47:39etc so we can create a vast variety of
-
47:39 - 47:41visualizations in the Eda stage so you
-
47:41 - 47:44can see here and again the idea of this
-
47:44 - 47:46visualization is just to give us some
-
47:46 - 47:50insight some preliminary insight into
-
47:50 - 47:53our data set that helps us to model it
-
47:53 - 47:54more correctly so some more insights
-
47:54 - 47:56that we get into our data set from all
-
47:56 - 47:58this visualization
-
47:58 - 48:00then we can plot the distribution so we
-
48:00 - 48:01can see whether it's a normal
-
48:01 - 48:03distribution or some other kind of
-
48:03 - 48:06distribution uh we can have a box plot
-
48:06 - 48:08to see whether there are any outliers in
-
48:08 - 48:10your data set and so on right so we can
-
48:10 - 48:12see from the box plots we can see
-
48:12 - 48:15rotational speed and have outliers so we
-
48:15 - 48:17already saw outliers are basically a
-
48:17 - 48:19problem that you may need to kind of
-
48:19 - 48:23tackle right so outliers are an isue uh
-
48:23 - 48:25it's a it's a part of data cleansing and
-
48:25 - 48:27so you may need to tackle this so we may
-
48:27 - 48:29have to check okay well where are the
-
48:29 - 48:31potential outliers so we can analyze
-
48:31 - 48:35them from the box blot okay um but then
-
48:35 - 48:37we can say well they are outliers but
-
48:37 - 48:39maybe they're not really horrible
-
48:39 - 48:41outliers so we can tolerate them or
-
48:41 - 48:43maybe we want to remove them so we can
-
48:43 - 48:45see what the mean and maximum values for
-
48:45 - 48:47all these with respect to product type
-
48:47 - 48:50how many of them are above or highly
-
48:50 - 48:51correlated with the product type in
-
48:51 - 48:54terms of the maximum and minimum okay
-
48:54 - 48:57and then so on so the Insight is well we
-
48:57 - 49:00got 4.8% of the instances are outliers
-
49:00 - 49:03so maybe 4.87% is not really that much
-
49:03 - 49:05the outliers are not horrible so we just
-
49:05 - 49:07leave them in the data set now for a
-
49:07 - 49:09different data set the data scientist
-
49:09 - 49:10could come to different conclusion so
-
49:10 - 49:12then they would do whatever they've
-
49:12 - 49:15deemed is appropriate to kind of cleanse
-
49:15 - 49:18the data set okay so now that we have
-
49:18 - 49:20done all the Eda the next thing we're
-
49:20 - 49:23going to do is we are going to do what
-
49:23 - 49:26is called feature engineering so we are
-
49:26 - 49:29going to transform our original feature
-
49:29 - 49:31variables and these are our original
-
49:31 - 49:33feature variables right these are our
-
49:33 - 49:35original feature variables and we are
-
49:35 - 49:38going to transform them all right we're
-
49:38 - 49:40going to transform them in some sense uh
-
49:40 - 49:44into some other form before we fit this
-
49:44 - 49:46for training into our machine learning
-
49:46 - 49:49algorithm all right so these are
-
49:49 - 49:52examples of let's say this example of a
-
49:52 - 49:55original data set right and this is
-
49:55 - 49:57examples these are some of the examples
-
49:57 - 49:58you don't have to use all of them but
-
49:58 - 49:59these are some of examples of what we
-
49:59 - 50:01call feature engineering which you can
-
50:01 - 50:04then transform your original values in
-
50:04 - 50:05your feature variables to all these
-
50:05 - 50:08transform values here so we're going to
-
50:08 - 50:10pretty much do that here so we have a
-
50:10 - 50:13ordinal encoding we do scaling of the
-
50:13 - 50:15data so the data set is scaled we use a
-
50:15 - 50:18minmax scaling and then finally we come
-
50:18 - 50:22to do a modeling so we have to split our
-
50:22 - 50:24data set into a training data set and a
-
50:24 - 50:29test data set so coming back to again um
-
50:29 - 50:32we said that in a before you train your
-
50:32 - 50:34model sorry before you train your model
-
50:34 - 50:36you have to take your original data set
-
50:36 - 50:37now this is a featured engineered data
-
50:37 - 50:39set we're going to break it into two or
-
50:39 - 50:41more subsets okay so one is called the
-
50:41 - 50:42training data set that we use to Feit
-
50:42 - 50:44and train a machine learning model the
-
50:44 - 50:46second is test data set to evaluate the
-
50:46 - 50:48accuracy of the model okay so we got
-
50:48 - 50:51this training data set your test data
-
50:51 - 50:53set and we also need
-
50:53 - 50:56to sample so from our original data set
-
50:56 - 50:57we need to sample sample some points
-
50:57 - 50:59that go into your training data set some
-
50:59 - 51:01points that go in your test data set so
-
51:01 - 51:03there are many ways to do sampling one
-
51:03 - 51:05way is to do stratified sampling where
-
51:05 - 51:07we ensure the same proportion of data
-
51:07 - 51:09from each steta or class because right
-
51:09 - 51:11now we have a multiclass classification
-
51:11 - 51:12problem so you want to make sure the
-
51:12 - 51:14same proportion of data from each TR
-
51:14 - 51:16class is equally proportional in the
-
51:16 - 51:18training and test data set as the
-
51:18 - 51:20original data set which is very useful
-
51:20 - 51:22for dealing with what is called an
-
51:22 - 51:24imbalanced data set so here we have an
-
51:24 - 51:26example of what is called an imbalanced
-
51:26 - 51:30data set in the sense that you have the
-
51:30 - 51:33vast majority of data points in your
-
51:33 - 51:35data set they are going to have the
-
51:35 - 51:37value of zero for their target variable
-
51:37 - 51:40colume so only a extremely small
-
51:40 - 51:43minority of the data points in your data
-
51:43 - 51:45set will actually have the value of one
-
51:45 - 51:49for their target variable colume okay so
-
51:49 - 51:51a situation where you have your class or
-
51:51 - 51:53your target variable colume where the
-
51:53 - 51:54vast majority of values are from one
-
51:54 - 51:58class and a tiny small minority are from
-
51:58 - 52:01another class we call this an imbalanced
-
52:01 - 52:03data set and for an imbalanced data set
-
52:03 - 52:04typically we will have a specific
-
52:04 - 52:06technique to do the train test split
-
52:06 - 52:08which is called stratified sampling and
-
52:08 - 52:10so that's what's exactly happening here
-
52:10 - 52:12we're doing a stratified split here so
-
52:12 - 52:15we are doing a train test split here uh
-
52:15 - 52:18and we are doing a stratified split uh
-
52:18 - 52:20and then now we actually develop the
-
52:20 - 52:23models so now we've got the train test
-
52:23 - 52:25plate now here is where we actually
-
52:25 - 52:27train the models
-
52:27 - 52:30now in terms of classification there are
-
52:30 - 52:32a whole bunch of
-
52:32 - 52:35possibilities right that you can use
-
52:35 - 52:38there are many many different algorithms
-
52:38 - 52:41that we can use to create a
-
52:41 - 52:43classification model so this are an
-
52:43 - 52:45example of some of the more common ones
-
52:45 - 52:47logistic support Vector machine decision
-
52:47 - 52:50trees random Forest bagging balance
-
52:50 - 52:53bagging boost assemble Ensemble so all
-
52:53 - 52:55these are different algorithms which
-
52:55 - 52:58will create different kind of models
-
52:58 - 53:02which will result in different accuracy
-
53:02 - 53:05measures okay so it's the goal of the
-
53:05 - 53:09data scientist to find the best model
-
53:09 - 53:12that gives the best accuracy for the
-
53:12 - 53:14given data set for training on that
-
53:14 - 53:17given data set so let's head back again
-
53:17 - 53:20to uh our machine learning workflow so
-
53:20 - 53:22here basically what I'm doing is I'm
-
53:22 - 53:24creating a whole bunch of models here
-
53:24 - 53:26all right so one is a random Forest one
-
53:26 - 53:27is balance bagging one is a boost
-
53:27 - 53:30classifier one's The Ensemble classifier
-
53:30 - 53:33and using all of these I am going to
-
53:33 - 53:35basically Feit or train my model using
-
53:35 - 53:37all these algorithms and then I'm going
-
53:37 - 53:40to evaluate them okay I'm going to
-
53:40 - 53:42evaluate how good each of these models
-
53:42 - 53:46are and here you can see your value your
-
53:46 - 53:49evaluation data right okay and this is
-
53:49 - 53:51the confusion Matrix which is another
-
53:51 - 53:54way of evaluating so now we come to the
-
53:54 - 53:56kind of the the the key part here which
-
53:56 - 53:59is which is how do I distinguish between
-
53:59 - 54:00all these models right I've got all
-
54:00 - 54:01these different models which are built
-
54:01 - 54:03with different algorithms which I'm
-
54:03 - 54:05using to train on the same data set how
-
54:05 - 54:07do I distinguish between all these
-
54:07 - 54:10models okay and so for that sense for
-
54:10 - 54:14that we actually have a whole bunch of
-
54:14 - 54:16common evaluation matrics for
-
54:16 - 54:18classification right so this evaluation
-
54:18 - 54:22matrics tell us how good a model is in
-
54:22 - 54:24terms of its accuracy in
-
54:24 - 54:27classification so in terms of
-
54:27 - 54:29accuracy we actually have many different
-
54:29 - 54:32models uh sorry many different measures
-
54:32 - 54:33right you might think well accuracy is
-
54:33 - 54:35just accuracy well that's all right it's
-
54:35 - 54:37just either it's accurate or it's not
-
54:37 - 54:39accurate right but actually it's not
-
54:39 - 54:41that simple there are many different
-
54:41 - 54:44ways to measure the accuracy of a
-
54:44 - 54:45classification model and these are some
-
54:45 - 54:48of the more common ones so for example
-
54:48 - 54:51the confusion metrix tells us how many
-
54:51 - 54:54true positives that means the value is
-
54:54 - 54:56positive the prediction is positive how
-
54:56 - 54:58many false FAL positives which means the
-
54:58 - 54:59value is negative the machine learning
-
54:59 - 55:02model predicts positive how many false
-
55:02 - 55:04negatives which means that the machine
-
55:04 - 55:06learning model predicts negative but
-
55:06 - 55:07it's actually positive and how many true
-
55:07 - 55:09negatives there are which means that the
-
55:09 - 55:11machine the machine learning model
-
55:11 - 55:13predicts negative and the true value is
-
55:13 - 55:15also negative so this is called a
-
55:15 - 55:17confusion Matrix this is one way we
-
55:17 - 55:19assess or evaluate the performance of a
-
55:19 - 55:21classification
-
55:21 - 55:23model okay this is for binary
-
55:23 - 55:25classification we can also have
-
55:25 - 55:27multiclass confusion Matrix
-
55:27 - 55:29and then we can also measure things like
-
55:29 - 55:32accuracy so accuracy is the true
-
55:32 - 55:34positives plus the true negatives which
-
55:34 - 55:35is the total number of correct
-
55:35 - 55:38predictions made by the model divided by
-
55:38 - 55:40the total number of data points in your
-
55:40 - 55:43data set and then you have also other
-
55:43 - 55:44kinds of
-
55:44 - 55:47measures uh such as recall and this is a
-
55:47 - 55:49formula for recall this is a formula for
-
55:49 - 55:51the F1 score okay and then there's
-
55:51 - 55:56something called the uh R curve right so
-
55:56 - 55:57without going too much in the detail of
-
55:57 - 55:59what each of these entails essentially
-
55:59 - 56:01these are all different ways these are
-
56:01 - 56:03different kpi right just like if you
-
56:03 - 56:06work in a company you have different kpi
-
56:06 - 56:08right certain employees have certain kpi
-
56:08 - 56:11that measures how good or how how uh you
-
56:11 - 56:13know efficient or how effective a
-
56:13 - 56:16particular employee is right so the
-
56:16 - 56:20kpi kpi for your machine learning models
-
56:20 - 56:24are Roc curve F1 score recall accuracy
-
56:24 - 56:27okay and your confusion Matrix so so
-
56:27 - 56:30fundamentally after I have built right
-
56:30 - 56:33so here I've built my four different
-
56:33 - 56:35models so after I built these form
-
56:35 - 56:38different models I'm going to check and
-
56:38 - 56:40evaluate them using all those different
-
56:40 - 56:42metrics like for example the F1 score
-
56:42 - 56:45the Precision score the recall score all
-
56:45 - 56:47right so for this model I can check out
-
56:47 - 56:50the ROC score the F1 score the Precision
-
56:50 - 56:52score the recall score then for this
-
56:52 - 56:55model this is the ROC score the F1 score
-
56:55 - 56:57the Precision score the recall called
-
56:57 - 57:00then for this model and so on so for
-
57:00 - 57:03every single model I've created using my
-
57:03 - 57:06training data set I will have all my set
-
57:06 - 57:08of evaluation metrics that I can use to
-
57:08 - 57:12evaluate how good this model is okay
-
57:12 - 57:13same thing here I've got a confusion
-
57:13 - 57:15Matrix here right so I can use that
-
57:15 - 57:18again to evaluate between all these four
-
57:18 - 57:20different models and then I kind of
-
57:20 - 57:22summarize it up here so we can see from
-
57:22 - 57:25this summary here that actually the top
-
57:25 - 57:28two models right which are I'm going to
-
57:28 - 57:29give a lot as a data scientist I'm now
-
57:29 - 57:31going to just focus on these two models
-
57:31 - 57:33so these two models are begging
-
57:33 - 57:36classifier and random Forest classifier
-
57:36 - 57:38they have the highest values of F1 score
-
57:38 - 57:40and the highest values of the rooc curve
-
57:40 - 57:43score okay so we can say these are the
-
57:43 - 57:46top two models in terms of accuracy okay
-
57:46 - 57:49using the fub1 evaluation metric and the
-
57:49 - 57:54r Au evaluation metric okay so these
-
57:54 - 57:57results uh kind of summarize here and
-
57:57 - 57:59then we use different sampling
-
57:59 - 58:01techniques okay so just now I talked
-
58:01 - 58:04about um different kinds of sampling
-
58:04 - 58:06techniques and so the idea of different
-
58:06 - 58:08kinds of sampling techniques is to just
-
58:08 - 58:11get a different feel for different
-
58:11 - 58:14distributions of the data in different
-
58:14 - 58:16areas of your data set so that you want
-
58:16 - 58:20to just kind of make sure that your your
-
58:20 - 58:23your evaluation of accuracy is actually
-
58:23 - 58:27statistically correct right so we can um
-
58:27 - 58:30do what is called oversampling and under
-
58:30 - 58:31sampling which is very useful when
-
58:31 - 58:32you're working with an imbalance data
-
58:32 - 58:35set so this is example of doing that and
-
58:35 - 58:37then here we again again check out the
-
58:37 - 58:39results for all these different
-
58:39 - 58:42techniques we use uh the F1 score the Au
-
58:42 - 58:44score all right these are the two key
-
58:44 - 58:47measures of accuracy right so and then
-
58:47 - 58:48we can check out the scores for the
-
58:48 - 58:50different approaches okay so we can see
-
58:50 - 58:53oh well overall the models have lower Au
-
58:53 - 58:56r r Au C score but they have a much
-
58:56 - 58:58higher F1 score the begging classifier
-
58:58 - 59:01had the highest R1 highest roc1 score
-
59:01 - 59:04but F1 score was too low okay then in
-
59:04 - 59:07the data scientist opinion the random
-
59:07 - 59:09forest with this particular technique of
-
59:09 - 59:11sampling has equilibrium between the F1
-
59:11 - 59:14R F1 R and A score so the takeaway one
-
59:14 - 59:17is the macro F1 score improves
-
59:17 - 59:18dramatically using the sampl sampling
-
59:18 - 59:20techniqu so these models might be better
-
59:20 - 59:22compared to the balanced ones all right
-
59:22 - 59:26so based on all this uh evaluation the
-
59:26 - 59:28data scientist says they're going to
-
59:28 - 59:30continue to work with these two models
-
59:30 - 59:31all right and the balance begging one
-
59:31 - 59:33and then continue to make further
-
59:33 - 59:35comparisons all right so then we
-
59:35 - 59:37continue to keep refining on our
-
59:37 - 59:39evaluation work here we're going to
-
59:39 - 59:41train the models one more time again so
-
59:41 - 59:43we again do a training test plate and
-
59:43 - 59:45then we do that for this particular uh
-
59:45 - 59:47approach model and then we print out we
-
59:47 - 59:48print out what is called a
-
59:48 - 59:51classification report and this is
-
59:51 - 59:53basically a summary of all those metrics
-
59:53 - 59:55that I talk about just now so just now
-
59:55 - 59:58remember I said the the there was
-
59:58 - 60:00several evaluation metrics right so uh
-
60:00 - 60:01we had the confusion matrics the
-
60:01 - 60:04accuracy the Precision the recall the Au
-
60:04 - 60:08ccore so here with the um classification
-
60:08 - 60:10report I can get a summary of all of
-
60:10 - 60:12that so I can see all the values here
-
60:12 - 60:15okay for this particular model begging
-
60:15 - 60:17Tomac links and then I can do that for
-
60:17 - 60:19another model the random Forest
-
60:19 - 60:21borderline SME and then I can do that
-
60:21 - 60:22for another model which is the balance
-
60:22 - 60:25ping so again we see this a lot of
-
60:25 - 60:27comparison between different models
-
60:27 - 60:29trying to figure out what all these
-
60:29 - 60:31evaluation metrics are telling us all
-
60:31 - 60:33right then again we have a confusion
-
60:33 - 60:36Matrix so we generate a confusion Matrix
-
60:36 - 60:39for the bagging with the toac links
-
60:39 - 60:41under sampling for the random followers
-
60:41 - 60:43with the borderline mod over sampling
-
60:43 - 60:45and just balance begging by itself then
-
60:45 - 60:48again we compare between these three uh
-
60:48 - 60:51models uh using the confusion Matrix
-
60:51 - 60:53evaluation Matrix and then we can kind
-
60:53 - 60:56of come to some conclusions all right so
-
60:56 - 60:58right so now we look at all the data
-
60:58 - 61:01then we move on and look at another um
-
61:01 - 61:03another kind of evaluation metrix which
-
61:03 - 61:07is the r score right so this is one of
-
61:07 - 61:09the other evaluation metrics I talk
-
61:09 - 61:11about so this one is a kind of a curve
-
61:11 - 61:13you look at it to see the area
-
61:13 - 61:14underneath the curve this is called AOC
-
61:14 - 61:18R area under the curve sorry Au Au R
-
61:18 - 61:20area under the curve all right so the
-
61:20 - 61:22area under the curve uh
-
61:22 - 61:24score will give us some idea about the
-
61:24 - 61:26threshold that we're going to use for
-
61:26 - 61:28classif ification so we can examine this
-
61:28 - 61:29for the bagging classifier for the
-
61:29 - 61:31random forest classifier for the balance
-
61:31 - 61:34bagging classifier okay then we can also
-
61:34 - 61:36again do that uh finally we can check
-
61:36 - 61:38the classification report of this
-
61:38 - 61:40particular model so we keep doing this
-
61:40 - 61:43over and over again evaluating this m
-
61:43 - 61:46The Matrix the the accuracy Matrix the
-
61:46 - 61:47evaluation Matrix for all these
-
61:47 - 61:49different models so we keep doing this
-
61:49 - 61:51over and over again for different
-
61:51 - 61:53thresholds or for classification and so
-
61:53 - 61:57as we keep drilling into these we kind
-
61:57 - 62:01of get more and more understanding of
-
62:01 - 62:03all these different models which one is
-
62:03 - 62:05the best one that gives the best
-
62:05 - 62:09performance for our data set okay so
-
62:09 - 62:11finally we come to this conclusion this
-
62:11 - 62:14particular model is not able to reduce
-
62:14 - 62:15the record on failure test than
-
62:15 - 62:1895.8% on the other hand balance begging
-
62:18 - 62:19with a decision thresold of 0.6 is able
-
62:19 - 62:22to have a better recall blah blah blah
-
62:22 - 62:25Etc so finally after having done all of
-
62:25 - 62:27this evalu ations
-
62:27 - 62:31okay this is the conclusion
-
62:31 - 62:34so after having gone so right now we
-
62:34 - 62:35have gone through all the steps of the
-
62:35 - 62:38Machining learning life cycle and which
-
62:38 - 62:40means we have right now or the data
-
62:40 - 62:42scientist right now has gone through all
-
62:42 - 62:43these
-
62:43 - 62:47steps uh which is now we have done this
-
62:47 - 62:49validation so we have done the cleaning
-
62:49 - 62:51exploration preparation transformation
-
62:51 - 62:53the future engineering we have developed
-
62:53 - 62:54and trained multiple models we have
-
62:54 - 62:56evaluated all these different models so
-
62:56 - 62:59right now we have reached this stage so
-
62:59 - 63:03at this stage we as the data scientist
-
63:03 - 63:05kind of have completed our job so we've
-
63:05 - 63:08come to some very useful conclusions
-
63:08 - 63:10which we now can share with our
-
63:10 - 63:13colleagues all right and based on this
-
63:13 - 63:15uh conclusions or recommendations
-
63:15 - 63:17somebody is going to choose a
-
63:17 - 63:19appropriate model and that model is
-
63:19 - 63:23going to get deployed for realtime use
-
63:23 - 63:25in a real life production environment
-
63:25 - 63:27okay and that decision is going to be
-
63:27 - 63:29made based on the recommendations coming
-
63:29 - 63:31from the data scientist at the end of
-
63:31 - 63:33this phase okay so at the end of this
-
63:33 - 63:35phase the data scientist is going to
-
63:35 - 63:37come up with these conclusions so
-
63:37 - 63:42conclusions is okay if the engineering
-
63:42 - 63:45team they are looking okay the
-
63:45 - 63:46engineering team right the engineering
-
63:46 - 63:49team if they are looking for the highest
-
63:49 - 63:52failure detection rate possible then
-
63:52 - 63:54they should go with this particular
-
63:54 - 63:57model okay
-
63:57 - 63:59and if they want a balance between
-
63:59 - 64:01precision and recall then they should
-
64:01 - 64:03choose between the begging model with a
-
64:03 - 64:060.4 decision threshold or the random
-
64:06 - 64:10forest model with a 0.5 threshold but if
-
64:10 - 64:12they don't care so much about predicting
-
64:12 - 64:14every failure and they want the highest
-
64:14 - 64:17Precision possible then they should opt
-
64:17 - 64:20for the begging toax link classifier
-
64:20 - 64:23with a bit higher decision threshold and
-
64:23 - 64:26so this is the key thing that the data
-
64:26 - 64:28scientist is going to give right this is
-
64:28 - 64:31the key takeaway this is the kind of the
-
64:31 - 64:33end result of the entire machine
-
64:33 - 64:35learning life cycle right now the data
-
64:35 - 64:36scientist is going to tell the
-
64:36 - 64:39engineering team all right you guys
-
64:39 - 64:41which is more important for you point a
-
64:41 - 64:45point B or Point C make your decision so
-
64:45 - 64:47the engineering team will then discuss
-
64:47 - 64:49among themselves and say hey you know
-
64:49 - 64:52what what we want is we want to get the
-
64:52 - 64:55highest failure detection possible
-
64:55 - 64:58because any kind kind of failure of that
-
64:58 - 65:00machine or the product on the samply
-
65:00 - 65:03line is really going to screw us up big
-
65:03 - 65:06time so what we're looking for is the
-
65:06 - 65:08model that will give us the highest
-
65:08 - 65:11failure detection rate we don't care
-
65:11 - 65:13about Precision but we want to be make
-
65:13 - 65:15sure that if there's a failure we are
-
65:15 - 65:18going to catch it right so that's what
-
65:18 - 65:20they want and so the data scientist will
-
65:20 - 65:22say Hey you go for the balance begging
-
65:22 - 65:25model okay then the data scientist saves
-
65:25 - 65:28this all right uh and then once you have
-
65:28 - 65:30saved this uh you can then go right
-
65:30 - 65:32ahead and deploy that so you can go
-
65:32 - 65:34right ahead and deploy that to
-
65:34 - 65:37production okay and so if you want to
-
65:37 - 65:39continue we can actually further
-
65:39 - 65:41continue this modeling problem so just
-
65:41 - 65:43now I model this problem as a binary
-
65:43 - 65:47classification problem uh sorry just I
-
65:47 - 65:48modeled this problem as a binary
-
65:48 - 65:50classification which means it's either
-
65:50 - 65:52zero or one either fail or not fail but
-
65:52 - 65:54we can also model it as a multiclass
-
65:54 - 65:56classification problem right because as
-
65:56 - 65:58as I said earlier just now for the
-
65:58 - 66:00Target variable colum which is sorry for
-
66:00 - 66:03the failure type colume you actually
-
66:03 - 66:05have multiple kinds of failures right
-
66:05 - 66:08for example you may have a power failure
-
66:08 - 66:10uh you may have a towar failure uh you
-
66:10 - 66:13may have a overstrain failure so now we
-
66:13 - 66:15can model the problem slightly
-
66:15 - 66:17differently so we can model it as a
-
66:17 - 66:20multiclass classification problem and
-
66:20 - 66:21then we go through the entire same
-
66:21 - 66:23process that we went through just now so
-
66:23 - 66:25we create different models we test this
-
66:25 - 66:27out but now the confusion Matrix is for
-
66:27 - 66:30a multiclass classification isue right
-
66:30 - 66:31so we're going
-
66:31 - 66:34to check them out we're going to again
-
66:34 - 66:36uh try different algorithms or models
-
66:36 - 66:38again train and test our data set do the
-
66:38 - 66:40training test split uh on these
-
66:40 - 66:42different models all right so we have
-
66:42 - 66:43like for example we have bon random
-
66:43 - 66:46Forest B random Forest a great search
-
66:46 - 66:48then you train the models using what is
-
66:48 - 66:50called hyperparameter tuning then you
-
66:50 - 66:51get the scores all right so you get the
-
66:51 - 66:53same evaluation scores again you check
-
66:53 - 66:55out the evaluation scores compare
-
66:55 - 66:57between them generate a confusion Matrix
-
66:57 - 67:00so this is a multiclass confusion Matrix
-
67:00 - 67:02and then you come to the final
-
67:02 - 67:06conclusion so now if you are interested
-
67:06 - 67:09to frame your problem domain as a
-
67:09 - 67:11multiclass classification problem all
-
67:11 - 67:14right then these are the recommendations
-
67:14 - 67:15from the data scientist so the data
-
67:15 - 67:17scientist will say you know what I'm
-
67:17 - 67:20going to pick this particular model the
-
67:20 - 67:22balance backing classifier and these are
-
67:22 - 67:25all the reasons that the data scientist
-
67:25 - 67:27is going to give as a rational for
-
67:27 - 67:29selecting this particular
-
67:29 - 67:32model and then once that's done you save
-
67:32 - 67:35the model and that's that's it that's it
-
67:35 - 67:39so that's all done now and so then the
-
67:39 - 67:41uh the model the machine learning model
-
67:41 - 67:44now you can put it live run it on the
-
67:44 - 67:45server and now the machine learning
-
67:45 - 67:47model is ready to work which means it's
-
67:47 - 67:49ready to generate predictions right
-
67:49 - 67:50that's the main job of the machine
-
67:50 - 67:52learning model you have picked the best
-
67:52 - 67:54machine learning model with the best
-
67:54 - 67:56evaluation metrics for whatever accur
-
67:56 - 67:58see goal you're trying to achieve and
-
67:58 - 68:00now you're going to run it on a server
-
68:00 - 68:01and now you're going to get all this
-
68:01 - 68:03real time data that's coming from your
-
68:03 - 68:05sensus you're going to pump that into
-
68:05 - 68:06your machine learning model your machine
-
68:06 - 68:08learning model will pump out a whole
-
68:08 - 68:10bunch of predictions and we're going to
-
68:10 - 68:13use that predictions in real time to
-
68:13 - 68:15make real time real world decision
-
68:15 - 68:18making right you're going to say okay
-
68:18 - 68:20I'm predicting that that machine is
-
68:20 - 68:23going to fail on Thursday at 5:00 p.m.
-
68:23 - 68:26so you better get your service folks in
-
68:26 - 68:29to service it on Thursday 2: p.m. or you
-
68:29 - 68:32know whatever so you can you know uh
-
68:32 - 68:33make decisions on when you want to do
-
68:33 - 68:35your maintenance you know and and make
-
68:35 - 68:38the best decisions to optimize the cost
-
68:38 - 68:41of Maintenance etc etc and then based on
-
68:41 - 68:42the
-
68:42 - 68:45results that are coming up from the
-
68:45 - 68:47predictions so the predictions may be
-
68:47 - 68:49good the predictions may be lousy the
-
68:49 - 68:51predictions may be average right so we
-
68:51 - 68:54are we're constantly monitoring how good
-
68:54 - 68:55or how useful are the predictions
-
68:55 - 68:58generated by this realtime model that's
-
68:58 - 69:00running on the server and based on our
-
69:00 - 69:03monitoring we will then take some new
-
69:03 - 69:05data and then repeat this entire life
-
69:05 - 69:07cycle again so this is basically a
-
69:07 - 69:09workflow that's iterative and we are
-
69:09 - 69:11constantly or the data scientist is
-
69:11 - 69:13constantly getting in all these new data
-
69:13 - 69:15points and then refining the model
-
69:15 - 69:18picking maybe a new model deploying the
-
69:18 - 69:22new model onto the server and so on all
-
69:22 - 69:24right and so that's it so that is
-
69:24 - 69:26basically your machine learning workflow
-
69:26 - 69:29in a nutshell okay so for this
-
69:29 - 69:32particular approach we have used a bunch
-
69:32 - 69:35of uh data science libraries from python
-
69:35 - 69:37so we have used pandas which is the most
-
69:37 - 69:39B basic data science libraries that
-
69:39 - 69:40provides all the tools to work with raw
-
69:40 - 69:43data we have used numai which is a high
-
69:43 - 69:44performance library for implementing
-
69:44 - 69:46complex array metrix operations we have
-
69:46 - 69:50used met plot lip and cbon which is used
-
69:50 - 69:52for doing the Eda the explorat
-
69:52 - 69:56exploratory data analysis phase machine
-
69:56 - 69:57learning where you visualize all your
-
69:57 - 69:59data we have used psyit learn which is
-
69:59 - 70:01the machine L learning library to do all
-
70:01 - 70:03your implementation for all your call
-
70:03 - 70:06machine learning algorithms uh we we we
-
70:06 - 70:08have not used this because this is not a
-
70:08 - 70:11deep learning uh problem but if you are
-
70:11 - 70:13working with a deep learning problem
-
70:13 - 70:15like image classification image
-
70:15 - 70:18recognition object detection okay
-
70:18 - 70:20natural language processing text
-
70:20 - 70:22classification well then you're going to
-
70:22 - 70:24use these libraries from python which is
-
70:24 - 70:29tensor flow okay and also py
-
70:29 - 70:33to and then lastly that whole thing that
-
70:33 - 70:35whole data science project that you saw
-
70:35 - 70:37just now this entire data science
-
70:37 - 70:39project is actually developed in
-
70:39 - 70:41something called a Jupiter notebook so
-
70:41 - 70:44all this python code along with all the
-
70:44 - 70:46observations from the data
-
70:46 - 70:49scientists okay for this entire data
-
70:49 - 70:50science project was actually run in
-
70:50 - 70:53something called a Jupiter notebook so
-
70:53 - 70:56that is uh the
-
70:56 - 70:59most widely used tool for interactively
-
70:59 - 71:02developing and presenting data science
-
71:02 - 71:05projects okay so that brings me to the
-
71:05 - 71:07end of this entire presentation I hope
-
71:07 - 71:10that you find it useful for you and that
-
71:10 - 71:13you can appreciate the importance of
-
71:13 - 71:15machine learning and how it can be
-
71:15 - 71:20applied in a real life use case in a
-
71:20 - 71:23typical production environment all right
-
71:23 - 71:27thank you all so much for watching
Show all