Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook

Rollback to version 7

0:01 - 0:04

Hello everyone, my name is Victor. I'm
0:04 - 0:05

your friendly neighborhood data
0:05 - 0:08

scientist from DreamCatcher. So in this
0:08 - 0:10

presentation, I would like to talk about
0:10 - 0:13

a specific industry use case of AI or
0:13 - 0:15

machine learning which is predictive
0:15 - 0:19

maintenance. So I will be covering these
0:19 - 0:21

topics and feel free to jump forward to
0:21 - 0:23

the specific part in the video where I
0:23 - 0:25

talk about all these topics. So I'm going
0:25 - 0:27

to start off with a general preview of
0:27 - 0:29

AI and machine learning. Then, I'll
0:29 - 0:31

discuss the use case which is predictive
0:31 - 0:33

maintenance. I'll talk about the basics
0:33 - 0:35

of machine learning, the workflow of
0:35 - 0:37

machine learning, and then we will come
0:37 - 0:41

to the meat of this presentation which
0:41 - 0:44

is essentially a demonstration of the
0:44 - 0:45

machine learning workflow from end to
0:45 - 0:48

end on a real life predictive
0:48 - 0:52

maintenance domain problem. All right, so
0:52 - 0:54

without any further ado, let's jump into
0:54 - 0:57

it. So let's start off with a quick
0:57 - 1:00

preview of AI and machine learning. Well
1:00 - 1:04

AI is a very general term, it encompasses
1:04 - 1:07

the entire area of science and
1:07 - 1:09

engineering that is related to creating
1:09 - 1:11

software programs and machines that
1:11 - 1:14

will be capable of performing tasks
1:14 - 1:16

that would normally require human
1:16 - 1:20

intelligence. But AI is a catchall term,
1:20 - 1:23

so really when we talk about apply AI,
1:23 - 1:26

how we use AI in our daily work, we are
1:26 - 1:28

really going to be talking about machine
1:28 - 1:30

learning. So machine learning is the
1:30 - 1:32

design and application of software
1:32 - 1:34

algorithms that are capable of learning
1:34 - 1:38

on their own without any explicit human
1:38 - 1:40

intervention. And the primary purpose of
1:40 - 1:43

these algorithms are to optimize
1:43 - 1:47

performance in a specific task. And the
1:47 - 1:50

primary performance or the primary task
1:50 - 1:52

that you want to optimize performance in
1:52 - 1:54

is to be able to make accurate
1:54 - 1:57

predictions about future outcomes based
1:57 - 2:01

on the analysis of historical data
2:01 - 2:03

from the past. So essentially machine
2:03 - 2:05

learning is about making predictions
2:05 - 2:07

about the future or what we call
2:07 - 2:09

predictive analytics.
2:09 - 2:11

And there are many different
2:11 - 2:13

kinds of algorithms that are available in
2:13 - 2:15

machine learning under the three primary
2:15 - 2:16

categories of supervised learning,
2:16 - 2:19

unsupervised learning, and reinforcement
2:19 - 2:21

learning. And here we can see some of the
2:21 - 2:24

different kinds of algorithms and their
2:24 - 2:27

use cases in various areas in
2:27 - 2:30

industry. So we have various domain use
2:30 - 2:30

cases
2:30 - 2:32

for all these different kind of
2:32 - 2:34

algorithms, and we can see that different
2:34 - 2:38

algorithms are fitted for different use cases.
2:38 - 2:41

Deep learning is an advanced form
2:41 - 2:42

of machine learning that's based on
2:42 - 2:44

something called an artificial neural
2:44 - 2:46

network or ANN for short, and this
2:46 - 2:48

essentially simulates the structure of
2:48 - 2:50

the human brain whereby neurons
2:50 - 2:51

interconnect and work together to
2:51 - 2:55

process and learn new information. So DL
2:55 - 2:57

is the foundational technology for most
2:57 - 2:59

of the popular AI tools that you
2:59 - 3:01

probably have heard of today. So I'm sure
3:01 - 3:03

you have heard of ChatGPT if you haven't
3:03 - 3:05

been living in a cave for the past 2
3:05 - 3:08

years. And yeah, so ChatGPT is an example
3:08 - 3:10

of what we call a large language model
3:10 - 3:12

and that's based on this technology
3:12 - 3:15

called deep learning. Also, all the modern
3:15 - 3:17

computer vision applications where a
3:17 - 3:20

computer program can classify images or
3:20 - 3:23

detect images or recognize images on
3:23 - 3:25

its own, okay, we call this computer
3:25 - 3:28

vision applications. They also use
3:28 - 3:30

this particular form of machine learning
3:30 - 3:32

called deep learning, right? So this is a
3:32 - 3:34

example of an artificial neural network.
3:34 - 3:35

For example, here I have an image of a
3:35 - 3:37

bird that's fed into this artificial
3:37 - 3:40

neural network, and output from this
3:40 - 3:41

artificial neural network is a
3:41 - 3:44

classification of this image into one of
3:44 - 3:46

these three potential categories. So in
3:46 - 3:49

this case, if the ANN has been trained
3:49 - 3:52

properly, we fit in this image, this
3:52 - 3:54

ANN should correctly classify this image
3:54 - 3:57

as a bird, right? So this is a image
3:57 - 3:59

classification problem which is a
3:59 - 4:01

classic use case for an artificial
4:01 - 4:04

neural network in the field of computer
4:04 - 4:08

vision. And just like in the case of
4:08 - 4:09

machine learning, there are a variety of
4:09 - 4:12

algorithms that are available for
4:12 - 4:14

deep learning under the category of
4:14 - 4:15

supervised learning and also
4:15 - 4:17

unsupervised learning.
4:17 - 4:19

All right, so this is how we can
4:19 - 4:21

kind of categorize this. You can think of
4:21 - 4:24

AI is a general area of smart systems
4:24 - 4:27

and machine. Machine learning is
4:27 - 4:29

basically apply AI and deep learning
4:29 - 4:30

is a
4:30 - 4:33

subspecialization of machine learning
4:33 - 4:35

using a particular architecture called
4:35 - 4:39

an artificial neural network.
4:39 - 4:42

And generative AI, so if you talk
4:42 - 4:45

about ChatGPT, okay, Google Gemini,
4:45 - 4:48

Microsoft Copilot, okay, all these
4:48 - 4:50

examples of generative AI, they are
4:50 - 4:52

basically large language models, and they
4:52 - 4:54

are a further subcategory within the
4:54 - 4:55

area of deep
4:55 - 4:58

learning. And there are many applications
4:58 - 4:59

of machine learning in industry right
4:59 - 5:02

now, so pick which particular industry
5:02 - 5:04

are you involved in, and these are all the
5:04 - 5:05

specific areas of
5:05 - 5:10

applications, right? So probably, I'm
5:10 - 5:12

going to guess the vast majority of you
5:12 - 5:13

who are watching this video, you're
5:13 - 5:14

probably coming from the manufacturing
5:14 - 5:17

industry, and so in the manufacturing
5:17 - 5:18

industry some of the standard use cases
5:18 - 5:20

for machine learning and deep learning
5:20 - 5:23

are predicting potential problems, okay?
5:23 - 5:25

So sometimes you call this predictive
5:25 - 5:27

maintenance where you want to predict
5:27 - 5:29

when a problem is going to happen and
5:29 - 5:30

then kind of address it before it
5:30 - 5:33

happens. And then monitoring systems,
5:33 - 5:35

automating your manufacturing assembly
5:35 - 5:38

line or production line, okay, smart
5:38 - 5:40

scheduling, and detecting anomaly on your
5:40 - 5:41

production line.
5:42 - 5:44

Okay, so let's talk about the use
5:44 - 5:46

case here which is predictive
5:46 - 5:49

maintenance, right? So what is predictive
5:49 - 5:52

maintenance? Well predictive maintenance,
5:52 - 5:53

here's the long definition, is a
5:53 - 5:55

equipment maintenance strategy that
5:55 - 5:56

relies on real-time monitoring of
5:56 - 5:58

equipment conditions and data to predict
5:58 - 6:00

equipment failures in advance.
6:00 - 6:03

And this uses advanced data models,
6:03 - 6:05

analytics, and machine learning whereby
6:05 - 6:07

we can reliably assess when failures are
6:07 - 6:09

more likely to occur, including which
6:09 - 6:11

components are more likely to be
6:11 - 6:14

affected on your production or assembly
6:14 - 6:17

line. So where does predictive
6:17 - 6:19

maintenance fit into the overall scheme
6:19 - 6:21

of things, right? So let's talk about the
6:21 - 6:23

kind of standard way that, you know,
6:23 - 6:26

factories or production
6:26 - 6:28

lines, assembly lines in factories tend
6:28 - 6:31

to handle maintenance issues say
6:31 - 6:33

10 or 20 years ago, right? So what you
6:33 - 6:35

have is the, what you would probably
6:35 - 6:36

start off is the most basic mode
6:36 - 6:38

which is reactive maintenance. So you
6:38 - 6:41

just wait until your machine breaks down
6:41 - 6:43

and then you repair, right? The simplest,
6:43 - 6:45

but, of course, I'm sure if you have worked on a
6:45 - 6:47

production line for any period of time,
6:47 - 6:49

you know that this reactive maintenance
6:49 - 6:51

can give you a whole bunch of headaches
6:51 - 6:52

especially if the machine breaks down
6:52 - 6:54

just before a critical delivery deadline,
6:54 - 6:56

right? Then you're going to have a
6:56 - 6:57

backlog of orders and you're going to
6:57 - 6:59

run to a lot of problems. Okay, so we move on
6:59 - 7:01

to preventive maintenance which is
7:01 - 7:04

you regularly schedule a maintenance of
7:04 - 7:07

your production machines to reduce
7:07 - 7:09

the failure rate. So you might do
7:09 - 7:11

maintenance once every month, once every
7:11 - 7:13

two weeks, whatever. Okay, this is great,
7:13 - 7:15

but the problem, of course, then is well
7:15 - 7:16

sometimes you're doing too much
7:16 - 7:18

maintenance, it's not really necessary,
7:18 - 7:21

and it still doesn't totally prevent
7:21 - 7:23

this, you know, a failure of the
7:23 - 7:26

machine that occurs outside of your planned
7:26 - 7:29

maintenance, right? So a bit of an
7:29 - 7:31

improvement, but not that much better.
7:31 - 7:33

And then, these last two categories is
7:33 - 7:35

where we bring in AI and machine
7:35 - 7:37

learning. So with machine learning, we're
7:37 - 7:39

going to use sensors to do real-time
7:39 - 7:42

monitoring of the data, and then using
7:42 - 7:43

that data we're going to build a machine
7:43 - 7:46

learning model which helps us to predict,
7:46 - 7:50

with a reasonable level of accuracy, when
7:50 - 7:53

the next failure is going to happen on
7:53 - 7:54

your assembly or production line on a
7:54 - 7:57

specific component or specific machine,
7:57 - 8:00

right? So you just want to be predict to
8:00 - 8:02

a high level of accuracy like maybe
8:02 - 8:04

to the specific day, even the specific
8:04 - 8:06

hour, or even minute itself when you
8:06 - 8:08

expect that particular product to fail
8:08 - 8:11

or the particular machine to fail. All
8:11 - 8:13

right, so these are the advantages of
8:13 - 8:15

predictive maintenance. It minimizes
8:15 - 8:17

the occurrence of unscheduled downtime, it
8:17 - 8:18

gives you a real-time overview of your
8:18 - 8:20

current condition of assets, ensures
8:20 - 8:23

minimal disruptions to productivity,
8:23 - 8:25

optimizes time you spend on maintenance work,
8:25 - 8:27

optimizes the use of spare parts, and so
8:27 - 8:28

on. And of course there are some
8:28 - 8:31

disadvantages, which is the
8:31 - 8:33

primary one, you need a specialized set
8:33 - 8:36

of skills among your engineers to
8:36 - 8:38

understand and create machine learning
8:38 - 8:41

models that can work on the realtime
8:41 - 8:44

data that you're getting. Okay, so we're
8:44 - 8:45

going to take a look at some real life
8:45 - 8:47

use cases. So these are a bunch of links
8:47 - 8:49

here, so if you navigate to these links
8:49 - 8:50

here, you'll be able to get a look at
8:50 - 8:54

some real life use cases of machine
8:54 - 8:58

learning in predictive maintenance. So
8:58 - 9:01

the IBM website, okay, gives you a look at
9:01 - 9:05

a bunch of five use cases, so you can
9:05 - 9:07

click on these links and follow up with
9:07 - 9:08

them if you want to read more. Okay, this
9:08 - 9:11

is waste management, manufacturing, okay,
9:11 - 9:15

building services, and renewable energy,
9:15 - 9:17

and also mining, right? So these are all
9:17 - 9:18

use cases, if you want to know more about
9:18 - 9:20

them, you can read up and follow them
9:20 - 9:24

from this website. And this website
9:24 - 9:26

gives, this is a pretty good website. I
9:26 - 9:28

would really encourage you to just look
9:28 - 9:29

through this if you're interested in
9:29 - 9:31

predictive maintenance. So here, it tells
9:31 - 9:34

you about, you know, an industry survey of
9:34 - 9:36

predictive maintenance. We can see that a
9:36 - 9:38

large portion of the industry,
9:38 - 9:40

manufacturing industry agreed that
9:40 - 9:41

predictive maintenance is a real need to
9:41 - 9:44

stay competitive and predictive
9:44 - 9:45

maintenance is essential for
9:45 - 9:47

manufacturing industry and will gain
9:47 - 9:48

additional strength in the future. So
9:48 - 9:50

this is a survey that was done quite
9:50 - 9:52

some time ago and this was the results
9:52 - 9:54

that we got back. So we can see the vast
9:54 - 9:56

majority of key industry players in the
9:56 - 9:58

manufacturing sector, they consider
9:58 - 9:59

predictive maintenance to be a very
9:59 - 10:00

important
10:00 - 10:02

activity that they want to
10:02 - 10:05

incorporate into their workflow, right?
10:05 - 10:08

And we can see here the kind of ROI that
10:08 - 10:11

we expect on investment in predictive
10:11 - 10:13

maintenance, so 45% reduction in downtime,
10:13 - 10:17

25% growth in productivity, 75% fault
10:17 - 10:19

elimination, 30% reduction in maintenance
10:19 - 10:23

cost, okay? And best of all, if you really
10:23 - 10:25

want to kind of take a look at examples,
10:25 - 10:27

all right, so there are all these
10:27 - 10:28

different companies that have
10:28 - 10:30

significantly invested in predictive
10:30 - 10:32

maintenance technology in their
10:32 - 10:34

manufacturing processes. So PepsiCo, we
10:34 - 10:39

have got Frito-Lay, General Motors, Mondi, Ecoplant,
10:39 - 10:41

all right? So you can jump over here
10:41 - 10:43

and take a look at some of these
10:43 - 10:46

use cases. Let me perhaps, let me try and
10:46 - 10:48

open this up, for example, Mondi, right? You
10:48 - 10:52

can see Mondi has impl- oops. Mondi has used
10:52 - 10:54

this particular piece of software
10:54 - 10:56

called MATLAB, all right, or MathWorks
10:56 - 11:00

sorry, to do predictive maintenance
11:00 - 11:02

for their manufacturing processes using
11:02 - 11:05

machine learning. And we can talk, you can
11:05 - 11:08

study how they have used it, all right,
11:08 - 11:09

and how it works, what was their
11:09 - 11:11

challenge, all right, the problems they
11:11 - 11:13

were facing, the solution that they use
11:13 - 11:15

using this MathWorks Consulting piece of
11:15 - 11:17

software, and data that they collected in
11:17 - 11:20

a MATLAB database, all right, sorry
11:20 - 11:24

in a Oracle database.
11:24 - 11:26

So using MathWorks from MATLAB, all
11:26 - 11:28

right, they were able to create a deep
11:28 - 11:31

learning model to, you know, to
11:31 - 11:33

solve this particular issue for their
11:33 - 11:36

domain. So if you're interested, please, I
11:36 - 11:38

strongly encourage you to read up on all
11:38 - 11:40

these real life customer stories with
11:40 - 11:43

showcase use cases for predictive
11:43 - 11:48

maintenance. Okay, so that's it for
11:48 - 11:52

real life use cases for predictive maintenance.
11:54 - 11:57

Now in this topic, I'm
11:57 - 11:58

going to talk about machine learning
11:58 - 12:00

basics, so what is actually involved
12:00 - 12:01

in machine learning, and I'm going to
12:01 - 12:04

give a very quick, fast, conceptual, high
12:04 - 12:06

level overview of machine learning, all
12:06 - 12:09

right? So there are several categories of
12:09 - 12:11

machine learning, supervised, unsupervised,
12:11 - 12:13

semi-supervised, reinforcement, and deep
12:13 - 12:16

learning, okay? And let's talk about the
12:16 - 12:19

most common and widely used category of
12:19 - 12:21

machine learning which is called
12:21 - 12:25

supervised learning. So the particular use
12:25 - 12:26

case here that I'm going to be
12:26 - 12:29

discussing, predictive maintenance, it's
12:29 - 12:31

basically a form of supervised learning.
12:31 - 12:33

So how does supervised learning work?
12:33 - 12:35

Well in supervised learning, you're going
12:35 - 12:37

to create a machine learning model by
12:37 - 12:39

providing what is called a labelled data
12:39 - 12:42

set as a input to a machine learning
12:42 - 12:45

program or algorithm. And this dataset
12:45 - 12:46

is going to contain what is called an
12:46 - 12:49

independent or feature variables, all
12:49 - 12:51

right, so this will be a set of variables.
12:51 - 12:53

And there will be one dependent or
12:53 - 12:55

target variable which we also call the
12:55 - 12:58

label, and the idea is that the
12:58 - 13:00

independent or the feature variables are
13:00 - 13:02

the attributes or properties of your
13:02 - 13:04

data set that influence the dependent or
13:04 - 13:08

the target variable, okay? So this process
13:08 - 13:09

that I've just described is called
13:09 - 13:12

training the machine learning model, and
13:12 - 13:14

the model is fundamentally a
13:14 - 13:16

mathematical function that best
13:16 - 13:18

approximates the relationship between
13:18 - 13:21

the independent variables and the
13:21 - 13:23

dependent variable. All right, so that's
13:23 - 13:24

quite a bit of a mouthful, so let's jump
13:24 - 13:26

into a diagram that maybe illustrates
13:26 - 13:28

this more clearly. So let's say you have
13:28 - 13:30

a dataset here, an Excel spreadsheet,
13:30 - 13:32

right? And this Excel spreadsheet has a
13:32 - 13:34

bunch of columns here and a bunch of
13:34 - 13:37

rows, okay? So these rows here represent
13:37 - 13:39

observations, or these rows are what
13:39 - 13:41

we call observations or samples or data
13:41 - 13:43

points in our data set, okay? So let's
13:43 - 13:47

assume this data set is gathered by a
13:47 - 13:50

marketing manager at a mall, at a retail
13:50 - 13:52

mall, all right? So they've got all this
13:52 - 13:55

information about the customers who
13:55 - 13:57

purchase products at this mall, all right?
13:57 - 13:59

So some of the information they've
13:59 - 14:00

gotten about the customers are their
14:00 - 14:02

gender, their age, their income, and the
14:02 - 14:04

number of children. So all this
14:04 - 14:06

information about the customers, we call
14:06 - 14:07

this the independent or the feature
14:07 - 14:10

variables, all right? And based on all
14:10 - 14:13

this information about the customer, we
14:13 - 14:16

also managed to get some or we record
14:16 - 14:18

the information about how much the
14:18 - 14:20

customer spends, all right? So this
14:20 - 14:22

information or these numbers here, we call
14:22 - 14:24

this the target variable or the
14:24 - 14:27

dependent variable, right? So on the
14:27 - 14:30

single row, the data point, one single sample, one
14:30 - 14:33

single data point, contains all the data
14:33 - 14:35

for the feature variables and one single
14:35 - 14:38

value for the label or the target
14:38 - 14:41

variable, okay? And the primary purpose of
14:41 - 14:43

the machine learning model is to create
14:43 - 14:46

a mapping from all your feature
14:46 - 14:48

variables to your target variable, so
14:48 - 14:51

somehow there's going to be a function,
14:51 - 14:52

okay, this will be a mathematical
14:52 - 14:55

function that maps all the values of
14:55 - 14:57

your feature variable to the value of
14:57 - 15:00

your target variable. In other words, this
15:00 - 15:01

function represents the relationship
15:01 - 15:03

between your feature variables and your
15:03 - 15:07

target variable, okay? So this whole thing,
15:07 - 15:09

this training process, we call this the
15:09 - 15:11

fitting the model. And the target
15:11 - 15:13

variable or the label, this thing here,
15:13 - 15:15

this column here, or the values here,
15:15 - 15:17

these are critical for providing a
15:17 - 15:19

context to do the fitting or the
15:19 - 15:21

training of the model. And once you've
15:21 - 15:23

got a trained and fitted model, you can
15:23 - 15:26

then use the model to make an accurate
15:26 - 15:28

prediction of target values
15:28 - 15:30

corresponding to new feature values that
15:30 - 15:33

the model has yet to encounter or yet to
15:33 - 15:35

see, and this, as I've already said
15:35 - 15:36

earlier, this is called predictive
15:36 - 15:38

analytics, okay? So let's see what's
15:38 - 15:40

actually happening here, you take your
15:40 - 15:43

training data, all right, so this is this
15:43 - 15:45

whole bunch of data, this data set here
15:45 - 15:47

consisting of a thousand rows of
15:47 - 15:50

data, 10,000 rows of data, you take this
15:50 - 15:52

entire data set, all right, this entire
15:52 - 15:54

data set, you jam it into your machine
15:54 - 15:57

learning algorithm, and a couple of hours
15:57 - 15:58

later your machine learning algorithm
15:58 - 16:01

comes up with a model. And the model is
16:01 - 16:04

essentially a function that maps all
16:04 - 16:06

your feature variables which is these
16:06 - 16:08

four columns here, to your target
16:08 - 16:10

variable which is this one single column
16:10 - 16:14

here, okay? So once you have the model, you
16:14 - 16:17

can put in a new data point. So basically
16:17 - 16:19

the new data point represents data about a
16:19 - 16:21

new customer, a new customer that you
16:21 - 16:23

have never seen before. So let's say
16:23 - 16:25

you've already got information about
16:25 - 16:28

10,000 customers that have visited this
16:28 - 16:30

mall and how much each of these 10,000
16:30 - 16:32

customers have spent when they are at this
16:32 - 16:34

mall. So now you have a totally new
16:34 - 16:36

customer that comes in the mall, this
16:36 - 16:38

customer has never come into this mall
16:38 - 16:40

before, and what we know about this
16:40 - 16:43

customer is that he is a male, the age is
16:43 - 16:45

50, the income is 18, and they have nine
16:45 - 16:48

children. So now when you take this data
16:48 - 16:51

and you pump that into your model, your
16:51 - 16:53

model is going to make a prediction, it's
16:53 - 16:56

going to say, hey, you know what? Based on
16:56 - 16:57

everything that I have been trained before
16:57 - 16:59

and based on the model I've developed,
16:59 - 17:02

I am going to predict that a customer
17:02 - 17:05

that is of a male gender, of the age 50
17:05 - 17:08

with the income of 18, and nine children,
17:08 - 17:12

that customer is going to spend 25 ringgit
17:12 - 17:16

at the mall. And this is it, this is what
17:16 - 17:19

you want. Right there, right here,
17:19 - 17:21

can you see here? That is the final
17:21 - 17:23

output of your machine learning model.
17:23 - 17:27

It's going to make a prediction about
17:27 - 17:30

something that it has not ever seen
17:30 - 17:33

before, okay? That is the core, this is
17:33 - 17:36

essentially the core of machine learning.
17:36 - 17:39

Predictive analytics, making prediction
17:39 - 17:40

about the future
17:41 - 17:44

based on a historical data set.
17:44 - 17:47

Okay, so there are two areas of
17:47 - 17:49

supervised learning, regression and
17:49 - 17:51

classification. So regression is used to
17:51 - 17:53

predict a numerical target variable, such
17:53 - 17:55

as the price of a house or the salary of
17:55 - 17:58

an employee, whereas classification is
17:58 - 18:00

used to predict a categorical target
18:00 - 18:04

variable or class label, okay? So for
18:04 - 18:06

classification you can have either
18:06 - 18:09

binary or multiclass, so, for example,
18:09 - 18:12

binary will be just true or false, zero
18:12 - 18:15

or one. So whether your machine is going
18:15 - 18:17

to fail or is it not going to fail, right?
18:17 - 18:19

So just two classes, two possible,
18:19 - 18:22

outcomes, or is the customer going to
18:22 - 18:24

make a purchase or is the customer not
18:24 - 18:26

going to make a purchase. We call this
18:26 - 18:28

binary classification. And then for
18:28 - 18:30

multiclass, when there are more than two
18:30 - 18:33

classes or types of values. So, for
18:33 - 18:34

example, here this would be a
18:34 - 18:36

classification problem. So if you have a
18:36 - 18:38

data set here, you've got information
18:38 - 18:39

about your customers, you've got your
18:39 - 18:41

gender of the customer, the age of the
18:41 - 18:43

customer, the salary of the customer, and
18:43 - 18:45

you also have record about whether the
18:45 - 18:48

customer made a purchase or not, okay? So
18:48 - 18:50

you can take this data set to train a
18:50 - 18:52

classification model, and then the
18:52 - 18:54

classification model can then make a
18:54 - 18:56

prediction about a new customer, and
18:56 - 18:59

they're going to predict zero which
18:59 - 19:00

means the customer didn't make a
19:00 - 19:03

purchase or one which means the customer
19:03 - 19:06

make a purchase, right? And regression,
19:06 - 19:09

this is regression, so let's say you want
19:09 - 19:11

to predict the wind speed, and you've got
19:11 - 19:14

historical data about all these four
19:14 - 19:17

other independent variables or feature
19:17 - 19:18

variables, so you have recorded
19:18 - 19:20

temperature, the pressure, the relative
19:20 - 19:22

humidity, and the wind direction for the
19:22 - 19:25

past 10 days, 15 days, or whatever, okay? So
19:25 - 19:27

now you are going to train your machine
19:27 - 19:29

learning model using this data set, and
19:29 - 19:32

the target variable column, okay, this
19:32 - 19:34

column here, the label is basically a
19:34 - 19:37

number, right? So now with this number,
19:37 - 19:40

this is a regression model, and so now
19:40 - 19:42

you can put in a new data point, so a new
19:42 - 19:45

data point means a new set of values for
19:45 - 19:47

temperature, pressure, relative humidity,
19:47 - 19:49

and wind direction, and your machine
19:49 - 19:51

learning model will then predict the
19:51 - 19:54

wind speed for that new data point, okay?
19:54 - 19:57

So that's a regression model.
19:59 - 20:02

All right. So in this particular topic
20:02 - 20:05

I'm going to talk about the workflow of
20:05 - 20:08

that's involved in machine learning. So
20:08 - 20:13

in the previous slides, I talked about
20:13 - 20:15

developing the model, all right? But
20:15 - 20:16

that's just one part of the entire
20:16 - 20:19

workflow. So in real life when you use
20:19 - 20:20

machine learning, there's an end-to-end
20:20 - 20:22

workflow that's involved. So the first
20:22 - 20:24

thing, of course, is you need to get your
20:24 - 20:27

data, and then you need to clean your
20:27 - 20:29

data, and then you need to explore your
20:29 - 20:31

data. You need to see what's going on in
20:31 - 20:33

your data set, right? And your data set,
20:33 - 20:36

real life data sets are not trivial, they
20:36 - 20:39

are hundreds of rows, thousands of rows,
20:39 - 20:41

sometimes millions of rows, billions of
20:41 - 20:43

rows, we're talking about billions or
20:43 - 20:45

millions of data points especially if
20:45 - 20:47

you're using an IoT sensor to get data
20:47 - 20:49

in real time. So you've got all these
20:49 - 20:51

super large data sets, you need to clean
20:51 - 20:53

them, and explore them, and then you need
20:53 - 20:56

to prepare them into a right format so
20:56 - 21:00

that you can put them into the training
21:00 - 21:02

process to create your machine learning
21:02 - 21:05

model, and then subsequently you check
21:05 - 21:08

how good is the model, right? How accurate
21:08 - 21:10

is the model in terms of its ability to
21:10 - 21:13

generate predictions for the
21:13 - 21:15

future, right? How accurate are the
21:15 - 21:17

predictions that are coming up from your
21:17 - 21:18

machine learning model. So that's
21:18 - 21:21

validating or evaluating your model, and
21:21 - 21:23

then subsequently if you determine that
21:23 - 21:25

your model is of adequate accuracy to
21:25 - 21:27

meet whatever your domain use case
21:27 - 21:29

requirements are, right? So let's say the
21:29 - 21:31

accuracy that's required for your domain
21:31 - 21:32

use case is
21:32 - 21:35

85%, okay? If my machine learning model
21:35 - 21:39

can give an 85% accuracy rate, I think
21:39 - 21:40

it's good enough, then I'm going to
21:40 - 21:43

deploy it into real world use case. So
21:43 - 21:45

here the machine learning model gets
21:45 - 21:48

deployed on the server, and then other,
21:48 - 21:51

you know, other data sources are going to
21:51 - 21:53

be captured from somewhere. That data is
21:53 - 21:54

pump into the machine learning model. The
21:54 - 21:55

machine learning model generates
21:55 - 21:58

predictions, and those predictions are
21:58 - 22:00

then used to make decisions on the
22:00 - 22:02

factory floor in real time or in any
22:02 - 22:05

other particular scenario. And then you
22:05 - 22:07

constantly monitor and update the model,
22:07 - 22:09

you get more new data, and then the
22:09 - 22:12

entire cycle repeats itself. So that's
22:12 - 22:14

your machine learning workflow, okay, in a
22:14 - 22:17

nutshell. Here's another example of
22:17 - 22:19

the same thing maybe in a slightly
22:19 - 22:20

different format, so, again, you have your
22:20 - 22:22

data collection and preparation. Here we
22:22 - 22:24

talk more about the different kinds of
22:24 - 22:27

algorithms that available to create a
22:27 - 22:28

model, and I'll talk about this more in
22:28 - 22:30

detail when we look at the real world
22:30 - 22:32

example of a end-to-end machine learning
22:32 - 22:35

workflow for the predictive maintenance
22:35 - 22:37

use case. So once you have chosen the
22:37 - 22:39

appropriate algorithm, you then have
22:39 - 22:41

trained your model, you then have
22:41 - 22:44

selected the appropriate train model
22:44 - 22:46

among the multiple models. You are
22:46 - 22:48

probably going to develop multiple
22:48 - 22:50

models from multiple algorithms, you're
22:50 - 22:52

going to evaluate them all, and then
22:52 - 22:53

you're going to say, hey, you know what?
22:53 - 22:55

After I've evaluated and tested that,
22:55 - 22:57

I've chosen the best model, I'm going to
22:57 - 23:00

deploy the model, all right, so this is
23:00 - 23:03

for real life production use, okay? Real
23:03 - 23:04

life sensor data is going to be pumped
23:04 - 23:06

into my model, my model is going to
23:06 - 23:08

generate predictions, the predicted data
23:08 - 23:10

is going to used immediately in real
23:10 - 23:13

time for real life decision making, and
23:13 - 23:15

then I'm going to monitor, right, the
23:15 - 23:17

results. So somebody's using the
23:17 - 23:19

predictions from my model, if the
23:19 - 23:22

predictions are lousy, that goes into the
23:22 - 23:23

monitoring, the monitoring system
23:23 - 23:25

captures that. If the predictions are
23:25 - 23:28

fantastic, well that is also captured by the
23:28 - 23:30

monitoring system, and that gets
23:30 - 23:32

feedback again to the next cycle of my
23:32 - 23:34

machine learning
23:34 - 23:36

pipeline. Okay, so that's the kind of
23:36 - 23:38

overall view, and here are the kind of
23:38 - 23:42

key phases of your workflow. So one of
23:42 - 23:44

the important phases is called EDA,
23:44 - 23:48

exploratory data analysis and in this
23:48 - 23:50

particular phase, you're going to
23:50 - 23:53

do a lot of stuff, primarily just to
23:53 - 23:55

understand your data set. So like I said,
23:55 - 23:57

real life data sets, they tend to be very
23:57 - 23:59

complex, and they tend to have various
23:59 - 24:01

statistical properties, all right,
24:01 - 24:03

statistics is a very important component
24:03 - 24:06

of machine learning. So an EDA helps you
24:06 - 24:07

to kind of get an overview of your data
24:07 - 24:10

set, get an overview of any problems in
24:10 - 24:12

your data set like any data that's
24:12 - 24:13

missing, the statistical properties of your
24:13 - 24:15

data set, the distribution of your data
24:15 - 24:17

set, the statistical correlation of
24:17 - 24:19

variables in your data set, etc,
24:19 - 24:23

etc. Okay, then we have data cleaning or
24:23 - 24:25

sometimes you call it data cleansing, and
24:25 - 24:28

in this phase what you want to do is
24:28 - 24:29

primarily, you want to kind of do things
24:29 - 24:32

like remove duplicate records or rows in
24:32 - 24:34

your table, you want to make sure that
24:34 - 24:37

your data or your data
24:37 - 24:39

points or your samples have appropriate IDs,
24:39 - 24:41

and most importantly, you want to make
24:41 - 24:43

sure there's not too many missing values
24:43 - 24:45

in your data set. So what I mean by
24:45 - 24:46

missing values are things like that,
24:46 - 24:48

right? You have got a data set, and for
24:48 - 24:52

some reason there are some cells or
24:52 - 24:55

locations in your data set which are
24:55 - 24:57

missing values, right? And if you have a
24:57 - 24:59

lot of these missing values, then you've
24:59 - 25:00

got a poor quality data set, and you're
25:00 - 25:02

not going to be able to build a good
25:02 - 25:04

model from this data set. You're not
25:04 - 25:06

going to be able to train a good machine
25:06 - 25:08

learning model from a data set with a
25:08 - 25:10

lot of missing values like this. So you
25:10 - 25:12

have to figure out whether there are a
25:12 - 25:13

lot of missing values in your data set,
25:13 - 25:15

how do you handle them. Another thing
25:15 - 25:17

that's important in data cleansing is
25:17 - 25:19

figuring out the outliers in your data
25:19 - 25:22

set. So outliers are things like this
25:22 - 25:24

you know data points are very far from
25:24 - 25:26

the general trend of data points in your
25:26 - 25:30

data set right and and so there are also
25:30 - 25:32

several ways to detect outliers in your
25:32 - 25:34

data set and there are several ways to
25:34 - 25:37

handle outliers in your data set
25:37 - 25:38

similarly as well there are several ways
25:38 - 25:40

to handle missing values in your data
25:40 - 25:43

set so handling missing values handling
25:43 - 25:46

outliers those are really two very key
25:46 - 25:47

importance of data
25:47 - 25:49

cleansing and there are many many
25:49 - 25:51

techniques to handle this so a data
25:51 - 25:52

scientist needs to be acquainted with
25:52 - 25:55

all of this all right why do I need to
25:55 - 25:58

do data cleansing well here is the key
25:58 - 25:59

point
25:59 - 26:03

if you have a very poor quality data set
26:03 - 26:05

which means youve got a lot of outliers
26:05 - 26:07

which are errors in your data set or you
26:07 - 26:08

got a lot of missing values in your data
26:08 - 26:11

set even though youve got a fantastic
26:11 - 26:13

algorithm you've got a fantastic model
26:13 - 26:16

the predictions that your model is going
26:16 - 26:19

to give is absolutely rubbish it's kind
26:19 - 26:22

of like taking water and putting water
26:22 - 26:26

into the tank of a mercedesbenz so
26:26 - 26:28

Mercedes-Benz is a great car but if you
26:28 - 26:30

take water and put it into your
26:30 - 26:33

mercedes-ben it will just die right your
26:33 - 26:37

car will just die can't run on on water
26:37 - 26:38

right on the other hand if you have a
26:38 - 26:42

myv myv is just a lousy car but if
26:42 - 26:45

you take a high octane good Patrol and
26:45 - 26:47

you point to a MV the MV will just go at
26:47 - 26:49

you know 100 Mil hour it which just
26:49 - 26:51

completely destroy the Mercedes-Benz in
26:51 - 26:53

terms of performance so it doesn't it
26:53 - 26:55

doesn't really matter what model you're
26:55 - 26:57

using right so you can be using the most
26:57 - 26:59

Fantastic Model like the the
26:59 - 27:01

mercedesbenz or machine learning but if
27:01 - 27:03

your data is lousy quality your
27:03 - 27:06

predictions is also going to be rubbish
27:06 - 27:10

okay so cleansing data set is in fact
27:10 - 27:12

probably the most important thing that
27:12 - 27:14

data scientists need to do and that's
27:14 - 27:16

what they spend most of the time doing
27:16 - 27:18

right building the model trading the
27:18 - 27:20

model getting the right algorithms and
27:20 - 27:23

so on that's really a small portion of
27:23 - 27:25

the actual machine learning workflow
27:25 - 27:27

right the actual uh machine learning
27:27 - 27:30

workflow the vast majority of time is on
27:30 - 27:32

cleaning and organizing your
27:32 - 27:33

data then you have something called
27:33 - 27:35

feature engineering which is you
27:35 - 27:37

pre-process the feature variables of
27:37 - 27:39

your original data set prior to using
27:39 - 27:41

them to train the model and this is
27:41 - 27:42

either through addition deletion
27:42 - 27:44

combination or transformation of these
27:44 - 27:45

variables and then the idea is you want
27:45 - 27:47

to improve the predictive accuracy of
27:47 - 27:49

the model and also because some models
27:49 - 27:51

can only work with numeric data so you
27:51 - 27:54

need to transform categorical data into
27:54 - 27:57

numeric data all right so just now um in
27:57 - 27:59

the earlier slides I showed you that you
27:59 - 28:01

take your original data set you pum it
28:01 - 28:03

into algorithm and then couple of hours
28:03 - 28:05

later you get a machine learning model
28:05 - 28:09

right so you didn't do anything to your
28:09 - 28:10

data set to the feature variables in
28:10 - 28:12

your data set before you pump it into a
28:12 - 28:14

machine machine learning algorithm so
28:14 - 28:16

what I showed you earlier is you just
28:16 - 28:19

take the data set exactly as it is and
28:19 - 28:21

you just pump it into the algorithm
28:21 - 28:23

couple of hours later you get the model
28:23 - 28:28

right uh but that's not what generally
28:28 - 28:30

happens in in real life in real life
28:30 - 28:32

you're going to take all the original
28:32 - 28:34

feature variables from your data set and
28:34 - 28:37

you're going to transform them in some
28:37 - 28:39

way so you can see here these are the
28:39 - 28:42

colums of data from my original data set
28:42 - 28:46

and before I actually put all these data
28:46 - 28:48

points from my original data set into my
28:48 - 28:51

algorithm to train and get my model I
28:51 - 28:55

will actually transform them okay so the
28:55 - 28:58

transformation of these feature variable
28:58 - 29:01

values we call this feature engineering
29:01 - 29:02

and there are many many techniques to do
29:02 - 29:05

feature engineering so one hot encoding
29:05 - 29:08

scaling log transformation descri
29:08 - 29:10

discretization date extraction Boolean
29:10 - 29:12

logic etc
29:12 - 29:15

etc okay then finally we do something
29:15 - 29:17

called a train test plate so where we
29:17 - 29:19

take our original data set right so this
29:19 - 29:21

was the original data set and we break
29:21 - 29:24

it into two parts so one is called the
29:24 - 29:26

training data set and the other is
29:26 - 29:28

called the test data set and the primary
29:28 - 29:30

purpose for this is when we feed and
29:30 - 29:31

train the machine learning model we're
29:31 - 29:33

going to use what is called the training
29:33 - 29:36

data set and we when we want to evaluate
29:36 - 29:37

the accuracy of the model right so this
29:37 - 29:41

is the key part of your machine learning
29:41 - 29:44

life cycle because you are not only just
29:44 - 29:45

going to have one possible models
29:45 - 29:48

because there are a vast range of
29:48 - 29:50

algorithms that you can use to create a
29:50 - 29:53

model so fundamentally you have a wide
29:53 - 29:56

range of choices right like wide range
29:56 - 29:58

of cars right you want to buy a car you
29:58 - 30:01

can buy buy a myv you can buy a paroda
30:01 - 30:03

you can buy a Honda you can buy a
30:03 - 30:05

mercedesbenz you can buy a Audi you can
30:05 - 30:08

buy a beamer many many different cars
30:08 - 30:09

you that available for you if you want
30:09 - 30:12

to buy a car right same thing with a
30:12 - 30:14

machine learning model that are aast
30:14 - 30:17

variety of algorithms that you can
30:17 - 30:19

choose from in order to create a model
30:19 - 30:22

and so once you create a model from a
30:22 - 30:24

given algorithm you need to say hey how
30:24 - 30:26

accurate is this model that have created
30:26 - 30:29

from this algorithm and and different
30:29 - 30:30

algorithms are going to create different
30:30 - 30:34

models with different rates of accuracy
30:34 - 30:36

and so the primary purpose of the test
30:36 - 30:38

data set is to evaluate the ACC accuracy
30:38 - 30:41

of the model to see hey is this model
30:41 - 30:43

that I've created using this algorithm
30:43 - 30:46

is it adequate for me to use in a real
30:46 - 30:49

life production use case Okay so that's
30:49 - 30:52

what it's all about okay so this is my
30:52 - 30:54

original data set I break it into my
30:54 - 30:57

feature data uh feature data set and
30:57 - 30:59

also my target variable colum so my
30:59 - 31:01

feature variable uh colums the target
31:01 - 31:02

variable colums and then I further break
31:02 - 31:04

it into a training data set and a test
31:04 - 31:07

data set the training data set is to use
31:07 - 31:08

the train to create the machine learning
31:08 - 31:10

model and then once the machine learning
31:10 - 31:12

model is created I then use the test
31:12 - 31:15

data set to evaluate the accuracy of the
31:15 - 31:16

machine learning
31:16 - 31:21

model all right and then finally we can
31:21 - 31:23

see what are the different parts or
31:23 - 31:26

aspects that go into a successful model
31:26 - 31:30

so Eda about 10% data cleansing about
31:30 - 31:32

20% feature engineering about
31:32 - 31:36

25% selecting a specific algorithm about
31:36 - 31:39

10% and then training the model from
31:39 - 31:42

that algorithm about 15% and then
31:42 - 31:44

finally evaluating the model deciding
31:44 - 31:46

which is the best model with the highest
31:46 - 31:51

accuracy rate that's about
31:54 - 31:57

20% all right so we have reached the
31:57 - 31:59

most interesting part of this
31:59 - 32:01

presentation which is the demonstration
32:01 - 32:04

of an endtoend machine learning workflow
32:04 - 32:06

on a real life data set that
32:06 - 32:10

demonstrates the use case of predictive
32:10 - 32:14

maintenance so the for the data set for
32:14 - 32:16

this particular use case I've used a
32:16 - 32:19

data set from kegle so for those of you
32:19 - 32:21

are not aware of this kegle is the
32:21 - 32:25

world's largest open-source Community
32:25 - 32:28

for data science and Ai and they have a
32:28 - 32:31

large collection of data sets from all
32:31 - 32:34

various uh areas of industry and human
32:34 - 32:37

endeavor and they also have a large
32:37 - 32:39

collection of models that have been
32:39 - 32:43

developed using these data sets so here
32:43 - 32:47

we have a data set for the particular
32:47 - 32:51

use case predictive maintenance okay so
32:51 - 32:53

this is some information about the data
32:53 - 32:56

set uh so in case um you do not know how
32:56 - 32:59

to get to there this is the URL to click
32:59 - 33:02

on okay to get to that data set so once
33:02 - 33:05

you at the data set here you can or the
33:05 - 33:07

page for about this data set you can see
33:07 - 33:10

all the information about this data set
33:10 - 33:13

and you can download the data set in a
33:13 - 33:14

CSV
33:14 - 33:16

format okay so let's take a look at the
33:16 - 33:20

data set so this data set has a total of
33:20 - 33:23

10,000 samples okay and these are the
33:23 - 33:26

feature variables the type the product
33:26 - 33:28

ID the add temperature process
33:28 - 33:31

temperature rotational speed talk tool
33:31 - 33:35

Weare and this is the target variable
33:35 - 33:37

all right so the target variable is what
33:37 - 33:38

we are interested in what we are
33:38 - 33:41

interested in using to train the machine
33:41 - 33:43

learning model and also what we
33:43 - 33:45

interested to predict okay so these are
33:45 - 33:48

the feature variables they describe or
33:48 - 33:50

they provide information about this
33:50 - 33:53

particular machine on the production
33:53 - 33:55

line on the assembly line so you might
33:55 - 33:57

know the product ID the type the air
33:57 - 33:58

temperature process temperature
33:58 - 34:00

rotational speed talk to where right so
34:00 - 34:03

let's say you've got a iot sensor system
34:03 - 34:06

that's basically capturing all this data
34:06 - 34:08

about a product or a machine on your
34:08 - 34:11

production or assembly line okay and
34:11 - 34:14

you've also captured information about
34:14 - 34:17

whether is for a specific uh sample
34:17 - 34:20

whether that sample uh experien a
34:20 - 34:23

failure or not okay so the target value
34:23 - 34:26

of zero okay indicates that there's no
34:26 - 34:28

failure so zero means no failure and we
34:28 - 34:30

can see that the vast majority of data
34:30 - 34:33

points in this data set are no failure
34:33 - 34:34

and here we can see an example here
34:34 - 34:37

where you have a case of a failure so a
34:37 - 34:40

failure is marked as a one positive and
34:40 - 34:43

no failure is marked as zero negative
34:43 - 34:45

all right so here we have one type of a
34:45 - 34:47

failure it's called a power failure and
34:47 - 34:49

if you scroll down the data set you see
34:49 - 34:50

there are also other kinds of failures
34:50 - 34:53

like a towar
34:53 - 34:57

failure uh we have a over strain failure
34:57 - 34:59

here for example
34:59 - 35:01

uh we also have a power failure again
35:01 - 35:02

and so on so if you scroll down through
35:02 - 35:04

these 10,000 data points and or if
35:04 - 35:06

you're familiar with using Excel to
35:06 - 35:09

filter out values in a colume you can
35:09 - 35:12

see that in this particular colume here
35:12 - 35:14

which is the so-called Target variable
35:14 - 35:17

colume you are going to have the vast
35:17 - 35:19

majority of values as zero which means
35:19 - 35:23

no failure and some of the rows or the
35:23 - 35:24

data points you are going to have a
35:24 - 35:26

value of one and for those rows that you
35:26 - 35:28

have a value of one for example example
35:28 - 35:31

here you are sorry for example here you
35:31 - 35:33

are going to have different types of
35:33 - 35:35

failure so like I said just now power
35:35 - 35:39

failure tool set filia etc etc so we are
35:39 - 35:41

going to go through the entire machine
35:41 - 35:44

learning workflow process with this data
35:44 - 35:47

set so to see an example of that we are
35:47 - 35:50

going to use a we're going to go to the
35:50 - 35:52

code section here all right so if I
35:52 - 35:54

click on the code section here and right
35:54 - 35:56

down here we have see what is called a
35:56 - 35:59

data set notebook so this is basically a
35:59 - 36:02

Jupiter notebook Jupiter is basically an
36:02 - 36:05

python application which allows you to
36:05 - 36:09

create a python machine learning
36:09 - 36:12

program that basically builds your
36:12 - 36:15

machine learning model assesses or
36:15 - 36:16

evaluates his accuracy and generates
36:16 - 36:19

predictions from it okay so here we have
36:19 - 36:22

a whole bunch of Jupiter notebooks that
36:22 - 36:25

are available and you can select any one
36:25 - 36:26

of them all these notebooks are
36:26 - 36:29

essentially going to process the data
36:29 - 36:32

from this particular data set so if I go
36:32 - 36:35

to this code page here I've actually
36:35 - 36:37

selected a specific notebook that I'm
36:37 - 36:40

going to run through to demonstrate an
36:40 - 36:43

endtoend machine learning workflow using
36:43 - 36:46

various machine learning libraries from
36:46 - 36:50

the Python programming language okay so
36:50 - 36:52

the uh particular notebook I'm going to
36:52 - 36:55

use is this particular notebook here and
36:55 - 36:57

you can also get the URL for that
36:57 - 37:00

particular The Notebook from
37:00 - 37:04

here okay so let's quickly do a quick
37:04 - 37:06

revision again what are we trying to do
37:06 - 37:08

here we're trying to build a machine
37:08 - 37:11

learning classification model right so
37:11 - 37:13

we said there are two primary areas of
37:13 - 37:15

supervised learning one is regression
37:15 - 37:16

which is used to predict a numerical
37:16 - 37:19

Target variable and the second kind of
37:19 - 37:21

supervised learning is classification
37:21 - 37:23

which is what we're doing here we're
37:23 - 37:26

trying to predict a categorical Target
37:26 - 37:30

variable okay so in this particular
37:30 - 37:32

example we actually have two kinds of
37:32 - 37:34

ways we can classify either a binary
37:34 - 37:38

classification or a multiclass
37:38 - 37:40

classification so for binary
37:40 - 37:41

classification we are only going to
37:41 - 37:43

classify the product or machine as
37:43 - 37:47

either it failed or it did not fail okay
37:47 - 37:49

so if we go back to the data set that I
37:49 - 37:51

showed you just now if you look at this
37:51 - 37:53

target variable colume there are only
37:53 - 37:55

two possible values here they either
37:55 - 37:58

zero or one zero means there's no fi
37:58 - 38:01

one means that's a failure okay so this
38:01 - 38:03

is an example of a binary classification
38:03 - 38:07

only two possible outcomes zero or one
38:07 - 38:10

didn't fail or fail all right two
38:10 - 38:13

possible outcomes and then we can also
38:13 - 38:15

for the same data set we can extend it
38:15 - 38:18

and make it a multiclass classification
38:18 - 38:21

problem all right so if we kind of want
38:21 - 38:24

to drill down further we can say that
38:24 - 38:27

not only is there a failure we can
38:27 - 38:29

actually say that are different types of
38:29 - 38:32

failures okay so we have one category of
38:32 - 38:36

class that is basically no failure okay
38:36 - 38:37

then we have a category for the
38:37 - 38:40

different types of failures right so you
38:40 - 38:44

can have a power failure you could have
38:44 - 38:46

a tool Weare
38:46 - 38:49

failure uh you could have let's go down
38:49 - 38:51

here you could have a over strain
38:51 - 38:54

failure and etc etc so you can have
38:54 - 38:57

multiple classes of failure in addition
38:57 - 39:01

to the general overall or the majority
39:01 - 39:04

class of no failure and that would be a
39:04 - 39:07

multiclass classification problem so
39:07 - 39:08

with this data set we are going to see
39:08 - 39:11

how to make it a binary classification
39:11 - 39:13

problem and also a multiclass
39:13 - 39:15

classification problem okay so let's
39:15 - 39:17

look at the workflow so let's say we've
39:17 - 39:19

already got the data so right now we do
39:19 - 39:21

have the data set this is the data set
39:21 - 39:23

that we have so let's assume we've
39:23 - 39:25

somehow managed to get this data set
39:25 - 39:27

from some iot sensors that are
39:27 - 39:29

monitoring realtime data in our
39:29 - 39:31

production environment on the assembly
39:31 - 39:33

line on the production line we've got
39:33 - 39:35

sensors reading data that gives us all
39:35 - 39:38

these data that we have in this CSV file
39:38 - 39:40

Okay so we've already got the data we've
39:40 - 39:42

retrieved the data now we're going to go
39:42 - 39:45

on to the cleaning and exploration part
39:45 - 39:48

of your machine learning life cycle all
39:48 - 39:50

right so let's look at the data cleaning
39:50 - 39:51

part so the data cleaning part we
39:51 - 39:54

interested in uh checking for missing
39:54 - 39:56

values and maybe removing the rows you
39:56 - 39:58

missing values okay
39:58 - 40:00

uh so the kind of things we can sorry
40:00 - 40:01

the kind of things we can do in missing
40:01 - 40:03

values we can remove the row missing
40:03 - 40:06

values we can put in some new values uh
40:06 - 40:08

some replacement values which could be a
40:08 - 40:10

average of all the values in that that
40:10 - 40:13

particular colume etc etc we also try to
40:13 - 40:15

identify outliers in our data set and
40:15 - 40:17

also there are a variety of ways to deal
40:17 - 40:19

with that so this is called Data
40:19 - 40:21

cleansing which is a really important
40:21 - 40:23

part of your machine learning workflow
40:23 - 40:26

right so that's where we are now at
40:26 - 40:27

we're doing cleansing and then we're
40:27 - 40:29

going to follow up with
40:29 - 40:31

exploration so let's look at the actual
40:31 - 40:33

code that does the cleansing here so
40:33 - 40:36

here we are right at the start of the uh
40:36 - 40:38

machine learning uh life cycle here so
40:38 - 40:41

this is a Jupiter notebook so here we
40:41 - 40:43

have a brief description of the problem
40:43 - 40:46

statement all right so this data set
40:46 - 40:48

reflects real life predictive
40:48 - 40:49

maintenance enounter industry with
40:49 - 40:50

measurements from real equipment the
40:50 - 40:52

features description is taken directly
40:52 - 40:55

from the data source set so here we have
40:55 - 40:57

a description of the six key features in
40:57 - 41:00

our data set type which is the quality
41:00 - 41:03

of the product the air temperature the
41:03 - 41:05

process temperature the rotational speed
41:05 - 41:07

the talk and the towar all right so
41:07 - 41:09

these are the six feature variables and
41:09 - 41:11

there are the two target variables so
41:11 - 41:13

just now I showed you just now there's
41:13 - 41:15

one target variable which only has two
41:15 - 41:17

possible values either zero or one okay
41:17 - 41:20

zero or one means failure or no failure
41:20 - 41:23

so that will be this colume here right
41:23 - 41:25

so let me go all the way back up to here
41:25 - 41:27

so this colume here we already saw it
41:27 - 41:29

only has two I values is either zero or
41:29 - 41:33

one and then we also have this column
41:33 - 41:35

here and this column here is basically
41:35 - 41:38

the failure type and so the we have as I
41:38 - 41:41

already demonstrated just now we do have
41:41 - 41:43

uh several categories of or types of
41:43 - 41:46

failure and so here we call this
41:46 - 41:47

multiclass
41:47 - 41:50

classification so we can either build a
41:50 - 41:52

binary classification model for this
41:52 - 41:54

problem domain or we can build a
41:54 - 41:55

multiclass
41:55 - 41:58

classification problem all right so this
41:58 - 42:00

jupyter notebook is going to demonstrate
42:00 - 42:02

both approaches to us so first step we
42:02 - 42:05

are going to write all this python code
42:05 - 42:07

that's going to import all the libraries
42:07 - 42:09

that we need to use okay so this is
42:09 - 42:12

basically python code okay and it's
42:12 - 42:15

importing the relevant machine learn
42:15 - 42:18

oops we are importing the relevant
42:18 - 42:21

machine learning libraries related to
42:21 - 42:24

our domain use case okay then we load in
42:24 - 42:26

our data set okay so this our data set
42:26 - 42:28

we describe it we have some quick
42:28 - 42:31

insights into the data set um and then
42:31 - 42:33

we just take a look at all the variables
42:33 - 42:36

of the feature variables Etc and so on
42:36 - 42:38

we just what we're doing now is just
42:38 - 42:40

doing a quick overview of the data set
42:40 - 42:42

so this all this python code here they
42:42 - 42:44

were writing is allowing us the data
42:44 - 42:45

scientist to get a quick overview of our
42:45 - 42:48

data set right okay like how many um V
42:48 - 42:50

how many rows are there how many columns
42:50 - 42:52

are there what are the data types of the
42:52 - 42:53

colums what are the name of the columns
42:53 - 42:57

etc etc okay then we zoom in on to the
42:57 - 42:59

Target variables so we look at the
42:59 - 43:02

Target variables how many uh counts
43:02 - 43:05

there are of this target variable uh and
43:05 - 43:06

so on how many different types of
43:06 - 43:08

failures there are then you want to
43:08 - 43:09

check whether there are any
43:09 - 43:11

inconsistencies between the Target and
43:11 - 43:14

the failure type Etc okay so when you do
43:14 - 43:15

all this checking you're going to
43:15 - 43:17

discover there are some discrepancies in
43:17 - 43:20

your data set so using a specific python
43:20 - 43:22

code to do checking you're going to say
43:22 - 43:23

hey you know what there's some errors
43:23 - 43:25

here right there are nine values that
43:25 - 43:27

classify as failure and Target variable
43:27 - 43:28

but as no no failure in the failure type
43:28 - 43:30

variable so that means there's a
43:30 - 43:33

discrepancy in your data point right so
43:33 - 43:35

which are so these are all the ones that
43:35 - 43:36

are discrepancies because the target
43:36 - 43:39

variable says one and we already know
43:39 - 43:41

that Target variable one is supposed to
43:41 - 43:43

mean that it's a failure right target
43:43 - 43:45

varable one is supposed to mean that is
43:45 - 43:47

a failure so we are kind of expecting to
43:47 - 43:50

see the failure classification but some
43:50 - 43:51

rows actually say there's no failure
43:51 - 43:54

although the target type is one but here
43:54 - 43:56

is a classic example of an error that
43:56 - 43:59

can very well Ur in a data set so now
43:59 - 44:01

the question is what do you do with
44:01 - 44:05

these errors in your data set right so
44:05 - 44:06

here the data scientist says I think it
44:06 - 44:08

would make sense to remove those
44:08 - 44:10

instances and so they write some code
44:10 - 44:13

then to remove those instances or those
44:13 - 44:15

uh rows or data points from the overall
44:15 - 44:17

data set and same thing we can again
44:17 - 44:19

check for other ISU so we find there's
44:19 - 44:21

another ISU here with our data set which
44:21 - 44:24

is another warning so again we can
44:24 - 44:26

possibly remove them so you're going to
44:26 - 44:31

remove 20 7 instances or rows from your
44:31 - 44:34

overall data set so your data set has a
44:34 - 44:37

10,000 uh rows or data points you're
44:37 - 44:40

removing 27 which is only 0.27 of the
44:40 - 44:42

entire data set and these were the
44:42 - 44:46

reasons why you remove them okay so if
44:46 - 44:48

you're just removing to uh 0.27% of the
44:48 - 44:51

anti data set no big deal right still
44:51 - 44:53

okay but you needed to remove them
44:53 - 44:56

because these errors right this
44:56 - 44:58

27 um
44:58 - 45:01

errors okay data points with errors in
45:01 - 45:03

your data set could really affect the
45:03 - 45:05

training of your machine learning model
45:05 - 45:09

so we need to do your data cleansing
45:09 - 45:12

right so we are actually cleansing now
45:12 - 45:15

uh uh some kind of data that is
45:15 - 45:18

incorrect or erroneous in your original
45:18 - 45:21

data set okay so then we go on to the
45:21 - 45:24

next part which is called Eda right so
45:24 - 45:29

Eda is where we kind of explore our data
45:29 - 45:32

and we want to kind of get a visual
45:32 - 45:34

overview of our data as a whole and also
45:34 - 45:36

take a look at the statistical
45:36 - 45:38

properties of data the statistical
45:38 - 45:40

distribution of the data in all the
45:40 - 45:43

various colums the correlation between
45:43 - 45:45

the variables between the feature
45:45 - 45:47

variables different columns and also the
45:47 - 45:49

feature variable and the target variable
45:49 - 45:52

so all of this is called Eda and Eda in
45:52 - 45:54

a machine learning workflow is typically
45:54 - 45:57

done through visualization
45:57 - 45:59

all right so let's go back here and take
45:59 - 46:01

a look right so for example here we are
46:01 - 46:03

looking at correlation so we plot the
46:03 - 46:06

values of all the various feature
46:06 - 46:08

variables against each other and look
46:08 - 46:11

for potential correlations and patterns
46:11 - 46:13

and so on and all the different shapes
46:13 - 46:17

that you see here in this pair plot okay
46:17 - 46:18

uh will have different meaning
46:18 - 46:20

statistical meaning and so the data
46:20 - 46:22

scientist has to kind of visually
46:22 - 46:24

inspect this P plot makes some
46:24 - 46:26

interpretations of these different
46:26 - 46:28

patterns that he sees here all right so
46:28 - 46:30

these are some of the insights that that
46:30 - 46:33

can be deduced from looking at these
46:33 - 46:34

pattern so for example the Tor and
46:34 - 46:36

rotational speed are highly correlated
46:36 - 46:38

the process temperature and a
46:38 - 46:40

temperature so highly correlated that
46:40 - 46:42

failures occur for extreme values of
46:42 - 46:45

some features etc etc then you can plot
46:45 - 46:46

certain kinds of charts this called a
46:46 - 46:48

violing chart to again get new insights
46:48 - 46:50

for example regarding the talk and
46:50 - 46:51

rotational speed it can see again that
46:51 - 46:53

most failures are triggered for much
46:53 - 46:55

lower or much higher values than the
46:55 - 46:57

mean when they're not failing so all
46:57 - 47:01

these visualizations they are there and
47:01 - 47:02

a trained data scientist can look at
47:02 - 47:05

them inspect them and make some kind of
47:05 - 47:08

insightful deductions from them okay
47:08 - 47:11

percentage of failure right uh the
47:11 - 47:14

correlation heat map okay between all
47:14 - 47:16

these different feature variables and
47:16 - 47:17

also the target
47:17 - 47:20

variable okay uh the product types
47:20 - 47:21

percentage of product types percentage
47:21 - 47:23

of failure with respect to the product
47:23 - 47:26

type so we can also kind of visualize
47:26 - 47:28

that as well so certain products have a
47:28 - 47:30

higher ratio of faure compared to other
47:30 - 47:33

product types Etc or for example uh M
47:33 - 47:36

tends to feel more than H products etc
47:36 - 47:39

etc so we can create a vast variety of
47:39 - 47:41

visualizations in the Eda stage so you
47:41 - 47:44

can see here and again the idea of this
47:44 - 47:46

visualization is just to give us some
47:46 - 47:50

insight some preliminary insight into
47:50 - 47:53

our data set that helps us to model it
47:53 - 47:54

more correctly so some more insights
47:54 - 47:56

that we get into our data set from all
47:56 - 47:58

this visualization
47:58 - 48:00

then we can plot the distribution so we
48:00 - 48:01

can see whether it's a normal
48:01 - 48:03

distribution or some other kind of
48:03 - 48:06

distribution uh we can have a box plot
48:06 - 48:08

to see whether there are any outliers in
48:08 - 48:10

your data set and so on right so we can
48:10 - 48:12

see from the box plots we can see
48:12 - 48:15

rotational speed and have outliers so we
48:15 - 48:17

already saw outliers are basically a
48:17 - 48:19

problem that you may need to kind of
48:19 - 48:23

tackle right so outliers are an isue uh
48:23 - 48:25

it's a it's a part of data cleansing and
48:25 - 48:27

so you may need to tackle this so we may
48:27 - 48:29

have to check okay well where are the
48:29 - 48:31

potential outliers so we can analyze
48:31 - 48:35

them from the box blot okay um but then
48:35 - 48:37

we can say well they are outliers but
48:37 - 48:39

maybe they're not really horrible
48:39 - 48:41

outliers so we can tolerate them or
48:41 - 48:43

maybe we want to remove them so we can
48:43 - 48:45

see what the mean and maximum values for
48:45 - 48:47

all these with respect to product type
48:47 - 48:50

how many of them are above or highly
48:50 - 48:51

correlated with the product type in
48:51 - 48:54

terms of the maximum and minimum okay
48:54 - 48:57

and then so on so the Insight is well we
48:57 - 49:00

got 4.8% of the instances are outliers
49:00 - 49:03

so maybe 4.87% is not really that much
49:03 - 49:05

the outliers are not horrible so we just
49:05 - 49:07

leave them in the data set now for a
49:07 - 49:09

different data set the data scientist
49:09 - 49:10

could come to different conclusion so
49:10 - 49:12

then they would do whatever they've
49:12 - 49:15

deemed is appropriate to kind of cleanse
49:15 - 49:18

the data set okay so now that we have
49:18 - 49:20

done all the Eda the next thing we're
49:20 - 49:23

going to do is we are going to do what
49:23 - 49:26

is called feature engineering so we are
49:26 - 49:29

going to transform our original feature
49:29 - 49:31

variables and these are our original
49:31 - 49:33

feature variables right these are our
49:33 - 49:35

original feature variables and we are
49:35 - 49:38

going to transform them all right we're
49:38 - 49:40

going to transform them in some sense uh
49:40 - 49:44

into some other form before we fit this
49:44 - 49:46

for training into our machine learning
49:46 - 49:49

algorithm all right so these are
49:49 - 49:52

examples of let's say this example of a
49:52 - 49:55

original data set right and this is
49:55 - 49:57

examples these are some of the examples
49:57 - 49:58

you don't have to use all of them but
49:58 - 49:59

these are some of examples of what we
49:59 - 50:01

call feature engineering which you can
50:01 - 50:04

then transform your original values in
50:04 - 50:05

your feature variables to all these
50:05 - 50:08

transform values here so we're going to
50:08 - 50:10

pretty much do that here so we have a
50:10 - 50:13

ordinal encoding we do scaling of the
50:13 - 50:15

data so the data set is scaled we use a
50:15 - 50:18

minmax scaling and then finally we come
50:18 - 50:22

to do a modeling so we have to split our
50:22 - 50:24

data set into a training data set and a
50:24 - 50:29

test data set so coming back to again um
50:29 - 50:32

we said that in a before you train your
50:32 - 50:34

model sorry before you train your model
50:34 - 50:36

you have to take your original data set
50:36 - 50:37

now this is a featured engineered data
50:37 - 50:39

set we're going to break it into two or
50:39 - 50:41

more subsets okay so one is called the
50:41 - 50:42

training data set that we use to Feit
50:42 - 50:44

and train a machine learning model the
50:44 - 50:46

second is test data set to evaluate the
50:46 - 50:48

accuracy of the model okay so we got
50:48 - 50:51

this training data set your test data
50:51 - 50:53

set and we also need
50:53 - 50:56

to sample so from our original data set
50:56 - 50:57

we need to sample sample some points
50:57 - 50:59

that go into your training data set some
50:59 - 51:01

points that go in your test data set so
51:01 - 51:03

there are many ways to do sampling one
51:03 - 51:05

way is to do stratified sampling where
51:05 - 51:07

we ensure the same proportion of data
51:07 - 51:09

from each steta or class because right
51:09 - 51:11

now we have a multiclass classification
51:11 - 51:12

problem so you want to make sure the
51:12 - 51:14

same proportion of data from each TR
51:14 - 51:16

class is equally proportional in the
51:16 - 51:18

training and test data set as the
51:18 - 51:20

original data set which is very useful
51:20 - 51:22

for dealing with what is called an
51:22 - 51:24

imbalanced data set so here we have an
51:24 - 51:26

example of what is called an imbalanced
51:26 - 51:30

data set in the sense that you have the
51:30 - 51:33

vast majority of data points in your
51:33 - 51:35

data set they are going to have the
51:35 - 51:37

value of zero for their target variable
51:37 - 51:40

colume so only a extremely small
51:40 - 51:43

minority of the data points in your data
51:43 - 51:45

set will actually have the value of one
51:45 - 51:49

for their target variable colume okay so
51:49 - 51:51

a situation where you have your class or
51:51 - 51:53

your target variable colume where the
51:53 - 51:54

vast majority of values are from one
51:54 - 51:58

class and a tiny small minority are from
51:58 - 52:01

another class we call this an imbalanced
52:01 - 52:03

data set and for an imbalanced data set
52:03 - 52:04

typically we will have a specific
52:04 - 52:06

technique to do the train test split
52:06 - 52:08

which is called stratified sampling and
52:08 - 52:10

so that's what's exactly happening here
52:10 - 52:12

we're doing a stratified split here so
52:12 - 52:15

we are doing a train test split here uh
52:15 - 52:18

and we are doing a stratified split uh
52:18 - 52:20

and then now we actually develop the
52:20 - 52:23

models so now we've got the train test
52:23 - 52:25

plate now here is where we actually
52:25 - 52:27

train the models
52:27 - 52:30

now in terms of classification there are
52:30 - 52:32

a whole bunch of
52:32 - 52:35

possibilities right that you can use
52:35 - 52:38

there are many many different algorithms
52:38 - 52:41

that we can use to create a
52:41 - 52:43

classification model so this are an
52:43 - 52:45

example of some of the more common ones
52:45 - 52:47

logistic support Vector machine decision
52:47 - 52:50

trees random Forest bagging balance
52:50 - 52:53

bagging boost assemble Ensemble so all
52:53 - 52:55

these are different algorithms which
52:55 - 52:58

will create different kind of models
52:58 - 53:02

which will result in different accuracy
53:02 - 53:05

measures okay so it's the goal of the
53:05 - 53:09

data scientist to find the best model
53:09 - 53:12

that gives the best accuracy for the
53:12 - 53:14

given data set for training on that
53:14 - 53:17

given data set so let's head back again
53:17 - 53:20

to uh our machine learning workflow so
53:20 - 53:22

here basically what I'm doing is I'm
53:22 - 53:24

creating a whole bunch of models here
53:24 - 53:26

all right so one is a random Forest one
53:26 - 53:27

is balance bagging one is a boost
53:27 - 53:30

classifier one's The Ensemble classifier
53:30 - 53:33

and using all of these I am going to
53:33 - 53:35

basically Feit or train my model using
53:35 - 53:37

all these algorithms and then I'm going
53:37 - 53:40

to evaluate them okay I'm going to
53:40 - 53:42

evaluate how good each of these models
53:42 - 53:46

are and here you can see your value your
53:46 - 53:49

evaluation data right okay and this is
53:49 - 53:51

the confusion Matrix which is another
53:51 - 53:54

way of evaluating so now we come to the
53:54 - 53:56

kind of the the the key part here which
53:56 - 53:59

is which is how do I distinguish between
53:59 - 54:00

all these models right I've got all
54:00 - 54:01

these different models which are built
54:01 - 54:03

with different algorithms which I'm
54:03 - 54:05

using to train on the same data set how
54:05 - 54:07

do I distinguish between all these
54:07 - 54:10

models okay and so for that sense for
54:10 - 54:14

that we actually have a whole bunch of
54:14 - 54:16

common evaluation matrics for
54:16 - 54:18

classification right so this evaluation
54:18 - 54:22

matrics tell us how good a model is in
54:22 - 54:24

terms of its accuracy in
54:24 - 54:27

classification so in terms of
54:27 - 54:29

accuracy we actually have many different
54:29 - 54:32

models uh sorry many different measures
54:32 - 54:33

right you might think well accuracy is
54:33 - 54:35

just accuracy well that's all right it's
54:35 - 54:37

just either it's accurate or it's not
54:37 - 54:39

accurate right but actually it's not
54:39 - 54:41

that simple there are many different
54:41 - 54:44

ways to measure the accuracy of a
54:44 - 54:45

classification model and these are some
54:45 - 54:48

of the more common ones so for example
54:48 - 54:51

the confusion metrix tells us how many
54:51 - 54:54

true positives that means the value is
54:54 - 54:56

positive the prediction is positive how
54:56 - 54:58

many false FAL positives which means the
54:58 - 54:59

value is negative the machine learning
54:59 - 55:02

model predicts positive how many false
55:02 - 55:04

negatives which means that the machine
55:04 - 55:06

learning model predicts negative but
55:06 - 55:07

it's actually positive and how many true
55:07 - 55:09

negatives there are which means that the
55:09 - 55:11

machine the machine learning model
55:11 - 55:13

predicts negative and the true value is
55:13 - 55:15

also negative so this is called a
55:15 - 55:17

confusion Matrix this is one way we
55:17 - 55:19

assess or evaluate the performance of a
55:19 - 55:21

classification
55:21 - 55:23

model okay this is for binary
55:23 - 55:25

classification we can also have
55:25 - 55:27

multiclass confusion Matrix
55:27 - 55:29

and then we can also measure things like
55:29 - 55:32

accuracy so accuracy is the true
55:32 - 55:34

positives plus the true negatives which
55:34 - 55:35

is the total number of correct
55:35 - 55:38

predictions made by the model divided by
55:38 - 55:40

the total number of data points in your
55:40 - 55:43

data set and then you have also other
55:43 - 55:44

kinds of
55:44 - 55:47

measures uh such as recall and this is a
55:47 - 55:49

formula for recall this is a formula for
55:49 - 55:51

the F1 score okay and then there's
55:51 - 55:56

something called the uh R curve right so
55:56 - 55:57

without going too much in the detail of
55:57 - 55:59

what each of these entails essentially
55:59 - 56:01

these are all different ways these are
56:01 - 56:03

different kpi right just like if you
56:03 - 56:06

work in a company you have different kpi
56:06 - 56:08

right certain employees have certain kpi
56:08 - 56:11

that measures how good or how how uh you
56:11 - 56:13

know efficient or how effective a
56:13 - 56:16

particular employee is right so the
56:16 - 56:20

kpi kpi for your machine learning models
56:20 - 56:24

are Roc curve F1 score recall accuracy
56:24 - 56:27

okay and your confusion Matrix so so
56:27 - 56:30

fundamentally after I have built right
56:30 - 56:33

so here I've built my four different
56:33 - 56:35

models so after I built these form
56:35 - 56:38

different models I'm going to check and
56:38 - 56:40

evaluate them using all those different
56:40 - 56:42

metrics like for example the F1 score
56:42 - 56:45

the Precision score the recall score all
56:45 - 56:47

right so for this model I can check out
56:47 - 56:50

the ROC score the F1 score the Precision
56:50 - 56:52

score the recall score then for this
56:52 - 56:55

model this is the ROC score the F1 score
56:55 - 56:57

the Precision score the recall called
56:57 - 57:00

then for this model and so on so for
57:00 - 57:03

every single model I've created using my
57:03 - 57:06

training data set I will have all my set
57:06 - 57:08

of evaluation metrics that I can use to
57:08 - 57:12

evaluate how good this model is okay
57:12 - 57:13

same thing here I've got a confusion
57:13 - 57:15

Matrix here right so I can use that
57:15 - 57:18

again to evaluate between all these four
57:18 - 57:20

different models and then I kind of
57:20 - 57:22

summarize it up here so we can see from
57:22 - 57:25

this summary here that actually the top
57:25 - 57:28

two models right which are I'm going to
57:28 - 57:29

give a lot as a data scientist I'm now
57:29 - 57:31

going to just focus on these two models
57:31 - 57:33

so these two models are begging
57:33 - 57:36

classifier and random Forest classifier
57:36 - 57:38

they have the highest values of F1 score
57:38 - 57:40

and the highest values of the rooc curve
57:40 - 57:43

score okay so we can say these are the
57:43 - 57:46

top two models in terms of accuracy okay
57:46 - 57:49

using the fub1 evaluation metric and the
57:49 - 57:54

r Au evaluation metric okay so these
57:54 - 57:57

results uh kind of summarize here and
57:57 - 57:59

then we use different sampling
57:59 - 58:01

techniques okay so just now I talked
58:01 - 58:04

about um different kinds of sampling
58:04 - 58:06

techniques and so the idea of different
58:06 - 58:08

kinds of sampling techniques is to just
58:08 - 58:11

get a different feel for different
58:11 - 58:14

distributions of the data in different
58:14 - 58:16

areas of your data set so that you want
58:16 - 58:20

to just kind of make sure that your your
58:20 - 58:23

your evaluation of accuracy is actually
58:23 - 58:27

statistically correct right so we can um
58:27 - 58:30

do what is called oversampling and under
58:30 - 58:31

sampling which is very useful when
58:31 - 58:32

you're working with an imbalance data
58:32 - 58:35

set so this is example of doing that and
58:35 - 58:37

then here we again again check out the
58:37 - 58:39

results for all these different
58:39 - 58:42

techniques we use uh the F1 score the Au
58:42 - 58:44

score all right these are the two key
58:44 - 58:47

measures of accuracy right so and then
58:47 - 58:48

we can check out the scores for the
58:48 - 58:50

different approaches okay so we can see
58:50 - 58:53

oh well overall the models have lower Au
58:53 - 58:56

r r Au C score but they have a much
58:56 - 58:58

higher F1 score the begging classifier
58:58 - 59:01

had the highest R1 highest roc1 score
59:01 - 59:04

but F1 score was too low okay then in
59:04 - 59:07

the data scientist opinion the random
59:07 - 59:09

forest with this particular technique of
59:09 - 59:11

sampling has equilibrium between the F1
59:11 - 59:14

R F1 R and A score so the takeaway one
59:14 - 59:17

is the macro F1 score improves
59:17 - 59:18

dramatically using the sampl sampling
59:18 - 59:20

techniqu so these models might be better
59:20 - 59:22

compared to the balanced ones all right
59:22 - 59:26

so based on all this uh evaluation the
59:26 - 59:28

data scientist says they're going to
59:28 - 59:30

continue to work with these two models
59:30 - 59:31

all right and the balance begging one
59:31 - 59:33

and then continue to make further
59:33 - 59:35

comparisons all right so then we
59:35 - 59:37

continue to keep refining on our
59:37 - 59:39

evaluation work here we're going to
59:39 - 59:41

train the models one more time again so
59:41 - 59:43

we again do a training test plate and
59:43 - 59:45

then we do that for this particular uh
59:45 - 59:47

approach model and then we print out we
59:47 - 59:48

print out what is called a
59:48 - 59:51

classification report and this is
59:51 - 59:53

basically a summary of all those metrics
59:53 - 59:55

that I talk about just now so just now
59:55 - 59:58

remember I said the the there was
59:58 - 60:00

several evaluation metrics right so uh
60:00 - 60:01

we had the confusion matrics the
60:01 - 60:04

accuracy the Precision the recall the Au
60:04 - 60:08

ccore so here with the um classification
60:08 - 60:10

report I can get a summary of all of
60:10 - 60:12

that so I can see all the values here
60:12 - 60:15

okay for this particular model begging
60:15 - 60:17

Tomac links and then I can do that for
60:17 - 60:19

another model the random Forest
60:19 - 60:21

borderline SME and then I can do that
60:21 - 60:22

for another model which is the balance
60:22 - 60:25

ping so again we see this a lot of
60:25 - 60:27

comparison between different models
60:27 - 60:29

trying to figure out what all these
60:29 - 60:31

evaluation metrics are telling us all
60:31 - 60:33

right then again we have a confusion
60:33 - 60:36

Matrix so we generate a confusion Matrix
60:36 - 60:39

for the bagging with the toac links
60:39 - 60:41

under sampling for the random followers
60:41 - 60:43

with the borderline mod over sampling
60:43 - 60:45

and just balance begging by itself then
60:45 - 60:48

again we compare between these three uh
60:48 - 60:51

models uh using the confusion Matrix
60:51 - 60:53

evaluation Matrix and then we can kind
60:53 - 60:56

of come to some conclusions all right so
60:56 - 60:58

right so now we look at all the data
60:58 - 61:01

then we move on and look at another um
61:01 - 61:03

another kind of evaluation metrix which
61:03 - 61:07

is the r score right so this is one of
61:07 - 61:09

the other evaluation metrics I talk
61:09 - 61:11

about so this one is a kind of a curve
61:11 - 61:13

you look at it to see the area
61:13 - 61:14

underneath the curve this is called AOC
61:14 - 61:18

R area under the curve sorry Au Au R
61:18 - 61:20

area under the curve all right so the
61:20 - 61:22

area under the curve uh
61:22 - 61:24

score will give us some idea about the
61:24 - 61:26

threshold that we're going to use for
61:26 - 61:28

classif ification so we can examine this
61:28 - 61:29

for the bagging classifier for the
61:29 - 61:31

random forest classifier for the balance
61:31 - 61:34

bagging classifier okay then we can also
61:34 - 61:36

again do that uh finally we can check
61:36 - 61:38

the classification report of this
61:38 - 61:40

particular model so we keep doing this
61:40 - 61:43

over and over again evaluating this m
61:43 - 61:46

The Matrix the the accuracy Matrix the
61:46 - 61:47

evaluation Matrix for all these
61:47 - 61:49

different models so we keep doing this
61:49 - 61:51

over and over again for different
61:51 - 61:53

thresholds or for classification and so
61:53 - 61:57

as we keep drilling into these we kind
61:57 - 62:01

of get more and more understanding of
62:01 - 62:03

all these different models which one is
62:03 - 62:05

the best one that gives the best
62:05 - 62:09

performance for our data set okay so
62:09 - 62:11

finally we come to this conclusion this
62:11 - 62:14

particular model is not able to reduce
62:14 - 62:15

the record on failure test than
62:15 - 62:18

95.8% on the other hand balance begging
62:18 - 62:19

with a decision thresold of 0.6 is able
62:19 - 62:22

to have a better recall blah blah blah
62:22 - 62:25

Etc so finally after having done all of
62:25 - 62:27

this evalu ations
62:27 - 62:31

okay this is the conclusion
62:31 - 62:34

so after having gone so right now we
62:34 - 62:35

have gone through all the steps of the
62:35 - 62:38

Machining learning life cycle and which
62:38 - 62:40

means we have right now or the data
62:40 - 62:42

scientist right now has gone through all
62:42 - 62:43

these
62:43 - 62:47

steps uh which is now we have done this
62:47 - 62:49

validation so we have done the cleaning
62:49 - 62:51

exploration preparation transformation
62:51 - 62:53

the future engineering we have developed
62:53 - 62:54

and trained multiple models we have
62:54 - 62:56

evaluated all these different models so
62:56 - 62:59

right now we have reached this stage so
62:59 - 63:03

at this stage we as the data scientist
63:03 - 63:05

kind of have completed our job so we've
63:05 - 63:08

come to some very useful conclusions
63:08 - 63:10

which we now can share with our
63:10 - 63:13

colleagues all right and based on this
63:13 - 63:15

uh conclusions or recommendations
63:15 - 63:17

somebody is going to choose a
63:17 - 63:19

appropriate model and that model is
63:19 - 63:23

going to get deployed for realtime use
63:23 - 63:25

in a real life production environment
63:25 - 63:27

okay and that decision is going to be
63:27 - 63:29

made based on the recommendations coming
63:29 - 63:31

from the data scientist at the end of
63:31 - 63:33

this phase okay so at the end of this
63:33 - 63:35

phase the data scientist is going to
63:35 - 63:37

come up with these conclusions so
63:37 - 63:42

conclusions is okay if the engineering
63:42 - 63:45

team they are looking okay the
63:45 - 63:46

engineering team right the engineering
63:46 - 63:49

team if they are looking for the highest
63:49 - 63:52

failure detection rate possible then
63:52 - 63:54

they should go with this particular
63:54 - 63:57

model okay
63:57 - 63:59

and if they want a balance between
63:59 - 64:01

precision and recall then they should
64:01 - 64:03

choose between the begging model with a
64:03 - 64:06

0.4 decision threshold or the random
64:06 - 64:10

forest model with a 0.5 threshold but if
64:10 - 64:12

they don't care so much about predicting
64:12 - 64:14

every failure and they want the highest
64:14 - 64:17

Precision possible then they should opt
64:17 - 64:20

for the begging toax link classifier
64:20 - 64:23

with a bit higher decision threshold and
64:23 - 64:26

so this is the key thing that the data
64:26 - 64:28

scientist is going to give right this is
64:28 - 64:31

the key takeaway this is the kind of the
64:31 - 64:33

end result of the entire machine
64:33 - 64:35

learning life cycle right now the data
64:35 - 64:36

scientist is going to tell the
64:36 - 64:39

engineering team all right you guys
64:39 - 64:41

which is more important for you point a
64:41 - 64:45

point B or Point C make your decision so
64:45 - 64:47

the engineering team will then discuss
64:47 - 64:49

among themselves and say hey you know
64:49 - 64:52

what what we want is we want to get the
64:52 - 64:55

highest failure detection possible
64:55 - 64:58

because any kind kind of failure of that
64:58 - 65:00

machine or the product on the samply
65:00 - 65:03

line is really going to screw us up big
65:03 - 65:06

time so what we're looking for is the
65:06 - 65:08

model that will give us the highest
65:08 - 65:11

failure detection rate we don't care
65:11 - 65:13

about Precision but we want to be make
65:13 - 65:15

sure that if there's a failure we are
65:15 - 65:18

going to catch it right so that's what
65:18 - 65:20

they want and so the data scientist will
65:20 - 65:22

say Hey you go for the balance begging
65:22 - 65:25

model okay then the data scientist saves
65:25 - 65:28

this all right uh and then once you have
65:28 - 65:30

saved this uh you can then go right
65:30 - 65:32

ahead and deploy that so you can go
65:32 - 65:34

right ahead and deploy that to
65:34 - 65:37

production okay and so if you want to
65:37 - 65:39

continue we can actually further
65:39 - 65:41

continue this modeling problem so just
65:41 - 65:43

now I model this problem as a binary
65:43 - 65:47

classification problem uh sorry just I
65:47 - 65:48

modeled this problem as a binary
65:48 - 65:50

classification which means it's either
65:50 - 65:52

zero or one either fail or not fail but
65:52 - 65:54

we can also model it as a multiclass
65:54 - 65:56

classification problem right because as
65:56 - 65:58

as I said earlier just now for the
65:58 - 66:00

Target variable colum which is sorry for
66:00 - 66:03

the failure type colume you actually
66:03 - 66:05

have multiple kinds of failures right
66:05 - 66:08

for example you may have a power failure
66:08 - 66:10

uh you may have a towar failure uh you
66:10 - 66:13

may have a overstrain failure so now we
66:13 - 66:15

can model the problem slightly
66:15 - 66:17

differently so we can model it as a
66:17 - 66:20

multiclass classification problem and
66:20 - 66:21

then we go through the entire same
66:21 - 66:23

process that we went through just now so
66:23 - 66:25

we create different models we test this
66:25 - 66:27

out but now the confusion Matrix is for
66:27 - 66:30

a multiclass classification isue right
66:30 - 66:31

so we're going
66:31 - 66:34

to check them out we're going to again
66:34 - 66:36

uh try different algorithms or models
66:36 - 66:38

again train and test our data set do the
66:38 - 66:40

training test split uh on these
66:40 - 66:42

different models all right so we have
66:42 - 66:43

like for example we have bon random
66:43 - 66:46

Forest B random Forest a great search
66:46 - 66:48

then you train the models using what is
66:48 - 66:50

called hyperparameter tuning then you
66:50 - 66:51

get the scores all right so you get the
66:51 - 66:53

same evaluation scores again you check
66:53 - 66:55

out the evaluation scores compare
66:55 - 66:57

between them generate a confusion Matrix
66:57 - 67:00

so this is a multiclass confusion Matrix
67:00 - 67:02

and then you come to the final
67:02 - 67:06

conclusion so now if you are interested
67:06 - 67:09

to frame your problem domain as a
67:09 - 67:11

multiclass classification problem all
67:11 - 67:14

right then these are the recommendations
67:14 - 67:15

from the data scientist so the data
67:15 - 67:17

scientist will say you know what I'm
67:17 - 67:20

going to pick this particular model the
67:20 - 67:22

balance backing classifier and these are
67:22 - 67:25

all the reasons that the data scientist
67:25 - 67:27

is going to give as a rational for
67:27 - 67:29

selecting this particular
67:29 - 67:32

model and then once that's done you save
67:32 - 67:35

the model and that's that's it that's it
67:35 - 67:39

so that's all done now and so then the
67:39 - 67:41

uh the model the machine learning model
67:41 - 67:44

now you can put it live run it on the
67:44 - 67:45

server and now the machine learning
67:45 - 67:47

model is ready to work which means it's
67:47 - 67:49

ready to generate predictions right
67:49 - 67:50

that's the main job of the machine
67:50 - 67:52

learning model you have picked the best
67:52 - 67:54

machine learning model with the best
67:54 - 67:56

evaluation metrics for whatever accur
67:56 - 67:58

see goal you're trying to achieve and
67:58 - 68:00

now you're going to run it on a server
68:00 - 68:01

and now you're going to get all this
68:01 - 68:03

real time data that's coming from your
68:03 - 68:05

sensus you're going to pump that into
68:05 - 68:06

your machine learning model your machine
68:06 - 68:08

learning model will pump out a whole
68:08 - 68:10

bunch of predictions and we're going to
68:10 - 68:13

use that predictions in real time to
68:13 - 68:15

make real time real world decision
68:15 - 68:18

making right you're going to say okay
68:18 - 68:20

I'm predicting that that machine is
68:20 - 68:23

going to fail on Thursday at 5:00 p.m.
68:23 - 68:26

so you better get your service folks in
68:26 - 68:29

to service it on Thursday 2: p.m. or you
68:29 - 68:32

know whatever so you can you know uh
68:32 - 68:33

make decisions on when you want to do
68:33 - 68:35

your maintenance you know and and make
68:35 - 68:38

the best decisions to optimize the cost
68:38 - 68:41

of Maintenance etc etc and then based on
68:41 - 68:42

the
68:42 - 68:45

results that are coming up from the
68:45 - 68:47

predictions so the predictions may be
68:47 - 68:49

good the predictions may be lousy the
68:49 - 68:51

predictions may be average right so we
68:51 - 68:54

are we're constantly monitoring how good
68:54 - 68:55

or how useful are the predictions
68:55 - 68:58

generated by this realtime model that's
68:58 - 69:00

running on the server and based on our
69:00 - 69:03

monitoring we will then take some new
69:03 - 69:05

data and then repeat this entire life
69:05 - 69:07

cycle again so this is basically a
69:07 - 69:09

workflow that's iterative and we are
69:09 - 69:11

constantly or the data scientist is
69:11 - 69:13

constantly getting in all these new data
69:13 - 69:15

points and then refining the model
69:15 - 69:18

picking maybe a new model deploying the
69:18 - 69:22

new model onto the server and so on all
69:22 - 69:24

right and so that's it so that is
69:24 - 69:26

basically your machine learning workflow
69:26 - 69:29

in a nutshell okay so for this
69:29 - 69:32

particular approach we have used a bunch
69:32 - 69:35

of uh data science libraries from python
69:35 - 69:37

so we have used pandas which is the most
69:37 - 69:39

B basic data science libraries that
69:39 - 69:40

provides all the tools to work with raw
69:40 - 69:43

data we have used numai which is a high
69:43 - 69:44

performance library for implementing
69:44 - 69:46

complex array metrix operations we have
69:46 - 69:50

used met plot lip and cbon which is used
69:50 - 69:52

for doing the Eda the explorat
69:52 - 69:56

exploratory data analysis phase machine
69:56 - 69:57

learning where you visualize all your
69:57 - 69:59

data we have used psyit learn which is
69:59 - 70:01

the machine L learning library to do all
70:01 - 70:03

your implementation for all your call
70:03 - 70:06

machine learning algorithms uh we we we
70:06 - 70:08

have not used this because this is not a
70:08 - 70:11

deep learning uh problem but if you are
70:11 - 70:13

working with a deep learning problem
70:13 - 70:15

like image classification image
70:15 - 70:18

recognition object detection okay
70:18 - 70:20

natural language processing text
70:20 - 70:22

classification well then you're going to
70:22 - 70:24

use these libraries from python which is
70:24 - 70:29

tensor flow okay and also py
70:29 - 70:33

to and then lastly that whole thing that
70:33 - 70:35

whole data science project that you saw
70:35 - 70:37

just now this entire data science
70:37 - 70:39

project is actually developed in
70:39 - 70:41

something called a Jupiter notebook so
70:41 - 70:44

all this python code along with all the
70:44 - 70:46

observations from the data
70:46 - 70:49

scientists okay for this entire data
70:49 - 70:50

science project was actually run in
70:50 - 70:53

something called a Jupiter notebook so
70:53 - 70:56

that is uh the
70:56 - 70:59

most widely used tool for interactively
70:59 - 71:02

developing and presenting data science
71:02 - 71:05

projects okay so that brings me to the
71:05 - 71:07

end of this entire presentation I hope
71:07 - 71:10

that you find it useful for you and that
71:10 - 71:13

you can appreciate the importance of
71:13 - 71:15

machine learning and how it can be
71:15 - 71:20

applied in a real life use case in a
71:20 - 71:23

typical production environment all right
71:23 - 71:27

thank you all so much for watching

Title:: Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook
Description:: more » « less
Video Language:: English
Duration:: 01:11:27

	OEVIDEOS edited English subtitles for Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook
	OEVIDEOS edited English subtitles for Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook
	OEVIDEOS edited English subtitles for Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook
	OEVIDEOS edited English subtitles for Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook
	OEVIDEOS edited English subtitles for Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook
	OEVIDEOS edited English subtitles for Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook
	OEVIDEOS edited English subtitles for Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook
	OEVIDEOS edited English subtitles for Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook

Show all

English subtitles

Revisions Compare revisions

Revision 19 Edited

OEVIDEOS
Revision 18 Edited

OEVIDEOS
Revision 17 Edited

OEVIDEOS
Revision 16 Edited

OEVIDEOS
Revision 15 Edited

OEVIDEOS
Revision 14 Edited

OEVIDEOS
Revision 13 Edited

OEVIDEOS
Revision 12 Edited

OEVIDEOS
Revision 11 Edited

OEVIDEOS
Revision 10 Edited

OEVIDEOS
Revision 9 Edited

OEVIDEOS
Revision 8 Edited

OEVIDEOS
Revision 7 Edited

OEVIDEOS
Revision 6 Edited

OEVIDEOS
Revision 5 Edited

OEVIDEOS
Revision 4 Edited

OEVIDEOS
Revision 3 Edited

OEVIDEOS
Revision 2 Edited

OEVIDEOS
Revision 1 Uploaded

OEVIDEOS

	Revision Number	Author	Created
	19	OEVIDEOS
	18	OEVIDEOS
	17	OEVIDEOS
	16	OEVIDEOS
	15	OEVIDEOS
	14	OEVIDEOS
	13	OEVIDEOS
	12	OEVIDEOS
	11	OEVIDEOS
	10	OEVIDEOS
	9	OEVIDEOS
	8	OEVIDEOS
	7	OEVIDEOS
	6	OEVIDEOS
	5	OEVIDEOS
	4	OEVIDEOS
	3	OEVIDEOS
	2	OEVIDEOS
	1	OEVIDEOS

Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook

Revisions Compare revisions

Our website uses cookies

Operating cookies (Required)