Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook

Rollback to version 17

0:01 - 0:04

Hello everyone, my name is Victor. I'm
0:04 - 0:05

your friendly neighborhood data
0:05 - 0:08

scientist from DreamCatcher. So in this
0:08 - 0:10

presentation, I would like to talk about
0:10 - 0:13

a specific industry use case of AI or
0:13 - 0:15

machine learning which is predictive
0:15 - 0:19

maintenance. So I will be covering these
0:19 - 0:21

topics and feel free to jump forward to
0:21 - 0:23

the specific part in the video where I
0:23 - 0:25

talk about all these topics. So I'm going
0:25 - 0:27

to start off with a general preview of
0:27 - 0:29

AI and machine learning. Then, I'll
0:29 - 0:31

discuss the use case which is predictive
0:31 - 0:33

maintenance. I'll talk about the basics
0:33 - 0:35

of machine learning, the workflow of
0:35 - 0:37

machine learning, and then we will come
0:37 - 0:41

to the meat of this presentation which
0:41 - 0:44

is essentially a demonstration of the
0:44 - 0:45

machine learning workflow from end to
0:45 - 0:48

end on a real life predictive
0:48 - 0:52

maintenance domain problem. All right, so
0:52 - 0:54

without any further ado, let's jump into
0:54 - 0:57

it. So let's start off with a quick
0:57 - 1:00

preview of AI and machine learning. Well
1:00 - 1:04

AI is a very general term, it encompasses
1:04 - 1:07

the entire area of science and
1:07 - 1:09

engineering that is related to creating
1:09 - 1:11

software programs and machines that
1:11 - 1:14

will be capable of performing tasks
1:14 - 1:16

that would normally require human
1:16 - 1:20

intelligence. But AI is a catchall term,
1:20 - 1:23

so really when we talk about apply AI,
1:23 - 1:26

how we use AI in our daily work, we are
1:26 - 1:28

really going to be talking about machine
1:28 - 1:30

learning. So machine learning is the
1:30 - 1:32

design and application of software
1:32 - 1:34

algorithms that are capable of learning
1:34 - 1:38

on their own without any explicit human
1:38 - 1:40

intervention. And the primary purpose of
1:40 - 1:43

these algorithms are to optimize
1:43 - 1:47

performance in a specific task. And the
1:47 - 1:50

primary performance or the primary task
1:50 - 1:52

that you want to optimize performance in
1:52 - 1:54

is to be able to make accurate
1:54 - 1:57

predictions about future outcomes based
1:57 - 2:01

on the analysis of historical data
2:01 - 2:03

from the past. So essentially machine
2:03 - 2:05

learning is about making predictions
2:05 - 2:07

about the future or what we call
2:07 - 2:09

predictive analytics.
2:09 - 2:11

And there are many different
2:11 - 2:13

kinds of algorithms that are available in
2:13 - 2:15

machine learning under the three primary
2:15 - 2:16

categories of supervised learning,
2:16 - 2:19

unsupervised learning, and reinforcement
2:19 - 2:21

learning. And here we can see some of the
2:21 - 2:24

different kinds of algorithms and their
2:24 - 2:27

use cases in various areas in
2:27 - 2:30

industry. So we have various domain use
2:30 - 2:30

cases
2:30 - 2:32

for all these different kind of
2:32 - 2:34

algorithms, and we can see that different
2:34 - 2:38

algorithms are fitted for different use cases.
2:38 - 2:41

Deep learning is an advanced form
2:41 - 2:42

of machine learning that's based on
2:42 - 2:44

something called an artificial neural
2:44 - 2:46

network or ANN for short, and this
2:46 - 2:48

essentially simulates the structure of
2:48 - 2:50

the human brain whereby neurons
2:50 - 2:51

interconnect and work together to
2:51 - 2:55

process and learn new information. So DL
2:55 - 2:57

is the foundational technology for most
2:57 - 2:59

of the popular AI tools that you
2:59 - 3:01

probably have heard of today. So I'm sure
3:01 - 3:03

you have heard of ChatGPT if you haven't
3:03 - 3:05

been living in a cave for the past 2
3:05 - 3:08

years. And yeah, so ChatGPT is an example
3:08 - 3:10

of what we call a large language model
3:10 - 3:12

and that's based on this technology
3:12 - 3:15

called deep learning. Also, all the modern
3:15 - 3:17

computer vision applications where a
3:17 - 3:20

computer program can classify images or
3:20 - 3:23

detect images or recognize images on
3:23 - 3:25

its own, okay, we call this computer
3:25 - 3:28

vision applications. They also use
3:28 - 3:30

this particular form of machine learning
3:30 - 3:32

called deep learning, right? So this is a
3:32 - 3:34

example of an artificial neural network.
3:34 - 3:35

For example, here I have an image of a
3:35 - 3:37

bird that's fed into this artificial
3:37 - 3:40

neural network, and output from this
3:40 - 3:41

artificial neural network is a
3:41 - 3:44

classification of this image into one of
3:44 - 3:46

these three potential categories. So in
3:46 - 3:49

this case, if the ANN has been trained
3:49 - 3:52

properly, we fit in this image, this
3:52 - 3:54

ANN should correctly classify this image
3:54 - 3:57

as a bird, right? So this is a image
3:57 - 3:59

classification problem which is a
3:59 - 4:01

classic use case for an artificial
4:01 - 4:04

neural network in the field of computer
4:04 - 4:08

vision. And just like in the case of
4:08 - 4:09

machine learning, there are a variety of
4:09 - 4:12

algorithms that are available for
4:12 - 4:14

deep learning under the category of
4:14 - 4:15

supervised learning and also
4:15 - 4:17

unsupervised learning.
4:17 - 4:19

All right, so this is how we can
4:19 - 4:21

kind of categorize this. You can think of
4:21 - 4:24

AI is a general area of smart systems
4:24 - 4:27

and machine. Machine learning is
4:27 - 4:29

basically apply AI and deep learning
4:29 - 4:30

is a
4:30 - 4:33

subspecialization of machine learning
4:33 - 4:35

using a particular architecture called
4:35 - 4:39

an artificial neural network.
4:39 - 4:42

And generative AI, so if you talk
4:42 - 4:45

about ChatGPT, okay, Google Gemini,
4:45 - 4:48

Microsoft Copilot, okay, all these
4:48 - 4:50

examples of generative AI, they are
4:50 - 4:52

basically large language models, and they
4:52 - 4:54

are a further subcategory within the
4:54 - 4:55

area of deep
4:55 - 4:58

learning. And there are many applications
4:58 - 4:59

of machine learning in industry right
4:59 - 5:02

now, so pick which particular industry
5:02 - 5:04

are you involved in, and these are all the
5:04 - 5:05

specific areas of
5:05 - 5:10

applications, right? So probably, I'm
5:10 - 5:12

going to guess the vast majority of you
5:12 - 5:13

who are watching this video, you're
5:13 - 5:14

probably coming from the manufacturing
5:14 - 5:17

industry, and so in the manufacturing
5:17 - 5:18

industry some of the standard use cases
5:18 - 5:20

for machine learning and deep learning
5:20 - 5:23

are predicting potential problems, okay?
5:23 - 5:25

So sometimes you call this predictive
5:25 - 5:27

maintenance where you want to predict
5:27 - 5:29

when a problem is going to happen and
5:29 - 5:30

then kind of address it before it
5:30 - 5:33

happens. And then monitoring systems,
5:33 - 5:35

automating your manufacturing assembly
5:35 - 5:38

line or production line, okay, smart
5:38 - 5:40

scheduling, and detecting anomaly on your
5:40 - 5:41

production line.
5:42 - 5:44

Okay, so let's talk about the use
5:44 - 5:46

case here which is predictive
5:46 - 5:49

maintenance, right? So what is predictive
5:49 - 5:52

maintenance? Well predictive maintenance,
5:52 - 5:53

here's the long definition, is a
5:53 - 5:55

equipment maintenance strategy that
5:55 - 5:56

relies on real-time monitoring of
5:56 - 5:58

equipment conditions and data to predict
5:58 - 6:00

equipment failures in advance.
6:00 - 6:03

And this uses advanced data models,
6:03 - 6:05

analytics, and machine learning whereby
6:05 - 6:07

we can reliably assess when failures are
6:07 - 6:09

more likely to occur, including which
6:09 - 6:11

components are more likely to be
6:11 - 6:14

affected on your production or assembly
6:14 - 6:17

line. So where does predictive
6:17 - 6:19

maintenance fit into the overall scheme
6:19 - 6:21

of things, right? So let's talk about the
6:21 - 6:23

kind of standard way that, you know,
6:23 - 6:26

factories or production
6:26 - 6:28

lines, assembly lines in factories tend
6:28 - 6:31

to handle maintenance issues say
6:31 - 6:33

10 or 20 years ago, right? So what you
6:33 - 6:35

have is the, what you would probably
6:35 - 6:36

start off is the most basic mode
6:36 - 6:38

which is reactive maintenance. So you
6:38 - 6:41

just wait until your machine breaks down
6:41 - 6:43

and then you repair, right? The simplest,
6:43 - 6:45

but, of course, I'm sure if you have worked on a
6:45 - 6:47

production line for any period of time,
6:47 - 6:49

you know that this reactive maintenance
6:49 - 6:51

can give you a whole bunch of headaches
6:51 - 6:52

especially if the machine breaks down
6:52 - 6:54

just before a critical delivery deadline,
6:54 - 6:56

right? Then you're going to have a
6:56 - 6:57

backlog of orders and you're going to
6:57 - 6:59

run to a lot of problems. Okay, so we move on
6:59 - 7:01

to preventive maintenance which is
7:01 - 7:04

you regularly schedule a maintenance of
7:04 - 7:07

your production machines to reduce
7:07 - 7:09

the failure rate. So you might do
7:09 - 7:11

maintenance once every month, once every
7:11 - 7:13

two weeks, whatever. Okay, this is great,
7:13 - 7:15

but the problem, of course, then is well
7:15 - 7:16

sometimes you're doing too much
7:16 - 7:18

maintenance, it's not really necessary,
7:18 - 7:21

and it still doesn't totally prevent
7:21 - 7:23

this, you know, a failure of the
7:23 - 7:26

machine that occurs outside of your planned
7:26 - 7:29

maintenance, right? So a bit of an
7:29 - 7:31

improvement, but not that much better.
7:31 - 7:33

And then, these last two categories is
7:33 - 7:35

where we bring in AI and machine
7:35 - 7:37

learning. So with machine learning, we're
7:37 - 7:39

going to use sensors to do real-time
7:39 - 7:42

monitoring of the data, and then using
7:42 - 7:43

that data we're going to build a machine
7:43 - 7:46

learning model which helps us to predict,
7:46 - 7:50

with a reasonable level of accuracy, when
7:50 - 7:53

the next failure is going to happen on
7:53 - 7:54

your assembly or production line on a
7:54 - 7:57

specific component or specific machine,
7:57 - 8:00

right? So you just want to be predict to
8:00 - 8:02

a high level of accuracy like maybe
8:02 - 8:04

to the specific day, even the specific
8:04 - 8:06

hour, or even minute itself when you
8:06 - 8:08

expect that particular product to fail
8:08 - 8:11

or the particular machine to fail. All
8:11 - 8:13

right, so these are the advantages of
8:13 - 8:15

predictive maintenance. It minimizes
8:15 - 8:17

the occurrence of unscheduled downtime, it
8:17 - 8:18

gives you a real-time overview of your
8:18 - 8:20

current condition of assets, ensures
8:20 - 8:23

minimal disruptions to productivity,
8:23 - 8:25

optimizes time you spend on maintenance work,
8:25 - 8:27

optimizes the use of spare parts, and so
8:27 - 8:28

on. And of course there are some
8:28 - 8:31

disadvantages, which is the
8:31 - 8:33

primary one, you need a specialized set
8:33 - 8:36

of skills among your engineers to
8:36 - 8:38

understand and create machine learning
8:38 - 8:41

models that can work on the real-time
8:41 - 8:44

data that you're getting. Okay, so we're
8:44 - 8:45

going to take a look at some real life
8:45 - 8:47

use cases. So these are a bunch of links
8:47 - 8:49

here, so if you navigate to these links
8:49 - 8:50

here, you'll be able to get a look at
8:50 - 8:54

some real life use cases of machine
8:54 - 8:58

learning in predictive maintenance. So
8:58 - 9:01

the IBM website, okay, gives you a look at
9:01 - 9:05

a bunch of five use cases, so you can
9:05 - 9:07

click on these links and follow up with
9:07 - 9:08

them if you want to read more. Okay, this
9:08 - 9:11

is waste management, manufacturing, okay,
9:11 - 9:15

building services, and renewable energy,
9:15 - 9:17

and also mining, right? So these are all
9:17 - 9:18

use cases, if you want to know more about
9:18 - 9:20

them, you can read up and follow them
9:20 - 9:24

from this website. And this website
9:24 - 9:26

gives, this is a pretty good website. I
9:26 - 9:28

would really encourage you to just look
9:28 - 9:29

through this if you're interested in
9:29 - 9:31

predictive maintenance. So here, it tells
9:31 - 9:34

you about, you know, an industry survey of
9:34 - 9:36

predictive maintenance. We can see that a
9:36 - 9:38

large portion of the industry,
9:38 - 9:40

manufacturing industry agreed that
9:40 - 9:41

predictive maintenance is a real need to
9:41 - 9:44

stay competitive and predictive
9:44 - 9:45

maintenance is essential for
9:45 - 9:47

manufacturing industry and will gain
9:47 - 9:48

additional strength in the future. So
9:48 - 9:50

this is a survey that was done quite
9:50 - 9:52

some time ago and this was the results
9:52 - 9:54

that we got back. So we can see the vast
9:54 - 9:56

majority of key industry players in the
9:56 - 9:58

manufacturing sector, they consider
9:58 - 9:59

predictive maintenance to be a very
9:59 - 10:00

important
10:00 - 10:02

activity that they want to
10:02 - 10:05

incorporate into their workflow, right?
10:05 - 10:08

And we can see here the kind of ROI that
10:08 - 10:11

we expect on investment in predictive
10:11 - 10:13

maintenance, so 45% reduction in downtime,
10:13 - 10:17

25% growth in productivity, 75% fault
10:17 - 10:19

elimination, 30% reduction in maintenance
10:19 - 10:23

cost, okay? And best of all, if you really
10:23 - 10:25

want to kind of take a look at examples,
10:25 - 10:27

all right, so there are all these
10:27 - 10:28

different companies that have
10:28 - 10:30

significantly invested in predictive
10:30 - 10:32

maintenance technology in their
10:32 - 10:34

manufacturing processes. So PepsiCo, we
10:34 - 10:39

have got Frito-Lay, General Motors, Mondi, Ecoplant,
10:39 - 10:41

all right? So you can jump over here
10:41 - 10:43

and take a look at some of these
10:43 - 10:46

use cases. Let me perhaps, let me try and
10:46 - 10:48

open this up, for example, Mondi, right? You
10:48 - 10:52

can see Mondi has impl- oops. Mondi has used
10:52 - 10:54

this particular piece of software
10:54 - 10:56

called MATLAB, all right, or MathWorks
10:56 - 11:00

sorry, to do predictive maintenance
11:00 - 11:02

for their manufacturing processes using
11:02 - 11:05

machine learning. And we can talk, you can
11:05 - 11:08

study how they have used it, all right,
11:08 - 11:09

and how it works, what was their
11:09 - 11:11

challenge, all right, the problems they
11:11 - 11:13

were facing, the solution that they use
11:13 - 11:15

using this MathWorks Consulting piece of
11:15 - 11:17

software, and data that they collected in
11:17 - 11:20

a MATLAB database, all right, sorry
11:20 - 11:24

in a Oracle database.
11:24 - 11:26

So using MathWorks from MATLAB, all
11:26 - 11:28

right, they were able to create a deep
11:28 - 11:31

learning model to, you know, to
11:31 - 11:33

solve this particular issue for their
11:33 - 11:36

domain. So if you're interested, please, I
11:36 - 11:38

strongly encourage you to read up on all
11:38 - 11:40

these real life customer stories with
11:40 - 11:43

showcase use cases for predictive
11:43 - 11:48

maintenance. Okay, so that's it for
11:48 - 11:52

real life use cases for predictive maintenance.
11:54 - 11:57

Now in this topic, I'm
11:57 - 11:58

going to talk about machine learning
11:58 - 12:00

basics, so what is actually involved
12:00 - 12:01

in machine learning, and I'm going to
12:01 - 12:04

give a very quick, fast, conceptual, high
12:04 - 12:06

level overview of machine learning, all
12:06 - 12:09

right? So there are several categories of
12:09 - 12:11

machine learning, supervised, unsupervised,
12:11 - 12:13

semi-supervised, reinforcement, and deep
12:13 - 12:16

learning, okay? And let's talk about the
12:16 - 12:19

most common and widely used category of
12:19 - 12:21

machine learning which is called
12:21 - 12:25

supervised learning. So the particular use
12:25 - 12:26

case here that I'm going to be
12:26 - 12:29

discussing, predictive maintenance, it's
12:29 - 12:31

basically a form of supervised learning.
12:31 - 12:33

So how does supervised learning work?
12:33 - 12:35

Well in supervised learning, you're going
12:35 - 12:37

to create a machine learning model by
12:37 - 12:39

providing what is called a labelled data
12:39 - 12:42

set as a input to a machine learning
12:42 - 12:45

program or algorithm. And this dataset
12:45 - 12:46

is going to contain what is called an
12:46 - 12:49

independent or feature variables, all
12:49 - 12:51

right, so this will be a set of variables.
12:51 - 12:53

And there will be one dependent or
12:53 - 12:55

target variable which we also call the
12:55 - 12:58

label, and the idea is that the
12:58 - 13:00

independent or the feature variables are
13:00 - 13:02

the attributes or properties of your
13:02 - 13:04

dataset that influence the dependent or
13:04 - 13:08

the target variable, okay? So this process
13:08 - 13:09

that I've just described is called
13:09 - 13:12

training the machine learning model, and
13:12 - 13:14

the model is fundamentally a
13:14 - 13:16

mathematical function that best
13:16 - 13:18

approximates the relationship between
13:18 - 13:21

the independent variables and the
13:21 - 13:23

dependent variable. All right, so that's
13:23 - 13:24

quite a bit of a mouthful, so let's jump
13:24 - 13:26

into a diagram that maybe illustrates
13:26 - 13:28

this more clearly. So let's say you have
13:28 - 13:30

a dataset here, an Excel spreadsheet,
13:30 - 13:32

right? And this Excel spreadsheet has a
13:32 - 13:34

bunch of columns here and a bunch of
13:34 - 13:37

rows, okay? So these rows here represent
13:37 - 13:39

observations, or these rows are what
13:39 - 13:41

we call observations or samples or data
13:41 - 13:43

points in our dataset, okay? So let's
13:43 - 13:47

assume this dataset is gathered by a
13:47 - 13:50

marketing manager at a mall, at a retail
13:50 - 13:52

mall, all right? So they've got all this
13:52 - 13:55

information about the customers who
13:55 - 13:57

purchase products at this mall, all right?
13:57 - 13:59

So some of the information they've
13:59 - 14:00

gotten about the customers are their
14:00 - 14:02

gender, their age, their income, and the
14:02 - 14:04

number of children. So all this
14:04 - 14:06

information about the customers, we call
14:06 - 14:07

this the independent or the feature
14:07 - 14:10

variables, all right? And based on all
14:10 - 14:13

this information about the customer, we
14:13 - 14:16

also managed to get some or we record
14:16 - 14:18

the information about how much the
14:18 - 14:20

customer spends, all right? So this
14:20 - 14:22

information or these numbers here, we call
14:22 - 14:24

this the target variable or the
14:24 - 14:27

dependent variable, right? So on the
14:27 - 14:30

single row, the data point, one single sample, one
14:30 - 14:33

single data point, contains all the data
14:33 - 14:35

for the feature variables and one single
14:35 - 14:38

value for the label or the target
14:38 - 14:41

variable, okay? And the primary purpose of
14:41 - 14:43

the machine learning model is to create
14:43 - 14:46

a mapping from all your feature
14:46 - 14:48

variables to your target variable, so
14:48 - 14:51

somehow there's going to be a function,
14:51 - 14:52

okay, this will be a mathematical
14:52 - 14:55

function that maps all the values of
14:55 - 14:57

your feature variable to the value of
14:57 - 15:00

your target variable. In other words, this
15:00 - 15:01

function represents the relationship
15:01 - 15:03

between your feature variables and your
15:03 - 15:07

target variable, okay? So this whole thing,
15:07 - 15:09

this training process, we call this the
15:09 - 15:11

fitting the model. And the target
15:11 - 15:13

variable or the label, this thing here,
15:13 - 15:15

this column here, or the values here,
15:15 - 15:17

these are critical for providing a
15:17 - 15:19

context to do the fitting or the
15:19 - 15:21

training of the model. And once you've
15:21 - 15:23

got a trained and fitted model, you can
15:23 - 15:26

then use the model to make an accurate
15:26 - 15:28

prediction of target values
15:28 - 15:30

corresponding to new feature values that
15:30 - 15:33

the model has yet to encounter or yet to
15:33 - 15:35

see, and this, as I've already said
15:35 - 15:36

earlier, this is called predictive
15:36 - 15:38

analytics, okay? So let's see what's
15:38 - 15:40

actually happening here, you take your
15:40 - 15:43

training data, all right, so this is this
15:43 - 15:45

whole bunch of data, this dataset here
15:45 - 15:47

consisting of a thousand rows of
15:47 - 15:50

data, 10,000 rows of data, you take this
15:50 - 15:52

entire dataset, all right, this entire
15:52 - 15:54

dataset, you jam it into your machine
15:54 - 15:57

learning algorithm, and a couple of hours
15:57 - 15:58

later your machine learning algorithm
15:58 - 16:01

comes up with a model. And the model is
16:01 - 16:04

essentially a function that maps all
16:04 - 16:06

your feature variables which is these
16:06 - 16:08

four columns here, to your target
16:08 - 16:10

variable which is this one single column
16:10 - 16:14

here, okay? So once you have the model, you
16:14 - 16:17

can put in a new data point. So basically
16:17 - 16:19

the new data point represents data about a
16:19 - 16:21

new customer, a new customer that you
16:21 - 16:23

have never seen before. So let's say
16:23 - 16:25

you've already got information about
16:25 - 16:28

10,000 customers that have visited this
16:28 - 16:30

mall and how much each of these 10,000
16:30 - 16:32

customers have spent when they are at this
16:32 - 16:34

mall. So now you have a totally new
16:34 - 16:36

customer that comes in the mall, this
16:36 - 16:38

customer has never come into this mall
16:38 - 16:40

before, and what we know about this
16:40 - 16:43

customer is that he is a male, the age is
16:43 - 16:45

50, the income is 18, and they have nine
16:45 - 16:48

children. So now when you take this data
16:48 - 16:51

and you pump that into your model, your
16:51 - 16:53

model is going to make a prediction, it's
16:53 - 16:56

going to say, hey, you know what? Based on
16:56 - 16:57

everything that I have been trained before
16:57 - 16:59

and based on the model I've developed,
16:59 - 17:02

I am going to predict that a customer
17:02 - 17:05

that is of a male gender, of the age 50
17:05 - 17:08

with the income of 18, and nine children,
17:08 - 17:12

that customer is going to spend 25 ringgit
17:12 - 17:16

at the mall. And this is it, this is what
17:16 - 17:19

you want. Right there, right here,
17:19 - 17:21

can you see here? That is the final
17:21 - 17:23

output of your machine learning model.
17:23 - 17:27

It's going to make a prediction about
17:27 - 17:30

something that it has not ever seen
17:30 - 17:33

before, okay? That is the core, this is
17:33 - 17:36

essentially the core of machine learning.
17:36 - 17:39

Predictive analytics, making prediction
17:39 - 17:40

about the future
17:41 - 17:44

based on a historical dataset.
17:44 - 17:47

Okay, so there are two areas of
17:47 - 17:49

supervised learning, regression and
17:49 - 17:51

classification. So regression is used to
17:51 - 17:53

predict a numerical target variable, such
17:53 - 17:55

as the price of a house or the salary of
17:55 - 17:58

an employee, whereas classification is
17:58 - 18:00

used to predict a categorical target
18:00 - 18:04

variable or class label, okay? So for
18:04 - 18:06

classification you can have either
18:06 - 18:09

binary or multiclass, so, for example,
18:09 - 18:12

binary will be just true or false, zero
18:12 - 18:15

or one. So whether your machine is going
18:15 - 18:17

to fail or is it not going to fail, right?
18:17 - 18:19

So just two classes, two possible,
18:19 - 18:22

outcomes, or is the customer going to
18:22 - 18:24

make a purchase or is the customer not
18:24 - 18:26

going to make a purchase. We call this
18:26 - 18:28

binary classification. And then for
18:28 - 18:30

multiclass, when there are more than two
18:30 - 18:33

classes or types of values. So, for
18:33 - 18:34

example, here this would be a
18:34 - 18:36

classification problem. So if you have a
18:36 - 18:38

dataset here, you've got information
18:38 - 18:39

about your customers, you've got your
18:39 - 18:41

gender of the customer, the age of the
18:41 - 18:43

customer, the salary of the customer, and
18:43 - 18:45

you also have record about whether the
18:45 - 18:48

customer made a purchase or not, okay? So
18:48 - 18:50

you can take this dataset to train a
18:50 - 18:52

classification model, and then the
18:52 - 18:54

classification model can then make a
18:54 - 18:56

prediction about a new customer, and
18:56 - 18:59

they're going to predict zero which
18:59 - 19:00

means the customer didn't make a
19:00 - 19:03

purchase or one which means the customer
19:03 - 19:06

make a purchase, right? And regression,
19:06 - 19:09

this is regression, so let's say you want
19:09 - 19:11

to predict the wind speed, and you've got
19:11 - 19:14

historical data about all these four
19:14 - 19:17

other independent variables or feature
19:17 - 19:18

variables, so you have recorded
19:18 - 19:20

temperature, the pressure, the relative
19:20 - 19:22

humidity, and the wind direction for the
19:22 - 19:25

past 10 days, 15 days, or whatever, okay? So
19:25 - 19:27

now you are going to train your machine
19:27 - 19:29

learning model using this dataset, and
19:29 - 19:32

the target variable column, okay, this
19:32 - 19:34

column here, the label is basically a
19:34 - 19:37

number, right? So now with this number,
19:37 - 19:40

this is a regression model, and so now
19:40 - 19:42

you can put in a new data point, so a new
19:42 - 19:45

data point means a new set of values for
19:45 - 19:47

temperature, pressure, relative humidity,
19:47 - 19:49

and wind direction, and your machine
19:49 - 19:51

learning model will then predict the
19:51 - 19:54

wind speed for that new data point, okay?
19:54 - 19:57

So that's a regression model.
19:59 - 20:02

All right. So in this particular topic
20:02 - 20:05

I'm going to talk about the workflow of
20:05 - 20:08

that's involved in machine learning. So
20:08 - 20:13

in the previous slides, I talked about
20:13 - 20:15

developing the model, all right? But
20:15 - 20:16

that's just one part of the entire
20:16 - 20:19

workflow. So in real life when you use
20:19 - 20:20

machine learning, there's an end-to-end
20:20 - 20:22

workflow that's involved. So the first
20:22 - 20:24

thing, of course, is you need to get your
20:24 - 20:27

data, and then you need to clean your
20:27 - 20:29

data, and then you need to explore your
20:29 - 20:31

data. You need to see what's going on in
20:31 - 20:33

your dataset, right? And your dataset,
20:33 - 20:36

real life datasets are not trivial, they
20:36 - 20:39

are hundreds of rows, thousands of rows,
20:39 - 20:41

sometimes millions of rows, billions of
20:41 - 20:43

rows, we're talking about billions or
20:43 - 20:45

millions of data points especially if
20:45 - 20:47

you're using an IoT sensor to get data
20:47 - 20:49

in real-time. So you've got all these
20:49 - 20:51

super large datasets, you need to clean
20:51 - 20:53

them, and explore them, and then you need
20:53 - 20:56

to prepare them into a right format so
20:56 - 21:00

that you can put them into the training
21:00 - 21:02

process to create your machine learning
21:02 - 21:05

model, and then subsequently you check
21:05 - 21:08

how good is the model, right? How accurate
21:08 - 21:10

is the model in terms of its ability to
21:10 - 21:13

generate predictions for the
21:13 - 21:15

future, right? How accurate are the
21:15 - 21:17

predictions that are coming up from your
21:17 - 21:18

machine learning model. So that's
21:18 - 21:21

validating or evaluating your model, and
21:21 - 21:23

then subsequently if you determine that
21:23 - 21:25

your model is of adequate accuracy to
21:25 - 21:27

meet whatever your domain use case
21:27 - 21:29

requirements are, right? So let's say the
21:29 - 21:31

accuracy that's required for your domain
21:31 - 21:32

use case is
21:32 - 21:35

85%, okay? If my machine learning model
21:35 - 21:39

can give an 85% accuracy rate, I think
21:39 - 21:40

it's good enough, then I'm going to
21:40 - 21:43

deploy it into real world use case. So
21:43 - 21:45

here the machine learning model gets
21:45 - 21:48

deployed on the server, and then other,
21:48 - 21:51

you know, other data sources are going to
21:51 - 21:53

be captured from somewhere. That data is
21:53 - 21:54

pump into the machine learning model. The
21:54 - 21:55

machine learning model generates
21:55 - 21:58

predictions, and those predictions are
21:58 - 22:00

then used to make decisions on the
22:00 - 22:02

factory floor in real-time or in any
22:02 - 22:05

other particular scenario. And then you
22:05 - 22:07

constantly monitor and update the model,
22:07 - 22:09

you get more new data, and then the
22:09 - 22:12

entire cycle repeats itself. So that's
22:12 - 22:14

your machine learning workflow, okay, in a
22:14 - 22:17

nutshell. Here's another example of
22:17 - 22:19

the same thing maybe in a slightly
22:19 - 22:20

different format, so, again, you have your
22:20 - 22:22

data collection and preparation. Here we
22:22 - 22:24

talk more about the different kinds of
22:24 - 22:27

algorithms that available to create a
22:27 - 22:28

model, and I'll talk about this more in
22:28 - 22:30

detail when we look at the real world
22:30 - 22:32

example of a end-to-end machine learning
22:32 - 22:35

workflow for the predictive maintenance
22:35 - 22:37

use case. So once you have chosen the
22:37 - 22:39

appropriate algorithm, you then have
22:39 - 22:41

trained your model, you then have
22:41 - 22:44

selected the appropriate train model
22:44 - 22:46

among the multiple models. You are
22:46 - 22:48

probably going to develop multiple
22:48 - 22:50

models from multiple algorithms, you're
22:50 - 22:52

going to evaluate them all, and then
22:52 - 22:53

you're going to say, hey, you know what?
22:53 - 22:55

After I've evaluated and tested that,
22:55 - 22:57

I've chosen the best model, I'm going to
22:57 - 23:00

deploy the model, all right, so this is
23:00 - 23:03

for real life production use, okay? Real
23:03 - 23:04

life sensor data is going to be pumped
23:04 - 23:06

into my model, my model is going to
23:06 - 23:08

generate predictions, the predicted data
23:08 - 23:10

is going to used immediately in real
23:10 - 23:13

time for real life decision making, and
23:13 - 23:15

then I'm going to monitor, right, the
23:15 - 23:17

results. So somebody's using the
23:17 - 23:19

predictions from my model, if the
23:19 - 23:22

predictions are lousy, that goes into the
23:22 - 23:23

monitoring, the monitoring system
23:23 - 23:25

captures that. If the predictions are
23:25 - 23:28

fantastic, well that is also captured by the
23:28 - 23:30

monitoring system, and that gets
23:30 - 23:32

feedback again to the next cycle of my
23:32 - 23:34

machine learning
23:34 - 23:36

pipeline. Okay, so that's the kind of
23:36 - 23:38

overall view, and here are the kind of
23:38 - 23:42

key phases of your workflow. So one of
23:42 - 23:44

the important phases is called EDA,
23:44 - 23:48

exploratory data analysis and in this
23:48 - 23:50

particular phase, you're going to
23:50 - 23:53

do a lot of stuff, primarily just to
23:53 - 23:55

understand your dataset. So like I said,
23:55 - 23:57

real life datasets, they tend to be very
23:57 - 23:59

complex, and they tend to have various
23:59 - 24:01

statistical properties, all right,
24:01 - 24:03

statistics is a very important component
24:03 - 24:06

of machine learning. So an EDA helps you
24:06 - 24:07

to kind of get an overview of your data
24:07 - 24:10

set, get an overview of any problems in
24:10 - 24:12

your dataset like any data that's
24:12 - 24:13

missing, the statistical properties of your
24:13 - 24:15

dataset, the distribution of your data
24:15 - 24:17

set, the statistical correlation of
24:17 - 24:19

variables in your data set, etc,
24:19 - 24:23

etc. Okay, then we have data cleaning or
24:23 - 24:25

sometimes you call it data cleansing, and
24:25 - 24:28

in this phase what you want to do is
24:28 - 24:29

primarily, you want to kind of do things
24:29 - 24:32

like remove duplicate records or rows in
24:32 - 24:34

your table, you want to make sure that
24:34 - 24:37

your data or your data
24:37 - 24:39

points or your samples have appropriate IDs,
24:39 - 24:41

and most importantly, you want to make
24:41 - 24:43

sure there's not too many missing values
24:43 - 24:45

in your dataset. So what I mean by
24:45 - 24:46

missing values are things like that,
24:46 - 24:48

right? You have got a dataset, and for
24:48 - 24:52

some reason there are some cells or
24:52 - 24:55

locations in your dataset which are
24:55 - 24:57

missing values, right? And if you have a
24:57 - 24:59

lot of these missing values, then you've
24:59 - 25:00

got a poor quality dataset, and you're
25:00 - 25:02

not going to be able to build a good
25:02 - 25:04

model from this dataset. You're not
25:04 - 25:06

going to be able to train a good machine
25:06 - 25:08

learning model from a dataset with a
25:08 - 25:10

lot of missing values like this. So you
25:10 - 25:12

have to figure out whether there are a
25:12 - 25:13

lot of missing values in your dataset,
25:13 - 25:15

how do you handle them. Another thing
25:15 - 25:17

that's important in data cleansing is
25:17 - 25:19

figuring out the outliers in your data
25:19 - 25:22

set. So outliers are things like this,
25:22 - 25:24

you know, data points that are very far from
25:24 - 25:26

the general trend of data points in your
25:26 - 25:30

data set, right? And so there are also
25:30 - 25:32

several ways to detect outliers in your
25:32 - 25:34

dataset, and there are several ways to
25:34 - 25:37

handle outliers in your dataset.
25:37 - 25:38

Similarly as well, there are several ways
25:38 - 25:40

to handle missing values in your data
25:40 - 25:43

set. So handling missing values, handling
25:43 - 25:46

outliers, those are really two very key
25:46 - 25:47

importance of data
25:47 - 25:49

cleansing, and there are many, many
25:49 - 25:51

techniques to handle this, so a data
25:51 - 25:52

scientist needs to be acquainted with
25:52 - 25:55

all of this. All right, why do I need to
25:55 - 25:58

do data cleansing? Well, here is the key
25:58 - 25:59

point.
25:59 - 26:03

If you have a very poor quality dataset,
26:03 - 26:05

which means you've got a lot of outliers
26:05 - 26:07

which are errors in your dataset, or you
26:07 - 26:08

got a lot of missing values in your data
26:08 - 26:11

set, even though you've got a fantastic
26:11 - 26:13

algorithm, you've got a fantastic model,
26:13 - 26:16

the predictions that your model is going
26:16 - 26:19

to give is absolutely rubbish. It's kind
26:19 - 26:22

of like taking water and putting water
26:22 - 26:26

into the tank of a Mercedes-Benz. So
26:26 - 26:28

Mercedes-Benz is a great car, but if you
26:28 - 26:30

take water and put it into your
26:30 - 26:33

Mercedes-Benz, it will just die, right? Your
26:33 - 26:37

car will just die, it can't run on water,
26:37 - 26:38

right? On the other hand, if you have a
26:38 - 26:42

Myvi, Myvi is just a lousy, shit car, but if
26:42 - 26:45

you take a high octane, good petrol and
26:45 - 26:47

you put into a Myvi, the Myvi will just go at,
26:47 - 26:49

you know, 100 miles an hour. It would just
26:49 - 26:51

completely destroy the Mercedes-Benz in
26:51 - 26:53

terms of performance, so it
26:53 - 26:55

doesn't really matter what model you're
26:55 - 26:57

using here, right? So you can be using the most
26:57 - 26:59

fantastic model like the
26:59 - 27:01

Mercedes-Benz or machine learning, but if
27:01 - 27:03

your data is lousy quality, your
27:03 - 27:06

predictions is also going to be rubbish,
27:06 - 27:10

okay? So cleansing dataset is, in fact,
27:10 - 27:12

probably the most important thing that
27:12 - 27:14

data scientists need to do and that's
27:14 - 27:16

what they spend most of the time doing,
27:16 - 27:18

right, building the model, training the
27:18 - 27:20

model, getting the right algorithms, and
27:20 - 27:23

so on, that's really a small portion of
27:23 - 27:25

the actual machine learning workflow,
27:25 - 27:27

right? The actual machine learning
27:27 - 27:30

workflow, the vast majority of time is on
27:30 - 27:32

cleaning and organizing your
27:32 - 27:33

data. Then you have something called
27:33 - 27:35

feature engineering which is you
27:35 - 27:37

preprocess the feature variables of
27:37 - 27:39

your original dataset prior to using
27:39 - 27:41

them to train the model, and this is
27:41 - 27:42

either through addition, deletion,
27:42 - 27:44

combination, or transformation of these
27:44 - 27:45

variables. And then the idea is you want
27:45 - 27:47

to improve the predictive accuracy of
27:47 - 27:49

the model, and also because some models
27:49 - 27:51

can only work with numeric data, so you
27:51 - 27:54

need to transform categorical data into
27:54 - 27:57

numeric data. All right, so just now, in
27:57 - 27:59

the earlier slides, I showed you that you
27:59 - 28:01

take your original dataset, you pump it
28:01 - 28:03

into algorithm, and then a couple of hours
28:03 - 28:05

later, you get a machine learning model,
28:05 - 28:09

right? So you didn't do anything to your
28:09 - 28:10

dataset, to the feature variables in
28:10 - 28:12

your dataset before you pump it into a
28:12 - 28:14

machine learning algorithm. So
28:14 - 28:16

what I showed you earlier is you just
28:16 - 28:19

take the dataset exactly as it is and
28:19 - 28:21

you just pump it into the algorithm,
28:21 - 28:23

couple of hours later, you get a model,
28:23 - 28:28

right? But that's not what generally
28:28 - 28:30

happens in in real life. In real life,
28:30 - 28:32

you're going to take all the original
28:32 - 28:34

feature variables from your dataset and
28:34 - 28:37

you're going to transform them in some
28:37 - 28:39

way. So you can see here these are the
28:39 - 28:42

columns of data from my original dataset,
28:42 - 28:46

and before I actually put all these data
28:46 - 28:48

points from my original dataset into my
28:48 - 28:51

algorithm to train and get my model, I
28:51 - 28:55

will actually transform them, okay? So the
28:55 - 28:58

transformation of these feature variable
28:58 - 29:01

values, we call this feature engineering.
29:01 - 29:02

And there are many, many techniques to do
29:02 - 29:05

feature engineering, so one-hot encoding,
29:05 - 29:08

scaling, log transformation,
29:08 - 29:10

discretization, date extraction, boolean
29:10 - 29:12

logic, etc, etc.
29:12 - 29:15

Okay, then finally we do something
29:15 - 29:17

called a train-test split, so where we
29:17 - 29:19

take our original dataset, right? So this
29:19 - 29:21

was the original dataset, and we break
29:21 - 29:24

it into two parts, so one is called the
29:24 - 29:26

training dataset and the other is
29:26 - 29:28

called the test dataset. And the primary
29:28 - 29:30

purpose for this is when we feed and
29:30 - 29:31

train the machine learning model, we're
29:31 - 29:33

going to use what is called the training
29:33 - 29:36

dataset, and when we want to evaluate
29:36 - 29:37

the accuracy of the model, right? So this
29:37 - 29:41

is the key part of your machine learning
29:41 - 29:44

life cycle because you are not only just
29:44 - 29:45

going to have one possible models
29:45 - 29:48

because there are a vast range of
29:48 - 29:50

algorithms that you can use to create a
29:50 - 29:53

model. So fundamentally you have a wide
29:53 - 29:56

range of choices, right, like wide range
29:56 - 29:58

of cars, right? You want to buy a car, you
29:58 - 30:01

can buy a Myvi, you can buy a Perodua,
30:01 - 30:03

you can buy a Honda, you can buy a
30:03 - 30:05

Mercedes-Benz, you can buy a Audi, you can
30:05 - 30:08

buy a beamer, many, many different cars
30:08 - 30:09

that available for you if you want
30:09 - 30:12

to buy a car, right? Same thing. With a
30:12 - 30:14

machine learning model there are a vast
30:14 - 30:17

variety of algorithms that you can
30:17 - 30:19

choose from in order to create a model,
30:19 - 30:22

and so once you create a model from a
30:22 - 30:24

given algorithm you need to say, hey, how
30:24 - 30:26

accurate is this model that I've created
30:26 - 30:29

from this algorithm. And different
30:29 - 30:30

algorithms are going to create different
30:30 - 30:34

models with different rates of accuracy.
30:34 - 30:36

And so the primary purpose of the test
30:36 - 30:38

dataset is to evaluate the accuracy
30:38 - 30:41

of the model to see hey, is this model
30:41 - 30:43

that I've created using this algorithm,
30:43 - 30:46

is it adequate for me to use in a real
30:46 - 30:49

life production use case? Okay? So that's
30:49 - 30:52

what it's all about. Okay, so this is my
30:52 - 30:54

original dataset, I break it into my
30:54 - 30:57

feature dataset and
30:57 - 30:59

also my target variable column, so my
30:59 - 31:01

feature variable columns, the target
31:01 - 31:02

variable columns, and then I further break
31:02 - 31:04

it into a training dataset and a test
31:04 - 31:07

dataset. The training dataset is to use
31:07 - 31:08

to train, to create the machine learning
31:08 - 31:10

model. And then once the machine learning
31:10 - 31:12

model is created, I then use the test
31:12 - 31:15

dataset to evaluate the accuracy of the
31:15 - 31:17

machine learning model.
31:17 - 31:21

All right. And then finally we can
31:21 - 31:23

see what are the different parts or
31:23 - 31:26

aspects that go into a successful model,
31:26 - 31:30

so EDA about 10%, data cleansing about
31:30 - 31:32

20%, feature engineering about
31:32 - 31:36

25%, selecting a specific algorithm about
31:36 - 31:39

10%, and then training the model from
31:39 - 31:42

that algorithm about 15%, and then
31:42 - 31:44

finally evaluating the model, deciding
31:44 - 31:46

which is the best model with the highest
31:46 - 31:52

accuracy rate, that's about 20%.
31:54 - 31:57

All right, so we have reached the
31:57 - 31:59

most interesting part of this
31:59 - 32:01

presentation which is the demonstration
32:01 - 32:04

of an end-to-end machine learning workflow
32:04 - 32:06

on a real life dataset that
32:06 - 32:10

demonstrates the use case of predictive
32:10 - 32:14

maintenance. So for the dataset for
32:14 - 32:16

this particular use case, I've used a
32:16 - 32:19

dataset from Kaggle. So for those of you
32:19 - 32:21

are not aware of this, Kaggle is the
32:21 - 32:25

world's largest open-source community
32:25 - 32:28

for data science and AI, and they have a
32:28 - 32:31

large collection of datasets from all
32:31 - 32:34

various areas of industry and human
32:34 - 32:37

endeavor, and they also have a large
32:37 - 32:39

collection of models that have been
32:39 - 32:43

developed using these datasets. So here
32:43 - 32:47

we have a dataset for the particular
32:47 - 32:51

use case, predictive maintenance, okay? So
32:51 - 32:53

this is some information about the data
32:53 - 32:56

set, so in case you do not know how
32:56 - 32:59

to get to there, this is the URL to click
32:59 - 33:02

on, okay, to get to that dataset. So once
33:02 - 33:05

your at the dataset here, you can- or the
33:05 - 33:07

page for about this dataset, you can see
33:07 - 33:10

all the information about this dataset,
33:10 - 33:13

and you can download the dataset in a
33:13 - 33:14

CSV format.
33:14 - 33:16

Okay, so let's take a look at the
33:16 - 33:20

dataset. So this dataset has a total of
33:20 - 33:23

10,000 samples, okay? And these are the
33:23 - 33:26

feature variables, the type, the product
33:26 - 33:28

ID, the air temperature, process
33:28 - 33:31

temperature, rotational speed, torque, tool
33:31 - 33:35

wear, and this is the target variable,
33:35 - 33:37

all right? So the target variable is what
33:37 - 33:38

we are interested in, what we are
33:38 - 33:41

interested in using to train the machine
33:41 - 33:43

learning model, and also what we are
33:43 - 33:45

interested to predict, okay? So these are
33:45 - 33:48

the feature variables, they describe or
33:48 - 33:50

they provide information about this
33:50 - 33:53

particular machine on the production
33:53 - 33:55

line, on the assembly line, so you might
33:55 - 33:57

know the product ID, the type, the air
33:57 - 33:58

temperature, process temperature,
33:58 - 34:00

rotational speed, torque, tool wear, right? So
34:00 - 34:03

let's say you've got a IoT sensor system
34:03 - 34:06

that's basically capturing all this data
34:06 - 34:08

about a product or a machine on your
34:08 - 34:11

production or assembly line, okay? And
34:11 - 34:14

you've also captured information about
34:14 - 34:17

whether is for a specific sample,
34:17 - 34:20

whether that sample experience a
34:20 - 34:23

failure or not, okay? So the target value
34:23 - 34:26

of zero, okay, indicates that there's no
34:26 - 34:28

failure. So zero means no failure, and we
34:28 - 34:30

can see that the vast majority of data
34:30 - 34:33

points in this dataset are no failure.
34:33 - 34:34

And here we can see an example here
34:34 - 34:37

where you have a case of a failure, so a
34:37 - 34:40

failure is marked as a one, positive, and
34:40 - 34:43

no failure is marked as zero, negative,
34:43 - 34:45

all right? So here we have one type of a
34:45 - 34:47

failure, it's called a power failure. And
34:47 - 34:49

if you scroll down the dataset, you see
34:49 - 34:50

there are also other kinds of failures
34:50 - 34:53

like a tool wear
34:53 - 34:57

failure, we have a overstrain failure
34:57 - 34:59

here, for example,
34:59 - 35:01

we also have a power failure again,
35:01 - 35:02

and so on. So if you scroll down through
35:02 - 35:04

these 10,000 data points, or if
35:04 - 35:06

you're familiar with using Excel to
35:06 - 35:09

filter out values in a column, you can
35:09 - 35:12

see that in this particular column here
35:12 - 35:14

which is the so-called target variable
35:14 - 35:17

column, you are going to have the vast
35:17 - 35:19

majority of values as zero which means
35:19 - 35:23

no failure, and some of the rows or the
35:23 - 35:24

data points you are going to have a
35:24 - 35:26

value of one, and for those rows that you
35:26 - 35:28

have a value of one, for example,
35:28 - 35:31

here you are- Sorry, for example, here you
35:31 - 35:33

are going to have different types of
35:33 - 35:35

failures, so like I said just now power
35:35 - 35:39

failure, tool set failure, etc, etc. So we are
35:39 - 35:41

going to go through the entire machine
35:41 - 35:44

learning workflow process with this dataset.
35:44 - 35:47

So to see an example of that, we are
35:47 - 35:50

going to use a- we're going to go to the
35:50 - 35:52

code section here, all right, so if I
35:52 - 35:54

click on the code section here. And right
35:54 - 35:56

down here we have see what is called a
35:56 - 35:59

dataset notebook. So this is basically a
35:59 - 36:02

Jupyter notebook. Jupyter is basically an
36:02 - 36:05

Python application which allows you to
36:05 - 36:09

create a Python machine learning
36:09 - 36:12

program that basically builds your
36:12 - 36:15

machine learning model, assesses or
36:15 - 36:16

evaluates its accuracy, and generates
36:16 - 36:19

predictions from it, okay? So here we have
36:19 - 36:22

a whole bunch of Jupyter notebooks that
36:22 - 36:25

are available, and you can select any one
36:25 - 36:26

of them. All these notebooks are
36:26 - 36:29

essentially going to process the data
36:29 - 36:32

from this particular dataset. So if I go
36:32 - 36:35

to this code page here, I've actually
36:35 - 36:37

selected a specific notebook that I'm
36:37 - 36:40

going to run through to demonstrate an
36:40 - 36:43

end-to-end machine learning workflow using
36:43 - 36:46

various machine learning libraries from
36:46 - 36:50

the Python programming language, okay? So
36:50 - 36:52

the particular notebook I'm going to
36:52 - 36:55

use is this particular notebook here, and
36:55 - 36:57

you can also get the URL for that
36:57 - 37:00

particular notebook from here.
37:00 - 37:04

Okay, so let's quickly do a quick
37:04 - 37:06

revision again. What are we trying to do
37:06 - 37:08

here? We're trying to build a machine
37:08 - 37:11

learning classification model, right? So
37:11 - 37:13

we said there are two primary areas of
37:13 - 37:15

supervised learning, one is regression
37:15 - 37:16

which is used to predict a numerical
37:16 - 37:19

target variable, and the second kind of
37:19 - 37:21

supervised learning is classification
37:21 - 37:23

which is what we're doing here. We're
37:23 - 37:26

trying to predict a categorical target
37:26 - 37:30

variable, okay? So in this particular
37:30 - 37:32

example, we actually have two kinds of
37:32 - 37:34

ways we can classify, either a binary
37:34 - 37:37

classification or a multiclass
37:37 - 37:40

classification. So for binary
37:40 - 37:41

classification, we are only going to
37:41 - 37:43

classify the product or machine as
37:43 - 37:47

either it failed or it did not fail, okay?
37:47 - 37:49

So if we go back to the dataset that I
37:49 - 37:51

showed you just now, if you look at this
37:51 - 37:53

target variable column, there are only
37:53 - 37:55

two possible values here. They are either
37:55 - 37:58

zero or one. Zero means there's no failure.
37:58 - 38:01

One means there's a failure, okay? So this
38:01 - 38:03

is an example of a binary classification.
38:03 - 38:07

Only two possible outcomes, zero or one,
38:07 - 38:10

didn't fail or fail, all right? Two
38:10 - 38:13

possible outcomes. And then we can also,
38:13 - 38:15

for the same dataset, we can extend it
38:15 - 38:18

and make it a multiclass classification
38:18 - 38:21

problem, all right? So if we kind of want
38:21 - 38:24

to drill down further, we can say that
38:24 - 38:27

not only is there a failure, we can
38:27 - 38:29

actually say there are different types of
38:29 - 38:32

failures, okay? So we have one category of
38:32 - 38:36

class that is basically no failure, okay?
38:36 - 38:37

Then we have a category for the
38:37 - 38:40

different types of failures, right? So you
38:40 - 38:44

can have a power failure, you could have
38:44 - 38:46

a tool wear failure,
38:46 - 38:49

you could have- let's go down
38:49 - 38:51

here, you could have a overstrain
38:51 - 38:54

failure, and etc, etc. So you can have
38:54 - 38:57

multiple classes of failure in addition
38:57 - 39:01

to the general overall or the majority
39:01 - 39:04

class of no failure, and that would be a
39:04 - 39:07

multiclass classification problem. So
39:07 - 39:08

with this dataset, we are going to see
39:08 - 39:11

how to make it a binary classification
39:11 - 39:13

problem and also a multiclass
39:13 - 39:15

classification problem. Okay, so let's
39:15 - 39:17

look at the workflow. So let's say we've
39:17 - 39:19

already got the data, so right now we do
39:19 - 39:21

have the dataset. This is the dataset
39:21 - 39:23

that we have, so let's assume we've
39:23 - 39:25

somehow managed to get this dataset
39:25 - 39:27

from some IoT sensors that are
39:27 - 39:29

monitoring real-time data in our
39:29 - 39:31

production environment. On the assembly
39:31 - 39:33

line, on the production line we've got
39:33 - 39:35

sensors reading data that gives us all
39:35 - 39:38

these data that we have in this CSV file.
39:38 - 39:40

Okay, so we've already got the data, we've
39:40 - 39:42

retrieved the data, now we're going to go
39:42 - 39:45

on to the cleaning and exploration part
39:45 - 39:48

of your machine learning life cycle. All
39:48 - 39:50

right, so let's look at the data cleaning
39:50 - 39:51

part. So the data cleaning part, we're
39:51 - 39:54

interested in checking for missing
39:54 - 39:56

values and maybe removing the rows you
39:56 - 39:58

missing values, okay?
39:58 - 40:00

So the kind of things we can- sorry,
40:00 - 40:01

the kind of things we can do in missing
40:01 - 40:03

values, we can remove the rows missing
40:03 - 40:06

values, we can put in some new values,
40:06 - 40:08

some replacement values which could be a
40:08 - 40:10

average of all the values in that that
40:10 - 40:13

particular column, etc, etc, we could also try to
40:13 - 40:15

identify outliers in our dataset and
40:15 - 40:17

also there are a variety of ways to deal
40:17 - 40:19

with that. So this is called data
40:19 - 40:21

cleansing which is a really important
40:21 - 40:23

part of your machine learning workflow,
40:23 - 40:26

right? So that's where we are now at,
40:26 - 40:27

we're doing cleansing, and then we're
40:27 - 40:28

going to follow up with
40:28 - 40:31

exploration. So let's look at the actual
40:31 - 40:33

code that does the cleansing here. So
40:33 - 40:36

here we are right at the start of the
40:36 - 40:38

machine learning life cycle here, so
40:38 - 40:41

this is a Jupyter notebook. So here we
40:41 - 40:43

have a brief description of the problem
40:43 - 40:46

statement, all right? So this dataset
40:46 - 40:48

reflects real life predictive
40:48 - 40:49

maintenance encountered industry with
40:49 - 40:50

measurements from real equipment. The
40:50 - 40:52

features description is taken directly
40:52 - 40:55

from the data source set. So here we have
40:55 - 40:57

a description of the six key features in
40:57 - 41:00

our dataset type which is the quality
41:00 - 41:03

of the product, the air temperature, the
41:03 - 41:05

process temperature, the rotational speed,
41:05 - 41:07

the torque, and the tool wear, all right? So
41:07 - 41:09

these are the six feature variables, and
41:09 - 41:11

there are the two target variables, so
41:11 - 41:13

just now- I showed you just now there's
41:13 - 41:15

one target variable which only has two
41:15 - 41:17

possible values, either zero or one, okay?
41:17 - 41:20

Zero or one means failure or no failure,
41:20 - 41:23

so that will be this column here, right?
41:23 - 41:25

So let me go all the way back up to here.
41:25 - 41:27

So this column here, we already saw it
41:27 - 41:29

only has two possible values, it's either zero or
41:29 - 41:33

one. And then we also have this column
41:33 - 41:35

here, and this column here is basically
41:35 - 41:38

the failure type. And so the- we have- as I
41:38 - 41:41

already demonstrated just now, we do have
41:41 - 41:43

several categories of types of
41:43 - 41:46

failure, and so here we call this
41:46 - 41:46

multiclass
41:46 - 41:50

classification. So we can either build a
41:50 - 41:52

binary classification model for this
41:52 - 41:54

problem domain, or we can build a
41:54 - 41:54

multiclass
41:54 - 41:58

classification problem, all right. So this
41:58 - 42:00

Jupyter notebook is going to demonstrate
42:00 - 42:02

both approaches to us. So first step, we
42:02 - 42:05

are going to write all this Python code
42:05 - 42:07

that's going to import all the libraries
42:07 - 42:09

that we need to use, okay? So this is
42:09 - 42:12

basically Python code, okay, and it's
42:12 - 42:15

importing the relevant machine learn-
42:15 - 42:18

oops. We are importing the relevant
42:18 - 42:21

machine learning libraries related to
42:21 - 42:24

our domain use case, okay? Then we load in
42:24 - 42:26

our dataset, okay, so this our dataset.
42:26 - 42:28

We describe it, we have some quick
42:28 - 42:31

insights into the dataset. And then
42:31 - 42:33

we just take a look at all the variables
42:33 - 42:36

of the feature variables, etc, and so on.
42:36 - 42:38

What we're doing now is just
42:38 - 42:40

doing a quick overview of the dataset,
42:40 - 42:42

so this all this Python code here that
42:42 - 42:44

we're writing is allowing us, the data
42:44 - 42:45

scientist, to get a quick overview of our
42:45 - 42:48

dataset, right, okay, like how many varia-
42:48 - 42:50

how many rows are there, how many columns
42:50 - 42:52

are there, what are the data types of the
42:52 - 42:53

columns, what are the name of the columns,
42:53 - 42:57

etc, etc. Okay, then we zoom in on to the
42:57 - 42:59

target variables. So we look at the
42:59 - 43:02

target variables, how many counts
43:02 - 43:05

there are of this target variable, and
43:05 - 43:06

so on. How many different types of
43:06 - 43:08

failures there are. Then you want to
43:08 - 43:09

check whether there are any
43:09 - 43:11

inconsistencies between the target and
43:11 - 43:14

the failure type, etc. Okay, so when you do
43:14 - 43:15

all this checking, you're going to
43:15 - 43:17

discover there are some discrepancies in
43:17 - 43:20

your dataset, so using a specific Python
43:20 - 43:22

code to do checking, you're going to say
43:22 - 43:23

hey, you know what? There's some errors
43:23 - 43:25

here, right? There are nine values that
43:25 - 43:27

classify as failure in target variable,
43:27 - 43:28

but as no failure in the failure type
43:28 - 43:30

variable, so that means there's a
43:30 - 43:33

discrepancy in your data point, right?
43:33 - 43:35

So these are all the ones that
43:35 - 43:36

are discrepancies because the target
43:36 - 43:39

variable says one, and we already know
43:39 - 43:41

that target variable one is supposed to
43:41 - 43:43

mean there is a failure, right? Target
43:43 - 43:45

variable one is supposed to mean there is
43:45 - 43:47

a failure, so we are kind of expecting to
43:47 - 43:50

see the failure classification, but some
43:50 - 43:51

rows actually say there's no failure
43:51 - 43:54

although the target type is one. Well here
43:54 - 43:56

is a classic example of an error that
43:56 - 43:59

can very well occur in a dataset, so now
43:59 - 44:01

the question is what do you do with
44:01 - 44:05

these errors in your dataset, right? So
44:05 - 44:06

here the data scientist says, I think it
44:06 - 44:08

would make sense to remove those
44:08 - 44:10

instances, and so they write some code
44:10 - 44:13

then to remove those instances or those
44:13 - 44:15

rows or data points from the overall
44:15 - 44:17

dataset, and same thing we can, again,
44:17 - 44:19

check for other issues. So we find there's
44:19 - 44:21

another issue here with our dataset which
44:21 - 44:24

is another warning, so, again, we can
44:24 - 44:26

possibly remove them. So you're going to
44:26 - 44:31

remove 27 instances or rows from your
44:31 - 44:34

overall dataset. So your dataset has
44:34 - 44:37

10,000 rows or data points. You're
44:37 - 44:40

removing 27 which is only 0.27 of the
44:40 - 44:42

entire dataset. And these were the
44:42 - 44:46

reasons why you removed them, okay? So if
44:46 - 44:48

you're just removing 0.27% of the
44:48 - 44:51

entire dataset, no big deal, right? Still
44:51 - 44:53

okay, but you needed to remove them
44:53 - 44:55

because these errors right, these
44:55 - 44:58

27
44:58 - 45:01

errors, okay, data points with errors in
45:01 - 45:03

your dataset could really affect the
45:03 - 45:05

training of your machine learning model.
45:05 - 45:09

So we need to do your data cleansing,
45:09 - 45:12

right? So we are actually cleansing now
45:12 - 45:15

some kind of data that is
45:15 - 45:18

incorrect or erroneous in your original
45:18 - 45:21

dataset. Okay, so then we go on to the
45:21 - 45:24

next part which is called EDA, right? So
45:24 - 45:29

EDA is where we kind of explore our data,
45:29 - 45:32

and we want to, kind of, get a visual
45:32 - 45:34

overview of our data as a whole, and also
45:34 - 45:36

take a look at the statistical
45:36 - 45:38

properties of our data. The statistical
45:38 - 45:40

distribution of the data in all the
45:40 - 45:43

various columns, the correlation between
45:43 - 45:45

the variables, between the feature
45:45 - 45:47

variables different columns, and also the
45:47 - 45:49

feature variable and the target variable.
45:49 - 45:52

So all of this is called EDA, and EDA in
45:52 - 45:54

a machine learning workflow is typically
45:54 - 45:57

done through visualization,
45:57 - 45:59

all right? So let's go back here and take
45:59 - 46:01

a look, right? So, for example, here we are
46:01 - 46:03

looking at correlation, so we plot the
46:03 - 46:06

values of all the various feature
46:06 - 46:08

variables against each other and look
46:08 - 46:11

for potential correlations and patterns
46:11 - 46:13

and so on. And all the different shapes
46:13 - 46:17

that you see here in this pair plot, okay,
46:17 - 46:18

will have different meaning,
46:18 - 46:20

statistical meaning, and so the data
46:20 - 46:22

scientist has to, kind of, visually
46:22 - 46:24

inspect this pair plot, make some
46:24 - 46:26

interpretations of these different
46:26 - 46:28

patterns that he sees here, all right. So
46:28 - 46:30

these are some of the insights that
46:30 - 46:33

can be deduced from looking at these
46:33 - 46:34

patterns, so, for example, the torque and
46:34 - 46:36

rotational speed are highly correlated,
46:36 - 46:38

the process temperature and air
46:38 - 46:40

temperature also highly correlated, that
46:40 - 46:42

failures occur for extreme values of
46:42 - 46:45

some features, etc, etc. Then you can plot
46:45 - 46:46

certain kinds of charts. This called a
46:46 - 46:48

violin chart to, again, get new insights.
46:48 - 46:50

For example, regarding the torque and
46:50 - 46:51

rotational speed, it can see, again, that
46:51 - 46:53

most failures are triggered for much
46:53 - 46:55

lower or much higher values than the
46:55 - 46:57

mean when they're not failing. So all
46:57 - 47:01

these visualizations, they are there, and
47:01 - 47:02

a trained data scientist can look at
47:02 - 47:05

them, inspect them, and make some kind of
47:05 - 47:08

insightful deductions from them, okay?
47:08 - 47:11

Percentage of failure, right? The
47:11 - 47:14

correlation heat map, okay, between all
47:14 - 47:16

these different feature variables, and
47:16 - 47:16

also the target
47:16 - 47:20

variable, okay? The product types,
47:20 - 47:21

percentage of product types, percentage
47:21 - 47:23

of failure with respect to the product
47:23 - 47:26

type, so we can also kind of visualize
47:26 - 47:28

that as well. So certain products have a
47:28 - 47:30

higher ratio of failure compared to other
47:30 - 47:33

product types, etc. Or, for example, M
47:33 - 47:36

tends to fail more than H products, etc,
47:36 - 47:39

etc. So we can create a vast variety of
47:39 - 47:41

visualizations in the EDA stage, so you
47:41 - 47:44

can see here. And, again, the idea of this
47:44 - 47:46

visualization is just to give us some
47:46 - 47:50

insight, some preliminary insight into
47:50 - 47:53

our dataset that helps us to model it
47:53 - 47:54

more correctly. So some more insights
47:54 - 47:56

that we get into our dataset from all
47:56 - 47:58

this visualization.
47:58 - 48:00

Then we can plot the distribution so we
48:00 - 48:01

can see whether it's a normal
48:01 - 48:03

distribution or some other kind of
48:03 - 48:06

distribution. We can have a box plot
48:06 - 48:08

to see whether there are any outliers in
48:08 - 48:10

your dataset and so on, right? So we can
48:10 - 48:12

see from the box plots, we can see
48:12 - 48:15

rotational speed and have outliers. So we
48:15 - 48:17

already saw outliers are basically a
48:17 - 48:19

problem that you may need to kind of
48:19 - 48:23

tackle, right? So outliers are an issue,
48:23 - 48:25

it's a part of data cleansing. And
48:25 - 48:27

so you may need to tackle this, so we may
48:27 - 48:29

have to check okay, well where are the
48:29 - 48:31

potential outliers so we can analyze
48:31 - 48:35

them from the box plot, okay? But then
48:35 - 48:37

we can say well they are outliers, but
48:37 - 48:39

maybe they're not really horrible
48:39 - 48:41

outliers so we can tolerate them or
48:41 - 48:43

maybe we want to remove them. So we can
48:43 - 48:45

see what our mean and maximum values for
48:45 - 48:47

all these with respect to product type,
48:47 - 48:50

how many of them are above or highly
48:50 - 48:51

correlated with the product type in
48:51 - 48:54

terms of the maximum and minimum, okay,
48:54 - 48:57

and then so on. So the insight is well we
48:57 - 49:00

got 4.8% of the instances are outliers,
49:00 - 49:03

so maybe 4.87% is not really that much,
49:03 - 49:05

the outliers are not horrible, so we just
49:05 - 49:07

leave them in the dataset. Now for a
49:07 - 49:09

different dataset, the data scientist
49:09 - 49:10

could come to a different conclusion, so
49:10 - 49:12

then they would do whatever they've
49:12 - 49:15

deemed is appropriate to, kind of, cleanse
49:15 - 49:18

the dataset. Okay, so now that we have
49:18 - 49:20

done all the EDA, the next thing we're
49:20 - 49:23

going to do is we are going to do what
49:23 - 49:26

is called feature engineering. So we are
49:26 - 49:29

going to transform our original feature
49:29 - 49:31

variables and these are our original
49:31 - 49:33

feature variables, right? These are our
49:33 - 49:35

original feature variables, and we are
49:35 - 49:38

going to transform them, all right? We're
49:38 - 49:40

going to transform them in some sense
49:40 - 49:44

into some other form before we fit this
49:44 - 49:46

for training into our machine learning
49:46 - 49:49

algorithm, all right? So these are
49:49 - 49:52

examples of- let's say these are examples of a
49:52 - 49:55

original dataset, right? And this is
49:55 - 49:57

examples, these are some of the examples,
49:57 - 49:58

you don't have to use all of them, but
49:58 - 49:59

these are some of the examples of what we
49:59 - 50:01

call feature engineering which you can
50:01 - 50:04

then transform your original values in
50:04 - 50:05

your feature variables to all these
50:05 - 50:08

transform values here. So we're going to
50:08 - 50:10

pretty much do that here, so we have a
50:10 - 50:13

ordinal encoding, we do scaling of the
50:13 - 50:15

data so the dataset is scaled, we use a
50:15 - 50:18

MinMax scaling, and then finally, we come
50:18 - 50:22

to do a modeling. So we have to split our
50:22 - 50:24

dataset into a training dataset and a
50:24 - 50:29

test dataset. So coming back to here again,
50:29 - 50:32

we said that before you train your
50:32 - 50:34

model, sorry, before you train your model,
50:34 - 50:36

you have to take your original dataset,
50:36 - 50:37

now this is a featured engineered dataset.
50:37 - 50:39

We're going to break it into two or
50:39 - 50:41

more subsets, okay. So one is called the
50:41 - 50:42

training dataset that we use to feed
50:42 - 50:44

and train a machine learning model. The
50:44 - 50:46

second is test dataset to evaluate the
50:46 - 50:48

accuracy of the model, okay? So we got
50:48 - 50:51

this training dataset, your test dataset,
50:51 - 50:53

and we also need
50:53 - 50:56

to sample. So from our original dataset
50:56 - 50:57

we need to sample some points
50:57 - 50:59

that go into your training dataset, some
50:59 - 51:01

points that go in your test dataset. So
51:01 - 51:03

there are many ways to do sampling. One
51:03 - 51:05

way is to do stratified sampling where
51:05 - 51:07

we ensure the same proportion of data
51:07 - 51:09

from each stata or class because right
51:09 - 51:11

now we have a multiclass classification
51:11 - 51:12

problem, so you want to make sure the
51:12 - 51:14

same proportion of data from each strata or
51:14 - 51:16

class is equally proportional in the
51:16 - 51:18

training and test dataset as the
51:18 - 51:20

original dataset which is very useful
51:20 - 51:22

for dealing with what is called an
51:22 - 51:24

imbalanced dataset. So here we have an
51:24 - 51:26

example of what is called an imbalanced
51:26 - 51:30

dataset in the sense that you have the
51:30 - 51:33

vast majority of data points in your
51:33 - 51:35

dataset, they are going to have the
51:35 - 51:37

value of zero for their target variable
51:37 - 51:40

column. So only a extremely small
51:40 - 51:43

minority of the data points in your dataset
51:43 - 51:45

will actually have the value of one
51:45 - 51:49

for their target variable column, okay? So
51:49 - 51:51

a situation where you have your class or
51:51 - 51:53

your target variable column where the
51:53 - 51:54

vast majority of values are from one
51:54 - 51:58

class and a tiny small minority are from
51:58 - 52:01

another class, we call this an imbalanced
52:01 - 52:03

dataset. And for an imbalanced dataset,
52:03 - 52:04

typically we will have a specific
52:04 - 52:06

technique to do the train test split
52:06 - 52:08

which is called stratified sampling, and
52:08 - 52:10

so that's what's exactly happening here.
52:10 - 52:12

We're doing a stratified split here, so
52:12 - 52:15

we are doing a train test split here,
52:15 - 52:18

and we are doing a stratified split.
52:18 - 52:20

And then now we actually develop the
52:20 - 52:23

models. So now we've got the train test
52:23 - 52:25

split, now here is where we actually
52:25 - 52:27

train the models.
52:27 - 52:30

Now in terms of classification there are
52:30 - 52:31

a whole bunch of
52:31 - 52:35

possibilities, right, that you can use.
52:35 - 52:38

There are many, many different algorithms
52:38 - 52:41

that we can use to create a
52:41 - 52:43

classification model. So these are an
52:43 - 52:45

example of some of the more common ones.
52:45 - 52:47

Logistic, support vector machine, decision
52:47 - 52:50

trees, random forest, bagging, balanced
52:50 - 52:53

bagging, boost, ensemble. So all
52:53 - 52:55

these are different algorithms which
52:55 - 52:58

will create different kinds of models
52:58 - 53:02

which will result in different accuracy
53:02 - 53:05

measures, okay? So it's the goal of the
53:05 - 53:09

data scientist to find the best model
53:09 - 53:12

that gives the best accuracy for the
53:12 - 53:14

given dataset, for training on that
53:14 - 53:17

given dataset. So let's head back, again,
53:17 - 53:20

to our machine learning workflow. So
53:20 - 53:22

here basically what I'm doing is I'm
53:22 - 53:24

creating a whole bunch of models here,
53:24 - 53:26

all right? So one is a random forest, one
53:26 - 53:27

is balanced bagging, one is a boost
53:27 - 53:30

classifier, one's a ensemble classifier,
53:30 - 53:33

and using all of these, I am going to
53:33 - 53:35

basically feed or train my model using
53:35 - 53:37

all these algorithms. And then I'm going
53:37 - 53:40

to evaluate them, okay? I'm going to
53:40 - 53:42

evaluate how good each of these models
53:42 - 53:46

are. And here you can see your
53:46 - 53:49

evaluation data, right? Okay and this is
53:49 - 53:51

the confusion matrix which is another
53:51 - 53:54

way of evaluating. So now we come to the,
53:54 - 53:56

kind of, the key part here which
53:56 - 53:59

is how do I distinguish between
53:59 - 54:00

all these models, right? I've got all
54:00 - 54:01

these different models which are built
54:01 - 54:03

with different algorithms which I'm
54:03 - 54:05

using to train on the same dataset, how
54:05 - 54:07

do I distinguish between all these
54:07 - 54:10

models, okay? And so for that sense, for
54:10 - 54:14

that we actually have a whole bunch of
54:14 - 54:16

common evaluation metrics for
54:16 - 54:18

classification, right? So this evaluation
54:18 - 54:22

metrics tell us how good a model is in
54:22 - 54:24

terms of its accuracy in
54:24 - 54:27

classification. So in terms of
54:27 - 54:29

accuracy, we actually have many different
54:29 - 54:32

models, sorry, many different measures,
54:32 - 54:33

right? You might think well, accuracy is
54:33 - 54:35

just accuracy, well that's all right, it's
54:35 - 54:37

just either it's accurate or it's not
54:37 - 54:39

accurate, right? But actually it's not
54:39 - 54:41

that simple. There are many different
54:41 - 54:44

ways to measure the accuracy of a
54:44 - 54:45

classification model, and these are some
54:45 - 54:48

of the more common ones. So, for example,
54:48 - 54:51

the confusion matrix tells us how many
54:51 - 54:54

true positives, that means the value is
54:54 - 54:56

positive, the prediction is positive, how
54:56 - 54:58

many false positives which means the
54:58 - 54:59

value is negative the machine learning
54:59 - 55:02

model predicts positive. How many false
55:02 - 55:04

negatives which means that the machine
55:04 - 55:06

learning model predicts negative, but
55:06 - 55:07

it's actually positive. And how many true
55:07 - 55:09

negatives there are which means that the
55:09 - 55:11

the machine learning model
55:11 - 55:13

predicts negative and the true value is
55:13 - 55:15

also negative. So this is called a
55:15 - 55:17

confusion matrix. This is one way we
55:17 - 55:19

assess or evaluate the performance of a
55:19 - 55:21

classification model,
55:21 - 55:23

okay? This is for binary
55:23 - 55:25

classification, we can also have
55:25 - 55:27

multiclass confusion matrix,
55:27 - 55:29

and then we can also measure things like
55:29 - 55:32

accuracy. So accuracy is the true
55:32 - 55:34

positives plus the true negatives which
55:34 - 55:35

is the total number of correct
55:35 - 55:38

predictions made by the model divided by
55:38 - 55:40

the total number of data points in your
55:40 - 55:43

dataset. And then you have also other
55:43 - 55:43

kinds of
55:43 - 55:47

measures such as recall. And this a
55:47 - 55:49

formula for recall, this is a formula for
55:49 - 55:51

the F1 score, okay? And then there's
55:51 - 55:56

something called the ROC curve, right? So
55:56 - 55:57

without going too much in the detail of
55:57 - 55:59

what each of these entails, essentially
55:59 - 56:01

these are all different ways, these are
56:01 - 56:03

different KPI, right? Just like if you
56:03 - 56:06

work in a company, you have different KPI,
56:06 - 56:08

right? Certain employees have certain KPI
56:08 - 56:11

that measures how good or how, you
56:11 - 56:13

know, efficient or how effective a
56:13 - 56:16

particular employee is, right? So the
56:16 - 56:20

KPI for your machine learning models
56:20 - 56:24

are ROC curve, F1 score, recall, accuracy,
56:24 - 56:27

okay, and your confusion matrix. So
56:27 - 56:30

fundamentally after I have built, right,
56:30 - 56:33

so here I've built my four different
56:33 - 56:35

models. So after I built these four
56:35 - 56:38

different models, I'm going to check and
56:38 - 56:40

evaluate them using all those different
56:40 - 56:42

metrics like, for example, the F1 score,
56:42 - 56:45

the precision score, the recall score, all
56:45 - 56:47

right. So for this model, I can check out
56:47 - 56:50

the ROC score, the F1 score, the precision
56:50 - 56:52

score, the recall score. Then for this
56:52 - 56:55

model, this is the ROC score, the F1 score,
56:55 - 56:57

the precision score, the recall score.
56:57 - 57:00

Then for this model and so on. So for
57:00 - 57:03

every single model I've created using my
57:03 - 57:06

training dataset, I will have all my set
57:06 - 57:08

of evaluation metrics that I can use to
57:08 - 57:12

evaluate how good this model is, okay?
57:12 - 57:13

Same thing here, I've got a confusion
57:13 - 57:15

matrix here, right, so I can use that,
57:15 - 57:18

again, to evaluate between all these four
57:18 - 57:20

different models, and then I, kind of,
57:20 - 57:22

summarize it up here. So we can see from
57:22 - 57:25

this summary here that actually the top
57:25 - 57:28

two models, right, which are I'm going to
57:28 - 57:29

give a lot, as a data scientist, I'm now
57:29 - 57:31

going to just focus on these two models.
57:31 - 57:33

So these two models are bagging
57:33 - 57:36

classifier and random forest classifier.
57:36 - 57:38

They have the highest values of F1 score,
57:38 - 57:40

and the highest values of the ROC curve
57:40 - 57:43

score, okay? So we can say these are the
57:43 - 57:46

top two models in terms of accuracy, okay,
57:46 - 57:49

using the F1 evaluation metric and the
57:49 - 57:54

ROC AUC evaluation metric, okay? So these
57:54 - 57:57

results, kind of, summarize here, and
57:57 - 57:59

then we use different sampling
57:59 - 58:01

techniques, okay, so just now I talked
58:01 - 58:04

about different kinds of sampling
58:04 - 58:06

techniques, and so the idea of different
58:06 - 58:08

kinds of sampling techniques is to just
58:08 - 58:11

get a different feel for different
58:11 - 58:14

distributions of the data in different
58:14 - 58:16

areas of your dataset, so that you want
58:16 - 58:20

to just, kind of, make sure that your
58:20 - 58:23

your evaluation of accuracy is actually
58:23 - 58:27

statistically correct, right? So we can
58:27 - 58:30

do what is called oversampling and under
58:30 - 58:31

sampling which is very useful when
58:31 - 58:32

you're working with an imbalanced data
58:32 - 58:35

set. So this is a example of doing that, and
58:35 - 58:37

then here we, again, check out the
58:37 - 58:39

results for all these different
58:39 - 58:42

techniques we use. The F1 score, the AUC
58:42 - 58:44

score, all right, these are the two key
58:44 - 58:47

measures of accuracy, right? So and then
58:47 - 58:48

we can check out the scores for the
58:48 - 58:50

different approaches. Okay so we can see,
58:50 - 58:53

oh well, overall the models have lower
58:53 - 58:56

ROC AUC score, but they have a much
58:56 - 58:58

higher F1 score. The bagging classifier
58:58 - 59:01

had the highest ROC AUC score,
59:01 - 59:04

but F1 score was too low, okay. Then, in
59:04 - 59:07

the data scientist opinion, the random
59:07 - 59:09

forest with this particular technique of
59:09 - 59:11

sampling has an equilibrium between the F1
59:11 - 59:14

ROC, and AUC score. So the takeaway one
59:14 - 59:17

is the macro F1 score improves
59:17 - 59:18

dramatically using these sampling
59:18 - 59:20

techniques, so these models might be better
59:20 - 59:22

compared to the balanced ones, all right.
59:22 - 59:26

So based on all this evaluation, the
59:26 - 59:28

data scientist says they're going to
59:28 - 59:30

continue to work with these two models,
59:30 - 59:31

all right, and the balanced bagging one,
59:31 - 59:33

and then continue to make further
59:33 - 59:35

comparisons, all right. So then, we
59:35 - 59:37

continue to keep refining on our
59:37 - 59:39

evaluation work here. We're going to
59:39 - 59:41

train the models one more time again, so
59:41 - 59:43

we, again, do a training test split, and
59:43 - 59:45

then we do that for this particular
59:45 - 59:47

approach model. And then we
59:47 - 59:48

print out what is called a
59:48 - 59:51

classification report, and this is
59:51 - 59:53

basically a summary of all those metrics
59:53 - 59:55

that I talk about just now, so, just now,
59:55 - 59:58

remember I said there was
59:58 - 60:00

several evaluation metrics, right? So
60:00 - 60:01

we had the confusion matrix, the
60:01 - 60:04

accuracy, the precision, the recall, the AUC
60:04 - 60:08

ROC score. So here with the classification
60:08 - 60:10

report, I can get a summary of all of
60:10 - 60:12

that, so I can see all the values here,
60:12 - 60:15

okay, for this particular model, bagging
60:15 - 60:17

tomek links. And then, I can do that for
60:17 - 60:19

another model, the random forest
60:19 - 60:21

borderline SMOTE, and then I can do that
60:21 - 60:22

for another model which is the balanced
60:22 - 60:25

bagging. So, again, we see this a lot of
60:25 - 60:27

comparison between different models
60:27 - 60:29

trying to figure out what all these
60:29 - 60:31

evaluation metrics are telling us, all
60:31 - 60:33

right? Then, again, we have a confusion
60:33 - 60:36

matrix. So we generate a confusion matrix
60:36 - 60:39

for the bagging with the tomeks links
60:39 - 60:41

undersampling, for the random forest
60:41 - 60:43

with the borderline SMOTE oversampling,
60:43 - 60:45

and just balanced bagging by itself. Then,
60:45 - 60:48

again, we compare between these three
60:48 - 60:51

models using the confusion matrix,
60:51 - 60:53

evaluation matrix, and then we can kind
60:53 - 60:56

of come to some conclusions. All right, so,
60:56 - 60:58

right, so now we look at all the data,
60:58 - 61:01

then we move on and look at another
61:01 - 61:03

another kind of evaluation metrics which
61:03 - 61:07

is the ROC score, right? So this is one of
61:07 - 61:09

the other evaluation metrics I talk
61:09 - 61:11

about. So this one is a kind of a curve,
61:11 - 61:13

you look at it to see the area
61:13 - 61:14

underneath the curve, this is called AOC
61:14 - 61:18

ROC area under the curve, sorry, AUC ROC
61:18 - 61:20

area under the curve. All right, so the
61:20 - 61:21

area under the curve
61:21 - 61:24

score will give us some idea about the
61:24 - 61:26

threshold that we're going to use for
61:26 - 61:28

classification, so we can examine this
61:28 - 61:29

for the bagging classifier, for the
61:29 - 61:31

random forest classifier, for the balanced
61:31 - 61:34

bagging classifier, okay? Then we can also,
61:34 - 61:36

again, do that- finally we can check
61:36 - 61:38

the classification report of this
61:38 - 61:40

particular model. So we keep doing this
61:40 - 61:43

over and over again, evaluating this
61:43 - 61:46

the matrix, the accuracy matrix, the
61:46 - 61:47

evaluation matrix for all these
61:47 - 61:49

different models. So we keep doing this
61:49 - 61:51

over and over again for different
61:51 - 61:53

thresholds or for classification, and so
61:53 - 61:57

as we keep drilling into these, we kind
61:57 - 62:01

of get more and more understanding of
62:01 - 62:03

all these different models, which one is
62:03 - 62:05

the best one that gives the best
62:05 - 62:09

performance for our dataset, okay? So
62:09 - 62:11

finally, we come to this conclusion, this
62:11 - 62:14

particular model is not able to reduce
62:14 - 62:15

the recall on failures less than
62:15 - 62:18

95.18%. On the other hand, balanced begging
62:18 - 62:19

with a decision thresold of 0.6 is able
62:19 - 62:22

to have a better recall blah, blah, blah,
62:22 - 62:25

etc. So finally, after having done all of
62:25 - 62:27

this evaluations,
62:27 - 62:31

okay, this is the conclusion.
62:31 - 62:34

So after having gone- so right now we
62:34 - 62:35

have gone through all the steps of the
62:35 - 62:38

machine learning life cycle which
62:38 - 62:40

means we have right now, or the data
62:40 - 62:42

scientist right now has gone through all
62:42 - 62:43

these steps
62:44 - 62:47

which is now we have done this
62:47 - 62:49

validation. So we have done the cleaning,
62:49 - 62:51

exploration, preparation, transformation,
62:51 - 62:53

the feature engineering, we have developed
62:53 - 62:54

and trained multiple models, we have
62:54 - 62:56

evaluated all these different models, so
62:56 - 62:59

right now we have reached this stage, so
62:59 - 63:03

at this stage we as the data scientist,
63:03 - 63:05

kind of, have completed our job. So we've
63:05 - 63:08

come to some very useful conclusions
63:08 - 63:10

which we now can share with our
63:10 - 63:13

colleagues, all right? And based on these
63:13 - 63:15

conclusions or recommendations,
63:15 - 63:17

somebody is going to choose a
63:17 - 63:19

appropriate model, and that model is
63:19 - 63:23

going to get deployed for real-time use
63:23 - 63:25

in a real life production environment,
63:25 - 63:27

okay? And that decision is going to be
63:27 - 63:29

made based on the recommendations coming
63:29 - 63:31

from the data scientist at the end of
63:31 - 63:33

this phase, okay? So at the end of this
63:33 - 63:35

phase, the data scientist is going to
63:35 - 63:37

come up with these conclusions. So
63:37 - 63:42

conclusions is, okay, if the engineering
63:42 - 63:45

team they are looking, okay? The
63:45 - 63:46

engineering team, right? The engineering
63:46 - 63:49

team, if they are looking for the highest
63:49 - 63:52

failure detection rate possible, then
63:52 - 63:54

they should go with this particular
63:54 - 63:57

model, okay?
63:57 - 63:59

And if they want a balance between
63:59 - 64:01

precision and recall, then they should
64:01 - 64:03

choose between the bagging model with a
64:03 - 64:06

0.4 decision threshold or the random
64:06 - 64:10

forest model with a 0.5 threshold, but if
64:10 - 64:12

they don't care so much about predicting
64:12 - 64:14

every failure, and they want the highest
64:14 - 64:17

precision possible, then they should opt
64:17 - 64:20

for the bagging tomek links classifier
64:20 - 64:23

with a bit higher decision threshold. And
64:23 - 64:26

so this is the key thing that the data
64:26 - 64:28

scientist is going to give, right? This is
64:28 - 64:31

the key takeaway. This is the, kind of, the
64:31 - 64:33

end result of the entire machine
64:33 - 64:35

learning life cycle. Right now the data
64:35 - 64:36

scientist is going to tell the
64:36 - 64:39

engineering team, all right you guys,
64:39 - 64:41

which is more important for you, point A,
64:41 - 64:45

point B, or point C. Make your decision. So
64:45 - 64:47

the engineering team will then discuss
64:47 - 64:49

among themselves and say, hey you know
64:49 - 64:52

what? What we want is we want to get the
64:52 - 64:55

highest failure detection possible
64:55 - 64:58

because any kind of failure of that
64:58 - 65:00

machine or the product or the assembly
65:00 - 65:03

line is really going to screw us up big
65:03 - 65:06

time. So what we're looking for is the
65:06 - 65:08

model that will give us the highest
65:08 - 65:11

failure detection rate. We don't care
65:11 - 65:13

about precision, but we want to be make
65:13 - 65:15

sure that if there's a failure, we are
65:15 - 65:18

going to catch it, right? So that's what
65:18 - 65:20

they want, and so the data scientist will
65:20 - 65:22

say, hey you go for the balanced bagging
65:22 - 65:25

model, okay? Then, the data scientist saves
65:25 - 65:28

this, all right. And then, once you have
65:28 - 65:30

saved this, you can then go right
65:30 - 65:32

ahead and deploy that. So you can go
65:32 - 65:34

right ahead and deploy that to
65:34 - 65:37

production. Okay, and so if you want to
65:37 - 65:39

continue, we can actually further
65:39 - 65:41

continue this modeling problem. So just
65:41 - 65:43

now, I model this problem as a binary
65:43 - 65:47

classification problem. Uh, sorry. I
65:47 - 65:48

modeled this problem as a binary
65:48 - 65:50

classification which means it's either
65:50 - 65:52

zero or one, either fail or not fail, but
65:52 - 65:54

we can also model it as a multiclass
65:54 - 65:56

classification problem, right, because
65:56 - 65:58

as I said earlier just now for the
65:58 - 66:00

target variable column which is- sorry, for
66:00 - 66:03

the failure type column, you actually
66:03 - 66:05

have multiple kinds of failures, right?
66:05 - 66:08

For example, you may have a power failure,
66:08 - 66:10

you may have a tool wear failure, you
66:10 - 66:13

may have a overstrain failure. So now we
66:13 - 66:15

can model the problem slightly
66:15 - 66:17

differently, so we can model it as a
66:17 - 66:20

multiclass classification problem, and
66:20 - 66:21

then we go through the entire same
66:21 - 66:23

process that we went through just now, so
66:23 - 66:25

we create different models, we test this
66:25 - 66:27

out, but now the confusion matrix is for
66:27 - 66:30

a multiclass classification issue, right?
66:30 - 66:31

So we're going
66:31 - 66:34

to check them out. We're going to, again,
66:34 - 66:36

try different algorithms or models.
66:36 - 66:38

Again, train and test our dataset, do the
66:38 - 66:40

training test split on these
66:40 - 66:42

different models. All right, so we have
66:42 - 66:43

like, for example, we have balanced random
66:43 - 66:46

forest, balanced random forest grid search,
66:46 - 66:48

then you train the models using what is
66:48 - 66:50

called hyperparameter tuning, then you
66:50 - 66:51

get the scores. All right, so you get the
66:51 - 66:53

same evaluation scores again. You check
66:53 - 66:55

out the evaluation scores, compare
66:55 - 66:57

between them, generate a confusion matrix,
66:57 - 67:00

so this is a multiclass confusion matrix.
67:00 - 67:02

And then, you come to the final
67:02 - 67:06

conclusion. So now if you are interested
67:06 - 67:09

to frame your problem domain as a
67:09 - 67:11

multiclass classification problem, all
67:11 - 67:14

right, then these are the recommendations
67:14 - 67:15

from the data scientist. So the data
67:15 - 67:17

scientist will say, you know what, I'm
67:17 - 67:20

going to pick this particular model, the
67:20 - 67:22

balanced bagging classifier, and these are
67:22 - 67:25

all the reasons that the data scientist
67:25 - 67:27

is going to give as a rational for
67:27 - 67:29

selecting this particular
67:29 - 67:32

model. And then once that's done, you save
67:32 - 67:35

the model and that's it, that's it.
67:35 - 67:39

So that's all done now, and so then the
67:39 - 67:41

the model, the machine learning model,
67:41 - 67:44

now you can put it live, run it on the
67:44 - 67:45

server, and now the machine learning
67:45 - 67:47

model is ready to work which means it's
67:47 - 67:49

ready to generate predictions, right?
67:49 - 67:50

That's the main job of the machine
67:50 - 67:52

learning model. You have picked the best
67:52 - 67:54

machine learning model with the best
67:54 - 67:56

evaluation metrics for whatever accuracy
67:56 - 67:58

goal you're trying to achieve. And
67:58 - 68:00

now you're going to run it on a server,
68:00 - 68:01

and now you're going to get all this
68:01 - 68:03

real-time data that's coming from your
68:03 - 68:05

sensors, you're going to pump that into
68:05 - 68:06

your machine learning model, your machine
68:06 - 68:08

learning model will pump out a whole
68:08 - 68:10

bunch of predictions, and we're going to
68:10 - 68:13

use that predictions in real-time to
68:13 - 68:15

make real-time, real-world decision
68:15 - 68:18

making, right? You're going to say, okay
68:18 - 68:20

I'm predicting that that machine is
68:20 - 68:23

going to fail on Thursday at 5:00 p.m.,
68:23 - 68:26

so you better get your service folks in
68:26 - 68:29

to service it on Thursday 2 p.m. or, you
68:29 - 68:32

know, whatever. So you can, you know,
68:32 - 68:33

make decisions on when you want to do
68:33 - 68:35

your maintenance, you know, and make
68:35 - 68:38

the best decisions to optimize the cost
68:38 - 68:41

of maintenance, etc, etc. And then based on
68:41 - 68:42

the
68:42 - 68:45

results that are coming up from the
68:45 - 68:47

predictions, so the predictions may be
68:47 - 68:49

good, the predictions may be lousy, the
68:49 - 68:51

predictions may be average, right? So
68:51 - 68:54

we're constantly monitoring how good
68:54 - 68:55

or how useful are the predictions
68:55 - 68:58

generated by this real-time model that's
68:58 - 69:00

running on the server, and based on our
69:00 - 69:03

monitoring, we will then take some new
69:03 - 69:05

data and then repeat this entire life
69:05 - 69:07

cycle again, so this is basically a
69:07 - 69:09

workflow that's iterative, and we are
69:09 - 69:11

constantly or the data scientist is
69:11 - 69:13

constantly getting in all these new data
69:13 - 69:15

points and then refining the model,
69:15 - 69:18

picking maybe a new model, deploying the
69:18 - 69:22

new model onto the server, and so on. All
69:22 - 69:24

right, and so that's it. So that is
69:24 - 69:26

basically your machine learning workflow
69:26 - 69:29

in a nutshell. Okay so for this
69:29 - 69:32

particular approach we have used a bunch
69:32 - 69:35

of data science libraries from Python,
69:35 - 69:37

so we have used Pandas which is the most
69:37 - 69:39

basic data science libraries that
69:39 - 69:40

provides all the tools to work with raw
69:40 - 69:43

data. We have used Numpy which is a high
69:43 - 69:44

performance library for implementing
69:44 - 69:46

complex array matrix operations. We have
69:46 - 69:50

used Matplotlib and Seaborn which is used
69:50 - 69:52

for doing the EDA the
69:52 - 69:56

exploratory data analysis phase of machine
69:56 - 69:57

learning where you visualize all your
69:57 - 69:59

data. We have used Scikit learn which is
69:59 - 70:01

the machine learning library to do all
70:01 - 70:03

your implementation for all your core
70:03 - 70:06

machine learning algorithms. We
70:06 - 70:08

have not used this because this is not a
70:08 - 70:11

deep learning problem, but if you are
70:11 - 70:13

working with a deep learning problem
70:13 - 70:15

like image classification, image
70:15 - 70:18

recognition, object detection, okay,
70:18 - 70:20

natural language processing, text
70:20 - 70:22

classification, well then you're going to
70:22 - 70:24

use these libraries from Python which is
70:24 - 70:29

Tensorflow, okay, and also Pytorch.
70:29 - 70:33

And then lastly, that whole thing, that
70:33 - 70:35

whole data science project that you saw
70:35 - 70:37

just now, this entire data science
70:37 - 70:39

project is actually developed in
70:39 - 70:41

something called a Jupyter notebook. So
70:41 - 70:44

all this Python code along with all the
70:44 - 70:46

observations from the data
70:46 - 70:49

scientists, okay, for this entire data
70:49 - 70:50

science project was actually run in
70:50 - 70:53

something called a Jupyter notebook. So
70:53 - 70:56

that is the
70:56 - 70:59

most widely used tool for interactively
70:59 - 71:02

developing and presenting data science
71:02 - 71:05

projects. Okay so that brings me to the
71:05 - 71:07

end of this entire presentation. I hope
71:07 - 71:10

that you find it useful for you, and that
71:10 - 71:13

you can appreciate the importance of
71:13 - 71:15

machine learning, and how it can be
71:15 - 71:20

applied in a real life use case in a
71:20 - 71:23

typical production environment. All right,
71:23 - 71:27

thank you all so much for watching!

Title:: Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook
Description:: more » « less
Video Language:: English
Duration:: 01:11:27

	OEVIDEOS edited English subtitles for Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook
	OEVIDEOS edited English subtitles for Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook
	OEVIDEOS edited English subtitles for Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook
	OEVIDEOS edited English subtitles for Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook
	OEVIDEOS edited English subtitles for Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook
	OEVIDEOS edited English subtitles for Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook
	OEVIDEOS edited English subtitles for Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook
	OEVIDEOS edited English subtitles for Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook

Show all

English subtitles

Revisions Compare revisions

Revision 19 Edited

OEVIDEOS
Revision 18 Edited

OEVIDEOS
Revision 17 Edited

OEVIDEOS
Revision 16 Edited

OEVIDEOS
Revision 15 Edited

OEVIDEOS
Revision 14 Edited

OEVIDEOS
Revision 13 Edited

OEVIDEOS
Revision 12 Edited

OEVIDEOS
Revision 11 Edited

OEVIDEOS
Revision 10 Edited

OEVIDEOS
Revision 9 Edited

OEVIDEOS
Revision 8 Edited

OEVIDEOS
Revision 7 Edited

OEVIDEOS
Revision 6 Edited

OEVIDEOS
Revision 5 Edited

OEVIDEOS
Revision 4 Edited

OEVIDEOS
Revision 3 Edited

OEVIDEOS
Revision 2 Edited

OEVIDEOS
Revision 1 Uploaded

OEVIDEOS

	Revision Number	Author	Created
	19	OEVIDEOS
	18	OEVIDEOS
	17	OEVIDEOS
	16	OEVIDEOS
	15	OEVIDEOS
	14	OEVIDEOS
	13	OEVIDEOS
	12	OEVIDEOS
	11	OEVIDEOS
	10	OEVIDEOS
	9	OEVIDEOS
	8	OEVIDEOS
	7	OEVIDEOS
	6	OEVIDEOS
	5	OEVIDEOS
	4	OEVIDEOS
	3	OEVIDEOS
	2	OEVIDEOS
	1	OEVIDEOS

Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook

Revisions Compare revisions

Our website uses cookies

Operating cookies (Required)