< Return to Video

Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook

  • 0:01 - 0:04
    Hello everyone, my name is Victor. I'm
  • 0:04 - 0:05
    your friendly neighborhood data
  • 0:05 - 0:08
    scientist from DreamCatcher. So in this
  • 0:08 - 0:10
    presentation, I would like to talk about
  • 0:10 - 0:13
    a specific industry use case of AI or
  • 0:13 - 0:15
    machine learning which is predictive
  • 0:15 - 0:19
    maintenance. So I will be covering these
  • 0:19 - 0:21
    topics and feel free to jump forward to
  • 0:21 - 0:23
    the specific part in the video where I
  • 0:23 - 0:25
    talk about all these topics. So I'm going
  • 0:25 - 0:27
    to start off with a general preview of
  • 0:27 - 0:29
    AI and machine learning. Then, I'll
  • 0:29 - 0:31
    discuss the use case which is predictive
  • 0:31 - 0:33
    maintenance. I'll talk about the basics
  • 0:33 - 0:35
    of machine learning, the workflow of
  • 0:35 - 0:37
    machine learning, and then we will come
  • 0:37 - 0:41
    to the meat of this presentation which
  • 0:41 - 0:44
    is essentially a demonstration of the
  • 0:44 - 0:45
    machine learning workflow from end to
  • 0:45 - 0:48
    end on a real life predictive
  • 0:48 - 0:52
    maintenance domain problem. All right, so
  • 0:52 - 0:54
    without any further ado, let's jump into
  • 0:54 - 0:57
    it. So let's start off with a quick
  • 0:57 - 1:00
    preview of AI and machine learning. Well
  • 1:00 - 1:04
    AI is a very general term, it encompasses
  • 1:04 - 1:07
    the entire area of science and
  • 1:07 - 1:09
    engineering that is related to creating
  • 1:09 - 1:11
    software programs and machines that
  • 1:11 - 1:14
    will be capable of performing tasks
  • 1:14 - 1:16
    that would normally require human
  • 1:16 - 1:20
    intelligence. But AI is a catchall term,
  • 1:20 - 1:23
    so really when we talk about apply AI,
  • 1:23 - 1:26
    how we use AI in our daily work, we are
  • 1:26 - 1:28
    really going to be talking about machine
  • 1:28 - 1:30
    learning. So machine learning is the
  • 1:30 - 1:32
    design and application of software
  • 1:32 - 1:34
    algorithms that are capable of learning
  • 1:34 - 1:38
    on their own without any explicit human
  • 1:38 - 1:40
    intervention. And the primary purpose of
  • 1:40 - 1:43
    these algorithms are to optimize
  • 1:43 - 1:47
    performance in a specific task. And the
  • 1:47 - 1:50
    primary performance or the primary task
  • 1:50 - 1:52
    that you want to optimize performance in
  • 1:52 - 1:54
    is to be able to make accurate
  • 1:54 - 1:57
    predictions about future outcomes based
  • 1:57 - 2:01
    on the analysis of historical data
  • 2:01 - 2:03
    from the past. So essentially machine
  • 2:03 - 2:05
    learning is about making predictions
  • 2:05 - 2:07
    about the future or what we call
  • 2:07 - 2:09
    predictive analytics.
  • 2:09 - 2:11
    And there are many different
  • 2:11 - 2:13
    kinds of algorithms that are available in
  • 2:13 - 2:15
    machine learning under the three primary
  • 2:15 - 2:16
    categories of supervised learning,
  • 2:16 - 2:19
    unsupervised learning, and reinforcement
  • 2:19 - 2:21
    learning. And here we can see some of the
  • 2:21 - 2:24
    different kinds of algorithms and their
  • 2:24 - 2:27
    use cases in various areas in
  • 2:27 - 2:30
    industry. So we have various domain use
  • 2:30 - 2:30
    cases
  • 2:30 - 2:32
    for all these different kind of
  • 2:32 - 2:34
    algorithms, and we can see that different
  • 2:34 - 2:38
    algorithms are fitted for different use cases.
  • 2:38 - 2:41
    Deep learning is an advanced form
  • 2:41 - 2:42
    of machine learning that's based on
  • 2:42 - 2:44
    something called an artificial neural
  • 2:44 - 2:46
    network or ANN for short, and this
  • 2:46 - 2:48
    essentially simulates the structure of
  • 2:48 - 2:50
    the human brain whereby neurons
  • 2:50 - 2:51
    interconnect and work together to
  • 2:51 - 2:55
    process and learn new information. So DL
  • 2:55 - 2:57
    is the foundational technology for most
  • 2:57 - 2:59
    of the popular AI tools that you
  • 2:59 - 3:01
    probably have heard of today. So I'm sure
  • 3:01 - 3:03
    you have heard of ChatGPT if you haven't
  • 3:03 - 3:05
    been living in a cave for the past 2
  • 3:05 - 3:08
    years. And yeah, so ChatGPT is an example
  • 3:08 - 3:10
    of what we call a large language model
  • 3:10 - 3:12
    and that's based on this technology
  • 3:12 - 3:15
    called deep learning. Also, all the modern
  • 3:15 - 3:17
    computer vision applications where a
  • 3:17 - 3:20
    computer program can classify images or
  • 3:20 - 3:23
    detect images or recognize images on
  • 3:23 - 3:25
    its own, okay, we call this computer
  • 3:25 - 3:28
    vision applications. They also use
  • 3:28 - 3:30
    this particular form of machine learning
  • 3:30 - 3:32
    called deep learning, right? So this is a
  • 3:32 - 3:34
    example of an artificial neural network.
  • 3:34 - 3:35
    For example, here I have an image of a
  • 3:35 - 3:37
    bird that's fed into this artificial
  • 3:37 - 3:40
    neural network, and output from this
  • 3:40 - 3:41
    artificial neural network is a
  • 3:41 - 3:44
    classification of this image into one of
  • 3:44 - 3:46
    these three potential categories. So in
  • 3:46 - 3:49
    this case, if the ANN has been trained
  • 3:49 - 3:52
    properly, we fit in this image, this
  • 3:52 - 3:54
    ANN should correctly classify this image
  • 3:54 - 3:57
    as a bird, right? So this is a image
  • 3:57 - 3:59
    classification problem which is a
  • 3:59 - 4:01
    classic use case for an artificial
  • 4:01 - 4:04
    neural network in the field of computer
  • 4:04 - 4:08
    vision. And just like in the case of
  • 4:08 - 4:09
    machine learning, there are a variety of
  • 4:09 - 4:12
    algorithms that are available for
  • 4:12 - 4:14
    deep learning under the category of
  • 4:14 - 4:15
    supervised learning and also
  • 4:15 - 4:17
    unsupervised learning.
  • 4:17 - 4:19
    All right, so this is how we can
  • 4:19 - 4:21
    kind of categorize this. You can think of
  • 4:21 - 4:24
    AI is a general area of smart systems
  • 4:24 - 4:27
    and machine. Machine learning is
  • 4:27 - 4:29
    basically apply AI and deep learning
  • 4:29 - 4:30
    is a
  • 4:30 - 4:33
    subspecialization of machine learning
  • 4:33 - 4:35
    using a particular architecture called
  • 4:35 - 4:39
    an artificial neural network.
  • 4:39 - 4:42
    And generative AI, so if you talk
  • 4:42 - 4:45
    about ChatGPT, okay, Google Gemini,
  • 4:45 - 4:48
    Microsoft Copilot, okay, all these
  • 4:48 - 4:50
    examples of generative AI, they are
  • 4:50 - 4:52
    basically large language models, and they
  • 4:52 - 4:54
    are a further subcategory within the
  • 4:54 - 4:55
    area of deep
  • 4:55 - 4:58
    learning. And there are many applications
  • 4:58 - 4:59
    of machine learning in industry right
  • 4:59 - 5:02
    now, so pick which particular industry
  • 5:02 - 5:04
    are you involved in, and these are all the
  • 5:04 - 5:05
    specific areas of
  • 5:05 - 5:10
    applications, right? So probably, I'm
  • 5:10 - 5:12
    going to guess the vast majority of you
  • 5:12 - 5:13
    who are watching this video, you're
  • 5:13 - 5:14
    probably coming from the manufacturing
  • 5:14 - 5:17
    industry, and so in the manufacturing
  • 5:17 - 5:18
    industry some of the standard use cases
  • 5:18 - 5:20
    for machine learning and deep learning
  • 5:20 - 5:23
    are predicting potential problems, okay?
  • 5:23 - 5:25
    So sometimes you call this predictive
  • 5:25 - 5:27
    maintenance where you want to predict
  • 5:27 - 5:29
    when a problem is going to happen and
  • 5:29 - 5:30
    then kind of address it before it
  • 5:30 - 5:33
    happens. And then monitoring systems,
  • 5:33 - 5:35
    automating your manufacturing assembly
  • 5:35 - 5:38
    line or production line, okay, smart
  • 5:38 - 5:40
    scheduling, and detecting anomaly on your
  • 5:40 - 5:41
    production line.
  • 5:42 - 5:44
    Okay, so let's talk about the use
  • 5:44 - 5:46
    case here which is predictive
  • 5:46 - 5:49
    maintenance, right? So what is predictive
  • 5:49 - 5:52
    maintenance? Well predictive maintenance,
  • 5:52 - 5:53
    here's the long definition, is a
  • 5:53 - 5:55
    equipment maintenance strategy that
  • 5:55 - 5:56
    relies on real-time monitoring of
  • 5:56 - 5:58
    equipment conditions and data to predict
  • 5:58 - 6:00
    equipment failures in advance.
  • 6:00 - 6:03
    And this uses advanced data models,
  • 6:03 - 6:05
    analytics, and machine learning whereby
  • 6:05 - 6:07
    we can reliably assess when failures are
  • 6:07 - 6:09
    more likely to occur, including which
  • 6:09 - 6:11
    components are more likely to be
  • 6:11 - 6:14
    affected on your production or assembly
  • 6:14 - 6:17
    line. So where does predictive
  • 6:17 - 6:19
    maintenance fit into the overall scheme
  • 6:19 - 6:21
    of things, right? So let's talk about the
  • 6:21 - 6:23
    kind of standard way that, you know,
  • 6:23 - 6:26
    factories or production
  • 6:26 - 6:28
    lines, assembly lines in factories tend
  • 6:28 - 6:31
    to handle maintenance issues say
  • 6:31 - 6:33
    10 or 20 years ago, right? So what you
  • 6:33 - 6:35
    have is the, what you would probably
  • 6:35 - 6:36
    start off is the most basic mode
  • 6:36 - 6:38
    which is reactive maintenance. So you
  • 6:38 - 6:41
    just wait until your machine breaks down
  • 6:41 - 6:43
    and then you repair, right? The simplest,
  • 6:43 - 6:45
    but, of course, I'm sure if you have worked on a
  • 6:45 - 6:47
    production line for any period of time,
  • 6:47 - 6:49
    you know that this reactive maintenance
  • 6:49 - 6:51
    can give you a whole bunch of headaches
  • 6:51 - 6:52
    especially if the machine breaks down
  • 6:52 - 6:54
    just before a critical delivery deadline,
  • 6:54 - 6:56
    right? Then you're going to have a
  • 6:56 - 6:57
    backlog of orders and you're going to
  • 6:57 - 6:59
    run to a lot of problems. Okay, so we move on
  • 6:59 - 7:01
    to preventive maintenance which is
  • 7:01 - 7:04
    you regularly schedule a maintenance of
  • 7:04 - 7:07
    your production machines to reduce
  • 7:07 - 7:09
    the failure rate. So you might do
  • 7:09 - 7:11
    maintenance once every month, once every
  • 7:11 - 7:13
    two weeks, whatever. Okay, this is great,
  • 7:13 - 7:15
    but the problem, of course, then is well
  • 7:15 - 7:16
    sometimes you're doing too much
  • 7:16 - 7:18
    maintenance, it's not really necessary,
  • 7:18 - 7:21
    and it still doesn't totally prevent
  • 7:21 - 7:23
    this, you know, a failure of the
  • 7:23 - 7:26
    machine that occurs outside of your planned
  • 7:26 - 7:29
    maintenance, right? So a bit of an
  • 7:29 - 7:31
    improvement, but not that much better.
  • 7:31 - 7:33
    And then, these last two categories is
  • 7:33 - 7:35
    where we bring in AI and machine
  • 7:35 - 7:37
    learning. So with machine learning, we're
  • 7:37 - 7:39
    going to use sensors to do real-time
  • 7:39 - 7:42
    monitoring of the data, and then using
  • 7:42 - 7:43
    that data we're going to build a machine
  • 7:43 - 7:46
    learning model which helps us to predict,
  • 7:46 - 7:50
    with a reasonable level of accuracy, when
  • 7:50 - 7:53
    the next failure is going to happen on
  • 7:53 - 7:54
    your assembly or production line on a
  • 7:54 - 7:57
    specific component or specific machine,
  • 7:57 - 8:00
    right? So you just want to be predict to
  • 8:00 - 8:02
    a high level of accuracy like maybe
  • 8:02 - 8:04
    to the specific day, even the specific
  • 8:04 - 8:06
    hour, or even minute itself when you
  • 8:06 - 8:08
    expect that particular product to fail
  • 8:08 - 8:11
    or the particular machine to fail. All
  • 8:11 - 8:13
    right, so these are the advantages of
  • 8:13 - 8:15
    predictive maintenance. It minimizes
  • 8:15 - 8:17
    the occurrence of unscheduled downtime, it
  • 8:17 - 8:18
    gives you a real-time overview of your
  • 8:18 - 8:20
    current condition of assets, ensures
  • 8:20 - 8:23
    minimal disruptions to productivity,
  • 8:23 - 8:25
    optimizes time you spend on maintenance work,
  • 8:25 - 8:27
    optimizes the use of spare parts, and so
  • 8:27 - 8:28
    on. And of course there are some
  • 8:28 - 8:31
    disadvantages, which is the
  • 8:31 - 8:33
    primary one, you need a specialized set
  • 8:33 - 8:36
    of skills among your engineers to
  • 8:36 - 8:38
    understand and create machine learning
  • 8:38 - 8:41
    models that can work on the realtime
  • 8:41 - 8:44
    data that you're getting. Okay, so we're
  • 8:44 - 8:45
    going to take a look at some real life
  • 8:45 - 8:47
    use cases. So these are a bunch of links
  • 8:47 - 8:49
    here, so if you navigate to these links
  • 8:49 - 8:50
    here, you'll be able to get a look at
  • 8:50 - 8:54
    some real life use cases of machine
  • 8:54 - 8:58
    learning in predictive maintenance. So
  • 8:58 - 9:01
    the IBM website, okay, gives you a look at
  • 9:01 - 9:05
    a bunch of five use cases, so you can
  • 9:05 - 9:07
    click on these links and follow up with
  • 9:07 - 9:08
    them if you want to read more. Okay, this
  • 9:08 - 9:11
    is waste management, manufacturing, okay,
  • 9:11 - 9:15
    building services, and renewable energy,
  • 9:15 - 9:17
    and also mining, right? So these are all
  • 9:17 - 9:18
    use cases, if you want to know more about
  • 9:18 - 9:20
    them, you can read up and follow them
  • 9:20 - 9:24
    from this website. And this website
  • 9:24 - 9:26
    gives, this is a pretty good website. I
  • 9:26 - 9:28
    would really encourage you to just look
  • 9:28 - 9:29
    through this if you're interested in
  • 9:29 - 9:31
    predictive maintenance. So here, it tells
  • 9:31 - 9:34
    you about, you know, an industry survey of
  • 9:34 - 9:36
    predictive maintenance. We can see that a
  • 9:36 - 9:38
    large portion of the industry,
  • 9:38 - 9:40
    manufacturing industry agreed that
  • 9:40 - 9:41
    predictive maintenance is a real need to
  • 9:41 - 9:44
    stay competitive and predictive
  • 9:44 - 9:45
    maintenance is essential for
  • 9:45 - 9:47
    manufacturing industry and will gain
  • 9:47 - 9:48
    additional strength in the future. So
  • 9:48 - 9:50
    this is a survey that was done quite
  • 9:50 - 9:52
    some time ago and this was the results
  • 9:52 - 9:54
    that we got back. So we can see the vast
  • 9:54 - 9:56
    majority of key industry players in the
  • 9:56 - 9:58
    manufacturing sector, they consider
  • 9:58 - 9:59
    predictive maintenance to be a very
  • 9:59 - 10:00
    important
  • 10:00 - 10:02
    activity that they want to
  • 10:02 - 10:05
    incorporate into their workflow, right?
  • 10:05 - 10:08
    And we can see here the kind of ROI that
  • 10:08 - 10:11
    we expect on investment in predictive
  • 10:11 - 10:13
    maintenance, so 45% reduction in downtime,
  • 10:13 - 10:17
    25% growth in productivity, 75% fault
  • 10:17 - 10:19
    elimination, 30% reduction in maintenance
  • 10:19 - 10:23
    cost, okay? And best of all, if you really
  • 10:23 - 10:25
    want to kind of take a look at examples,
  • 10:25 - 10:27
    all right, so there are all these
  • 10:27 - 10:28
    different companies that have
  • 10:28 - 10:30
    significantly invested in predictive
  • 10:30 - 10:32
    maintenance technology in their
  • 10:32 - 10:34
    manufacturing processes. So PepsiCo, we
  • 10:34 - 10:39
    have got Frito-Lay, General Motors, Mondi, Ecoplant,
  • 10:39 - 10:41
    all right? So you can jump over here
  • 10:41 - 10:43
    and take a look at some of these
  • 10:43 - 10:46
    use cases. Let me perhaps, let me try and
  • 10:46 - 10:48
    open this up, for example, Mondi, right? You
  • 10:48 - 10:52
    can see Mondi has impl- oops. Mondi has used
  • 10:52 - 10:54
    this particular piece of software
  • 10:54 - 10:56
    called MATLAB, all right, or MathWorks
  • 10:56 - 11:00
    sorry, to do predictive maintenance
  • 11:00 - 11:02
    for their manufacturing processes using
  • 11:02 - 11:05
    machine learning. And we can talk, you can
  • 11:05 - 11:08
    study how they have used it, all right,
  • 11:08 - 11:09
    and how it works, what was their
  • 11:09 - 11:11
    challenge, all right, the problems they
  • 11:11 - 11:13
    were facing, the solution that they use
  • 11:13 - 11:15
    using this MathWorks Consulting piece of
  • 11:15 - 11:17
    software, and data that they collected in
  • 11:17 - 11:20
    a MATLAB database, all right, sorry
  • 11:20 - 11:24
    in a Oracle database.
  • 11:24 - 11:26
    So using MathWorks from MATLAB, all
  • 11:26 - 11:28
    right, they were able to create a deep
  • 11:28 - 11:31
    learning model to, you know, to
  • 11:31 - 11:33
    solve this particular issue for their
  • 11:33 - 11:36
    domain. So if you're interested, please, I
  • 11:36 - 11:38
    strongly encourage you to read up on all
  • 11:38 - 11:40
    these real life customer stories with
  • 11:40 - 11:43
    showcase use cases for predictive
  • 11:43 - 11:48
    maintenance. Okay, so that's it for
  • 11:48 - 11:52
    real life use cases for predictive maintenance.
  • 11:54 - 11:57
    Now in this topic, I'm
  • 11:57 - 11:58
    going to talk about machine learning
  • 11:58 - 12:00
    basics, so what is actually involved
  • 12:00 - 12:01
    in machine learning, and I'm going to
  • 12:01 - 12:04
    give a very quick, fast, conceptual, high
  • 12:04 - 12:06
    level overview of machine learning, all
  • 12:06 - 12:09
    right? So there are several categories of
  • 12:09 - 12:11
    machine learning, supervised, unsupervised,
  • 12:11 - 12:13
    semi-supervised, reinforcement, and deep
  • 12:13 - 12:16
    learning, okay? And let's talk about the
  • 12:16 - 12:19
    most common and widely used category of
  • 12:19 - 12:21
    machine learning which is called
  • 12:21 - 12:25
    supervised learning. So the particular use
  • 12:25 - 12:26
    case here that I'm going to be
  • 12:26 - 12:29
    discussing, predictive maintenance, it's
  • 12:29 - 12:31
    basically a form of supervised learning.
  • 12:31 - 12:33
    So how does supervised learning work?
  • 12:33 - 12:35
    Well in supervised learning, you're going
  • 12:35 - 12:37
    to create a machine learning model by
  • 12:37 - 12:39
    providing what is called a labelled data
  • 12:39 - 12:42
    set as a input to a machine learning
  • 12:42 - 12:45
    program or algorithm. And this dataset
  • 12:45 - 12:46
    is going to contain what is called an
  • 12:46 - 12:49
    independent or feature variables, all
  • 12:49 - 12:51
    right, so this will be a set of variables.
  • 12:51 - 12:53
    And there will be one dependent or
  • 12:53 - 12:55
    target variable which we also call the
  • 12:55 - 12:58
    label, and the idea is that the
  • 12:58 - 13:00
    independent or the feature variables are
  • 13:00 - 13:02
    the attributes or properties of your
  • 13:02 - 13:04
    data set that influence the dependent or
  • 13:04 - 13:08
    the target variable, okay? So this process
  • 13:08 - 13:09
    that I've just described is called
  • 13:09 - 13:12
    training the machine learning model, and
  • 13:12 - 13:14
    the model is fundamentally a
  • 13:14 - 13:16
    mathematical function that best
  • 13:16 - 13:18
    approximates the relationship between
  • 13:18 - 13:21
    the independent variables and the
  • 13:21 - 13:23
    dependent variable. All right, so that's
  • 13:23 - 13:24
    quite a bit of a mouthful, so let's jump
  • 13:24 - 13:26
    into a diagram that maybe illustrates
  • 13:26 - 13:28
    this more clearly. So let's say you have
  • 13:28 - 13:30
    a dataset here, an Excel spreadsheet,
  • 13:30 - 13:32
    right? And this Excel spreadsheet has a
  • 13:32 - 13:34
    bunch of columns here and a bunch of
  • 13:34 - 13:37
    rows, okay? So these rows here represent
  • 13:37 - 13:39
    observations, or these rows are what
  • 13:39 - 13:41
    we call observations or samples or data
  • 13:41 - 13:43
    points in our data set, okay? So let's
  • 13:43 - 13:47
    assume this data set is gathered by a
  • 13:47 - 13:50
    marketing manager at a mall, at a retail
  • 13:50 - 13:52
    mall, all right? So they've got all this
  • 13:52 - 13:55
    information about the customers who
  • 13:55 - 13:57
    purchase products at this mall, all right?
  • 13:57 - 13:59
    So some of the information they've
  • 13:59 - 14:00
    gotten about the customers are their
  • 14:00 - 14:02
    gender, their age, their income, and the
  • 14:02 - 14:04
    number of children. So all this
  • 14:04 - 14:06
    information about the customers, we call
  • 14:06 - 14:07
    this the independent or the feature
  • 14:07 - 14:10
    variables, all right? And based on all
  • 14:10 - 14:13
    this information about the customer, we
  • 14:13 - 14:16
    also managed to get some or we record
  • 14:16 - 14:18
    the information about how much the
  • 14:18 - 14:20
    customer spends, all right? So this
  • 14:20 - 14:22
    information or these numbers here, we call
  • 14:22 - 14:24
    this the target variable or the
  • 14:24 - 14:27
    dependent variable, right? So on the
  • 14:27 - 14:30
    single row, the data point, one single sample, one
  • 14:30 - 14:33
    single data point, contains all the data
  • 14:33 - 14:35
    for the feature variables and one single
  • 14:35 - 14:38
    value for the label or the target
  • 14:38 - 14:41
    variable, okay? And the primary purpose of
  • 14:41 - 14:43
    the machine learning model is to create
  • 14:43 - 14:46
    a mapping from all your feature
  • 14:46 - 14:48
    variables to your target variable, so
  • 14:48 - 14:51
    somehow there's going to be a function,
  • 14:51 - 14:52
    okay, this will be a mathematical
  • 14:52 - 14:55
    function that maps all the values of
  • 14:55 - 14:57
    your feature variable to the value of
  • 14:57 - 15:00
    your target variable. In other words, this
  • 15:00 - 15:01
    function represents the relationship
  • 15:01 - 15:03
    between your feature variables and your
  • 15:03 - 15:07
    target variable, okay? So this whole thing,
  • 15:07 - 15:09
    this training process, we call this the
  • 15:09 - 15:11
    fitting the model. And the target
  • 15:11 - 15:13
    variable or the label, this thing here,
  • 15:13 - 15:15
    this column here, or the values here,
  • 15:15 - 15:17
    these are critical for providing a
  • 15:17 - 15:19
    context to do the fitting or the
  • 15:19 - 15:21
    training of the model. And once you've
  • 15:21 - 15:23
    got a trained and fitted model, you can
  • 15:23 - 15:26
    then use the model to make an accurate
  • 15:26 - 15:28
    prediction of target values
  • 15:28 - 15:30
    corresponding to new feature values that
  • 15:30 - 15:33
    the model has yet to encounter or yet to
  • 15:33 - 15:35
    see, and this, as I've already said
  • 15:35 - 15:36
    earlier, this is called predictive
  • 15:36 - 15:38
    analytics, okay? So let's see what's
  • 15:38 - 15:40
    actually happening here, you take your
  • 15:40 - 15:43
    training data, all right, so this is this
  • 15:43 - 15:45
    whole bunch of data, this data set here
  • 15:45 - 15:47
    consisting of a thousand rows of
  • 15:47 - 15:50
    data, 10,000 rows of data, you take this
  • 15:50 - 15:52
    entire data set, all right, this entire
  • 15:52 - 15:54
    data set, you jam it into your machine
  • 15:54 - 15:57
    learning algorithm, and a couple of hours
  • 15:57 - 15:58
    later your machine learning algorithm
  • 15:58 - 16:01
    comes up with a model. And the model is
  • 16:01 - 16:04
    essentially a function that maps all
  • 16:04 - 16:06
    your feature variables which is these
  • 16:06 - 16:08
    four columns here, to your target
  • 16:08 - 16:10
    variable which is this one single column
  • 16:10 - 16:14
    here, okay? So once you have the model, you
  • 16:14 - 16:17
    can put in a new data point. So basically
  • 16:17 - 16:19
    the new data point represents data about a
  • 16:19 - 16:21
    new customer, a new customer that you
  • 16:21 - 16:23
    have never seen before. So let's say
  • 16:23 - 16:25
    you've already got information about
  • 16:25 - 16:28
    10,000 customers that have visited this
  • 16:28 - 16:30
    mall and how much each of these 10,000
  • 16:30 - 16:32
    customers have spent when they are at this
  • 16:32 - 16:34
    mall. So now you have a totally new
  • 16:34 - 16:36
    customer that comes in the mall, this
  • 16:36 - 16:38
    customer has never come into this mall
  • 16:38 - 16:40
    before, and what we know about this
  • 16:40 - 16:43
    customer is that he is a male, the age is
  • 16:43 - 16:45
    50, the income is 18, and they have nine
  • 16:45 - 16:48
    children. So now when you take this data
  • 16:48 - 16:51
    and you pump that into your model, your
  • 16:51 - 16:53
    model is going to make a prediction, it's
  • 16:53 - 16:56
    going to say, hey, you know what? Based on
  • 16:56 - 16:57
    everything that I have been trained before
  • 16:57 - 16:59
    and based on the model I've developed,
  • 16:59 - 17:02
    I am going to predict that a customer
  • 17:02 - 17:05
    that is of a male gender, of the age 50
  • 17:05 - 17:08
    with the income of 18, and nine children,
  • 17:08 - 17:12
    that customer is going to spend 25 ringgit
  • 17:12 - 17:16
    at the mall. And this is it, this is what
  • 17:16 - 17:19
    you want. Right there, right here,
  • 17:19 - 17:21
    can you see here? That is the final
  • 17:21 - 17:23
    output of your machine learning model.
  • 17:23 - 17:27
    It's going to make a prediction about
  • 17:27 - 17:30
    something that it has not ever seen
  • 17:30 - 17:33
    before, okay? That is the core, this is
  • 17:33 - 17:36
    essentially the core of machine learning.
  • 17:36 - 17:39
    Predictive analytics, making prediction
  • 17:39 - 17:40
    about the future
  • 17:41 - 17:44
    based on a historical data set.
  • 17:44 - 17:47
    Okay, so there are two areas of
  • 17:47 - 17:49
    supervised learning, regression and
  • 17:49 - 17:51
    classification. So regression is used to
  • 17:51 - 17:53
    predict a numerical target variable, such
  • 17:53 - 17:55
    as the price of a house or the salary of
  • 17:55 - 17:58
    an employee, whereas classification is
  • 17:58 - 18:00
    used to predict a categorical target
  • 18:00 - 18:04
    variable or class label, okay? So for
  • 18:04 - 18:06
    classification you can have either
  • 18:06 - 18:09
    binary or multiclass, so, for example,
  • 18:09 - 18:12
    binary will be just true or false, zero
  • 18:12 - 18:15
    or one. So whether your machine is going
  • 18:15 - 18:17
    to fail or is it not going to fail, right?
  • 18:17 - 18:19
    So just two classes, two possible,
  • 18:19 - 18:22
    outcomes, or is the customer going to
  • 18:22 - 18:24
    make a purchase or is the customer not
  • 18:24 - 18:26
    going to make a purchase. We call this
  • 18:26 - 18:28
    binary classification. And then for
  • 18:28 - 18:30
    multiclass, when there are more than two
  • 18:30 - 18:33
    classes or types of values. So, for
  • 18:33 - 18:34
    example, here this would be a
  • 18:34 - 18:36
    classification problem. So if you have a
  • 18:36 - 18:38
    data set here, you've got information
  • 18:38 - 18:39
    about your customers, you've got your
  • 18:39 - 18:41
    gender of the customer, the age of the
  • 18:41 - 18:43
    customer, the salary of the customer, and
  • 18:43 - 18:45
    you also have record about whether the
  • 18:45 - 18:48
    customer made a purchase or not, okay? So
  • 18:48 - 18:50
    you can take this data set to train a
  • 18:50 - 18:52
    classification model, and then the
  • 18:52 - 18:54
    classification model can then make a
  • 18:54 - 18:56
    prediction about a new customer, and
  • 18:56 - 18:59
    they're going to predict zero which
  • 18:59 - 19:00
    means the customer didn't make a
  • 19:00 - 19:03
    purchase or one which means the customer
  • 19:03 - 19:06
    make a purchase, right? And regression,
  • 19:06 - 19:09
    this is regression, so let's say you want
  • 19:09 - 19:11
    to predict the wind speed, and you've got
  • 19:11 - 19:14
    historical data about all these four
  • 19:14 - 19:17
    other independent variables or feature
  • 19:17 - 19:18
    variables, so you have recorded
  • 19:18 - 19:20
    temperature, the pressure, the relative
  • 19:20 - 19:22
    humidity, and the wind direction for the
  • 19:22 - 19:25
    past 10 days, 15 days, or whatever, okay? So
  • 19:25 - 19:27
    now you are going to train your machine
  • 19:27 - 19:29
    learning model using this data set, and
  • 19:29 - 19:32
    the target variable column, okay, this
  • 19:32 - 19:34
    column here, the label is basically a
  • 19:34 - 19:37
    number, right? So now with this number,
  • 19:37 - 19:40
    this is a regression model, and so now
  • 19:40 - 19:42
    you can put in a new data point, so a new
  • 19:42 - 19:45
    data point means a new set of values for
  • 19:45 - 19:47
    temperature, pressure, relative humidity,
  • 19:47 - 19:49
    and wind direction, and your machine
  • 19:49 - 19:51
    learning model will then predict the
  • 19:51 - 19:54
    wind speed for that new data point, okay?
  • 19:54 - 19:57
    So that's a regression model.
  • 19:59 - 20:02
    All right. So in this particular topic
  • 20:02 - 20:05
    I'm going to talk about the workflow of
  • 20:05 - 20:08
    that's involved in machine learning. So
  • 20:08 - 20:13
    in the previous slides, I talked about
  • 20:13 - 20:15
    developing the model, all right? But
  • 20:15 - 20:16
    that's just one part of the entire
  • 20:16 - 20:19
    workflow. So in real life when you use
  • 20:19 - 20:20
    machine learning, there's an end-to-end
  • 20:20 - 20:22
    workflow that's involved. So the first
  • 20:22 - 20:24
    thing, of course, is you need to get your
  • 20:24 - 20:27
    data, and then you need to clean your
  • 20:27 - 20:29
    data, and then you need to explore your
  • 20:29 - 20:31
    data. You need to see what's going on in
  • 20:31 - 20:33
    your data set, right? And your data set,
  • 20:33 - 20:36
    real life data sets are not trivial, they
  • 20:36 - 20:39
    are hundreds of rows, thousands of rows,
  • 20:39 - 20:41
    sometimes millions of rows, billions of
  • 20:41 - 20:43
    rows, we're talking about billions or
  • 20:43 - 20:45
    millions of data points especially if
  • 20:45 - 20:47
    you're using an IoT sensor to get data
  • 20:47 - 20:49
    in real time. So you've got all these
  • 20:49 - 20:51
    super large data sets, you need to clean
  • 20:51 - 20:53
    them, and explore them, and then you need
  • 20:53 - 20:56
    to prepare them into a right format so
  • 20:56 - 21:00
    that you can put them into the training
  • 21:00 - 21:02
    process to create your machine learning
  • 21:02 - 21:05
    model, and then subsequently you check
  • 21:05 - 21:08
    how good is the model, right? How accurate
  • 21:08 - 21:10
    is the model in terms of its ability to
  • 21:10 - 21:13
    generate predictions for the
  • 21:13 - 21:15
    future, right? How accurate are the
  • 21:15 - 21:17
    predictions that are coming up from your
  • 21:17 - 21:18
    machine learning model. So that's
  • 21:18 - 21:21
    validating or evaluating your model, and
  • 21:21 - 21:23
    then subsequently if you determine that
  • 21:23 - 21:25
    your model is of adequate accuracy to
  • 21:25 - 21:27
    meet whatever your domain use case
  • 21:27 - 21:29
    requirements are, right? So let's say the
  • 21:29 - 21:31
    accuracy that's required for your domain
  • 21:31 - 21:32
    use case is
  • 21:32 - 21:35
    85%, okay? If my machine learning model
  • 21:35 - 21:39
    can give an 85% accuracy rate, I think
  • 21:39 - 21:40
    it's good enough, then I'm going to
  • 21:40 - 21:43
    deploy it into real world use case. So
  • 21:43 - 21:45
    here the machine learning model gets
  • 21:45 - 21:48
    deployed on the server, and then other,
  • 21:48 - 21:51
    you know, other data sources are going to
  • 21:51 - 21:53
    be captured from somewhere. That data is
  • 21:53 - 21:54
    pump into the machine learning model. The
  • 21:54 - 21:55
    machine learning model generates
  • 21:55 - 21:58
    predictions, and those predictions are
  • 21:58 - 22:00
    then used to make decisions on the
  • 22:00 - 22:02
    factory floor in real time or in any
  • 22:02 - 22:05
    other particular scenario. And then you
  • 22:05 - 22:07
    constantly monitor and update the model,
  • 22:07 - 22:09
    you get more new data, and then the
  • 22:09 - 22:12
    entire cycle repeats itself. So that's
  • 22:12 - 22:14
    your machine learning workflow, okay, in a
  • 22:14 - 22:17
    nutshell. Here's another example of
  • 22:17 - 22:19
    the same thing maybe in a slightly
  • 22:19 - 22:20
    different format, so, again, you have your
  • 22:20 - 22:22
    data collection and preparation. Here we
  • 22:22 - 22:24
    talk more about the different kinds of
  • 22:24 - 22:27
    algorithms that available to create a
  • 22:27 - 22:28
    model, and I'll talk about this more in
  • 22:28 - 22:30
    detail when we look at the real world
  • 22:30 - 22:32
    example of a end-to-end machine learning
  • 22:32 - 22:35
    workflow for the predictive maintenance
  • 22:35 - 22:37
    use case. So once you have chosen the
  • 22:37 - 22:39
    appropriate algorithm, you then have
  • 22:39 - 22:41
    trained your model, you then have
  • 22:41 - 22:44
    selected the appropriate train model
  • 22:44 - 22:46
    among the multiple models. You are
  • 22:46 - 22:48
    probably going to develop multiple
  • 22:48 - 22:50
    models from multiple algorithms, you're
  • 22:50 - 22:52
    going to evaluate them all, and then
  • 22:52 - 22:53
    you're going to say, hey, you know what?
  • 22:53 - 22:55
    After I've evaluated and tested that,
  • 22:55 - 22:57
    I've chosen the best model, I'm going to
  • 22:57 - 23:00
    deploy the model, all right, so this is
  • 23:00 - 23:03
    for real life production use, okay? Real
  • 23:03 - 23:04
    life sensor data is going to be pumped
  • 23:04 - 23:06
    into my model, my model is going to
  • 23:06 - 23:08
    generate predictions, the predicted data
  • 23:08 - 23:10
    is going to used immediately in real
  • 23:10 - 23:13
    time for real life decision making, and
  • 23:13 - 23:15
    then I'm going to monitor, right, the
  • 23:15 - 23:17
    results. So somebody's using the
  • 23:17 - 23:19
    predictions from my model, if the
  • 23:19 - 23:22
    predictions are lousy, that goes into the
  • 23:22 - 23:23
    monitoring, the monitoring system
  • 23:23 - 23:25
    captures that. If the predictions are
  • 23:25 - 23:28
    fantastic, well that is also captured by the
  • 23:28 - 23:30
    monitoring system, and that gets
  • 23:30 - 23:32
    feedback again to the next cycle of my
  • 23:32 - 23:34
    machine learning
  • 23:34 - 23:36
    pipeline. Okay, so that's the kind of
  • 23:36 - 23:38
    overall view, and here are the kind of
  • 23:38 - 23:42
    key phases of your workflow. So one of
  • 23:42 - 23:44
    the important phases is called EDA,
  • 23:44 - 23:48
    exploratory data analysis and in this
  • 23:48 - 23:50
    particular phase, you're going to
  • 23:50 - 23:53
    do a lot of stuff, primarily just to
  • 23:53 - 23:55
    understand your data set. So like I said,
  • 23:55 - 23:57
    real life data sets, they tend to be very
  • 23:57 - 23:59
    complex, and they tend to have various
  • 23:59 - 24:01
    statistical properties, all right,
  • 24:01 - 24:03
    statistics is a very important component
  • 24:03 - 24:06
    of machine learning. So an EDA helps you
  • 24:06 - 24:07
    to kind of get an overview of your data
  • 24:07 - 24:10
    set, get an overview of any problems in
  • 24:10 - 24:12
    your data set like any data that's
  • 24:12 - 24:13
    missing, the statistical properties of your
  • 24:13 - 24:15
    data set, the distribution of your data
  • 24:15 - 24:17
    set, the statistical correlation of
  • 24:17 - 24:19
    variables in your data set, etc,
  • 24:19 - 24:23
    etc. Okay, then we have data cleaning or
  • 24:23 - 24:25
    sometimes you call it data cleansing, and
  • 24:25 - 24:28
    in this phase what you want to do is
  • 24:28 - 24:29
    primarily, you want to kind of do things
  • 24:29 - 24:32
    like remove duplicate records or rows in
  • 24:32 - 24:34
    your table, you want to make sure that
  • 24:34 - 24:37
    your data or your data
  • 24:37 - 24:39
    points or your samples have appropriate IDs,
  • 24:39 - 24:41
    and most importantly, you want to make
  • 24:41 - 24:43
    sure there's not too many missing values
  • 24:43 - 24:45
    in your data set. So what I mean by
  • 24:45 - 24:46
    missing values are things like that,
  • 24:46 - 24:48
    right? You have got a data set, and for
  • 24:48 - 24:52
    some reason there are some cells or
  • 24:52 - 24:55
    locations in your data set which are
  • 24:55 - 24:57
    missing values, right? And if you have a
  • 24:57 - 24:59
    lot of these missing values, then you've
  • 24:59 - 25:00
    got a poor quality data set, and you're
  • 25:00 - 25:02
    not going to be able to build a good
  • 25:02 - 25:04
    model from this data set. You're not
  • 25:04 - 25:06
    going to be able to train a good machine
  • 25:06 - 25:08
    learning model from a data set with a
  • 25:08 - 25:10
    lot of missing values like this. So you
  • 25:10 - 25:12
    have to figure out whether there are a
  • 25:12 - 25:13
    lot of missing values in your data set,
  • 25:13 - 25:15
    how do you handle them. Another thing
  • 25:15 - 25:17
    that's important in data cleansing is
  • 25:17 - 25:19
    figuring out the outliers in your data
  • 25:19 - 25:22
    set. So outliers are things like this
  • 25:22 - 25:24
    you know data points are very far from
  • 25:24 - 25:26
    the general trend of data points in your
  • 25:26 - 25:30
    data set right and and so there are also
  • 25:30 - 25:32
    several ways to detect outliers in your
  • 25:32 - 25:34
    data set and there are several ways to
  • 25:34 - 25:37
    handle outliers in your data set
  • 25:37 - 25:38
    similarly as well there are several ways
  • 25:38 - 25:40
    to handle missing values in your data
  • 25:40 - 25:43
    set so handling missing values handling
  • 25:43 - 25:46
    outliers those are really two very key
  • 25:46 - 25:47
    importance of data
  • 25:47 - 25:49
    cleansing and there are many many
  • 25:49 - 25:51
    techniques to handle this so a data
  • 25:51 - 25:52
    scientist needs to be acquainted with
  • 25:52 - 25:55
    all of this all right why do I need to
  • 25:55 - 25:58
    do data cleansing well here is the key
  • 25:58 - 25:59
    point
  • 25:59 - 26:03
    if you have a very poor quality data set
  • 26:03 - 26:05
    which means youve got a lot of outliers
  • 26:05 - 26:07
    which are errors in your data set or you
  • 26:07 - 26:08
    got a lot of missing values in your data
  • 26:08 - 26:11
    set even though youve got a fantastic
  • 26:11 - 26:13
    algorithm you've got a fantastic model
  • 26:13 - 26:16
    the predictions that your model is going
  • 26:16 - 26:19
    to give is absolutely rubbish it's kind
  • 26:19 - 26:22
    of like taking water and putting water
  • 26:22 - 26:26
    into the tank of a mercedesbenz so
  • 26:26 - 26:28
    Mercedes-Benz is a great car but if you
  • 26:28 - 26:30
    take water and put it into your
  • 26:30 - 26:33
    mercedes-ben it will just die right your
  • 26:33 - 26:37
    car will just die can't run on on water
  • 26:37 - 26:38
    right on the other hand if you have a
  • 26:38 - 26:42
    myv myv is just a lousy car but if
  • 26:42 - 26:45
    you take a high octane good Patrol and
  • 26:45 - 26:47
    you point to a MV the MV will just go at
  • 26:47 - 26:49
    you know 100 Mil hour it which just
  • 26:49 - 26:51
    completely destroy the Mercedes-Benz in
  • 26:51 - 26:53
    terms of performance so it doesn't it
  • 26:53 - 26:55
    doesn't really matter what model you're
  • 26:55 - 26:57
    using right so you can be using the most
  • 26:57 - 26:59
    Fantastic Model like the the
  • 26:59 - 27:01
    mercedesbenz or machine learning but if
  • 27:01 - 27:03
    your data is lousy quality your
  • 27:03 - 27:06
    predictions is also going to be rubbish
  • 27:06 - 27:10
    okay so cleansing data set is in fact
  • 27:10 - 27:12
    probably the most important thing that
  • 27:12 - 27:14
    data scientists need to do and that's
  • 27:14 - 27:16
    what they spend most of the time doing
  • 27:16 - 27:18
    right building the model trading the
  • 27:18 - 27:20
    model getting the right algorithms and
  • 27:20 - 27:23
    so on that's really a small portion of
  • 27:23 - 27:25
    the actual machine learning workflow
  • 27:25 - 27:27
    right the actual uh machine learning
  • 27:27 - 27:30
    workflow the vast majority of time is on
  • 27:30 - 27:32
    cleaning and organizing your
  • 27:32 - 27:33
    data then you have something called
  • 27:33 - 27:35
    feature engineering which is you
  • 27:35 - 27:37
    pre-process the feature variables of
  • 27:37 - 27:39
    your original data set prior to using
  • 27:39 - 27:41
    them to train the model and this is
  • 27:41 - 27:42
    either through addition deletion
  • 27:42 - 27:44
    combination or transformation of these
  • 27:44 - 27:45
    variables and then the idea is you want
  • 27:45 - 27:47
    to improve the predictive accuracy of
  • 27:47 - 27:49
    the model and also because some models
  • 27:49 - 27:51
    can only work with numeric data so you
  • 27:51 - 27:54
    need to transform categorical data into
  • 27:54 - 27:57
    numeric data all right so just now um in
  • 27:57 - 27:59
    the earlier slides I showed you that you
  • 27:59 - 28:01
    take your original data set you pum it
  • 28:01 - 28:03
    into algorithm and then couple of hours
  • 28:03 - 28:05
    later you get a machine learning model
  • 28:05 - 28:09
    right so you didn't do anything to your
  • 28:09 - 28:10
    data set to the feature variables in
  • 28:10 - 28:12
    your data set before you pump it into a
  • 28:12 - 28:14
    machine machine learning algorithm so
  • 28:14 - 28:16
    what I showed you earlier is you just
  • 28:16 - 28:19
    take the data set exactly as it is and
  • 28:19 - 28:21
    you just pump it into the algorithm
  • 28:21 - 28:23
    couple of hours later you get the model
  • 28:23 - 28:28
    right uh but that's not what generally
  • 28:28 - 28:30
    happens in in real life in real life
  • 28:30 - 28:32
    you're going to take all the original
  • 28:32 - 28:34
    feature variables from your data set and
  • 28:34 - 28:37
    you're going to transform them in some
  • 28:37 - 28:39
    way so you can see here these are the
  • 28:39 - 28:42
    colums of data from my original data set
  • 28:42 - 28:46
    and before I actually put all these data
  • 28:46 - 28:48
    points from my original data set into my
  • 28:48 - 28:51
    algorithm to train and get my model I
  • 28:51 - 28:55
    will actually transform them okay so the
  • 28:55 - 28:58
    transformation of these feature variable
  • 28:58 - 29:01
    values we call this feature engineering
  • 29:01 - 29:02
    and there are many many techniques to do
  • 29:02 - 29:05
    feature engineering so one hot encoding
  • 29:05 - 29:08
    scaling log transformation descri
  • 29:08 - 29:10
    discretization date extraction Boolean
  • 29:10 - 29:12
    logic etc
  • 29:12 - 29:15
    etc okay then finally we do something
  • 29:15 - 29:17
    called a train test plate so where we
  • 29:17 - 29:19
    take our original data set right so this
  • 29:19 - 29:21
    was the original data set and we break
  • 29:21 - 29:24
    it into two parts so one is called the
  • 29:24 - 29:26
    training data set and the other is
  • 29:26 - 29:28
    called the test data set and the primary
  • 29:28 - 29:30
    purpose for this is when we feed and
  • 29:30 - 29:31
    train the machine learning model we're
  • 29:31 - 29:33
    going to use what is called the training
  • 29:33 - 29:36
    data set and we when we want to evaluate
  • 29:36 - 29:37
    the accuracy of the model right so this
  • 29:37 - 29:41
    is the key part of your machine learning
  • 29:41 - 29:44
    life cycle because you are not only just
  • 29:44 - 29:45
    going to have one possible models
  • 29:45 - 29:48
    because there are a vast range of
  • 29:48 - 29:50
    algorithms that you can use to create a
  • 29:50 - 29:53
    model so fundamentally you have a wide
  • 29:53 - 29:56
    range of choices right like wide range
  • 29:56 - 29:58
    of cars right you want to buy a car you
  • 29:58 - 30:01
    can buy buy a myv you can buy a paroda
  • 30:01 - 30:03
    you can buy a Honda you can buy a
  • 30:03 - 30:05
    mercedesbenz you can buy a Audi you can
  • 30:05 - 30:08
    buy a beamer many many different cars
  • 30:08 - 30:09
    you that available for you if you want
  • 30:09 - 30:12
    to buy a car right same thing with a
  • 30:12 - 30:14
    machine learning model that are aast
  • 30:14 - 30:17
    variety of algorithms that you can
  • 30:17 - 30:19
    choose from in order to create a model
  • 30:19 - 30:22
    and so once you create a model from a
  • 30:22 - 30:24
    given algorithm you need to say hey how
  • 30:24 - 30:26
    accurate is this model that have created
  • 30:26 - 30:29
    from this algorithm and and different
  • 30:29 - 30:30
    algorithms are going to create different
  • 30:30 - 30:34
    models with different rates of accuracy
  • 30:34 - 30:36
    and so the primary purpose of the test
  • 30:36 - 30:38
    data set is to evaluate the ACC accuracy
  • 30:38 - 30:41
    of the model to see hey is this model
  • 30:41 - 30:43
    that I've created using this algorithm
  • 30:43 - 30:46
    is it adequate for me to use in a real
  • 30:46 - 30:49
    life production use case Okay so that's
  • 30:49 - 30:52
    what it's all about okay so this is my
  • 30:52 - 30:54
    original data set I break it into my
  • 30:54 - 30:57
    feature data uh feature data set and
  • 30:57 - 30:59
    also my target variable colum so my
  • 30:59 - 31:01
    feature variable uh colums the target
  • 31:01 - 31:02
    variable colums and then I further break
  • 31:02 - 31:04
    it into a training data set and a test
  • 31:04 - 31:07
    data set the training data set is to use
  • 31:07 - 31:08
    the train to create the machine learning
  • 31:08 - 31:10
    model and then once the machine learning
  • 31:10 - 31:12
    model is created I then use the test
  • 31:12 - 31:15
    data set to evaluate the accuracy of the
  • 31:15 - 31:16
    machine learning
  • 31:16 - 31:21
    model all right and then finally we can
  • 31:21 - 31:23
    see what are the different parts or
  • 31:23 - 31:26
    aspects that go into a successful model
  • 31:26 - 31:30
    so Eda about 10% data cleansing about
  • 31:30 - 31:32
    20% feature engineering about
  • 31:32 - 31:36
    25% selecting a specific algorithm about
  • 31:36 - 31:39
    10% and then training the model from
  • 31:39 - 31:42
    that algorithm about 15% and then
  • 31:42 - 31:44
    finally evaluating the model deciding
  • 31:44 - 31:46
    which is the best model with the highest
  • 31:46 - 31:51
    accuracy rate that's about
  • 31:54 - 31:57
    20% all right so we have reached the
  • 31:57 - 31:59
    most interesting part of this
  • 31:59 - 32:01
    presentation which is the demonstration
  • 32:01 - 32:04
    of an endtoend machine learning workflow
  • 32:04 - 32:06
    on a real life data set that
  • 32:06 - 32:10
    demonstrates the use case of predictive
  • 32:10 - 32:14
    maintenance so the for the data set for
  • 32:14 - 32:16
    this particular use case I've used a
  • 32:16 - 32:19
    data set from kegle so for those of you
  • 32:19 - 32:21
    are not aware of this kegle is the
  • 32:21 - 32:25
    world's largest open-source Community
  • 32:25 - 32:28
    for data science and Ai and they have a
  • 32:28 - 32:31
    large collection of data sets from all
  • 32:31 - 32:34
    various uh areas of industry and human
  • 32:34 - 32:37
    endeavor and they also have a large
  • 32:37 - 32:39
    collection of models that have been
  • 32:39 - 32:43
    developed using these data sets so here
  • 32:43 - 32:47
    we have a data set for the particular
  • 32:47 - 32:51
    use case predictive maintenance okay so
  • 32:51 - 32:53
    this is some information about the data
  • 32:53 - 32:56
    set uh so in case um you do not know how
  • 32:56 - 32:59
    to get to there this is the URL to click
  • 32:59 - 33:02
    on okay to get to that data set so once
  • 33:02 - 33:05
    you at the data set here you can or the
  • 33:05 - 33:07
    page for about this data set you can see
  • 33:07 - 33:10
    all the information about this data set
  • 33:10 - 33:13
    and you can download the data set in a
  • 33:13 - 33:14
    CSV
  • 33:14 - 33:16
    format okay so let's take a look at the
  • 33:16 - 33:20
    data set so this data set has a total of
  • 33:20 - 33:23
    10,000 samples okay and these are the
  • 33:23 - 33:26
    feature variables the type the product
  • 33:26 - 33:28
    ID the add temperature process
  • 33:28 - 33:31
    temperature rotational speed talk tool
  • 33:31 - 33:35
    Weare and this is the target variable
  • 33:35 - 33:37
    all right so the target variable is what
  • 33:37 - 33:38
    we are interested in what we are
  • 33:38 - 33:41
    interested in using to train the machine
  • 33:41 - 33:43
    learning model and also what we
  • 33:43 - 33:45
    interested to predict okay so these are
  • 33:45 - 33:48
    the feature variables they describe or
  • 33:48 - 33:50
    they provide information about this
  • 33:50 - 33:53
    particular machine on the production
  • 33:53 - 33:55
    line on the assembly line so you might
  • 33:55 - 33:57
    know the product ID the type the air
  • 33:57 - 33:58
    temperature process temperature
  • 33:58 - 34:00
    rotational speed talk to where right so
  • 34:00 - 34:03
    let's say you've got a iot sensor system
  • 34:03 - 34:06
    that's basically capturing all this data
  • 34:06 - 34:08
    about a product or a machine on your
  • 34:08 - 34:11
    production or assembly line okay and
  • 34:11 - 34:14
    you've also captured information about
  • 34:14 - 34:17
    whether is for a specific uh sample
  • 34:17 - 34:20
    whether that sample uh experien a
  • 34:20 - 34:23
    failure or not okay so the target value
  • 34:23 - 34:26
    of zero okay indicates that there's no
  • 34:26 - 34:28
    failure so zero means no failure and we
  • 34:28 - 34:30
    can see that the vast majority of data
  • 34:30 - 34:33
    points in this data set are no failure
  • 34:33 - 34:34
    and here we can see an example here
  • 34:34 - 34:37
    where you have a case of a failure so a
  • 34:37 - 34:40
    failure is marked as a one positive and
  • 34:40 - 34:43
    no failure is marked as zero negative
  • 34:43 - 34:45
    all right so here we have one type of a
  • 34:45 - 34:47
    failure it's called a power failure and
  • 34:47 - 34:49
    if you scroll down the data set you see
  • 34:49 - 34:50
    there are also other kinds of failures
  • 34:50 - 34:53
    like a towar
  • 34:53 - 34:57
    failure uh we have a over strain failure
  • 34:57 - 34:59
    here for example
  • 34:59 - 35:01
    uh we also have a power failure again
  • 35:01 - 35:02
    and so on so if you scroll down through
  • 35:02 - 35:04
    these 10,000 data points and or if
  • 35:04 - 35:06
    you're familiar with using Excel to
  • 35:06 - 35:09
    filter out values in a colume you can
  • 35:09 - 35:12
    see that in this particular colume here
  • 35:12 - 35:14
    which is the so-called Target variable
  • 35:14 - 35:17
    colume you are going to have the vast
  • 35:17 - 35:19
    majority of values as zero which means
  • 35:19 - 35:23
    no failure and some of the rows or the
  • 35:23 - 35:24
    data points you are going to have a
  • 35:24 - 35:26
    value of one and for those rows that you
  • 35:26 - 35:28
    have a value of one for example example
  • 35:28 - 35:31
    here you are sorry for example here you
  • 35:31 - 35:33
    are going to have different types of
  • 35:33 - 35:35
    failure so like I said just now power
  • 35:35 - 35:39
    failure tool set filia etc etc so we are
  • 35:39 - 35:41
    going to go through the entire machine
  • 35:41 - 35:44
    learning workflow process with this data
  • 35:44 - 35:47
    set so to see an example of that we are
  • 35:47 - 35:50
    going to use a we're going to go to the
  • 35:50 - 35:52
    code section here all right so if I
  • 35:52 - 35:54
    click on the code section here and right
  • 35:54 - 35:56
    down here we have see what is called a
  • 35:56 - 35:59
    data set notebook so this is basically a
  • 35:59 - 36:02
    Jupiter notebook Jupiter is basically an
  • 36:02 - 36:05
    python application which allows you to
  • 36:05 - 36:09
    create a python machine learning
  • 36:09 - 36:12
    program that basically builds your
  • 36:12 - 36:15
    machine learning model assesses or
  • 36:15 - 36:16
    evaluates his accuracy and generates
  • 36:16 - 36:19
    predictions from it okay so here we have
  • 36:19 - 36:22
    a whole bunch of Jupiter notebooks that
  • 36:22 - 36:25
    are available and you can select any one
  • 36:25 - 36:26
    of them all these notebooks are
  • 36:26 - 36:29
    essentially going to process the data
  • 36:29 - 36:32
    from this particular data set so if I go
  • 36:32 - 36:35
    to this code page here I've actually
  • 36:35 - 36:37
    selected a specific notebook that I'm
  • 36:37 - 36:40
    going to run through to demonstrate an
  • 36:40 - 36:43
    endtoend machine learning workflow using
  • 36:43 - 36:46
    various machine learning libraries from
  • 36:46 - 36:50
    the Python programming language okay so
  • 36:50 - 36:52
    the uh particular notebook I'm going to
  • 36:52 - 36:55
    use is this particular notebook here and
  • 36:55 - 36:57
    you can also get the URL for that
  • 36:57 - 37:00
    particular The Notebook from
  • 37:00 - 37:04
    here okay so let's quickly do a quick
  • 37:04 - 37:06
    revision again what are we trying to do
  • 37:06 - 37:08
    here we're trying to build a machine
  • 37:08 - 37:11
    learning classification model right so
  • 37:11 - 37:13
    we said there are two primary areas of
  • 37:13 - 37:15
    supervised learning one is regression
  • 37:15 - 37:16
    which is used to predict a numerical
  • 37:16 - 37:19
    Target variable and the second kind of
  • 37:19 - 37:21
    supervised learning is classification
  • 37:21 - 37:23
    which is what we're doing here we're
  • 37:23 - 37:26
    trying to predict a categorical Target
  • 37:26 - 37:30
    variable okay so in this particular
  • 37:30 - 37:32
    example we actually have two kinds of
  • 37:32 - 37:34
    ways we can classify either a binary
  • 37:34 - 37:38
    classification or a multiclass
  • 37:38 - 37:40
    classification so for binary
  • 37:40 - 37:41
    classification we are only going to
  • 37:41 - 37:43
    classify the product or machine as
  • 37:43 - 37:47
    either it failed or it did not fail okay
  • 37:47 - 37:49
    so if we go back to the data set that I
  • 37:49 - 37:51
    showed you just now if you look at this
  • 37:51 - 37:53
    target variable colume there are only
  • 37:53 - 37:55
    two possible values here they either
  • 37:55 - 37:58
    zero or one zero means there's no fi
  • 37:58 - 38:01
    one means that's a failure okay so this
  • 38:01 - 38:03
    is an example of a binary classification
  • 38:03 - 38:07
    only two possible outcomes zero or one
  • 38:07 - 38:10
    didn't fail or fail all right two
  • 38:10 - 38:13
    possible outcomes and then we can also
  • 38:13 - 38:15
    for the same data set we can extend it
  • 38:15 - 38:18
    and make it a multiclass classification
  • 38:18 - 38:21
    problem all right so if we kind of want
  • 38:21 - 38:24
    to drill down further we can say that
  • 38:24 - 38:27
    not only is there a failure we can
  • 38:27 - 38:29
    actually say that are different types of
  • 38:29 - 38:32
    failures okay so we have one category of
  • 38:32 - 38:36
    class that is basically no failure okay
  • 38:36 - 38:37
    then we have a category for the
  • 38:37 - 38:40
    different types of failures right so you
  • 38:40 - 38:44
    can have a power failure you could have
  • 38:44 - 38:46
    a tool Weare
  • 38:46 - 38:49
    failure uh you could have let's go down
  • 38:49 - 38:51
    here you could have a over strain
  • 38:51 - 38:54
    failure and etc etc so you can have
  • 38:54 - 38:57
    multiple classes of failure in addition
  • 38:57 - 39:01
    to the general overall or the majority
  • 39:01 - 39:04
    class of no failure and that would be a
  • 39:04 - 39:07
    multiclass classification problem so
  • 39:07 - 39:08
    with this data set we are going to see
  • 39:08 - 39:11
    how to make it a binary classification
  • 39:11 - 39:13
    problem and also a multiclass
  • 39:13 - 39:15
    classification problem okay so let's
  • 39:15 - 39:17
    look at the workflow so let's say we've
  • 39:17 - 39:19
    already got the data so right now we do
  • 39:19 - 39:21
    have the data set this is the data set
  • 39:21 - 39:23
    that we have so let's assume we've
  • 39:23 - 39:25
    somehow managed to get this data set
  • 39:25 - 39:27
    from some iot sensors that are
  • 39:27 - 39:29
    monitoring realtime data in our
  • 39:29 - 39:31
    production environment on the assembly
  • 39:31 - 39:33
    line on the production line we've got
  • 39:33 - 39:35
    sensors reading data that gives us all
  • 39:35 - 39:38
    these data that we have in this CSV file
  • 39:38 - 39:40
    Okay so we've already got the data we've
  • 39:40 - 39:42
    retrieved the data now we're going to go
  • 39:42 - 39:45
    on to the cleaning and exploration part
  • 39:45 - 39:48
    of your machine learning life cycle all
  • 39:48 - 39:50
    right so let's look at the data cleaning
  • 39:50 - 39:51
    part so the data cleaning part we
  • 39:51 - 39:54
    interested in uh checking for missing
  • 39:54 - 39:56
    values and maybe removing the rows you
  • 39:56 - 39:58
    missing values okay
  • 39:58 - 40:00
    uh so the kind of things we can sorry
  • 40:00 - 40:01
    the kind of things we can do in missing
  • 40:01 - 40:03
    values we can remove the row missing
  • 40:03 - 40:06
    values we can put in some new values uh
  • 40:06 - 40:08
    some replacement values which could be a
  • 40:08 - 40:10
    average of all the values in that that
  • 40:10 - 40:13
    particular colume etc etc we also try to
  • 40:13 - 40:15
    identify outliers in our data set and
  • 40:15 - 40:17
    also there are a variety of ways to deal
  • 40:17 - 40:19
    with that so this is called Data
  • 40:19 - 40:21
    cleansing which is a really important
  • 40:21 - 40:23
    part of your machine learning workflow
  • 40:23 - 40:26
    right so that's where we are now at
  • 40:26 - 40:27
    we're doing cleansing and then we're
  • 40:27 - 40:29
    going to follow up with
  • 40:29 - 40:31
    exploration so let's look at the actual
  • 40:31 - 40:33
    code that does the cleansing here so
  • 40:33 - 40:36
    here we are right at the start of the uh
  • 40:36 - 40:38
    machine learning uh life cycle here so
  • 40:38 - 40:41
    this is a Jupiter notebook so here we
  • 40:41 - 40:43
    have a brief description of the problem
  • 40:43 - 40:46
    statement all right so this data set
  • 40:46 - 40:48
    reflects real life predictive
  • 40:48 - 40:49
    maintenance enounter industry with
  • 40:49 - 40:50
    measurements from real equipment the
  • 40:50 - 40:52
    features description is taken directly
  • 40:52 - 40:55
    from the data source set so here we have
  • 40:55 - 40:57
    a description of the six key features in
  • 40:57 - 41:00
    our data set type which is the quality
  • 41:00 - 41:03
    of the product the air temperature the
  • 41:03 - 41:05
    process temperature the rotational speed
  • 41:05 - 41:07
    the talk and the towar all right so
  • 41:07 - 41:09
    these are the six feature variables and
  • 41:09 - 41:11
    there are the two target variables so
  • 41:11 - 41:13
    just now I showed you just now there's
  • 41:13 - 41:15
    one target variable which only has two
  • 41:15 - 41:17
    possible values either zero or one okay
  • 41:17 - 41:20
    zero or one means failure or no failure
  • 41:20 - 41:23
    so that will be this colume here right
  • 41:23 - 41:25
    so let me go all the way back up to here
  • 41:25 - 41:27
    so this colume here we already saw it
  • 41:27 - 41:29
    only has two I values is either zero or
  • 41:29 - 41:33
    one and then we also have this column
  • 41:33 - 41:35
    here and this column here is basically
  • 41:35 - 41:38
    the failure type and so the we have as I
  • 41:38 - 41:41
    already demonstrated just now we do have
  • 41:41 - 41:43
    uh several categories of or types of
  • 41:43 - 41:46
    failure and so here we call this
  • 41:46 - 41:47
    multiclass
  • 41:47 - 41:50
    classification so we can either build a
  • 41:50 - 41:52
    binary classification model for this
  • 41:52 - 41:54
    problem domain or we can build a
  • 41:54 - 41:55
    multiclass
  • 41:55 - 41:58
    classification problem all right so this
  • 41:58 - 42:00
    jupyter notebook is going to demonstrate
  • 42:00 - 42:02
    both approaches to us so first step we
  • 42:02 - 42:05
    are going to write all this python code
  • 42:05 - 42:07
    that's going to import all the libraries
  • 42:07 - 42:09
    that we need to use okay so this is
  • 42:09 - 42:12
    basically python code okay and it's
  • 42:12 - 42:15
    importing the relevant machine learn
  • 42:15 - 42:18
    oops we are importing the relevant
  • 42:18 - 42:21
    machine learning libraries related to
  • 42:21 - 42:24
    our domain use case okay then we load in
  • 42:24 - 42:26
    our data set okay so this our data set
  • 42:26 - 42:28
    we describe it we have some quick
  • 42:28 - 42:31
    insights into the data set um and then
  • 42:31 - 42:33
    we just take a look at all the variables
  • 42:33 - 42:36
    of the feature variables Etc and so on
  • 42:36 - 42:38
    we just what we're doing now is just
  • 42:38 - 42:40
    doing a quick overview of the data set
  • 42:40 - 42:42
    so this all this python code here they
  • 42:42 - 42:44
    were writing is allowing us the data
  • 42:44 - 42:45
    scientist to get a quick overview of our
  • 42:45 - 42:48
    data set right okay like how many um V
  • 42:48 - 42:50
    how many rows are there how many columns
  • 42:50 - 42:52
    are there what are the data types of the
  • 42:52 - 42:53
    colums what are the name of the columns
  • 42:53 - 42:57
    etc etc okay then we zoom in on to the
  • 42:57 - 42:59
    Target variables so we look at the
  • 42:59 - 43:02
    Target variables how many uh counts
  • 43:02 - 43:05
    there are of this target variable uh and
  • 43:05 - 43:06
    so on how many different types of
  • 43:06 - 43:08
    failures there are then you want to
  • 43:08 - 43:09
    check whether there are any
  • 43:09 - 43:11
    inconsistencies between the Target and
  • 43:11 - 43:14
    the failure type Etc okay so when you do
  • 43:14 - 43:15
    all this checking you're going to
  • 43:15 - 43:17
    discover there are some discrepancies in
  • 43:17 - 43:20
    your data set so using a specific python
  • 43:20 - 43:22
    code to do checking you're going to say
  • 43:22 - 43:23
    hey you know what there's some errors
  • 43:23 - 43:25
    here right there are nine values that
  • 43:25 - 43:27
    classify as failure and Target variable
  • 43:27 - 43:28
    but as no no failure in the failure type
  • 43:28 - 43:30
    variable so that means there's a
  • 43:30 - 43:33
    discrepancy in your data point right so
  • 43:33 - 43:35
    which are so these are all the ones that
  • 43:35 - 43:36
    are discrepancies because the target
  • 43:36 - 43:39
    variable says one and we already know
  • 43:39 - 43:41
    that Target variable one is supposed to
  • 43:41 - 43:43
    mean that it's a failure right target
  • 43:43 - 43:45
    varable one is supposed to mean that is
  • 43:45 - 43:47
    a failure so we are kind of expecting to
  • 43:47 - 43:50
    see the failure classification but some
  • 43:50 - 43:51
    rows actually say there's no failure
  • 43:51 - 43:54
    although the target type is one but here
  • 43:54 - 43:56
    is a classic example of an error that
  • 43:56 - 43:59
    can very well Ur in a data set so now
  • 43:59 - 44:01
    the question is what do you do with
  • 44:01 - 44:05
    these errors in your data set right so
  • 44:05 - 44:06
    here the data scientist says I think it
  • 44:06 - 44:08
    would make sense to remove those
  • 44:08 - 44:10
    instances and so they write some code
  • 44:10 - 44:13
    then to remove those instances or those
  • 44:13 - 44:15
    uh rows or data points from the overall
  • 44:15 - 44:17
    data set and same thing we can again
  • 44:17 - 44:19
    check for other ISU so we find there's
  • 44:19 - 44:21
    another ISU here with our data set which
  • 44:21 - 44:24
    is another warning so again we can
  • 44:24 - 44:26
    possibly remove them so you're going to
  • 44:26 - 44:31
    remove 20 7 instances or rows from your
  • 44:31 - 44:34
    overall data set so your data set has a
  • 44:34 - 44:37
    10,000 uh rows or data points you're
  • 44:37 - 44:40
    removing 27 which is only 0.27 of the
  • 44:40 - 44:42
    entire data set and these were the
  • 44:42 - 44:46
    reasons why you remove them okay so if
  • 44:46 - 44:48
    you're just removing to uh 0.27% of the
  • 44:48 - 44:51
    anti data set no big deal right still
  • 44:51 - 44:53
    okay but you needed to remove them
  • 44:53 - 44:56
    because these errors right this
  • 44:56 - 44:58
    27 um
  • 44:58 - 45:01
    errors okay data points with errors in
  • 45:01 - 45:03
    your data set could really affect the
  • 45:03 - 45:05
    training of your machine learning model
  • 45:05 - 45:09
    so we need to do your data cleansing
  • 45:09 - 45:12
    right so we are actually cleansing now
  • 45:12 - 45:15
    uh uh some kind of data that is
  • 45:15 - 45:18
    incorrect or erroneous in your original
  • 45:18 - 45:21
    data set okay so then we go on to the
  • 45:21 - 45:24
    next part which is called Eda right so
  • 45:24 - 45:29
    Eda is where we kind of explore our data
  • 45:29 - 45:32
    and we want to kind of get a visual
  • 45:32 - 45:34
    overview of our data as a whole and also
  • 45:34 - 45:36
    take a look at the statistical
  • 45:36 - 45:38
    properties of data the statistical
  • 45:38 - 45:40
    distribution of the data in all the
  • 45:40 - 45:43
    various colums the correlation between
  • 45:43 - 45:45
    the variables between the feature
  • 45:45 - 45:47
    variables different columns and also the
  • 45:47 - 45:49
    feature variable and the target variable
  • 45:49 - 45:52
    so all of this is called Eda and Eda in
  • 45:52 - 45:54
    a machine learning workflow is typically
  • 45:54 - 45:57
    done through visualization
  • 45:57 - 45:59
    all right so let's go back here and take
  • 45:59 - 46:01
    a look right so for example here we are
  • 46:01 - 46:03
    looking at correlation so we plot the
  • 46:03 - 46:06
    values of all the various feature
  • 46:06 - 46:08
    variables against each other and look
  • 46:08 - 46:11
    for potential correlations and patterns
  • 46:11 - 46:13
    and so on and all the different shapes
  • 46:13 - 46:17
    that you see here in this pair plot okay
  • 46:17 - 46:18
    uh will have different meaning
  • 46:18 - 46:20
    statistical meaning and so the data
  • 46:20 - 46:22
    scientist has to kind of visually
  • 46:22 - 46:24
    inspect this P plot makes some
  • 46:24 - 46:26
    interpretations of these different
  • 46:26 - 46:28
    patterns that he sees here all right so
  • 46:28 - 46:30
    these are some of the insights that that
  • 46:30 - 46:33
    can be deduced from looking at these
  • 46:33 - 46:34
    pattern so for example the Tor and
  • 46:34 - 46:36
    rotational speed are highly correlated
  • 46:36 - 46:38
    the process temperature and a
  • 46:38 - 46:40
    temperature so highly correlated that
  • 46:40 - 46:42
    failures occur for extreme values of
  • 46:42 - 46:45
    some features etc etc then you can plot
  • 46:45 - 46:46
    certain kinds of charts this called a
  • 46:46 - 46:48
    violing chart to again get new insights
  • 46:48 - 46:50
    for example regarding the talk and
  • 46:50 - 46:51
    rotational speed it can see again that
  • 46:51 - 46:53
    most failures are triggered for much
  • 46:53 - 46:55
    lower or much higher values than the
  • 46:55 - 46:57
    mean when they're not failing so all
  • 46:57 - 47:01
    these visualizations they are there and
  • 47:01 - 47:02
    a trained data scientist can look at
  • 47:02 - 47:05
    them inspect them and make some kind of
  • 47:05 - 47:08
    insightful deductions from them okay
  • 47:08 - 47:11
    percentage of failure right uh the
  • 47:11 - 47:14
    correlation heat map okay between all
  • 47:14 - 47:16
    these different feature variables and
  • 47:16 - 47:17
    also the target
  • 47:17 - 47:20
    variable okay uh the product types
  • 47:20 - 47:21
    percentage of product types percentage
  • 47:21 - 47:23
    of failure with respect to the product
  • 47:23 - 47:26
    type so we can also kind of visualize
  • 47:26 - 47:28
    that as well so certain products have a
  • 47:28 - 47:30
    higher ratio of faure compared to other
  • 47:30 - 47:33
    product types Etc or for example uh M
  • 47:33 - 47:36
    tends to feel more than H products etc
  • 47:36 - 47:39
    etc so we can create a vast variety of
  • 47:39 - 47:41
    visualizations in the Eda stage so you
  • 47:41 - 47:44
    can see here and again the idea of this
  • 47:44 - 47:46
    visualization is just to give us some
  • 47:46 - 47:50
    insight some preliminary insight into
  • 47:50 - 47:53
    our data set that helps us to model it
  • 47:53 - 47:54
    more correctly so some more insights
  • 47:54 - 47:56
    that we get into our data set from all
  • 47:56 - 47:58
    this visualization
  • 47:58 - 48:00
    then we can plot the distribution so we
  • 48:00 - 48:01
    can see whether it's a normal
  • 48:01 - 48:03
    distribution or some other kind of
  • 48:03 - 48:06
    distribution uh we can have a box plot
  • 48:06 - 48:08
    to see whether there are any outliers in
  • 48:08 - 48:10
    your data set and so on right so we can
  • 48:10 - 48:12
    see from the box plots we can see
  • 48:12 - 48:15
    rotational speed and have outliers so we
  • 48:15 - 48:17
    already saw outliers are basically a
  • 48:17 - 48:19
    problem that you may need to kind of
  • 48:19 - 48:23
    tackle right so outliers are an isue uh
  • 48:23 - 48:25
    it's a it's a part of data cleansing and
  • 48:25 - 48:27
    so you may need to tackle this so we may
  • 48:27 - 48:29
    have to check okay well where are the
  • 48:29 - 48:31
    potential outliers so we can analyze
  • 48:31 - 48:35
    them from the box blot okay um but then
  • 48:35 - 48:37
    we can say well they are outliers but
  • 48:37 - 48:39
    maybe they're not really horrible
  • 48:39 - 48:41
    outliers so we can tolerate them or
  • 48:41 - 48:43
    maybe we want to remove them so we can
  • 48:43 - 48:45
    see what the mean and maximum values for
  • 48:45 - 48:47
    all these with respect to product type
  • 48:47 - 48:50
    how many of them are above or highly
  • 48:50 - 48:51
    correlated with the product type in
  • 48:51 - 48:54
    terms of the maximum and minimum okay
  • 48:54 - 48:57
    and then so on so the Insight is well we
  • 48:57 - 49:00
    got 4.8% of the instances are outliers
  • 49:00 - 49:03
    so maybe 4.87% is not really that much
  • 49:03 - 49:05
    the outliers are not horrible so we just
  • 49:05 - 49:07
    leave them in the data set now for a
  • 49:07 - 49:09
    different data set the data scientist
  • 49:09 - 49:10
    could come to different conclusion so
  • 49:10 - 49:12
    then they would do whatever they've
  • 49:12 - 49:15
    deemed is appropriate to kind of cleanse
  • 49:15 - 49:18
    the data set okay so now that we have
  • 49:18 - 49:20
    done all the Eda the next thing we're
  • 49:20 - 49:23
    going to do is we are going to do what
  • 49:23 - 49:26
    is called feature engineering so we are
  • 49:26 - 49:29
    going to transform our original feature
  • 49:29 - 49:31
    variables and these are our original
  • 49:31 - 49:33
    feature variables right these are our
  • 49:33 - 49:35
    original feature variables and we are
  • 49:35 - 49:38
    going to transform them all right we're
  • 49:38 - 49:40
    going to transform them in some sense uh
  • 49:40 - 49:44
    into some other form before we fit this
  • 49:44 - 49:46
    for training into our machine learning
  • 49:46 - 49:49
    algorithm all right so these are
  • 49:49 - 49:52
    examples of let's say this example of a
  • 49:52 - 49:55
    original data set right and this is
  • 49:55 - 49:57
    examples these are some of the examples
  • 49:57 - 49:58
    you don't have to use all of them but
  • 49:58 - 49:59
    these are some of examples of what we
  • 49:59 - 50:01
    call feature engineering which you can
  • 50:01 - 50:04
    then transform your original values in
  • 50:04 - 50:05
    your feature variables to all these
  • 50:05 - 50:08
    transform values here so we're going to
  • 50:08 - 50:10
    pretty much do that here so we have a
  • 50:10 - 50:13
    ordinal encoding we do scaling of the
  • 50:13 - 50:15
    data so the data set is scaled we use a
  • 50:15 - 50:18
    minmax scaling and then finally we come
  • 50:18 - 50:22
    to do a modeling so we have to split our
  • 50:22 - 50:24
    data set into a training data set and a
  • 50:24 - 50:29
    test data set so coming back to again um
  • 50:29 - 50:32
    we said that in a before you train your
  • 50:32 - 50:34
    model sorry before you train your model
  • 50:34 - 50:36
    you have to take your original data set
  • 50:36 - 50:37
    now this is a featured engineered data
  • 50:37 - 50:39
    set we're going to break it into two or
  • 50:39 - 50:41
    more subsets okay so one is called the
  • 50:41 - 50:42
    training data set that we use to Feit
  • 50:42 - 50:44
    and train a machine learning model the
  • 50:44 - 50:46
    second is test data set to evaluate the
  • 50:46 - 50:48
    accuracy of the model okay so we got
  • 50:48 - 50:51
    this training data set your test data
  • 50:51 - 50:53
    set and we also need
  • 50:53 - 50:56
    to sample so from our original data set
  • 50:56 - 50:57
    we need to sample sample some points
  • 50:57 - 50:59
    that go into your training data set some
  • 50:59 - 51:01
    points that go in your test data set so
  • 51:01 - 51:03
    there are many ways to do sampling one
  • 51:03 - 51:05
    way is to do stratified sampling where
  • 51:05 - 51:07
    we ensure the same proportion of data
  • 51:07 - 51:09
    from each steta or class because right
  • 51:09 - 51:11
    now we have a multiclass classification
  • 51:11 - 51:12
    problem so you want to make sure the
  • 51:12 - 51:14
    same proportion of data from each TR
  • 51:14 - 51:16
    class is equally proportional in the
  • 51:16 - 51:18
    training and test data set as the
  • 51:18 - 51:20
    original data set which is very useful
  • 51:20 - 51:22
    for dealing with what is called an
  • 51:22 - 51:24
    imbalanced data set so here we have an
  • 51:24 - 51:26
    example of what is called an imbalanced
  • 51:26 - 51:30
    data set in the sense that you have the
  • 51:30 - 51:33
    vast majority of data points in your
  • 51:33 - 51:35
    data set they are going to have the
  • 51:35 - 51:37
    value of zero for their target variable
  • 51:37 - 51:40
    colume so only a extremely small
  • 51:40 - 51:43
    minority of the data points in your data
  • 51:43 - 51:45
    set will actually have the value of one
  • 51:45 - 51:49
    for their target variable colume okay so
  • 51:49 - 51:51
    a situation where you have your class or
  • 51:51 - 51:53
    your target variable colume where the
  • 51:53 - 51:54
    vast majority of values are from one
  • 51:54 - 51:58
    class and a tiny small minority are from
  • 51:58 - 52:01
    another class we call this an imbalanced
  • 52:01 - 52:03
    data set and for an imbalanced data set
  • 52:03 - 52:04
    typically we will have a specific
  • 52:04 - 52:06
    technique to do the train test split
  • 52:06 - 52:08
    which is called stratified sampling and
  • 52:08 - 52:10
    so that's what's exactly happening here
  • 52:10 - 52:12
    we're doing a stratified split here so
  • 52:12 - 52:15
    we are doing a train test split here uh
  • 52:15 - 52:18
    and we are doing a stratified split uh
  • 52:18 - 52:20
    and then now we actually develop the
  • 52:20 - 52:23
    models so now we've got the train test
  • 52:23 - 52:25
    plate now here is where we actually
  • 52:25 - 52:27
    train the models
  • 52:27 - 52:30
    now in terms of classification there are
  • 52:30 - 52:32
    a whole bunch of
  • 52:32 - 52:35
    possibilities right that you can use
  • 52:35 - 52:38
    there are many many different algorithms
  • 52:38 - 52:41
    that we can use to create a
  • 52:41 - 52:43
    classification model so this are an
  • 52:43 - 52:45
    example of some of the more common ones
  • 52:45 - 52:47
    logistic support Vector machine decision
  • 52:47 - 52:50
    trees random Forest bagging balance
  • 52:50 - 52:53
    bagging boost assemble Ensemble so all
  • 52:53 - 52:55
    these are different algorithms which
  • 52:55 - 52:58
    will create different kind of models
  • 52:58 - 53:02
    which will result in different accuracy
  • 53:02 - 53:05
    measures okay so it's the goal of the
  • 53:05 - 53:09
    data scientist to find the best model
  • 53:09 - 53:12
    that gives the best accuracy for the
  • 53:12 - 53:14
    given data set for training on that
  • 53:14 - 53:17
    given data set so let's head back again
  • 53:17 - 53:20
    to uh our machine learning workflow so
  • 53:20 - 53:22
    here basically what I'm doing is I'm
  • 53:22 - 53:24
    creating a whole bunch of models here
  • 53:24 - 53:26
    all right so one is a random Forest one
  • 53:26 - 53:27
    is balance bagging one is a boost
  • 53:27 - 53:30
    classifier one's The Ensemble classifier
  • 53:30 - 53:33
    and using all of these I am going to
  • 53:33 - 53:35
    basically Feit or train my model using
  • 53:35 - 53:37
    all these algorithms and then I'm going
  • 53:37 - 53:40
    to evaluate them okay I'm going to
  • 53:40 - 53:42
    evaluate how good each of these models
  • 53:42 - 53:46
    are and here you can see your value your
  • 53:46 - 53:49
    evaluation data right okay and this is
  • 53:49 - 53:51
    the confusion Matrix which is another
  • 53:51 - 53:54
    way of evaluating so now we come to the
  • 53:54 - 53:56
    kind of the the the key part here which
  • 53:56 - 53:59
    is which is how do I distinguish between
  • 53:59 - 54:00
    all these models right I've got all
  • 54:00 - 54:01
    these different models which are built
  • 54:01 - 54:03
    with different algorithms which I'm
  • 54:03 - 54:05
    using to train on the same data set how
  • 54:05 - 54:07
    do I distinguish between all these
  • 54:07 - 54:10
    models okay and so for that sense for
  • 54:10 - 54:14
    that we actually have a whole bunch of
  • 54:14 - 54:16
    common evaluation matrics for
  • 54:16 - 54:18
    classification right so this evaluation
  • 54:18 - 54:22
    matrics tell us how good a model is in
  • 54:22 - 54:24
    terms of its accuracy in
  • 54:24 - 54:27
    classification so in terms of
  • 54:27 - 54:29
    accuracy we actually have many different
  • 54:29 - 54:32
    models uh sorry many different measures
  • 54:32 - 54:33
    right you might think well accuracy is
  • 54:33 - 54:35
    just accuracy well that's all right it's
  • 54:35 - 54:37
    just either it's accurate or it's not
  • 54:37 - 54:39
    accurate right but actually it's not
  • 54:39 - 54:41
    that simple there are many different
  • 54:41 - 54:44
    ways to measure the accuracy of a
  • 54:44 - 54:45
    classification model and these are some
  • 54:45 - 54:48
    of the more common ones so for example
  • 54:48 - 54:51
    the confusion metrix tells us how many
  • 54:51 - 54:54
    true positives that means the value is
  • 54:54 - 54:56
    positive the prediction is positive how
  • 54:56 - 54:58
    many false FAL positives which means the
  • 54:58 - 54:59
    value is negative the machine learning
  • 54:59 - 55:02
    model predicts positive how many false
  • 55:02 - 55:04
    negatives which means that the machine
  • 55:04 - 55:06
    learning model predicts negative but
  • 55:06 - 55:07
    it's actually positive and how many true
  • 55:07 - 55:09
    negatives there are which means that the
  • 55:09 - 55:11
    machine the machine learning model
  • 55:11 - 55:13
    predicts negative and the true value is
  • 55:13 - 55:15
    also negative so this is called a
  • 55:15 - 55:17
    confusion Matrix this is one way we
  • 55:17 - 55:19
    assess or evaluate the performance of a
  • 55:19 - 55:21
    classification
  • 55:21 - 55:23
    model okay this is for binary
  • 55:23 - 55:25
    classification we can also have
  • 55:25 - 55:27
    multiclass confusion Matrix
  • 55:27 - 55:29
    and then we can also measure things like
  • 55:29 - 55:32
    accuracy so accuracy is the true
  • 55:32 - 55:34
    positives plus the true negatives which
  • 55:34 - 55:35
    is the total number of correct
  • 55:35 - 55:38
    predictions made by the model divided by
  • 55:38 - 55:40
    the total number of data points in your
  • 55:40 - 55:43
    data set and then you have also other
  • 55:43 - 55:44
    kinds of
  • 55:44 - 55:47
    measures uh such as recall and this is a
  • 55:47 - 55:49
    formula for recall this is a formula for
  • 55:49 - 55:51
    the F1 score okay and then there's
  • 55:51 - 55:56
    something called the uh R curve right so
  • 55:56 - 55:57
    without going too much in the detail of
  • 55:57 - 55:59
    what each of these entails essentially
  • 55:59 - 56:01
    these are all different ways these are
  • 56:01 - 56:03
    different kpi right just like if you
  • 56:03 - 56:06
    work in a company you have different kpi
  • 56:06 - 56:08
    right certain employees have certain kpi
  • 56:08 - 56:11
    that measures how good or how how uh you
  • 56:11 - 56:13
    know efficient or how effective a
  • 56:13 - 56:16
    particular employee is right so the
  • 56:16 - 56:20
    kpi kpi for your machine learning models
  • 56:20 - 56:24
    are Roc curve F1 score recall accuracy
  • 56:24 - 56:27
    okay and your confusion Matrix so so
  • 56:27 - 56:30
    fundamentally after I have built right
  • 56:30 - 56:33
    so here I've built my four different
  • 56:33 - 56:35
    models so after I built these form
  • 56:35 - 56:38
    different models I'm going to check and
  • 56:38 - 56:40
    evaluate them using all those different
  • 56:40 - 56:42
    metrics like for example the F1 score
  • 56:42 - 56:45
    the Precision score the recall score all
  • 56:45 - 56:47
    right so for this model I can check out
  • 56:47 - 56:50
    the ROC score the F1 score the Precision
  • 56:50 - 56:52
    score the recall score then for this
  • 56:52 - 56:55
    model this is the ROC score the F1 score
  • 56:55 - 56:57
    the Precision score the recall called
  • 56:57 - 57:00
    then for this model and so on so for
  • 57:00 - 57:03
    every single model I've created using my
  • 57:03 - 57:06
    training data set I will have all my set
  • 57:06 - 57:08
    of evaluation metrics that I can use to
  • 57:08 - 57:12
    evaluate how good this model is okay
  • 57:12 - 57:13
    same thing here I've got a confusion
  • 57:13 - 57:15
    Matrix here right so I can use that
  • 57:15 - 57:18
    again to evaluate between all these four
  • 57:18 - 57:20
    different models and then I kind of
  • 57:20 - 57:22
    summarize it up here so we can see from
  • 57:22 - 57:25
    this summary here that actually the top
  • 57:25 - 57:28
    two models right which are I'm going to
  • 57:28 - 57:29
    give a lot as a data scientist I'm now
  • 57:29 - 57:31
    going to just focus on these two models
  • 57:31 - 57:33
    so these two models are begging
  • 57:33 - 57:36
    classifier and random Forest classifier
  • 57:36 - 57:38
    they have the highest values of F1 score
  • 57:38 - 57:40
    and the highest values of the rooc curve
  • 57:40 - 57:43
    score okay so we can say these are the
  • 57:43 - 57:46
    top two models in terms of accuracy okay
  • 57:46 - 57:49
    using the fub1 evaluation metric and the
  • 57:49 - 57:54
    r Au evaluation metric okay so these
  • 57:54 - 57:57
    results uh kind of summarize here and
  • 57:57 - 57:59
    then we use different sampling
  • 57:59 - 58:01
    techniques okay so just now I talked
  • 58:01 - 58:04
    about um different kinds of sampling
  • 58:04 - 58:06
    techniques and so the idea of different
  • 58:06 - 58:08
    kinds of sampling techniques is to just
  • 58:08 - 58:11
    get a different feel for different
  • 58:11 - 58:14
    distributions of the data in different
  • 58:14 - 58:16
    areas of your data set so that you want
  • 58:16 - 58:20
    to just kind of make sure that your your
  • 58:20 - 58:23
    your evaluation of accuracy is actually
  • 58:23 - 58:27
    statistically correct right so we can um
  • 58:27 - 58:30
    do what is called oversampling and under
  • 58:30 - 58:31
    sampling which is very useful when
  • 58:31 - 58:32
    you're working with an imbalance data
  • 58:32 - 58:35
    set so this is example of doing that and
  • 58:35 - 58:37
    then here we again again check out the
  • 58:37 - 58:39
    results for all these different
  • 58:39 - 58:42
    techniques we use uh the F1 score the Au
  • 58:42 - 58:44
    score all right these are the two key
  • 58:44 - 58:47
    measures of accuracy right so and then
  • 58:47 - 58:48
    we can check out the scores for the
  • 58:48 - 58:50
    different approaches okay so we can see
  • 58:50 - 58:53
    oh well overall the models have lower Au
  • 58:53 - 58:56
    r r Au C score but they have a much
  • 58:56 - 58:58
    higher F1 score the begging classifier
  • 58:58 - 59:01
    had the highest R1 highest roc1 score
  • 59:01 - 59:04
    but F1 score was too low okay then in
  • 59:04 - 59:07
    the data scientist opinion the random
  • 59:07 - 59:09
    forest with this particular technique of
  • 59:09 - 59:11
    sampling has equilibrium between the F1
  • 59:11 - 59:14
    R F1 R and A score so the takeaway one
  • 59:14 - 59:17
    is the macro F1 score improves
  • 59:17 - 59:18
    dramatically using the sampl sampling
  • 59:18 - 59:20
    techniqu so these models might be better
  • 59:20 - 59:22
    compared to the balanced ones all right
  • 59:22 - 59:26
    so based on all this uh evaluation the
  • 59:26 - 59:28
    data scientist says they're going to
  • 59:28 - 59:30
    continue to work with these two models
  • 59:30 - 59:31
    all right and the balance begging one
  • 59:31 - 59:33
    and then continue to make further
  • 59:33 - 59:35
    comparisons all right so then we
  • 59:35 - 59:37
    continue to keep refining on our
  • 59:37 - 59:39
    evaluation work here we're going to
  • 59:39 - 59:41
    train the models one more time again so
  • 59:41 - 59:43
    we again do a training test plate and
  • 59:43 - 59:45
    then we do that for this particular uh
  • 59:45 - 59:47
    approach model and then we print out we
  • 59:47 - 59:48
    print out what is called a
  • 59:48 - 59:51
    classification report and this is
  • 59:51 - 59:53
    basically a summary of all those metrics
  • 59:53 - 59:55
    that I talk about just now so just now
  • 59:55 - 59:58
    remember I said the the there was
  • 59:58 - 60:00
    several evaluation metrics right so uh
  • 60:00 - 60:01
    we had the confusion matrics the
  • 60:01 - 60:04
    accuracy the Precision the recall the Au
  • 60:04 - 60:08
    ccore so here with the um classification
  • 60:08 - 60:10
    report I can get a summary of all of
  • 60:10 - 60:12
    that so I can see all the values here
  • 60:12 - 60:15
    okay for this particular model begging
  • 60:15 - 60:17
    Tomac links and then I can do that for
  • 60:17 - 60:19
    another model the random Forest
  • 60:19 - 60:21
    borderline SME and then I can do that
  • 60:21 - 60:22
    for another model which is the balance
  • 60:22 - 60:25
    ping so again we see this a lot of
  • 60:25 - 60:27
    comparison between different models
  • 60:27 - 60:29
    trying to figure out what all these
  • 60:29 - 60:31
    evaluation metrics are telling us all
  • 60:31 - 60:33
    right then again we have a confusion
  • 60:33 - 60:36
    Matrix so we generate a confusion Matrix
  • 60:36 - 60:39
    for the bagging with the toac links
  • 60:39 - 60:41
    under sampling for the random followers
  • 60:41 - 60:43
    with the borderline mod over sampling
  • 60:43 - 60:45
    and just balance begging by itself then
  • 60:45 - 60:48
    again we compare between these three uh
  • 60:48 - 60:51
    models uh using the confusion Matrix
  • 60:51 - 60:53
    evaluation Matrix and then we can kind
  • 60:53 - 60:56
    of come to some conclusions all right so
  • 60:56 - 60:58
    right so now we look at all the data
  • 60:58 - 61:01
    then we move on and look at another um
  • 61:01 - 61:03
    another kind of evaluation metrix which
  • 61:03 - 61:07
    is the r score right so this is one of
  • 61:07 - 61:09
    the other evaluation metrics I talk
  • 61:09 - 61:11
    about so this one is a kind of a curve
  • 61:11 - 61:13
    you look at it to see the area
  • 61:13 - 61:14
    underneath the curve this is called AOC
  • 61:14 - 61:18
    R area under the curve sorry Au Au R
  • 61:18 - 61:20
    area under the curve all right so the
  • 61:20 - 61:22
    area under the curve uh
  • 61:22 - 61:24
    score will give us some idea about the
  • 61:24 - 61:26
    threshold that we're going to use for
  • 61:26 - 61:28
    classif ification so we can examine this
  • 61:28 - 61:29
    for the bagging classifier for the
  • 61:29 - 61:31
    random forest classifier for the balance
  • 61:31 - 61:34
    bagging classifier okay then we can also
  • 61:34 - 61:36
    again do that uh finally we can check
  • 61:36 - 61:38
    the classification report of this
  • 61:38 - 61:40
    particular model so we keep doing this
  • 61:40 - 61:43
    over and over again evaluating this m
  • 61:43 - 61:46
    The Matrix the the accuracy Matrix the
  • 61:46 - 61:47
    evaluation Matrix for all these
  • 61:47 - 61:49
    different models so we keep doing this
  • 61:49 - 61:51
    over and over again for different
  • 61:51 - 61:53
    thresholds or for classification and so
  • 61:53 - 61:57
    as we keep drilling into these we kind
  • 61:57 - 62:01
    of get more and more understanding of
  • 62:01 - 62:03
    all these different models which one is
  • 62:03 - 62:05
    the best one that gives the best
  • 62:05 - 62:09
    performance for our data set okay so
  • 62:09 - 62:11
    finally we come to this conclusion this
  • 62:11 - 62:14
    particular model is not able to reduce
  • 62:14 - 62:15
    the record on failure test than
  • 62:15 - 62:18
    95.8% on the other hand balance begging
  • 62:18 - 62:19
    with a decision thresold of 0.6 is able
  • 62:19 - 62:22
    to have a better recall blah blah blah
  • 62:22 - 62:25
    Etc so finally after having done all of
  • 62:25 - 62:27
    this evalu ations
  • 62:27 - 62:31
    okay this is the conclusion
  • 62:31 - 62:34
    so after having gone so right now we
  • 62:34 - 62:35
    have gone through all the steps of the
  • 62:35 - 62:38
    Machining learning life cycle and which
  • 62:38 - 62:40
    means we have right now or the data
  • 62:40 - 62:42
    scientist right now has gone through all
  • 62:42 - 62:43
    these
  • 62:43 - 62:47
    steps uh which is now we have done this
  • 62:47 - 62:49
    validation so we have done the cleaning
  • 62:49 - 62:51
    exploration preparation transformation
  • 62:51 - 62:53
    the future engineering we have developed
  • 62:53 - 62:54
    and trained multiple models we have
  • 62:54 - 62:56
    evaluated all these different models so
  • 62:56 - 62:59
    right now we have reached this stage so
  • 62:59 - 63:03
    at this stage we as the data scientist
  • 63:03 - 63:05
    kind of have completed our job so we've
  • 63:05 - 63:08
    come to some very useful conclusions
  • 63:08 - 63:10
    which we now can share with our
  • 63:10 - 63:13
    colleagues all right and based on this
  • 63:13 - 63:15
    uh conclusions or recommendations
  • 63:15 - 63:17
    somebody is going to choose a
  • 63:17 - 63:19
    appropriate model and that model is
  • 63:19 - 63:23
    going to get deployed for realtime use
  • 63:23 - 63:25
    in a real life production environment
  • 63:25 - 63:27
    okay and that decision is going to be
  • 63:27 - 63:29
    made based on the recommendations coming
  • 63:29 - 63:31
    from the data scientist at the end of
  • 63:31 - 63:33
    this phase okay so at the end of this
  • 63:33 - 63:35
    phase the data scientist is going to
  • 63:35 - 63:37
    come up with these conclusions so
  • 63:37 - 63:42
    conclusions is okay if the engineering
  • 63:42 - 63:45
    team they are looking okay the
  • 63:45 - 63:46
    engineering team right the engineering
  • 63:46 - 63:49
    team if they are looking for the highest
  • 63:49 - 63:52
    failure detection rate possible then
  • 63:52 - 63:54
    they should go with this particular
  • 63:54 - 63:57
    model okay
  • 63:57 - 63:59
    and if they want a balance between
  • 63:59 - 64:01
    precision and recall then they should
  • 64:01 - 64:03
    choose between the begging model with a
  • 64:03 - 64:06
    0.4 decision threshold or the random
  • 64:06 - 64:10
    forest model with a 0.5 threshold but if
  • 64:10 - 64:12
    they don't care so much about predicting
  • 64:12 - 64:14
    every failure and they want the highest
  • 64:14 - 64:17
    Precision possible then they should opt
  • 64:17 - 64:20
    for the begging toax link classifier
  • 64:20 - 64:23
    with a bit higher decision threshold and
  • 64:23 - 64:26
    so this is the key thing that the data
  • 64:26 - 64:28
    scientist is going to give right this is
  • 64:28 - 64:31
    the key takeaway this is the kind of the
  • 64:31 - 64:33
    end result of the entire machine
  • 64:33 - 64:35
    learning life cycle right now the data
  • 64:35 - 64:36
    scientist is going to tell the
  • 64:36 - 64:39
    engineering team all right you guys
  • 64:39 - 64:41
    which is more important for you point a
  • 64:41 - 64:45
    point B or Point C make your decision so
  • 64:45 - 64:47
    the engineering team will then discuss
  • 64:47 - 64:49
    among themselves and say hey you know
  • 64:49 - 64:52
    what what we want is we want to get the
  • 64:52 - 64:55
    highest failure detection possible
  • 64:55 - 64:58
    because any kind kind of failure of that
  • 64:58 - 65:00
    machine or the product on the samply
  • 65:00 - 65:03
    line is really going to screw us up big
  • 65:03 - 65:06
    time so what we're looking for is the
  • 65:06 - 65:08
    model that will give us the highest
  • 65:08 - 65:11
    failure detection rate we don't care
  • 65:11 - 65:13
    about Precision but we want to be make
  • 65:13 - 65:15
    sure that if there's a failure we are
  • 65:15 - 65:18
    going to catch it right so that's what
  • 65:18 - 65:20
    they want and so the data scientist will
  • 65:20 - 65:22
    say Hey you go for the balance begging
  • 65:22 - 65:25
    model okay then the data scientist saves
  • 65:25 - 65:28
    this all right uh and then once you have
  • 65:28 - 65:30
    saved this uh you can then go right
  • 65:30 - 65:32
    ahead and deploy that so you can go
  • 65:32 - 65:34
    right ahead and deploy that to
  • 65:34 - 65:37
    production okay and so if you want to
  • 65:37 - 65:39
    continue we can actually further
  • 65:39 - 65:41
    continue this modeling problem so just
  • 65:41 - 65:43
    now I model this problem as a binary
  • 65:43 - 65:47
    classification problem uh sorry just I
  • 65:47 - 65:48
    modeled this problem as a binary
  • 65:48 - 65:50
    classification which means it's either
  • 65:50 - 65:52
    zero or one either fail or not fail but
  • 65:52 - 65:54
    we can also model it as a multiclass
  • 65:54 - 65:56
    classification problem right because as
  • 65:56 - 65:58
    as I said earlier just now for the
  • 65:58 - 66:00
    Target variable colum which is sorry for
  • 66:00 - 66:03
    the failure type colume you actually
  • 66:03 - 66:05
    have multiple kinds of failures right
  • 66:05 - 66:08
    for example you may have a power failure
  • 66:08 - 66:10
    uh you may have a towar failure uh you
  • 66:10 - 66:13
    may have a overstrain failure so now we
  • 66:13 - 66:15
    can model the problem slightly
  • 66:15 - 66:17
    differently so we can model it as a
  • 66:17 - 66:20
    multiclass classification problem and
  • 66:20 - 66:21
    then we go through the entire same
  • 66:21 - 66:23
    process that we went through just now so
  • 66:23 - 66:25
    we create different models we test this
  • 66:25 - 66:27
    out but now the confusion Matrix is for
  • 66:27 - 66:30
    a multiclass classification isue right
  • 66:30 - 66:31
    so we're going
  • 66:31 - 66:34
    to check them out we're going to again
  • 66:34 - 66:36
    uh try different algorithms or models
  • 66:36 - 66:38
    again train and test our data set do the
  • 66:38 - 66:40
    training test split uh on these
  • 66:40 - 66:42
    different models all right so we have
  • 66:42 - 66:43
    like for example we have bon random
  • 66:43 - 66:46
    Forest B random Forest a great search
  • 66:46 - 66:48
    then you train the models using what is
  • 66:48 - 66:50
    called hyperparameter tuning then you
  • 66:50 - 66:51
    get the scores all right so you get the
  • 66:51 - 66:53
    same evaluation scores again you check
  • 66:53 - 66:55
    out the evaluation scores compare
  • 66:55 - 66:57
    between them generate a confusion Matrix
  • 66:57 - 67:00
    so this is a multiclass confusion Matrix
  • 67:00 - 67:02
    and then you come to the final
  • 67:02 - 67:06
    conclusion so now if you are interested
  • 67:06 - 67:09
    to frame your problem domain as a
  • 67:09 - 67:11
    multiclass classification problem all
  • 67:11 - 67:14
    right then these are the recommendations
  • 67:14 - 67:15
    from the data scientist so the data
  • 67:15 - 67:17
    scientist will say you know what I'm
  • 67:17 - 67:20
    going to pick this particular model the
  • 67:20 - 67:22
    balance backing classifier and these are
  • 67:22 - 67:25
    all the reasons that the data scientist
  • 67:25 - 67:27
    is going to give as a rational for
  • 67:27 - 67:29
    selecting this particular
  • 67:29 - 67:32
    model and then once that's done you save
  • 67:32 - 67:35
    the model and that's that's it that's it
  • 67:35 - 67:39
    so that's all done now and so then the
  • 67:39 - 67:41
    uh the model the machine learning model
  • 67:41 - 67:44
    now you can put it live run it on the
  • 67:44 - 67:45
    server and now the machine learning
  • 67:45 - 67:47
    model is ready to work which means it's
  • 67:47 - 67:49
    ready to generate predictions right
  • 67:49 - 67:50
    that's the main job of the machine
  • 67:50 - 67:52
    learning model you have picked the best
  • 67:52 - 67:54
    machine learning model with the best
  • 67:54 - 67:56
    evaluation metrics for whatever accur
  • 67:56 - 67:58
    see goal you're trying to achieve and
  • 67:58 - 68:00
    now you're going to run it on a server
  • 68:00 - 68:01
    and now you're going to get all this
  • 68:01 - 68:03
    real time data that's coming from your
  • 68:03 - 68:05
    sensus you're going to pump that into
  • 68:05 - 68:06
    your machine learning model your machine
  • 68:06 - 68:08
    learning model will pump out a whole
  • 68:08 - 68:10
    bunch of predictions and we're going to
  • 68:10 - 68:13
    use that predictions in real time to
  • 68:13 - 68:15
    make real time real world decision
  • 68:15 - 68:18
    making right you're going to say okay
  • 68:18 - 68:20
    I'm predicting that that machine is
  • 68:20 - 68:23
    going to fail on Thursday at 5:00 p.m.
  • 68:23 - 68:26
    so you better get your service folks in
  • 68:26 - 68:29
    to service it on Thursday 2: p.m. or you
  • 68:29 - 68:32
    know whatever so you can you know uh
  • 68:32 - 68:33
    make decisions on when you want to do
  • 68:33 - 68:35
    your maintenance you know and and make
  • 68:35 - 68:38
    the best decisions to optimize the cost
  • 68:38 - 68:41
    of Maintenance etc etc and then based on
  • 68:41 - 68:42
    the
  • 68:42 - 68:45
    results that are coming up from the
  • 68:45 - 68:47
    predictions so the predictions may be
  • 68:47 - 68:49
    good the predictions may be lousy the
  • 68:49 - 68:51
    predictions may be average right so we
  • 68:51 - 68:54
    are we're constantly monitoring how good
  • 68:54 - 68:55
    or how useful are the predictions
  • 68:55 - 68:58
    generated by this realtime model that's
  • 68:58 - 69:00
    running on the server and based on our
  • 69:00 - 69:03
    monitoring we will then take some new
  • 69:03 - 69:05
    data and then repeat this entire life
  • 69:05 - 69:07
    cycle again so this is basically a
  • 69:07 - 69:09
    workflow that's iterative and we are
  • 69:09 - 69:11
    constantly or the data scientist is
  • 69:11 - 69:13
    constantly getting in all these new data
  • 69:13 - 69:15
    points and then refining the model
  • 69:15 - 69:18
    picking maybe a new model deploying the
  • 69:18 - 69:22
    new model onto the server and so on all
  • 69:22 - 69:24
    right and so that's it so that is
  • 69:24 - 69:26
    basically your machine learning workflow
  • 69:26 - 69:29
    in a nutshell okay so for this
  • 69:29 - 69:32
    particular approach we have used a bunch
  • 69:32 - 69:35
    of uh data science libraries from python
  • 69:35 - 69:37
    so we have used pandas which is the most
  • 69:37 - 69:39
    B basic data science libraries that
  • 69:39 - 69:40
    provides all the tools to work with raw
  • 69:40 - 69:43
    data we have used numai which is a high
  • 69:43 - 69:44
    performance library for implementing
  • 69:44 - 69:46
    complex array metrix operations we have
  • 69:46 - 69:50
    used met plot lip and cbon which is used
  • 69:50 - 69:52
    for doing the Eda the explorat
  • 69:52 - 69:56
    exploratory data analysis phase machine
  • 69:56 - 69:57
    learning where you visualize all your
  • 69:57 - 69:59
    data we have used psyit learn which is
  • 69:59 - 70:01
    the machine L learning library to do all
  • 70:01 - 70:03
    your implementation for all your call
  • 70:03 - 70:06
    machine learning algorithms uh we we we
  • 70:06 - 70:08
    have not used this because this is not a
  • 70:08 - 70:11
    deep learning uh problem but if you are
  • 70:11 - 70:13
    working with a deep learning problem
  • 70:13 - 70:15
    like image classification image
  • 70:15 - 70:18
    recognition object detection okay
  • 70:18 - 70:20
    natural language processing text
  • 70:20 - 70:22
    classification well then you're going to
  • 70:22 - 70:24
    use these libraries from python which is
  • 70:24 - 70:29
    tensor flow okay and also py
  • 70:29 - 70:33
    to and then lastly that whole thing that
  • 70:33 - 70:35
    whole data science project that you saw
  • 70:35 - 70:37
    just now this entire data science
  • 70:37 - 70:39
    project is actually developed in
  • 70:39 - 70:41
    something called a Jupiter notebook so
  • 70:41 - 70:44
    all this python code along with all the
  • 70:44 - 70:46
    observations from the data
  • 70:46 - 70:49
    scientists okay for this entire data
  • 70:49 - 70:50
    science project was actually run in
  • 70:50 - 70:53
    something called a Jupiter notebook so
  • 70:53 - 70:56
    that is uh the
  • 70:56 - 70:59
    most widely used tool for interactively
  • 70:59 - 71:02
    developing and presenting data science
  • 71:02 - 71:05
    projects okay so that brings me to the
  • 71:05 - 71:07
    end of this entire presentation I hope
  • 71:07 - 71:10
    that you find it useful for you and that
  • 71:10 - 71:13
    you can appreciate the importance of
  • 71:13 - 71:15
    machine learning and how it can be
  • 71:15 - 71:20
    applied in a real life use case in a
  • 71:20 - 71:23
    typical production environment all right
  • 71:23 - 71:27
    thank you all so much for watching
Title:
Machine Learning for Predictive maintenance: End-to-end workflow in Jupyter notebook
Description:

more » « less
Video Language:
English
Duration:
01:11:27

English subtitles

Revisions Compare revisions