[Script Info]
Title: 
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:01.20,0:00:03.76,Default,,0000,0000,0000,,Hello everyone, my name is Victor. I'm
Dialogue: 0,0:00:03.76,0:00:05.36,Default,,0000,0000,0000,,your friendly neighborhood data
Dialogue: 0,0:00:05.36,0:00:07.76,Default,,0000,0000,0000,,scientist from DreamCatcher. So in this
Dialogue: 0,0:00:07.76,0:00:10.16,Default,,0000,0000,0000,,presentation, I would like to talk about
Dialogue: 0,0:00:10.16,0:00:12.76,Default,,0000,0000,0000,,a specific industry use case of AI or
Dialogue: 0,0:00:12.76,0:00:15.07,Default,,0000,0000,0000,,machine learning which is predictive
Dialogue: 0,0:00:15.07,0:00:19.00,Default,,0000,0000,0000,,maintenance. So I will be covering these
Dialogue: 0,0:00:19.00,0:00:21.32,Default,,0000,0000,0000,,topics and feel free to jump forward to
Dialogue: 0,0:00:21.32,0:00:23.36,Default,,0000,0000,0000,,the specific part in the video where I
Dialogue: 0,0:00:23.36,0:00:25.16,Default,,0000,0000,0000,,talk about all these topics. So I'm going
Dialogue: 0,0:00:25.16,0:00:27.16,Default,,0000,0000,0000,,to start off with a general preview of
Dialogue: 0,0:00:27.16,0:00:29.08,Default,,0000,0000,0000,,AI and machine learning. Then, I'll
Dialogue: 0,0:00:29.08,0:00:30.84,Default,,0000,0000,0000,,discuss the use case which is predictive
Dialogue: 0,0:00:30.84,0:00:32.72,Default,,0000,0000,0000,,maintenance. I'll talk about the basics
Dialogue: 0,0:00:32.72,0:00:34.80,Default,,0000,0000,0000,,of machine learning, the workflow of
Dialogue: 0,0:00:34.80,0:00:37.24,Default,,0000,0000,0000,,machine learning, and then we will come
Dialogue: 0,0:00:37.24,0:00:40.76,Default,,0000,0000,0000,,to the meat of this presentation which
Dialogue: 0,0:00:40.76,0:00:43.68,Default,,0000,0000,0000,,is essentially a demonstration of the
Dialogue: 0,0:00:43.68,0:00:45.40,Default,,0000,0000,0000,,machine learning workflow from end to
Dialogue: 0,0:00:45.40,0:00:47.58,Default,,0000,0000,0000,,end on a real life predictive
Dialogue: 0,0:00:47.58,0:00:51.52,Default,,0000,0000,0000,,maintenance domain problem. All right, so
Dialogue: 0,0:00:51.52,0:00:53.64,Default,,0000,0000,0000,,without any further ado, let's jump into
Dialogue: 0,0:00:53.64,0:00:56.68,Default,,0000,0000,0000,,it. So let's start off with a quick
Dialogue: 0,0:00:56.68,0:01:00.08,Default,,0000,0000,0000,,preview of AI and machine learning. Well
Dialogue: 0,0:01:00.08,0:01:03.60,Default,,0000,0000,0000,,AI is a very general term, it encompasses
Dialogue: 0,0:01:03.60,0:01:06.68,Default,,0000,0000,0000,,the entire area of science and
Dialogue: 0,0:01:06.68,0:01:09.04,Default,,0000,0000,0000,,engineering that is related to creating
Dialogue: 0,0:01:09.04,0:01:10.84,Default,,0000,0000,0000,,software programs and machines that
Dialogue: 0,0:01:10.84,0:01:13.76,Default,,0000,0000,0000,,will be capable of performing tasks
Dialogue: 0,0:01:13.76,0:01:16.08,Default,,0000,0000,0000,,that would normally require human
Dialogue: 0,0:01:16.08,0:01:19.60,Default,,0000,0000,0000,,intelligence. But AI is a catchall term,
Dialogue: 0,0:01:19.60,0:01:22.92,Default,,0000,0000,0000,,so really when we talk about apply AI,
Dialogue: 0,0:01:22.92,0:01:25.92,Default,,0000,0000,0000,,how we use AI in our daily work, we are
Dialogue: 0,0:01:25.92,0:01:27.72,Default,,0000,0000,0000,,really going to be talking about machine
Dialogue: 0,0:01:27.72,0:01:30.00,Default,,0000,0000,0000,,learning. So machine learning is the
Dialogue: 0,0:01:30.00,0:01:31.68,Default,,0000,0000,0000,,design and application of software
Dialogue: 0,0:01:31.68,0:01:34.08,Default,,0000,0000,0000,,algorithms that are capable of learning
Dialogue: 0,0:01:34.08,0:01:37.96,Default,,0000,0000,0000,,on their own without any explicit human
Dialogue: 0,0:01:37.96,0:01:40.40,Default,,0000,0000,0000,,intervention. And the primary purpose of
Dialogue: 0,0:01:40.40,0:01:43.28,Default,,0000,0000,0000,,these algorithms are to optimize
Dialogue: 0,0:01:43.28,0:01:46.84,Default,,0000,0000,0000,,performance in a specific task. And the
Dialogue: 0,0:01:46.84,0:01:49.68,Default,,0000,0000,0000,,primary performance or the primary task
Dialogue: 0,0:01:49.68,0:01:52.00,Default,,0000,0000,0000,,that you want to optimize performance in
Dialogue: 0,0:01:52.00,0:01:54.24,Default,,0000,0000,0000,,is to be able to make accurate
Dialogue: 0,0:01:54.24,0:01:57.48,Default,,0000,0000,0000,,predictions about future outcomes based
Dialogue: 0,0:01:57.48,0:02:00.56,Default,,0000,0000,0000,,on the analysis of historical data
Dialogue: 0,0:02:00.56,0:02:02.96,Default,,0000,0000,0000,,from the past. So essentially machine
Dialogue: 0,0:02:02.96,0:02:05.32,Default,,0000,0000,0000,,learning is about making predictions
Dialogue: 0,0:02:05.32,0:02:06.88,Default,,0000,0000,0000,,about the future or what we call
Dialogue: 0,0:02:06.88,0:02:08.92,Default,,0000,0000,0000,,predictive analytics.
Dialogue: 0,0:02:08.92,0:02:11.00,Default,,0000,0000,0000,,And there are many different
Dialogue: 0,0:02:11.00,0:02:12.72,Default,,0000,0000,0000,,kinds of algorithms that are available in
Dialogue: 0,0:02:12.72,0:02:14.52,Default,,0000,0000,0000,,machine learning under the three primary
Dialogue: 0,0:02:14.52,0:02:16.44,Default,,0000,0000,0000,,categories of supervised learning,
Dialogue: 0,0:02:16.44,0:02:18.92,Default,,0000,0000,0000,,unsupervised learning, and reinforcement
Dialogue: 0,0:02:18.92,0:02:21.44,Default,,0000,0000,0000,,learning. And here we can see some of the
Dialogue: 0,0:02:21.44,0:02:23.56,Default,,0000,0000,0000,,different kinds of algorithms and their
Dialogue: 0,0:02:23.56,0:02:27.48,Default,,0000,0000,0000,,use cases in various areas in
Dialogue: 0,0:02:27.48,0:02:29.68,Default,,0000,0000,0000,,industry. So we have various domain use
Dialogue: 0,0:02:29.68,0:02:30.48,Default,,0000,0000,0000,,cases
Dialogue: 0,0:02:30.48,0:02:31.80,Default,,0000,0000,0000,,for all these different kind of
Dialogue: 0,0:02:31.80,0:02:33.84,Default,,0000,0000,0000,,algorithms, and we can see that different
Dialogue: 0,0:02:33.84,0:02:38.12,Default,,0000,0000,0000,,algorithms are fitted for different use cases.
Dialogue: 0,0:02:38.12,0:02:41.00,Default,,0000,0000,0000,,Deep learning is an advanced form
Dialogue: 0,0:02:41.00,0:02:42.40,Default,,0000,0000,0000,,of machine learning that's based on
Dialogue: 0,0:02:42.40,0:02:44.28,Default,,0000,0000,0000,,something called an artificial neural
Dialogue: 0,0:02:44.28,0:02:46.32,Default,,0000,0000,0000,,network or ANN for short, and this
Dialogue: 0,0:02:46.32,0:02:47.84,Default,,0000,0000,0000,,essentially simulates the structure of
Dialogue: 0,0:02:47.84,0:02:49.52,Default,,0000,0000,0000,,the human brain whereby neurons
Dialogue: 0,0:02:49.52,0:02:51.36,Default,,0000,0000,0000,,interconnect and work together to
Dialogue: 0,0:02:51.36,0:02:54.96,Default,,0000,0000,0000,,process and learn new information. So DL
Dialogue: 0,0:02:54.96,0:02:57.24,Default,,0000,0000,0000,,is the foundational technology for most
Dialogue: 0,0:02:57.24,0:02:59.36,Default,,0000,0000,0000,,of the popular AI tools that you
Dialogue: 0,0:02:59.36,0:03:01.40,Default,,0000,0000,0000,,probably have heard of today. So I'm sure
Dialogue: 0,0:03:01.40,0:03:03.20,Default,,0000,0000,0000,,you have heard of ChatGPT if you haven't
Dialogue: 0,0:03:03.20,0:03:05.36,Default,,0000,0000,0000,,been living in a cave for the past 2
Dialogue: 0,0:03:05.36,0:03:08.28,Default,,0000,0000,0000,,years. And yeah, so ChatGPT is an example
Dialogue: 0,0:03:08.28,0:03:10.12,Default,,0000,0000,0000,,of what we call a large language model
Dialogue: 0,0:03:10.12,0:03:11.60,Default,,0000,0000,0000,,and that's based on this technology
Dialogue: 0,0:03:11.60,0:03:14.88,Default,,0000,0000,0000,,called deep learning. Also, all the modern
Dialogue: 0,0:03:14.88,0:03:17.44,Default,,0000,0000,0000,,computer vision applications where a
Dialogue: 0,0:03:17.44,0:03:20.04,Default,,0000,0000,0000,,computer program can classify images or
Dialogue: 0,0:03:20.04,0:03:23.24,Default,,0000,0000,0000,,detect images or recognize images on
Dialogue: 0,0:03:23.24,0:03:25.28,Default,,0000,0000,0000,,its own, okay, we call this computer
Dialogue: 0,0:03:25.28,0:03:27.76,Default,,0000,0000,0000,,vision applications. They also use
Dialogue: 0,0:03:27.76,0:03:29.52,Default,,0000,0000,0000,,this particular form of machine learning
Dialogue: 0,0:03:29.52,0:03:31.56,Default,,0000,0000,0000,,called deep learning, right? So this is a
Dialogue: 0,0:03:31.56,0:03:33.64,Default,,0000,0000,0000,,example of an artificial neural network.
Dialogue: 0,0:03:33.64,0:03:35.20,Default,,0000,0000,0000,,For example, here I have an image of a
Dialogue: 0,0:03:35.20,0:03:37.16,Default,,0000,0000,0000,,bird that's fed into this artificial
Dialogue: 0,0:03:37.16,0:03:39.56,Default,,0000,0000,0000,,neural network, and output from this
Dialogue: 0,0:03:39.56,0:03:41.24,Default,,0000,0000,0000,,artificial neural network is a
Dialogue: 0,0:03:41.24,0:03:43.96,Default,,0000,0000,0000,,classification of this image into one of
Dialogue: 0,0:03:43.96,0:03:46.40,Default,,0000,0000,0000,,these three potential categories. So in
Dialogue: 0,0:03:46.40,0:03:49.08,Default,,0000,0000,0000,,this case if the a Ann has been trained
Dialogue: 0,0:03:49.08,0:03:51.80,Default,,0000,0000,0000,,properly uh we fit in this image this
Dialogue: 0,0:03:51.80,0:03:54.08,Default,,0000,0000,0000,,a&amp;n should correctly classify this image
Dialogue: 0,0:03:54.08,0:03:56.88,Default,,0000,0000,0000,,as a bird right so this is image
Dialogue: 0,0:03:56.88,0:03:58.96,Default,,0000,0000,0000,,classification problem which is a
Dialogue: 0,0:03:58.96,0:04:01.08,Default,,0000,0000,0000,,classic use case for an artificial
Dialogue: 0,0:04:01.08,0:04:04.44,Default,,0000,0000,0000,,neural network in the field of computer
Dialogue: 0,0:04:04.44,0:04:07.88,Default,,0000,0000,0000,,vision and and just like in the case of
Dialogue: 0,0:04:07.88,0:04:09.40,Default,,0000,0000,0000,,machine learning there are a variety of
Dialogue: 0,0:04:09.40,0:04:11.64,Default,,0000,0000,0000,,algorithms uh that are available for
Dialogue: 0,0:04:11.64,0:04:13.60,Default,,0000,0000,0000,,deep learning under the category of
Dialogue: 0,0:04:13.60,0:04:15.00,Default,,0000,0000,0000,,supervised learning and also
Dialogue: 0,0:04:15.00,0:04:16.84,Default,,0000,0000,0000,,unsupervised
Dialogue: 0,0:04:16.84,0:04:19.20,Default,,0000,0000,0000,,learning all right so this is how we can
Dialogue: 0,0:04:19.20,0:04:20.84,Default,,0000,0000,0000,,kind of categorize this you can think of
Dialogue: 0,0:04:20.84,0:04:23.88,Default,,0000,0000,0000,,AI is a general area of Smart Systems
Dialogue: 0,0:04:23.88,0:04:26.56,Default,,0000,0000,0000,,and machine machine learning is
Dialogue: 0,0:04:26.56,0:04:29.36,Default,,0000,0000,0000,,basically applied Ai and deep learning
Dialogue: 0,0:04:29.36,0:04:30.36,Default,,0000,0000,0000,,is a
Dialogue: 0,0:04:30.36,0:04:32.56,Default,,0000,0000,0000,,subspecialization of machine learning
Dialogue: 0,0:04:32.56,0:04:35.00,Default,,0000,0000,0000,,using a particular architecture called
Dialogue: 0,0:04:35.00,0:04:37.96,Default,,0000,0000,0000,,an artificial neural
Dialogue: 0,0:04:37.96,0:04:42.16,Default,,0000,0000,0000,,network and generative AI so if you talk
Dialogue: 0,0:04:42.16,0:04:45.28,Default,,0000,0000,0000,,about chat GPT Okay Google Gemini
Dialogue: 0,0:04:45.28,0:04:47.64,Default,,0000,0000,0000,,Microsoft co-pilot okay all these
Dialogue: 0,0:04:47.64,0:04:49.60,Default,,0000,0000,0000,,examples of generative AI they are
Dialogue: 0,0:04:49.60,0:04:51.60,Default,,0000,0000,0000,,basically large language models and they
Dialogue: 0,0:04:51.60,0:04:53.88,Default,,0000,0000,0000,,are a further subcategory within the
Dialogue: 0,0:04:53.88,0:04:55.52,Default,,0000,0000,0000,,area of deep
Dialogue: 0,0:04:55.52,0:04:57.76,Default,,0000,0000,0000,,learning and there are many applications
Dialogue: 0,0:04:57.76,0:04:59.40,Default,,0000,0000,0000,,of machine learning in Industry right
Dialogue: 0,0:04:59.40,0:05:01.76,Default,,0000,0000,0000,,now so pick which particular industry
Dialogue: 0,0:05:01.76,0:05:03.68,Default,,0000,0000,0000,,you involved in and these are all the
Dialogue: 0,0:05:03.68,0:05:06.08,Default,,0000,0000,0000,,specific areas of
Dialogue: 0,0:05:06.08,0:05:09.96,Default,,0000,0000,0000,,applications right uh so probably I'm
Dialogue: 0,0:05:09.96,0:05:11.68,Default,,0000,0000,0000,,going to guess the vast majority of you
Dialogue: 0,0:05:11.68,0:05:12.88,Default,,0000,0000,0000,,who are watching this video you're
Dialogue: 0,0:05:12.88,0:05:14.36,Default,,0000,0000,0000,,probably coming from the manufacturing
Dialogue: 0,0:05:14.36,0:05:16.64,Default,,0000,0000,0000,,industry and so in the manufacturing
Dialogue: 0,0:05:16.64,0:05:18.48,Default,,0000,0000,0000,,industry some of the standard use cases
Dialogue: 0,0:05:18.48,0:05:20.04,Default,,0000,0000,0000,,for machine learning and deep learning
Dialogue: 0,0:05:20.04,0:05:23.08,Default,,0000,0000,0000,,are predicting potential problems okay
Dialogue: 0,0:05:23.08,0:05:25.32,Default,,0000,0000,0000,,so sometimes you call this uh predictive
Dialogue: 0,0:05:25.32,0:05:27.16,Default,,0000,0000,0000,,maintenance where you want to predict
Dialogue: 0,0:05:27.16,0:05:28.80,Default,,0000,0000,0000,,when a problem is going to happen and
Dialogue: 0,0:05:28.80,0:05:30.40,Default,,0000,0000,0000,,then kind of address it before it
Dialogue: 0,0:05:30.40,0:05:32.76,Default,,0000,0000,0000,,happens and then monitoring systems
Dialogue: 0,0:05:32.76,0:05:35.20,Default,,0000,0000,0000,,automating your manufacturing assembly
Dialogue: 0,0:05:35.20,0:05:37.88,Default,,0000,0000,0000,,line or production line okay smart
Dialogue: 0,0:05:37.88,0:05:40.12,Default,,0000,0000,0000,,scheduling and detecting anomaly on your
Dialogue: 0,0:05:40.12,0:05:41.60,Default,,0000,0000,0000,,production
Dialogue: 0,0:05:41.60,0:05:44.16,Default,,0000,0000,0000,,line okay so let's talk about the use
Dialogue: 0,0:05:44.16,0:05:45.68,Default,,0000,0000,0000,,case here which is predictive
Dialogue: 0,0:05:45.68,0:05:49.28,Default,,0000,0000,0000,,maintenance right so what is predictive
Dialogue: 0,0:05:49.28,0:05:51.72,Default,,0000,0000,0000,,maintenance well predictive maintenance
Dialogue: 0,0:05:51.72,0:05:53.20,Default,,0000,0000,0000,,uh here's the long definition is a
Dialogue: 0,0:05:53.20,0:05:54.64,Default,,0000,0000,0000,,equipment maintenance strategy that
Dialogue: 0,0:05:54.64,0:05:56.28,Default,,0000,0000,0000,,relies on real-time monitoring of
Dialogue: 0,0:05:56.28,0:05:58.36,Default,,0000,0000,0000,,equipment conditions and data to predict
Dialogue: 0,0:05:58.36,0:06:00.28,Default,,0000,0000,0000,,equipment failures in advance
Dialogue: 0,0:06:00.28,0:06:02.68,Default,,0000,0000,0000,,and this uses Advanced Data models
Dialogue: 0,0:06:02.68,0:06:05.24,Default,,0000,0000,0000,,analytics and machine learning whereby
Dialogue: 0,0:06:05.24,0:06:07.48,Default,,0000,0000,0000,,we can reliably assess when failures are
Dialogue: 0,0:06:07.48,0:06:09.20,Default,,0000,0000,0000,,more likely to occur including which
Dialogue: 0,0:06:09.20,0:06:11.12,Default,,0000,0000,0000,,components are more likely to be
Dialogue: 0,0:06:11.12,0:06:13.56,Default,,0000,0000,0000,,affected on your production or assembly
Dialogue: 0,0:06:13.56,0:06:16.60,Default,,0000,0000,0000,,line so where does pred predictive
Dialogue: 0,0:06:16.60,0:06:18.76,Default,,0000,0000,0000,,maintenance fit into the overall scheme
Dialogue: 0,0:06:18.76,0:06:20.76,Default,,0000,0000,0000,,of things right so let's talk about the
Dialogue: 0,0:06:20.76,0:06:23.04,Default,,0000,0000,0000,,kind of standard way that you know
Dialogue: 0,0:06:23.04,0:06:25.52,Default,,0000,0000,0000,,factories or production uh or production
Dialogue: 0,0:06:25.52,0:06:27.68,Default,,0000,0000,0000,,lines assembly lines in factories tend
Dialogue: 0,0:06:27.68,0:06:31.08,Default,,0000,0000,0000,,to handle uh Main maintenance issues say
Dialogue: 0,0:06:31.08,0:06:33.12,Default,,0000,0000,0000,,10 or 20 years ago right so what you
Dialogue: 0,0:06:33.12,0:06:34.52,Default,,0000,0000,0000,,have is the what you would probably
Dialogue: 0,0:06:34.52,0:06:36.40,Default,,0000,0000,0000,,start off is is the most basic mode
Dialogue: 0,0:06:36.40,0:06:38.24,Default,,0000,0000,0000,,which is reactive maintenance so you
Dialogue: 0,0:06:38.24,0:06:40.68,Default,,0000,0000,0000,,just wait until your machine breaks down
Dialogue: 0,0:06:40.68,0:06:43.04,Default,,0000,0000,0000,,and then you repair right the simplest
Dialogue: 0,0:06:43.04,0:06:44.72,Default,,0000,0000,0000,,but of course I'm sure if you work on a
Dialogue: 0,0:06:44.72,0:06:46.72,Default,,0000,0000,0000,,production line for any period of time
Dialogue: 0,0:06:46.72,0:06:48.88,Default,,0000,0000,0000,,you know that this reactive maintenance
Dialogue: 0,0:06:48.88,0:06:50.76,Default,,0000,0000,0000,,can give you a whole bunch of headaches
Dialogue: 0,0:06:50.76,0:06:52.16,Default,,0000,0000,0000,,especially if the machine breaks down
Dialogue: 0,0:06:52.16,0:06:54.12,Default,,0000,0000,0000,,just before a critical delivery dat line
Dialogue: 0,0:06:54.12,0:06:55.52,Default,,0000,0000,0000,,right then you're you're going to have a
Dialogue: 0,0:06:55.52,0:06:56.80,Default,,0000,0000,0000,,backlog of orders and you're going to
Dialogue: 0,0:06:56.80,0:06:59.16,Default,,0000,0000,0000,,run a lot of problems okay so we move on
Dialogue: 0,0:06:59.16,0:07:00.88,Default,,0000,0000,0000,,to PR preventive maintenance which is
Dialogue: 0,0:07:00.88,0:07:03.84,Default,,0000,0000,0000,,you regularly schedule a maintenance of
Dialogue: 0,0:07:03.84,0:07:07.00,Default,,0000,0000,0000,,your production machines uh to reduce
Dialogue: 0,0:07:07.00,0:07:08.80,Default,,0000,0000,0000,,the failure rate so you might do
Dialogue: 0,0:07:08.80,0:07:10.52,Default,,0000,0000,0000,,maintenance once every month once every
Dialogue: 0,0:07:10.52,0:07:13.12,Default,,0000,0000,0000,,two weeks whatever okay this is great
Dialogue: 0,0:07:13.12,0:07:15.24,Default,,0000,0000,0000,,but the problem of course then is well
Dialogue: 0,0:07:15.24,0:07:16.20,Default,,0000,0000,0000,,sometimes you're doing too much
Dialogue: 0,0:07:16.20,0:07:18.40,Default,,0000,0000,0000,,maintenance it's not really necessary
Dialogue: 0,0:07:18.40,0:07:20.64,Default,,0000,0000,0000,,and it still doesn't totally uh prevent
Dialogue: 0,0:07:20.64,0:07:23.24,Default,,0000,0000,0000,,this uh you know uh a failure of the
Dialogue: 0,0:07:23.24,0:07:25.64,Default,,0000,0000,0000,,machine that occurs outside of your plan
Dialogue: 0,0:07:25.64,0:07:28.68,Default,,0000,0000,0000,,maintenance right so a bit of
Dialogue: 0,0:07:28.68,0:07:31.16,Default,,0000,0000,0000,,improvement but not not that much better
Dialogue: 0,0:07:31.16,0:07:33.28,Default,,0000,0000,0000,,and then these last two categories is
Dialogue: 0,0:07:33.28,0:07:34.68,Default,,0000,0000,0000,,where we bring in Ai and machine
Dialogue: 0,0:07:34.68,0:07:36.76,Default,,0000,0000,0000,,learning so with machine learning we're
Dialogue: 0,0:07:36.76,0:07:39.28,Default,,0000,0000,0000,,going to use sensors to do real-time
Dialogue: 0,0:07:39.28,0:07:41.76,Default,,0000,0000,0000,,monitoring of the data and then using
Dialogue: 0,0:07:41.76,0:07:43.32,Default,,0000,0000,0000,,that data we're going to build a machine
Dialogue: 0,0:07:43.32,0:07:46.48,Default,,0000,0000,0000,,learning model which helps us to predict
Dialogue: 0,0:07:46.48,0:07:50.00,Default,,0000,0000,0000,,with a reasonable level of accuracy when
Dialogue: 0,0:07:50.00,0:07:52.52,Default,,0000,0000,0000,,the next failure is going to happen on
Dialogue: 0,0:07:52.52,0:07:54.44,Default,,0000,0000,0000,,your assembly or production line on a
Dialogue: 0,0:07:54.44,0:07:57.44,Default,,0000,0000,0000,,specific component or specific machine
Dialogue: 0,0:07:57.44,0:07:59.52,Default,,0000,0000,0000,,right so you just want to be predict to
Dialogue: 0,0:07:59.52,0:08:01.96,Default,,0000,0000,0000,,to a high level of accuracy like maybe
Dialogue: 0,0:08:01.96,0:08:04.44,Default,,0000,0000,0000,,to the specific day even the specific
Dialogue: 0,0:08:04.44,0:08:06.40,Default,,0000,0000,0000,,hour or even minute itself when you
Dialogue: 0,0:08:06.40,0:08:08.36,Default,,0000,0000,0000,,expect that particular product to fail
Dialogue: 0,0:08:08.36,0:08:10.96,Default,,0000,0000,0000,,or the particular machine to fail all
Dialogue: 0,0:08:10.96,0:08:12.64,Default,,0000,0000,0000,,right so these are advantages of
Dialogue: 0,0:08:12.64,0:08:14.88,Default,,0000,0000,0000,,predictive maintenance it minimizes
Dialogue: 0,0:08:14.88,0:08:16.72,Default,,0000,0000,0000,,occurrence of unscheduled downtime it
Dialogue: 0,0:08:16.72,0:08:18.08,Default,,0000,0000,0000,,gives you a realtime overview of your
Dialogue: 0,0:08:18.08,0:08:19.92,Default,,0000,0000,0000,,current condition of assets ensures
Dialogue: 0,0:08:19.92,0:08:22.68,Default,,0000,0000,0000,,minimal disruptions of productivity uh
Dialogue: 0,0:08:22.68,0:08:24.72,Default,,0000,0000,0000,,optimizes time span on maintenance work
Dialogue: 0,0:08:24.72,0:08:26.64,Default,,0000,0000,0000,,optimizes the use of spare parts and so
Dialogue: 0,0:08:26.64,0:08:28.28,Default,,0000,0000,0000,,on and of course there are some
Dialogue: 0,0:08:28.28,0:08:30.64,Default,,0000,0000,0000,,disadvantages with which is uh the
Dialogue: 0,0:08:30.64,0:08:32.56,Default,,0000,0000,0000,,primary one you need a specialized set
Dialogue: 0,0:08:32.56,0:08:35.52,Default,,0000,0000,0000,,of skills among your engineers to
Dialogue: 0,0:08:35.52,0:08:37.72,Default,,0000,0000,0000,,understand and create machine learning
Dialogue: 0,0:08:37.72,0:08:40.60,Default,,0000,0000,0000,,models that can work on the realtime
Dialogue: 0,0:08:40.60,0:08:43.56,Default,,0000,0000,0000,,data that you're getting okay so we're
Dialogue: 0,0:08:43.56,0:08:45.00,Default,,0000,0000,0000,,going to take a look at some real life
Dialogue: 0,0:08:45.00,0:08:47.20,Default,,0000,0000,0000,,use cases so these are a bunch of links
Dialogue: 0,0:08:47.20,0:08:48.72,Default,,0000,0000,0000,,here so if you navigate to these links
Dialogue: 0,0:08:48.72,0:08:50.12,Default,,0000,0000,0000,,here you'll be able to get a look at
Dialogue: 0,0:08:50.12,0:08:54.36,Default,,0000,0000,0000,,some real life use cases of um machine
Dialogue: 0,0:08:54.36,0:08:57.64,Default,,0000,0000,0000,,learning uh in predictive maintenance so
Dialogue: 0,0:08:57.64,0:09:00.96,Default,,0000,0000,0000,,the IBM website okay gives you a look at
Dialogue: 0,0:09:00.96,0:09:04.88,Default,,0000,0000,0000,,a bunch of five use cases so you can
Dialogue: 0,0:09:04.88,0:09:06.52,Default,,0000,0000,0000,,click on these links and follow up with
Dialogue: 0,0:09:06.52,0:09:08.28,Default,,0000,0000,0000,,them if you want to read more okay this
Dialogue: 0,0:09:08.28,0:09:11.48,Default,,0000,0000,0000,,is Waste Management manufacturing okay
Dialogue: 0,0:09:11.48,0:09:14.76,Default,,0000,0000,0000,,Building Services and renewable energy
Dialogue: 0,0:09:14.76,0:09:16.88,Default,,0000,0000,0000,,and also mining right so these are all
Dialogue: 0,0:09:16.88,0:09:18.28,Default,,0000,0000,0000,,use cases if you want to know more about
Dialogue: 0,0:09:18.28,0:09:20.48,Default,,0000,0000,0000,,them you can read up and follow them
Dialogue: 0,0:09:20.48,0:09:23.60,Default,,0000,0000,0000,,from this website uh and this website
Dialogue: 0,0:09:23.60,0:09:25.76,Default,,0000,0000,0000,,gives uh this is a pretty good website I
Dialogue: 0,0:09:25.76,0:09:27.72,Default,,0000,0000,0000,,would really encourage you to just look
Dialogue: 0,0:09:27.72,0:09:28.88,Default,,0000,0000,0000,,through this if you're interested in
Dialogue: 0,0:09:28.88,0:09:31.16,Default,,0000,0000,0000,,predictive maintenance so here it tells
Dialogue: 0,0:09:31.16,0:09:34.28,Default,,0000,0000,0000,,you about you know an industry survey of
Dialogue: 0,0:09:34.28,0:09:36.36,Default,,0000,0000,0000,,predictive maintenance we can see that a
Dialogue: 0,0:09:36.36,0:09:38.20,Default,,0000,0000,0000,,large portion of the industry
Dialogue: 0,0:09:38.20,0:09:39.68,Default,,0000,0000,0000,,manufacturing industry agreed that
Dialogue: 0,0:09:39.68,0:09:41.36,Default,,0000,0000,0000,,predictive maintenance is a real need to
Dialogue: 0,0:09:41.36,0:09:43.96,Default,,0000,0000,0000,,stay competitive uh and predictive
Dialogue: 0,0:09:43.96,0:09:45.24,Default,,0000,0000,0000,,maintenance is essential for
Dialogue: 0,0:09:45.24,0:09:46.72,Default,,0000,0000,0000,,manufacturing industry and will gain
Dialogue: 0,0:09:46.72,0:09:48.28,Default,,0000,0000,0000,,additional strength in the future so
Dialogue: 0,0:09:48.28,0:09:50.20,Default,,0000,0000,0000,,this is a survey that was done um quite
Dialogue: 0,0:09:50.20,0:09:52.04,Default,,0000,0000,0000,,some time ago and this was the results
Dialogue: 0,0:09:52.04,0:09:53.88,Default,,0000,0000,0000,,that we got back so we can see the vast
Dialogue: 0,0:09:53.88,0:09:55.72,Default,,0000,0000,0000,,majority of key industry players in the
Dialogue: 0,0:09:55.72,0:09:57.64,Default,,0000,0000,0000,,manufacturing sector they consider
Dialogue: 0,0:09:57.64,0:09:59.00,Default,,0000,0000,0000,,predictive maintenance to be very
Dialogue: 0,0:09:59.00,0:09:59.84,Default,,0000,0000,0000,,important
Dialogue: 0,0:09:59.84,0:10:01.60,Default,,0000,0000,0000,,um activity that they want to
Dialogue: 0,0:10:01.60,0:10:04.52,Default,,0000,0000,0000,,incorporate into their workflow right
Dialogue: 0,0:10:04.52,0:10:07.72,Default,,0000,0000,0000,,and we can see here the kind of Roi that
Dialogue: 0,0:10:07.72,0:10:10.68,Default,,0000,0000,0000,,we expect on investment in predictive
Dialogue: 0,0:10:10.68,0:10:13.40,Default,,0000,0000,0000,,maintenance so 45% reduction in downtime
Dialogue: 0,0:10:13.40,0:10:17.12,Default,,0000,0000,0000,,25% growth in productivity 75% fault
Dialogue: 0,0:10:17.12,0:10:19.48,Default,,0000,0000,0000,,elimination 30% reduction in maintenance
Dialogue: 0,0:10:19.48,0:10:22.64,Default,,0000,0000,0000,,cost okay and best of all if you really
Dialogue: 0,0:10:22.64,0:10:25.04,Default,,0000,0000,0000,,want to kind of take a look at examples
Dialogue: 0,0:10:25.04,0:10:26.68,Default,,0000,0000,0000,,all right so there are all these
Dialogue: 0,0:10:26.68,0:10:28.12,Default,,0000,0000,0000,,different companies that have uh
Dialogue: 0,0:10:28.12,0:10:30.16,Default,,0000,0000,0000,,significantly invested in predictive
Dialogue: 0,0:10:30.16,0:10:31.64,Default,,0000,0000,0000,,maintenance technology in their
Dialogue: 0,0:10:31.64,0:10:34.24,Default,,0000,0000,0000,,manufacturing processes So pepsic Co we
Dialogue: 0,0:10:34.24,0:10:38.48,Default,,0000,0000,0000,,have got uh Frito General motos Mii EOP
Dialogue: 0,0:10:38.48,0:10:40.96,Default,,0000,0000,0000,,plan all right so you can jump over here
Dialogue: 0,0:10:40.96,0:10:42.96,Default,,0000,0000,0000,,and and take a look at some of these uh
Dialogue: 0,0:10:42.96,0:10:46.04,Default,,0000,0000,0000,,use cases let me perhaps let me try and
Dialogue: 0,0:10:46.04,0:10:48.08,Default,,0000,0000,0000,,open this up for example Mii right you
Dialogue: 0,0:10:48.08,0:10:51.88,Default,,0000,0000,0000,,can see Mii has impl oops Mii has used
Dialogue: 0,0:10:51.88,0:10:53.72,Default,,0000,0000,0000,,uh this particular piece of software
Dialogue: 0,0:10:53.72,0:10:55.84,Default,,0000,0000,0000,,called ma lab all right or math work
Dialogue: 0,0:10:55.84,0:10:59.76,Default,,0000,0000,0000,,sorry uh to do uh predictive maintenance
Dialogue: 0,0:10:59.76,0:11:01.92,Default,,0000,0000,0000,,for their manufacturing processes using
Dialogue: 0,0:11:01.92,0:11:05.08,Default,,0000,0000,0000,,machine learning and we can talk you can
Dialogue: 0,0:11:05.08,0:11:07.68,Default,,0000,0000,0000,,study how they have used it all right
Dialogue: 0,0:11:07.68,0:11:09.00,Default,,0000,0000,0000,,and how it works what was their
Dialogue: 0,0:11:09.00,0:11:10.92,Default,,0000,0000,0000,,challenge all right the problems that
Dialogue: 0,0:11:10.92,0:11:12.64,Default,,0000,0000,0000,,were facing the solution that they use
Dialogue: 0,0:11:12.64,0:11:14.56,Default,,0000,0000,0000,,using this MathWorks Consulting piece of
Dialogue: 0,0:11:14.56,0:11:17.16,Default,,0000,0000,0000,,software and data that they collected in
Dialogue: 0,0:11:17.16,0:11:20.40,Default,,0000,0000,0000,,a uh matlb database all right uh sorry
Dialogue: 0,0:11:20.40,0:11:23.64,Default,,0000,0000,0000,,in a Oracle databas uh Oracle database
Dialogue: 0,0:11:23.64,0:11:26.40,Default,,0000,0000,0000,,so using math works from math lab all
Dialogue: 0,0:11:26.40,0:11:27.96,Default,,0000,0000,0000,,right they were able to create a deep
Dialogue: 0,0:11:27.96,0:11:30.56,Default,,0000,0000,0000,,learning model to to to you know to
Dialogue: 0,0:11:30.56,0:11:32.84,Default,,0000,0000,0000,,solve this particular issue for their
Dialogue: 0,0:11:32.84,0:11:35.72,Default,,0000,0000,0000,,domain so if you're interested please I
Dialogue: 0,0:11:35.72,0:11:37.64,Default,,0000,0000,0000,,strongly encourage you to read up on all
Dialogue: 0,0:11:37.64,0:11:40.44,Default,,0000,0000,0000,,these real life customer Stories We
Dialogue: 0,0:11:40.44,0:11:44.24,Default,,0000,0000,0000,,showcase uh use cases for predictive
Dialogue: 0,0:11:44.24,0:11:48.24,Default,,0000,0000,0000,,maintenance okay so that's it for uh
Dialogue: 0,0:11:48.24,0:11:52.20,Default,,0000,0000,0000,,real life use cases for predictive
Dialogue: 0,0:11:52.96,0:11:56.60,Default,,0000,0000,0000,,maintenance now in this uh topic I'm I'm
Dialogue: 0,0:11:56.60,0:11:58.00,Default,,0000,0000,0000,,going to talk about machine learning
Dialogue: 0,0:11:58.00,0:12:00.04,Default,,0000,0000,0000,,Basics so what is is actually involved
Dialogue: 0,0:12:00.04,0:12:01.48,Default,,0000,0000,0000,,in machine learning and I'm going to
Dialogue: 0,0:12:01.48,0:12:03.84,Default,,0000,0000,0000,,give a very quick fast conceptual high
Dialogue: 0,0:12:03.84,0:12:05.92,Default,,0000,0000,0000,,level overview of machine learning all
Dialogue: 0,0:12:05.92,0:12:09.00,Default,,0000,0000,0000,,right so there are several categories of
Dialogue: 0,0:12:09.00,0:12:10.96,Default,,0000,0000,0000,,machine learning supervised unsupervised
Dialogue: 0,0:12:10.96,0:12:13.00,Default,,0000,0000,0000,,semi-supervised reinforcement and deep
Dialogue: 0,0:12:13.00,0:12:15.88,Default,,0000,0000,0000,,learning okay and let's talk about the
Dialogue: 0,0:12:15.88,0:12:19.36,Default,,0000,0000,0000,,most common and widely used category of
Dialogue: 0,0:12:19.36,0:12:20.56,Default,,0000,0000,0000,,machine learning which is called
Dialogue: 0,0:12:20.56,0:12:25.04,Default,,0000,0000,0000,,supervised learning so the par um use
Dialogue: 0,0:12:25.04,0:12:26.28,Default,,0000,0000,0000,,case here that I'm going to be
Dialogue: 0,0:12:26.28,0:12:28.56,Default,,0000,0000,0000,,discussing predictive maintenance it's
Dialogue: 0,0:12:28.56,0:12:31.32,Default,,0000,0000,0000,,basically a form of supervised learning
Dialogue: 0,0:12:31.32,0:12:33.48,Default,,0000,0000,0000,,so how does supervised learning work
Dialogue: 0,0:12:33.48,0:12:35.20,Default,,0000,0000,0000,,well in supervised learning you're going
Dialogue: 0,0:12:35.20,0:12:37.24,Default,,0000,0000,0000,,to create a machine learning model by
Dialogue: 0,0:12:37.24,0:12:39.36,Default,,0000,0000,0000,,providing what is called a labeled data
Dialogue: 0,0:12:39.36,0:12:41.68,Default,,0000,0000,0000,,set as a input to a machine learning
Dialogue: 0,0:12:41.68,0:12:44.68,Default,,0000,0000,0000,,program or algorithm and this data set
Dialogue: 0,0:12:44.68,0:12:46.44,Default,,0000,0000,0000,,is going to contain what is called an
Dialogue: 0,0:12:46.44,0:12:48.76,Default,,0000,0000,0000,,independent of feature variables all
Dialogue: 0,0:12:48.76,0:12:51.24,Default,,0000,0000,0000,,right so this will be a set of variables
Dialogue: 0,0:12:51.24,0:12:52.96,Default,,0000,0000,0000,,and there will be one dependent or
Dialogue: 0,0:12:52.96,0:12:54.96,Default,,0000,0000,0000,,Target variable which we also call the
Dialogue: 0,0:12:54.96,0:12:57.72,Default,,0000,0000,0000,,label and the idea is that the
Dialogue: 0,0:12:57.72,0:12:59.84,Default,,0000,0000,0000,,independent or the feature variable are
Dialogue: 0,0:12:59.84,0:13:01.60,Default,,0000,0000,0000,,the attributes or properties of your
Dialogue: 0,0:13:01.60,0:13:04.16,Default,,0000,0000,0000,,data set that influence the dependent or
Dialogue: 0,0:13:04.16,0:13:07.76,Default,,0000,0000,0000,,the target variable okay so this process
Dialogue: 0,0:13:07.76,0:13:09.12,Default,,0000,0000,0000,,that I've just described is called
Dialogue: 0,0:13:09.12,0:13:11.60,Default,,0000,0000,0000,,training the machine learning model and
Dialogue: 0,0:13:11.60,0:13:14.28,Default,,0000,0000,0000,,the model is fundamentally a
Dialogue: 0,0:13:14.28,0:13:16.40,Default,,0000,0000,0000,,mathematical function that best
Dialogue: 0,0:13:16.40,0:13:18.40,Default,,0000,0000,0000,,approximates the relationship between
Dialogue: 0,0:13:18.40,0:13:20.64,Default,,0000,0000,0000,,the independent variables and the
Dialogue: 0,0:13:20.64,0:13:22.64,Default,,0000,0000,0000,,dependent variable all right so there's
Dialogue: 0,0:13:22.64,0:13:24.48,Default,,0000,0000,0000,,quite a bit of a mouthful so let's jump
Dialogue: 0,0:13:24.48,0:13:26.32,Default,,0000,0000,0000,,into a diagram that maybe illustrates
Dialogue: 0,0:13:26.32,0:13:27.88,Default,,0000,0000,0000,,this more clearly so let's say you have
Dialogue: 0,0:13:27.88,0:13:30.00,Default,,0000,0000,0000,,a data set here an Excel spreadsheet
Dialogue: 0,0:13:30.00,0:13:32.16,Default,,0000,0000,0000,,right and this Excel spreadsheet has a
Dialogue: 0,0:13:32.16,0:13:34.04,Default,,0000,0000,0000,,bunch of columns here and a bunch of
Dialogue: 0,0:13:34.04,0:13:36.80,Default,,0000,0000,0000,,rows okay so these rows here represent
Dialogue: 0,0:13:36.80,0:13:39.00,Default,,0000,0000,0000,,observations or or these rows are what
Dialogue: 0,0:13:39.00,0:13:40.96,Default,,0000,0000,0000,,we call observations or samples or data
Dialogue: 0,0:13:40.96,0:13:43.12,Default,,0000,0000,0000,,points in our data set okay so let's
Dialogue: 0,0:13:43.12,0:13:46.88,Default,,0000,0000,0000,,assume this data set is uh gathered by a
Dialogue: 0,0:13:46.88,0:13:49.96,Default,,0000,0000,0000,,marketing manager at a mall at a retail
Dialogue: 0,0:13:49.96,0:13:52.28,Default,,0000,0000,0000,,mall all right so they've got all this
Dialogue: 0,0:13:52.28,0:13:54.92,Default,,0000,0000,0000,,uh information about the customers who
Dialogue: 0,0:13:54.92,0:13:56.80,Default,,0000,0000,0000,,purchase products at this mall all right
Dialogue: 0,0:13:56.80,0:13:58.52,Default,,0000,0000,0000,,so some of the information they've
Dialogue: 0,0:13:58.52,0:14:00.00,Default,,0000,0000,0000,,gotten about the customers are their
Dialogue: 0,0:14:00.00,0:14:01.84,Default,,0000,0000,0000,,gender their age their income and the
Dialogue: 0,0:14:01.84,0:14:03.60,Default,,0000,0000,0000,,number of children so all this
Dialogue: 0,0:14:03.60,0:14:05.68,Default,,0000,0000,0000,,information about the customers we call
Dialogue: 0,0:14:05.68,0:14:07.36,Default,,0000,0000,0000,,this the independent of the feature
Dialogue: 0,0:14:07.36,0:14:10.08,Default,,0000,0000,0000,,variables all right and based on all
Dialogue: 0,0:14:10.08,0:14:12.76,Default,,0000,0000,0000,,this information about the customer we
Dialogue: 0,0:14:12.76,0:14:16.20,Default,,0000,0000,0000,,also managed to get some or we record
Dialogue: 0,0:14:16.20,0:14:17.60,Default,,0000,0000,0000,,the information about how much the
Dialogue: 0,0:14:17.60,0:14:20.48,Default,,0000,0000,0000,,customer spends all right so this uh
Dialogue: 0,0:14:20.48,0:14:22.08,Default,,0000,0000,0000,,information or this numbers here we call
Dialogue: 0,0:14:22.08,0:14:23.84,Default,,0000,0000,0000,,this the target variable or the
Dialogue: 0,0:14:23.84,0:14:26.60,Default,,0000,0000,0000,,dependent variable right so on the
Dialogue: 0,0:14:26.60,0:14:29.52,Default,,0000,0000,0000,,single Row the data point one single one
Dialogue: 0,0:14:29.52,0:14:32.56,Default,,0000,0000,0000,,single data point contains all the data
Dialogue: 0,0:14:32.56,0:14:35.04,Default,,0000,0000,0000,,for the feature variables and one single
Dialogue: 0,0:14:35.04,0:14:37.80,Default,,0000,0000,0000,,value for the label or the target
Dialogue: 0,0:14:37.80,0:14:41.20,Default,,0000,0000,0000,,variable okay and the primary purpose of
Dialogue: 0,0:14:41.20,0:14:43.24,Default,,0000,0000,0000,,the machine learning model is to create
Dialogue: 0,0:14:43.24,0:14:45.52,Default,,0000,0000,0000,,a mapping from all your feature
Dialogue: 0,0:14:45.52,0:14:48.16,Default,,0000,0000,0000,,variables to your target variable so
Dialogue: 0,0:14:48.16,0:14:50.92,Default,,0000,0000,0000,,somehow there's going to be a function
Dialogue: 0,0:14:50.92,0:14:52.16,Default,,0000,0000,0000,,okay this will be a mathematical
Dialogue: 0,0:14:52.16,0:14:54.80,Default,,0000,0000,0000,,function that maps all the values of
Dialogue: 0,0:14:54.80,0:14:57.04,Default,,0000,0000,0000,,your feature variable to the value of
Dialogue: 0,0:14:57.04,0:14:59.64,Default,,0000,0000,0000,,your target variable in other words this
Dialogue: 0,0:14:59.64,0:15:01.28,Default,,0000,0000,0000,,function represents the relationship
Dialogue: 0,0:15:01.28,0:15:03.36,Default,,0000,0000,0000,,between your future variables and your
Dialogue: 0,0:15:03.36,0:15:07.08,Default,,0000,0000,0000,,target variable okay so this whole thing
Dialogue: 0,0:15:07.08,0:15:08.56,Default,,0000,0000,0000,,this training process we call this the
Dialogue: 0,0:15:08.56,0:15:11.32,Default,,0000,0000,0000,,fitting the model and the target
Dialogue: 0,0:15:11.32,0:15:13.24,Default,,0000,0000,0000,,variable or the label this thing here
Dialogue: 0,0:15:13.24,0:15:15.12,Default,,0000,0000,0000,,this colume here or the values here
Dialogue: 0,0:15:15.12,0:15:17.40,Default,,0000,0000,0000,,these are critical for providing a
Dialogue: 0,0:15:17.40,0:15:19.00,Default,,0000,0000,0000,,context to do the fitting of the
Dialogue: 0,0:15:19.00,0:15:21.16,Default,,0000,0000,0000,,training of the model and once you've
Dialogue: 0,0:15:21.16,0:15:23.36,Default,,0000,0000,0000,,got a trained and fitted model you can
Dialogue: 0,0:15:23.36,0:15:25.96,Default,,0000,0000,0000,,then use the model to make an accurate
Dialogue: 0,0:15:25.96,0:15:28.32,Default,,0000,0000,0000,,prediction of Target values
Dialogue: 0,0:15:28.32,0:15:30.24,Default,,0000,0000,0000,,corresponding to new feature values that
Dialogue: 0,0:15:30.24,0:15:32.52,Default,,0000,0000,0000,,the model has yet to encounter or yet to
Dialogue: 0,0:15:32.52,0:15:34.76,Default,,0000,0000,0000,,see and this as I've already said
Dialogue: 0,0:15:34.76,0:15:36.24,Default,,0000,0000,0000,,earlier this is called Predictive
Dialogue: 0,0:15:36.24,0:15:38.48,Default,,0000,0000,0000,,Analytics okay so let's see what's
Dialogue: 0,0:15:38.48,0:15:40.12,Default,,0000,0000,0000,,actually happening here you take your
Dialogue: 0,0:15:40.12,0:15:43.08,Default,,0000,0000,0000,,training data all right so this is this
Dialogue: 0,0:15:43.08,0:15:44.88,Default,,0000,0000,0000,,whole bunch of data this data set here
Dialogue: 0,0:15:44.88,0:15:47.44,Default,,0000,0000,0000,,cons consisting of a thousand rows of
Dialogue: 0,0:15:47.44,0:15:49.92,Default,,0000,0000,0000,,data 10,000 rows of data you take this
Dialogue: 0,0:15:49.92,0:15:52.04,Default,,0000,0000,0000,,entire data set all right this entire
Dialogue: 0,0:15:52.04,0:15:54.00,Default,,0000,0000,0000,,data set you jam it into your machine
Dialogue: 0,0:15:54.00,0:15:56.52,Default,,0000,0000,0000,,learning algorithm and a couple of hours
Dialogue: 0,0:15:56.52,0:15:58.08,Default,,0000,0000,0000,,later your machine learning algorithm
Dialogue: 0,0:15:58.08,0:16:01.36,Default,,0000,0000,0000,,comes out with a model and the model is
Dialogue: 0,0:16:01.36,0:16:04.20,Default,,0000,0000,0000,,essentially a function that maps all
Dialogue: 0,0:16:04.20,0:16:05.96,Default,,0000,0000,0000,,your feature variables which is these
Dialogue: 0,0:16:05.96,0:16:08.20,Default,,0000,0000,0000,,four columns here to your target
Dialogue: 0,0:16:08.20,0:16:10.44,Default,,0000,0000,0000,,variable which is this one single colume
Dialogue: 0,0:16:10.44,0:16:14.28,Default,,0000,0000,0000,,here okay so once you have the model you
Dialogue: 0,0:16:14.28,0:16:17.04,Default,,0000,0000,0000,,can put in a new data point so basically
Dialogue: 0,0:16:17.04,0:16:19.08,Default,,0000,0000,0000,,the new data point represents data about
Dialogue: 0,0:16:19.08,0:16:20.96,Default,,0000,0000,0000,,new customer a new customer that you
Dialogue: 0,0:16:20.96,0:16:23.12,Default,,0000,0000,0000,,have never seen before so let's say
Dialogue: 0,0:16:23.12,0:16:25.08,Default,,0000,0000,0000,,you've already got information about
Dialogue: 0,0:16:25.08,0:16:27.56,Default,,0000,0000,0000,,10,000 customers that have visited this
Dialogue: 0,0:16:27.56,0:16:29.92,Default,,0000,0000,0000,,mall and how much each of these 10,000
Dialogue: 0,0:16:29.92,0:16:31.52,Default,,0000,0000,0000,,customers have spent when they at this
Dialogue: 0,0:16:31.52,0:16:34.04,Default,,0000,0000,0000,,mall so now you have a totally new
Dialogue: 0,0:16:34.04,0:16:35.80,Default,,0000,0000,0000,,customer that comes in the mall this
Dialogue: 0,0:16:35.80,0:16:37.80,Default,,0000,0000,0000,,customer has never come into this mall
Dialogue: 0,0:16:37.80,0:16:39.84,Default,,0000,0000,0000,,before and what we know about this
Dialogue: 0,0:16:39.84,0:16:42.68,Default,,0000,0000,0000,,customer is that he is a male the age is
Dialogue: 0,0:16:42.68,0:16:45.20,Default,,0000,0000,0000,,50 the income is 18 and they have nine
Dialogue: 0,0:16:45.20,0:16:48.16,Default,,0000,0000,0000,,children so now when you take this data
Dialogue: 0,0:16:48.16,0:16:50.52,Default,,0000,0000,0000,,and you pump that into your model your
Dialogue: 0,0:16:50.52,0:16:52.92,Default,,0000,0000,0000,,model is going to make a prediction it's
Dialogue: 0,0:16:52.92,0:16:55.72,Default,,0000,0000,0000,,going to say hey you know what based on
Dialogue: 0,0:16:55.72,0:16:57.28,Default,,0000,0000,0000,,everything that have been trained before
Dialogue: 0,0:16:57.28,0:16:59.36,Default,,0000,0000,0000,,and based on the model I've developed
Dialogue: 0,0:16:59.36,0:17:01.96,Default,,0000,0000,0000,,I am going to predict that a customer
Dialogue: 0,0:17:01.96,0:17:04.88,Default,,0000,0000,0000,,that is of a male gender of the age 50
Dialogue: 0,0:17:04.88,0:17:08.28,Default,,0000,0000,0000,,with the income of 18 and nine children
Dialogue: 0,0:17:08.28,0:17:12.40,Default,,0000,0000,0000,,that customer is going to spend 25 ring
Dialogue: 0,0:17:12.40,0:17:15.84,Default,,0000,0000,0000,,at the mall and this is it this is what
Dialogue: 0,0:17:15.84,0:17:18.60,Default,,0000,0000,0000,,you want right right there right here
Dialogue: 0,0:17:18.60,0:17:21.32,Default,,0000,0000,0000,,can you see here that is the final
Dialogue: 0,0:17:21.32,0:17:23.48,Default,,0000,0000,0000,,output of your machine learning model
Dialogue: 0,0:17:23.48,0:17:27.36,Default,,0000,0000,0000,,it's going to make a prediction about
Dialogue: 0,0:17:27.36,0:17:29.76,Default,,0000,0000,0000,,something that it has not ever seen
Dialogue: 0,0:17:29.76,0:17:32.92,Default,,0000,0000,0000,,before okay that is the core this is
Dialogue: 0,0:17:32.92,0:17:35.52,Default,,0000,0000,0000,,essentially the core of machine learning
Dialogue: 0,0:17:35.52,0:17:38.64,Default,,0000,0000,0000,,Predictive Analytics making prediction
Dialogue: 0,0:17:38.64,0:17:40.12,Default,,0000,0000,0000,,about the
Dialogue: 0,0:17:40.12,0:17:43.80,Default,,0000,0000,0000,,future based on a historical data
Dialogue: 0,0:17:43.80,0:17:47.44,Default,,0000,0000,0000,,set okay so there are two areas of
Dialogue: 0,0:17:47.44,0:17:49.48,Default,,0000,0000,0000,,supervised learning regression and
Dialogue: 0,0:17:49.48,0:17:51.40,Default,,0000,0000,0000,,classification so regression is used to
Dialogue: 0,0:17:51.40,0:17:53.44,Default,,0000,0000,0000,,predict a numerical Target variable such
Dialogue: 0,0:17:53.44,0:17:55.32,Default,,0000,0000,0000,,as the price of a house or the salary of
Dialogue: 0,0:17:55.32,0:17:57.80,Default,,0000,0000,0000,,an employee whereas classification is
Dialogue: 0,0:17:57.80,0:17:59.92,Default,,0000,0000,0000,,used to predict a c categorical Target
Dialogue: 0,0:17:59.92,0:18:03.56,Default,,0000,0000,0000,,variable or class label okay so for
Dialogue: 0,0:18:03.56,0:18:05.80,Default,,0000,0000,0000,,classification you can have either
Dialogue: 0,0:18:05.80,0:18:08.68,Default,,0000,0000,0000,,binary or multiclass so for example
Dialogue: 0,0:18:08.68,0:18:11.56,Default,,0000,0000,0000,,binary will be just true or false zero
Dialogue: 0,0:18:11.56,0:18:14.84,Default,,0000,0000,0000,,or one so whether your machine is going
Dialogue: 0,0:18:14.84,0:18:17.36,Default,,0000,0000,0000,,to fail or is it not going to fail right
Dialogue: 0,0:18:17.36,0:18:19.00,Default,,0000,0000,0000,,so just two classes two possible
Dialogue: 0,0:18:19.00,0:18:21.64,Default,,0000,0000,0000,,outcomes or is the customer going to
Dialogue: 0,0:18:21.64,0:18:23.68,Default,,0000,0000,0000,,make a purchase or is the customer not
Dialogue: 0,0:18:23.68,0:18:26.16,Default,,0000,0000,0000,,going to make a purchase uh we call this
Dialogue: 0,0:18:26.16,0:18:28.12,Default,,0000,0000,0000,,binary classification and then for
Dialogue: 0,0:18:28.12,0:18:29.68,Default,,0000,0000,0000,,multiclass when there are more than two
Dialogue: 0,0:18:29.68,0:18:32.56,Default,,0000,0000,0000,,classes or types of values so for
Dialogue: 0,0:18:32.56,0:18:34.04,Default,,0000,0000,0000,,example here this would be a
Dialogue: 0,0:18:34.04,0:18:35.76,Default,,0000,0000,0000,,classification problem so if you have a
Dialogue: 0,0:18:35.76,0:18:37.96,Default,,0000,0000,0000,,data set here you've got information
Dialogue: 0,0:18:37.96,0:18:39.36,Default,,0000,0000,0000,,about your customers you've got your
Dialogue: 0,0:18:39.36,0:18:41.16,Default,,0000,0000,0000,,gender of the customer the age of the
Dialogue: 0,0:18:41.16,0:18:42.92,Default,,0000,0000,0000,,customer the salary of the customer and
Dialogue: 0,0:18:42.92,0:18:44.64,Default,,0000,0000,0000,,you also have record about whether the
Dialogue: 0,0:18:44.64,0:18:47.68,Default,,0000,0000,0000,,customer made a purchase or not okay so
Dialogue: 0,0:18:47.68,0:18:50.08,Default,,0000,0000,0000,,you can take this data set to train a
Dialogue: 0,0:18:50.08,0:18:52.44,Default,,0000,0000,0000,,classification model and then the
Dialogue: 0,0:18:52.44,0:18:54.12,Default,,0000,0000,0000,,classification model can then make a
Dialogue: 0,0:18:54.12,0:18:56.32,Default,,0000,0000,0000,,prediction about a new customer and
Dialogue: 0,0:18:56.32,0:18:58.80,Default,,0000,0000,0000,,they're going to predict zero which
Dialogue: 0,0:18:58.80,0:19:00.48,Default,,0000,0000,0000,,means the customer didn't make a
Dialogue: 0,0:19:00.48,0:19:03.16,Default,,0000,0000,0000,,purchase or one which means the customer
Dialogue: 0,0:19:03.16,0:19:06.32,Default,,0000,0000,0000,,make a purchase right and regression
Dialogue: 0,0:19:06.32,0:19:08.60,Default,,0000,0000,0000,,this is regression so let's say you want
Dialogue: 0,0:19:08.60,0:19:11.28,Default,,0000,0000,0000,,to predict the wind speed and you've got
Dialogue: 0,0:19:11.28,0:19:13.80,Default,,0000,0000,0000,,historical data about all these four
Dialogue: 0,0:19:13.80,0:19:16.56,Default,,0000,0000,0000,,other independent variables or feature
Dialogue: 0,0:19:16.56,0:19:18.04,Default,,0000,0000,0000,,variables so you have recorded
Dialogue: 0,0:19:18.04,0:19:19.64,Default,,0000,0000,0000,,temperature the pressure the relative
Dialogue: 0,0:19:19.64,0:19:21.80,Default,,0000,0000,0000,,humidity and the wind direction for the
Dialogue: 0,0:19:21.80,0:19:24.80,Default,,0000,0000,0000,,past 10 days 15 days or whatever okay so
Dialogue: 0,0:19:24.80,0:19:26.76,Default,,0000,0000,0000,,now you are going to train your machine
Dialogue: 0,0:19:26.76,0:19:28.72,Default,,0000,0000,0000,,learning model using this data set and
Dialogue: 0,0:19:28.72,0:19:31.68,Default,,0000,0000,0000,,and the target variable colume okay this
Dialogue: 0,0:19:31.68,0:19:33.76,Default,,0000,0000,0000,,colume here the label is basically a
Dialogue: 0,0:19:33.76,0:19:37.08,Default,,0000,0000,0000,,number right so now with this number
Dialogue: 0,0:19:37.08,0:19:39.60,Default,,0000,0000,0000,,this is a regression model and so now
Dialogue: 0,0:19:39.60,0:19:41.76,Default,,0000,0000,0000,,you can put in a new data point so a new
Dialogue: 0,0:19:41.76,0:19:45.08,Default,,0000,0000,0000,,data point means a new set of values for
Dialogue: 0,0:19:45.08,0:19:46.96,Default,,0000,0000,0000,,temperature pressure relative humidity
Dialogue: 0,0:19:46.96,0:19:48.60,Default,,0000,0000,0000,,and wind direction and your machine
Dialogue: 0,0:19:48.60,0:19:50.68,Default,,0000,0000,0000,,learning model will then predict the
Dialogue: 0,0:19:50.68,0:19:53.64,Default,,0000,0000,0000,,wind speed for that new data point okay
Dialogue: 0,0:19:53.64,0:19:57.48,Default,,0000,0000,0000,,so that's a regression model
Dialogue: 0,0:19:59.16,0:20:02.28,Default,,0000,0000,0000,,all right so in this particular topic
Dialogue: 0,0:20:02.28,0:20:04.92,Default,,0000,0000,0000,,I'm going to talk about the workflow of
Dialogue: 0,0:20:04.92,0:20:07.96,Default,,0000,0000,0000,,that's involved in machine learning so
Dialogue: 0,0:20:07.96,0:20:12.64,Default,,0000,0000,0000,,in the previous um slides I talked about
Dialogue: 0,0:20:12.64,0:20:14.60,Default,,0000,0000,0000,,developing the model all right but
Dialogue: 0,0:20:14.60,0:20:16.36,Default,,0000,0000,0000,,that's just one part of the entire
Dialogue: 0,0:20:16.36,0:20:19.08,Default,,0000,0000,0000,,workflow so in real life when you use
Dialogue: 0,0:20:19.08,0:20:20.48,Default,,0000,0000,0000,,machine learning there's an endtoend
Dialogue: 0,0:20:20.48,0:20:22.48,Default,,0000,0000,0000,,workflow that's involved so the first
Dialogue: 0,0:20:22.48,0:20:24.16,Default,,0000,0000,0000,,thing of course is you need to get your
Dialogue: 0,0:20:24.16,0:20:26.88,Default,,0000,0000,0000,,data and then you need to clean your
Dialogue: 0,0:20:26.88,0:20:29.00,Default,,0000,0000,0000,,data and then you need to exploore your
Dialogue: 0,0:20:29.00,0:20:30.80,Default,,0000,0000,0000,,data you need to see what's going on in
Dialogue: 0,0:20:30.80,0:20:33.28,Default,,0000,0000,0000,,your data set right and your data set
Dialogue: 0,0:20:33.28,0:20:35.72,Default,,0000,0000,0000,,real life data sets are not trivial they
Dialogue: 0,0:20:35.72,0:20:38.76,Default,,0000,0000,0000,,are hundreds of rows thousands of rows
Dialogue: 0,0:20:38.76,0:20:40.64,Default,,0000,0000,0000,,sometimes millions of rows billions of
Dialogue: 0,0:20:40.64,0:20:43.08,Default,,0000,0000,0000,,rows we're talking about billions or
Dialogue: 0,0:20:43.08,0:20:45.12,Default,,0000,0000,0000,,millions of data points especially if
Dialogue: 0,0:20:45.12,0:20:47.12,Default,,0000,0000,0000,,you're using an iot sensor to get data
Dialogue: 0,0:20:47.12,0:20:49.00,Default,,0000,0000,0000,,in real time so you've got all these
Dialogue: 0,0:20:49.00,0:20:51.32,Default,,0000,0000,0000,,super large data sets you need to clean
Dialogue: 0,0:20:51.32,0:20:53.40,Default,,0000,0000,0000,,them and explore them and then you need
Dialogue: 0,0:20:53.40,0:20:56.36,Default,,0000,0000,0000,,to prepare them into a right format so
Dialogue: 0,0:20:56.36,0:20:59.60,Default,,0000,0000,0000,,that you can put them into the training
Dialogue: 0,0:20:59.60,0:21:01.52,Default,,0000,0000,0000,,process to create your machine learning
Dialogue: 0,0:21:01.52,0:21:04.80,Default,,0000,0000,0000,,model and then subsequently you check
Dialogue: 0,0:21:04.80,0:21:07.56,Default,,0000,0000,0000,,how good is the model right how accurate
Dialogue: 0,0:21:07.56,0:21:10.08,Default,,0000,0000,0000,,is the model in terms of its ability to
Dialogue: 0,0:21:10.08,0:21:12.56,Default,,0000,0000,0000,,generate predictions or or for the
Dialogue: 0,0:21:12.56,0:21:14.96,Default,,0000,0000,0000,,future right how accurate are the
Dialogue: 0,0:21:14.96,0:21:16.68,Default,,0000,0000,0000,,predictions that are coming up from your
Dialogue: 0,0:21:16.68,0:21:18.40,Default,,0000,0000,0000,,machine learning model so that's
Dialogue: 0,0:21:18.40,0:21:20.76,Default,,0000,0000,0000,,validating or evaluating your model and
Dialogue: 0,0:21:20.76,0:21:22.56,Default,,0000,0000,0000,,then subsequently if you determine that
Dialogue: 0,0:21:22.56,0:21:25.40,Default,,0000,0000,0000,,your model is of adequate accuracy to
Dialogue: 0,0:21:25.40,0:21:27.24,Default,,0000,0000,0000,,meet whatever your domain use case
Dialogue: 0,0:21:27.24,0:21:29.40,Default,,0000,0000,0000,,requirements are right so let's say the
Dialogue: 0,0:21:29.40,0:21:31.44,Default,,0000,0000,0000,,accuracy that's required for your domain
Dialogue: 0,0:21:31.44,0:21:32.96,Default,,0000,0000,0000,,use case is
Dialogue: 0,0:21:32.96,0:21:35.32,Default,,0000,0000,0000,,85% okay if my machine learning model
Dialogue: 0,0:21:35.32,0:21:38.52,Default,,0000,0000,0000,,can give an 85% accuracy rate I think
Dialogue: 0,0:21:38.52,0:21:40.16,Default,,0000,0000,0000,,it's good enough then I'm going to
Dialogue: 0,0:21:40.16,0:21:42.88,Default,,0000,0000,0000,,deploy it into rail world use case so
Dialogue: 0,0:21:42.88,0:21:45.00,Default,,0000,0000,0000,,here the machine learning model gets uh
Dialogue: 0,0:21:45.00,0:21:48.44,Default,,0000,0000,0000,,deployed on the server and then um other
Dialogue: 0,0:21:48.44,0:21:50.76,Default,,0000,0000,0000,,you know other data sources are going to
Dialogue: 0,0:21:50.76,0:21:52.56,Default,,0000,0000,0000,,be captured from somewhere that data is
Dialogue: 0,0:21:52.56,0:21:54.20,Default,,0000,0000,0000,,pump into the machine learning model the
Dialogue: 0,0:21:54.20,0:21:55.44,Default,,0000,0000,0000,,machine learning model generates
Dialogue: 0,0:21:55.44,0:21:57.76,Default,,0000,0000,0000,,predictions and those predictions are
Dialogue: 0,0:21:57.76,0:21:59.60,Default,,0000,0000,0000,,then used to make decisions on the
Dialogue: 0,0:21:59.60,0:22:02.00,Default,,0000,0000,0000,,factory floor in real time or in any
Dialogue: 0,0:22:02.00,0:22:04.56,Default,,0000,0000,0000,,other particular scenario and then you
Dialogue: 0,0:22:04.56,0:22:06.84,Default,,0000,0000,0000,,constantly Monitor and update the model
Dialogue: 0,0:22:06.84,0:22:09.36,Default,,0000,0000,0000,,you get more new data and then the
Dialogue: 0,0:22:09.36,0:22:11.96,Default,,0000,0000,0000,,entire cycle repeats itself so that's
Dialogue: 0,0:22:11.96,0:22:14.48,Default,,0000,0000,0000,,your machine learning workflow okay in a
Dialogue: 0,0:22:14.48,0:22:16.92,Default,,0000,0000,0000,,nutshell uh here's another example of
Dialogue: 0,0:22:16.92,0:22:18.52,Default,,0000,0000,0000,,this the same thing maybe in a slightly
Dialogue: 0,0:22:18.52,0:22:20.04,Default,,0000,0000,0000,,different format so again you have your
Dialogue: 0,0:22:20.04,0:22:22.16,Default,,0000,0000,0000,,data collection and preparation here we
Dialogue: 0,0:22:22.16,0:22:24.36,Default,,0000,0000,0000,,talk more about the different kinds of
Dialogue: 0,0:22:24.36,0:22:26.52,Default,,0000,0000,0000,,algorithms that available to create a
Dialogue: 0,0:22:26.52,0:22:28.12,Default,,0000,0000,0000,,model and I'll talk about this more in
Dialogue: 0,0:22:28.12,0:22:30.00,Default,,0000,0000,0000,,detail when we look at the real world
Dialogue: 0,0:22:30.00,0:22:32.32,Default,,0000,0000,0000,,example of a endtoend machine learning
Dialogue: 0,0:22:32.32,0:22:34.56,Default,,0000,0000,0000,,workflow for the predictive maintenance
Dialogue: 0,0:22:34.56,0:22:36.88,Default,,0000,0000,0000,,use case so once you have chosen the
Dialogue: 0,0:22:36.88,0:22:38.84,Default,,0000,0000,0000,,appropriate algorithm you then have
Dialogue: 0,0:22:38.84,0:22:41.24,Default,,0000,0000,0000,,trained your model you then have
Dialogue: 0,0:22:41.24,0:22:44.08,Default,,0000,0000,0000,,selected the appropriate train model
Dialogue: 0,0:22:44.08,0:22:46.44,Default,,0000,0000,0000,,among the multiple models you you are
Dialogue: 0,0:22:46.44,0:22:47.80,Default,,0000,0000,0000,,probably going to develop multiple
Dialogue: 0,0:22:47.80,0:22:49.56,Default,,0000,0000,0000,,models from multiple algorithms you're
Dialogue: 0,0:22:49.56,0:22:51.68,Default,,0000,0000,0000,,going to evaluate them all and then
Dialogue: 0,0:22:51.68,0:22:53.20,Default,,0000,0000,0000,,you're going to say hey you know what
Dialogue: 0,0:22:53.20,0:22:55.28,Default,,0000,0000,0000,,after I've evaluated and tested that
Dialogue: 0,0:22:55.28,0:22:57.48,Default,,0000,0000,0000,,I've chosen the best model I'm going to
Dialogue: 0,0:22:57.48,0:22:59.64,Default,,0000,0000,0000,,deploy the model all right so this is
Dialogue: 0,0:22:59.64,0:23:02.64,Default,,0000,0000,0000,,for Real Life production use okay real
Dialogue: 0,0:23:02.64,0:23:04.28,Default,,0000,0000,0000,,life sensor data is going to be pumped
Dialogue: 0,0:23:04.28,0:23:06.04,Default,,0000,0000,0000,,into my model my model is going to
Dialogue: 0,0:23:06.04,0:23:08.04,Default,,0000,0000,0000,,generate predictions the predicted data
Dialogue: 0,0:23:08.04,0:23:10.12,Default,,0000,0000,0000,,is going to used immediately in real
Dialogue: 0,0:23:10.12,0:23:12.84,Default,,0000,0000,0000,,time for real life decision making and
Dialogue: 0,0:23:12.84,0:23:15.00,Default,,0000,0000,0000,,then I'm going to monitor right the
Dialogue: 0,0:23:15.00,0:23:17.44,Default,,0000,0000,0000,,results so somebody's using the
Dialogue: 0,0:23:17.44,0:23:19.28,Default,,0000,0000,0000,,predictions from my model if the
Dialogue: 0,0:23:19.28,0:23:21.88,Default,,0000,0000,0000,,predictions are lousy that goes into the
Dialogue: 0,0:23:21.88,0:23:23.44,Default,,0000,0000,0000,,monitoring the monitoring system
Dialogue: 0,0:23:23.44,0:23:25.28,Default,,0000,0000,0000,,captures that if the predictions are
Dialogue: 0,0:23:25.28,0:23:27.72,Default,,0000,0000,0000,,fantastic well that also captured by the
Dialogue: 0,0:23:27.72,0:23:29.80,Default,,0000,0000,0000,,monitoring system system and that gets
Dialogue: 0,0:23:29.80,0:23:32.36,Default,,0000,0000,0000,,feedback again to the next cycle of my
Dialogue: 0,0:23:32.36,0:23:33.68,Default,,0000,0000,0000,,machine learning
Dialogue: 0,0:23:33.68,0:23:35.96,Default,,0000,0000,0000,,pipeline okay so that's the kind of
Dialogue: 0,0:23:35.96,0:23:38.36,Default,,0000,0000,0000,,overall View and here are the kind of
Dialogue: 0,0:23:38.36,0:23:41.56,Default,,0000,0000,0000,,key phases of your workflow so one of
Dialogue: 0,0:23:41.56,0:23:43.96,Default,,0000,0000,0000,,the important phases is called Eda
Dialogue: 0,0:23:43.96,0:23:47.52,Default,,0000,0000,0000,,exploratory data analysis and in this
Dialogue: 0,0:23:47.52,0:23:49.88,Default,,0000,0000,0000,,particular uh phase uh you're going to
Dialogue: 0,0:23:49.88,0:23:53.12,Default,,0000,0000,0000,,do a lot of stuff primarily just to
Dialogue: 0,0:23:53.12,0:23:54.88,Default,,0000,0000,0000,,understand your data set so like I said
Dialogue: 0,0:23:54.88,0:23:56.56,Default,,0000,0000,0000,,real life data sets they tend to be very
Dialogue: 0,0:23:56.56,0:23:59.32,Default,,0000,0000,0000,,complex and they tend to have various
Dialogue: 0,0:23:59.32,0:24:01.04,Default,,0000,0000,0000,,statistical properties all right
Dialogue: 0,0:24:01.04,0:24:02.68,Default,,0000,0000,0000,,statistics is a very important component
Dialogue: 0,0:24:02.68,0:24:05.60,Default,,0000,0000,0000,,of machine learning so an Eda helps you
Dialogue: 0,0:24:05.60,0:24:07.48,Default,,0000,0000,0000,,to kind of get an overview of your data
Dialogue: 0,0:24:07.48,0:24:09.68,Default,,0000,0000,0000,,set get an overview of any problems in
Dialogue: 0,0:24:09.68,0:24:11.52,Default,,0000,0000,0000,,your data set like any data that's
Dialogue: 0,0:24:11.52,0:24:13.44,Default,,0000,0000,0000,,missing the statistical properties your
Dialogue: 0,0:24:13.44,0:24:15.16,Default,,0000,0000,0000,,data set the distribution of your data
Dialogue: 0,0:24:15.16,0:24:17.28,Default,,0000,0000,0000,,set the statistical correlation of
Dialogue: 0,0:24:17.28,0:24:19.64,Default,,0000,0000,0000,,variables in your data set etc
Dialogue: 0,0:24:19.64,0:24:23.40,Default,,0000,0000,0000,,etc okay then we have data cleaning or
Dialogue: 0,0:24:23.40,0:24:25.28,Default,,0000,0000,0000,,sometimes you call it data cleansing and
Dialogue: 0,0:24:25.28,0:24:27.60,Default,,0000,0000,0000,,in this phase what you want to do is
Dialogue: 0,0:24:27.60,0:24:29.44,Default,,0000,0000,0000,,primarily you want to kind of do things
Dialogue: 0,0:24:29.44,0:24:31.96,Default,,0000,0000,0000,,like remove duplicate records or rows in
Dialogue: 0,0:24:31.96,0:24:33.68,Default,,0000,0000,0000,,your table you want to make sure that
Dialogue: 0,0:24:33.68,0:24:36.80,Default,,0000,0000,0000,,there I your your data or your data
Dialogue: 0,0:24:36.80,0:24:39.40,Default,,0000,0000,0000,,points your samples have appropriate IDs
Dialogue: 0,0:24:39.40,0:24:41.08,Default,,0000,0000,0000,,and most importantly you want to make
Dialogue: 0,0:24:41.08,0:24:43.04,Default,,0000,0000,0000,,sure there's not too many missing values
Dialogue: 0,0:24:43.04,0:24:44.88,Default,,0000,0000,0000,,in your data set so what I mean by
Dialogue: 0,0:24:44.88,0:24:46.32,Default,,0000,0000,0000,,missing values are things like that
Dialogue: 0,0:24:46.32,0:24:48.20,Default,,0000,0000,0000,,right you have got a data set and for
Dialogue: 0,0:24:48.20,0:24:51.64,Default,,0000,0000,0000,,some reason there are some cells or
Dialogue: 0,0:24:51.64,0:24:54.56,Default,,0000,0000,0000,,locations in your data set which are
Dialogue: 0,0:24:54.56,0:24:56.52,Default,,0000,0000,0000,,missing values right and if you have a
Dialogue: 0,0:24:56.52,0:24:58.68,Default,,0000,0000,0000,,lot of these missing values then you've
Dialogue: 0,0:24:58.68,0:25:00.44,Default,,0000,0000,0000,,got a poor quality data set and you're
Dialogue: 0,0:25:00.44,0:25:02.20,Default,,0000,0000,0000,,not going to be able to build a good
Dialogue: 0,0:25:02.20,0:25:04.16,Default,,0000,0000,0000,,model from this data set you're not
Dialogue: 0,0:25:04.16,0:25:06.00,Default,,0000,0000,0000,,going to be able to train a good machine
Dialogue: 0,0:25:06.00,0:25:08.12,Default,,0000,0000,0000,,learning model from a data set with a
Dialogue: 0,0:25:08.12,0:25:10.20,Default,,0000,0000,0000,,lot of missing values like this so you
Dialogue: 0,0:25:10.20,0:25:11.88,Default,,0000,0000,0000,,have to figure out whether there are a
Dialogue: 0,0:25:11.88,0:25:13.40,Default,,0000,0000,0000,,lot of missing values in your data set
Dialogue: 0,0:25:13.40,0:25:15.40,Default,,0000,0000,0000,,how do you handle them another thing
Dialogue: 0,0:25:15.40,0:25:16.92,Default,,0000,0000,0000,,that's important in data cleansing is
Dialogue: 0,0:25:16.92,0:25:18.80,Default,,0000,0000,0000,,figuring out the outliers in your data
Dialogue: 0,0:25:18.80,0:25:21.92,Default,,0000,0000,0000,,set so uh outliers are things like this
Dialogue: 0,0:25:21.92,0:25:24.04,Default,,0000,0000,0000,,you know data points are very far from
Dialogue: 0,0:25:24.04,0:25:26.44,Default,,0000,0000,0000,,the general trend of data points in your
Dialogue: 0,0:25:26.44,0:25:29.56,Default,,0000,0000,0000,,data set right and and so there are also
Dialogue: 0,0:25:29.56,0:25:31.92,Default,,0000,0000,0000,,several ways to detect outliers in your
Dialogue: 0,0:25:31.92,0:25:34.20,Default,,0000,0000,0000,,data set and there are several ways to
Dialogue: 0,0:25:34.20,0:25:36.64,Default,,0000,0000,0000,,handle outliers in your data set
Dialogue: 0,0:25:36.64,0:25:38.20,Default,,0000,0000,0000,,similarly as well there are several ways
Dialogue: 0,0:25:38.20,0:25:39.96,Default,,0000,0000,0000,,to handle missing values in your data
Dialogue: 0,0:25:39.96,0:25:42.88,Default,,0000,0000,0000,,set so handling missing values handling
Dialogue: 0,0:25:42.88,0:25:45.68,Default,,0000,0000,0000,,outliers those are really two very key
Dialogue: 0,0:25:45.68,0:25:47.28,Default,,0000,0000,0000,,importance of data
Dialogue: 0,0:25:47.28,0:25:49.12,Default,,0000,0000,0000,,cleansing and there are many many
Dialogue: 0,0:25:49.12,0:25:50.76,Default,,0000,0000,0000,,techniques to handle this so a data
Dialogue: 0,0:25:50.76,0:25:52.00,Default,,0000,0000,0000,,scientist needs to be acquainted with
Dialogue: 0,0:25:52.00,0:25:55.36,Default,,0000,0000,0000,,all of this all right why do I need to
Dialogue: 0,0:25:55.36,0:25:58.00,Default,,0000,0000,0000,,do data cleansing well here is the key
Dialogue: 0,0:25:58.00,0:25:59.36,Default,,0000,0000,0000,,point
Dialogue: 0,0:25:59.36,0:26:02.80,Default,,0000,0000,0000,,if you have a very poor quality data set
Dialogue: 0,0:26:02.80,0:26:04.88,Default,,0000,0000,0000,,which means youve got a lot of outliers
Dialogue: 0,0:26:04.88,0:26:06.72,Default,,0000,0000,0000,,which are errors in your data set or you
Dialogue: 0,0:26:06.72,0:26:08.16,Default,,0000,0000,0000,,got a lot of missing values in your data
Dialogue: 0,0:26:08.16,0:26:10.84,Default,,0000,0000,0000,,set even though youve got a fantastic
Dialogue: 0,0:26:10.84,0:26:13.04,Default,,0000,0000,0000,,algorithm you've got a fantastic model
Dialogue: 0,0:26:13.04,0:26:15.72,Default,,0000,0000,0000,,the predictions that your model is going
Dialogue: 0,0:26:15.72,0:26:18.96,Default,,0000,0000,0000,,to give is absolutely rubbish it's kind
Dialogue: 0,0:26:18.96,0:26:22.08,Default,,0000,0000,0000,,of like taking water and putting water
Dialogue: 0,0:26:22.08,0:26:26.00,Default,,0000,0000,0000,,into the tank of a mercedesbenz so
Dialogue: 0,0:26:26.00,0:26:28.44,Default,,0000,0000,0000,,Mercedes-Benz is a great car but if you
Dialogue: 0,0:26:28.44,0:26:30.08,Default,,0000,0000,0000,,take water and put it into your
Dialogue: 0,0:26:30.08,0:26:33.40,Default,,0000,0000,0000,,mercedes-ben it will just die right your
Dialogue: 0,0:26:33.40,0:26:36.52,Default,,0000,0000,0000,,car will just die can't run on on water
Dialogue: 0,0:26:36.52,0:26:38.28,Default,,0000,0000,0000,,right on the other hand if you have a
Dialogue: 0,0:26:38.28,0:26:41.56,Default,,0000,0000,0000,,myv myv is just a lousy car but if
Dialogue: 0,0:26:41.56,0:26:44.84,Default,,0000,0000,0000,,you take a high octane good Patrol and
Dialogue: 0,0:26:44.84,0:26:47.24,Default,,0000,0000,0000,,you point to a MV the MV will just go at
Dialogue: 0,0:26:47.24,0:26:49.48,Default,,0000,0000,0000,,you know 100 Mil hour it which just
Dialogue: 0,0:26:49.48,0:26:51.16,Default,,0000,0000,0000,,completely destroy the Mercedes-Benz in
Dialogue: 0,0:26:51.16,0:26:53.36,Default,,0000,0000,0000,,terms of performance so it doesn't it
Dialogue: 0,0:26:53.36,0:26:54.80,Default,,0000,0000,0000,,doesn't really matter what model you're
Dialogue: 0,0:26:54.80,0:26:57.08,Default,,0000,0000,0000,,using right so you can be using the most
Dialogue: 0,0:26:57.08,0:26:58.68,Default,,0000,0000,0000,,Fantastic Model like the the
Dialogue: 0,0:26:58.68,0:27:01.20,Default,,0000,0000,0000,,mercedesbenz or machine learning but if
Dialogue: 0,0:27:01.20,0:27:03.08,Default,,0000,0000,0000,,your data is lousy quality your
Dialogue: 0,0:27:03.08,0:27:06.48,Default,,0000,0000,0000,,predictions is also going to be rubbish
Dialogue: 0,0:27:06.48,0:27:10.00,Default,,0000,0000,0000,,okay so cleansing data set is in fact
Dialogue: 0,0:27:10.00,0:27:11.88,Default,,0000,0000,0000,,probably the most important thing that
Dialogue: 0,0:27:11.88,0:27:13.64,Default,,0000,0000,0000,,data scientists need to do and that's
Dialogue: 0,0:27:13.64,0:27:15.52,Default,,0000,0000,0000,,what they spend most of the time doing
Dialogue: 0,0:27:15.52,0:27:17.60,Default,,0000,0000,0000,,right building the model trading the
Dialogue: 0,0:27:17.60,0:27:20.24,Default,,0000,0000,0000,,model getting the right algorithms and
Dialogue: 0,0:27:20.24,0:27:23.24,Default,,0000,0000,0000,,so on that's really a small portion of
Dialogue: 0,0:27:23.24,0:27:25.20,Default,,0000,0000,0000,,the actual machine learning workflow
Dialogue: 0,0:27:25.20,0:27:27.36,Default,,0000,0000,0000,,right the actual uh machine learning
Dialogue: 0,0:27:27.36,0:27:29.68,Default,,0000,0000,0000,,workflow the vast majority of time is on
Dialogue: 0,0:27:29.68,0:27:31.56,Default,,0000,0000,0000,,cleaning and organizing your
Dialogue: 0,0:27:31.56,0:27:33.36,Default,,0000,0000,0000,,data then you have something called
Dialogue: 0,0:27:33.36,0:27:35.08,Default,,0000,0000,0000,,feature engineering which is you
Dialogue: 0,0:27:35.08,0:27:37.00,Default,,0000,0000,0000,,pre-process the feature variables of
Dialogue: 0,0:27:37.00,0:27:38.92,Default,,0000,0000,0000,,your original data set prior to using
Dialogue: 0,0:27:38.92,0:27:40.60,Default,,0000,0000,0000,,them to train the model and this is
Dialogue: 0,0:27:40.60,0:27:41.96,Default,,0000,0000,0000,,either through addition deletion
Dialogue: 0,0:27:41.96,0:27:43.60,Default,,0000,0000,0000,,combination or transformation of these
Dialogue: 0,0:27:43.60,0:27:45.40,Default,,0000,0000,0000,,variables and then the idea is you want
Dialogue: 0,0:27:45.40,0:27:47.00,Default,,0000,0000,0000,,to improve the predictive accuracy of
Dialogue: 0,0:27:47.00,0:27:49.32,Default,,0000,0000,0000,,the model and also because some models
Dialogue: 0,0:27:49.32,0:27:51.08,Default,,0000,0000,0000,,can only work with numeric data so you
Dialogue: 0,0:27:51.08,0:27:53.72,Default,,0000,0000,0000,,need to transform categorical data into
Dialogue: 0,0:27:53.72,0:27:57.04,Default,,0000,0000,0000,,numeric data all right so just now um in
Dialogue: 0,0:27:57.04,0:27:58.80,Default,,0000,0000,0000,,the earlier slides I showed you that you
Dialogue: 0,0:27:58.80,0:28:00.76,Default,,0000,0000,0000,,take your original data set you pum it
Dialogue: 0,0:28:00.76,0:28:03.20,Default,,0000,0000,0000,,into algorithm and then couple of hours
Dialogue: 0,0:28:03.20,0:28:05.20,Default,,0000,0000,0000,,later you get a machine learning model
Dialogue: 0,0:28:05.20,0:28:08.64,Default,,0000,0000,0000,,right so you didn't do anything to your
Dialogue: 0,0:28:08.64,0:28:10.16,Default,,0000,0000,0000,,data set to the feature variables in
Dialogue: 0,0:28:10.16,0:28:12.16,Default,,0000,0000,0000,,your data set before you pump it into a
Dialogue: 0,0:28:12.16,0:28:14.40,Default,,0000,0000,0000,,machine machine learning algorithm so
Dialogue: 0,0:28:14.40,0:28:15.84,Default,,0000,0000,0000,,what I showed you earlier is you just
Dialogue: 0,0:28:15.84,0:28:18.92,Default,,0000,0000,0000,,take the data set exactly as it is and
Dialogue: 0,0:28:18.92,0:28:20.80,Default,,0000,0000,0000,,you just pump it into the algorithm
Dialogue: 0,0:28:20.80,0:28:23.12,Default,,0000,0000,0000,,couple of hours later you get the model
Dialogue: 0,0:28:23.12,0:28:27.64,Default,,0000,0000,0000,,right uh but that's not what generally
Dialogue: 0,0:28:27.64,0:28:29.60,Default,,0000,0000,0000,,happens in in real life in real life
Dialogue: 0,0:28:29.60,0:28:31.56,Default,,0000,0000,0000,,you're going to take all the original
Dialogue: 0,0:28:31.56,0:28:34.32,Default,,0000,0000,0000,,feature variables from your data set and
Dialogue: 0,0:28:34.32,0:28:36.72,Default,,0000,0000,0000,,you're going to transform them in some
Dialogue: 0,0:28:36.72,0:28:38.96,Default,,0000,0000,0000,,way so you can see here these are the
Dialogue: 0,0:28:38.96,0:28:42.12,Default,,0000,0000,0000,,colums of data from my original data set
Dialogue: 0,0:28:42.12,0:28:46.04,Default,,0000,0000,0000,,and before I actually put all these data
Dialogue: 0,0:28:46.04,0:28:48.24,Default,,0000,0000,0000,,points from my original data set into my
Dialogue: 0,0:28:48.24,0:28:50.72,Default,,0000,0000,0000,,algorithm to train and get my model I
Dialogue: 0,0:28:50.72,0:28:54.96,Default,,0000,0000,0000,,will actually transform them okay so the
Dialogue: 0,0:28:54.96,0:28:57.60,Default,,0000,0000,0000,,transformation of these feature variable
Dialogue: 0,0:28:57.60,0:29:00.60,Default,,0000,0000,0000,,values we call this feature engineering
Dialogue: 0,0:29:00.60,0:29:02.44,Default,,0000,0000,0000,,and there are many many techniques to do
Dialogue: 0,0:29:02.44,0:29:04.96,Default,,0000,0000,0000,,feature engineering so one hot encoding
Dialogue: 0,0:29:04.96,0:29:08.28,Default,,0000,0000,0000,,scaling log transformation descri
Dialogue: 0,0:29:08.28,0:29:10.48,Default,,0000,0000,0000,,discretization date extraction Boolean
Dialogue: 0,0:29:10.48,0:29:12.04,Default,,0000,0000,0000,,logic etc
Dialogue: 0,0:29:12.04,0:29:14.88,Default,,0000,0000,0000,,etc okay then finally we do something
Dialogue: 0,0:29:14.88,0:29:16.80,Default,,0000,0000,0000,,called a train test plate so where we
Dialogue: 0,0:29:16.80,0:29:19.44,Default,,0000,0000,0000,,take our original data set right so this
Dialogue: 0,0:29:19.44,0:29:21.36,Default,,0000,0000,0000,,was the original data set and we break
Dialogue: 0,0:29:21.36,0:29:23.72,Default,,0000,0000,0000,,it into two parts so one is called the
Dialogue: 0,0:29:23.72,0:29:25.76,Default,,0000,0000,0000,,training data set and the other is
Dialogue: 0,0:29:25.76,0:29:28.12,Default,,0000,0000,0000,,called the test data set and the primary
Dialogue: 0,0:29:28.12,0:29:30.00,Default,,0000,0000,0000,,purpose for this is when we feed and
Dialogue: 0,0:29:30.00,0:29:31.40,Default,,0000,0000,0000,,train the machine learning model we're
Dialogue: 0,0:29:31.40,0:29:32.64,Default,,0000,0000,0000,,going to use what is called the training
Dialogue: 0,0:29:32.64,0:29:35.56,Default,,0000,0000,0000,,data set and we when we want to evaluate
Dialogue: 0,0:29:35.56,0:29:37.40,Default,,0000,0000,0000,,the accuracy of the model right so this
Dialogue: 0,0:29:37.40,0:29:40.96,Default,,0000,0000,0000,,is the key part of your machine learning
Dialogue: 0,0:29:40.96,0:29:43.64,Default,,0000,0000,0000,,life cycle because you are not only just
Dialogue: 0,0:29:43.64,0:29:45.44,Default,,0000,0000,0000,,going to have one possible models
Dialogue: 0,0:29:45.44,0:29:47.72,Default,,0000,0000,0000,,because there are a vast range of
Dialogue: 0,0:29:47.72,0:29:50.08,Default,,0000,0000,0000,,algorithms that you can use to create a
Dialogue: 0,0:29:50.08,0:29:53.00,Default,,0000,0000,0000,,model so fundamentally you have a wide
Dialogue: 0,0:29:53.00,0:29:55.68,Default,,0000,0000,0000,,range of choices right like wide range
Dialogue: 0,0:29:55.68,0:29:57.64,Default,,0000,0000,0000,,of cars right you want to buy a car you
Dialogue: 0,0:29:57.64,0:30:00.56,Default,,0000,0000,0000,,can buy buy a myv you can buy a paroda
Dialogue: 0,0:30:00.56,0:30:02.64,Default,,0000,0000,0000,,you can buy a Honda you can buy a
Dialogue: 0,0:30:02.64,0:30:05.04,Default,,0000,0000,0000,,mercedesbenz you can buy a Audi you can
Dialogue: 0,0:30:05.04,0:30:07.76,Default,,0000,0000,0000,,buy a beamer many many different cars
Dialogue: 0,0:30:07.76,0:30:09.24,Default,,0000,0000,0000,,you that available for you if you want
Dialogue: 0,0:30:09.24,0:30:11.68,Default,,0000,0000,0000,,to buy a car right same thing with a
Dialogue: 0,0:30:11.68,0:30:14.36,Default,,0000,0000,0000,,machine learning model that are aast
Dialogue: 0,0:30:14.36,0:30:16.72,Default,,0000,0000,0000,,variety of algorithms that you can
Dialogue: 0,0:30:16.72,0:30:19.48,Default,,0000,0000,0000,,choose from in order to create a model
Dialogue: 0,0:30:19.48,0:30:21.52,Default,,0000,0000,0000,,and so once you create a model from a
Dialogue: 0,0:30:21.52,0:30:24.48,Default,,0000,0000,0000,,given algorithm you need to say hey how
Dialogue: 0,0:30:24.48,0:30:26.44,Default,,0000,0000,0000,,accurate is this model that have created
Dialogue: 0,0:30:26.44,0:30:28.64,Default,,0000,0000,0000,,from this algorithm and and different
Dialogue: 0,0:30:28.64,0:30:30.40,Default,,0000,0000,0000,,algorithms are going to create different
Dialogue: 0,0:30:30.40,0:30:33.72,Default,,0000,0000,0000,,models with different rates of accuracy
Dialogue: 0,0:30:33.72,0:30:35.68,Default,,0000,0000,0000,,and so the primary purpose of the test
Dialogue: 0,0:30:35.68,0:30:38.20,Default,,0000,0000,0000,,data set is to evaluate the ACC accuracy
Dialogue: 0,0:30:38.20,0:30:41.48,Default,,0000,0000,0000,,of the model to see hey is this model
Dialogue: 0,0:30:41.48,0:30:43.36,Default,,0000,0000,0000,,that I've created using this algorithm
Dialogue: 0,0:30:43.36,0:30:45.88,Default,,0000,0000,0000,,is it adequate for me to use in a real
Dialogue: 0,0:30:45.88,0:30:48.60,Default,,0000,0000,0000,,life production use case Okay so that's
Dialogue: 0,0:30:48.60,0:30:52.32,Default,,0000,0000,0000,,what it's all about okay so this is my
Dialogue: 0,0:30:52.32,0:30:54.28,Default,,0000,0000,0000,,original data set I break it into my
Dialogue: 0,0:30:54.28,0:30:56.56,Default,,0000,0000,0000,,feature data uh feature data set and
Dialogue: 0,0:30:56.56,0:30:58.52,Default,,0000,0000,0000,,also my target variable colum so my
Dialogue: 0,0:30:58.52,0:31:00.64,Default,,0000,0000,0000,,feature variable uh colums the target
Dialogue: 0,0:31:00.64,0:31:02.20,Default,,0000,0000,0000,,variable colums and then I further break
Dialogue: 0,0:31:02.20,0:31:04.24,Default,,0000,0000,0000,,it into a training data set and a test
Dialogue: 0,0:31:04.24,0:31:06.60,Default,,0000,0000,0000,,data set the training data set is to use
Dialogue: 0,0:31:06.60,0:31:08.32,Default,,0000,0000,0000,,the train to create the machine learning
Dialogue: 0,0:31:08.32,0:31:10.48,Default,,0000,0000,0000,,model and then once the machine learning
Dialogue: 0,0:31:10.48,0:31:12.20,Default,,0000,0000,0000,,model is created I then use the test
Dialogue: 0,0:31:12.20,0:31:15.08,Default,,0000,0000,0000,,data set to evaluate the accuracy of the
Dialogue: 0,0:31:15.08,0:31:16.28,Default,,0000,0000,0000,,machine learning
Dialogue: 0,0:31:16.28,0:31:21.00,Default,,0000,0000,0000,,model all right and then finally we can
Dialogue: 0,0:31:21.00,0:31:23.20,Default,,0000,0000,0000,,see what are the different parts or
Dialogue: 0,0:31:23.20,0:31:26.08,Default,,0000,0000,0000,,aspects that go into a successful model
Dialogue: 0,0:31:26.08,0:31:29.52,Default,,0000,0000,0000,,so Eda about 10% data cleansing about
Dialogue: 0,0:31:29.52,0:31:32.36,Default,,0000,0000,0000,,20% feature engineering about
Dialogue: 0,0:31:32.36,0:31:36.32,Default,,0000,0000,0000,,25% selecting a specific algorithm about
Dialogue: 0,0:31:36.32,0:31:39.12,Default,,0000,0000,0000,,10% and then training the model from
Dialogue: 0,0:31:39.12,0:31:41.64,Default,,0000,0000,0000,,that algorithm about 15% and then
Dialogue: 0,0:31:41.64,0:31:43.68,Default,,0000,0000,0000,,finally evaluating the model deciding
Dialogue: 0,0:31:43.68,0:31:45.96,Default,,0000,0000,0000,,which is the best model with the highest
Dialogue: 0,0:31:45.96,0:31:50.68,Default,,0000,0000,0000,,accuracy rate that's about
Dialogue: 0,0:31:54.08,0:31:56.92,Default,,0000,0000,0000,,20% all right so we have reached the
Dialogue: 0,0:31:56.92,0:31:58.88,Default,,0000,0000,0000,,most interesting part of this
Dialogue: 0,0:31:58.88,0:32:01.04,Default,,0000,0000,0000,,presentation which is the demonstration
Dialogue: 0,0:32:01.04,0:32:03.76,Default,,0000,0000,0000,,of an endtoend machine learning workflow
Dialogue: 0,0:32:03.76,0:32:06.08,Default,,0000,0000,0000,,on a real life data set that
Dialogue: 0,0:32:06.08,0:32:10.08,Default,,0000,0000,0000,,demonstrates the use case of predictive
Dialogue: 0,0:32:10.08,0:32:13.52,Default,,0000,0000,0000,,maintenance so the for the data set for
Dialogue: 0,0:32:13.52,0:32:16.24,Default,,0000,0000,0000,,this particular use case I've used a
Dialogue: 0,0:32:16.24,0:32:19.20,Default,,0000,0000,0000,,data set from kegle so for those of you
Dialogue: 0,0:32:19.20,0:32:21.40,Default,,0000,0000,0000,,are not aware of this kegle is the
Dialogue: 0,0:32:21.40,0:32:24.88,Default,,0000,0000,0000,,world's largest open-source Community
Dialogue: 0,0:32:24.88,0:32:28.08,Default,,0000,0000,0000,,for data science and Ai and they have a
Dialogue: 0,0:32:28.08,0:32:31.16,Default,,0000,0000,0000,,large collection of data sets from all
Dialogue: 0,0:32:31.16,0:32:34.44,Default,,0000,0000,0000,,various uh areas of industry and human
Dialogue: 0,0:32:34.44,0:32:37.04,Default,,0000,0000,0000,,endeavor and they also have a large
Dialogue: 0,0:32:37.04,0:32:38.84,Default,,0000,0000,0000,,collection of models that have been
Dialogue: 0,0:32:38.84,0:32:42.88,Default,,0000,0000,0000,,developed using these data sets so here
Dialogue: 0,0:32:42.88,0:32:47.04,Default,,0000,0000,0000,,we have a data set for the particular
Dialogue: 0,0:32:47.04,0:32:50.52,Default,,0000,0000,0000,,use case predictive maintenance okay so
Dialogue: 0,0:32:50.52,0:32:52.92,Default,,0000,0000,0000,,this is some information about the data
Dialogue: 0,0:32:52.92,0:32:56.44,Default,,0000,0000,0000,,set uh so in case um you do not know how
Dialogue: 0,0:32:56.44,0:32:59.20,Default,,0000,0000,0000,,to get to there this is the URL to click
Dialogue: 0,0:32:59.20,0:33:02.24,Default,,0000,0000,0000,,on okay to get to that data set so once
Dialogue: 0,0:33:02.24,0:33:05.12,Default,,0000,0000,0000,,you at the data set here you can or the
Dialogue: 0,0:33:05.12,0:33:07.40,Default,,0000,0000,0000,,page for about this data set you can see
Dialogue: 0,0:33:07.40,0:33:09.96,Default,,0000,0000,0000,,all the information about this data set
Dialogue: 0,0:33:09.96,0:33:13.04,Default,,0000,0000,0000,,and you can download the data set in a
Dialogue: 0,0:33:13.04,0:33:14.16,Default,,0000,0000,0000,,CSV
Dialogue: 0,0:33:14.16,0:33:16.36,Default,,0000,0000,0000,,format okay so let's take a look at the
Dialogue: 0,0:33:16.36,0:33:19.56,Default,,0000,0000,0000,,data set so this data set has a total of
Dialogue: 0,0:33:19.56,0:33:23.44,Default,,0000,0000,0000,,10,000 samples okay and these are the
Dialogue: 0,0:33:23.44,0:33:26.28,Default,,0000,0000,0000,,feature variables the type the product
Dialogue: 0,0:33:26.28,0:33:28.44,Default,,0000,0000,0000,,ID the add temperature process
Dialogue: 0,0:33:28.44,0:33:31.00,Default,,0000,0000,0000,,temperature rotational speed talk tool
Dialogue: 0,0:33:31.00,0:33:34.80,Default,,0000,0000,0000,,Weare and this is the target variable
Dialogue: 0,0:33:34.80,0:33:36.72,Default,,0000,0000,0000,,all right so the target variable is what
Dialogue: 0,0:33:36.72,0:33:38.16,Default,,0000,0000,0000,,we are interested in what we are
Dialogue: 0,0:33:38.16,0:33:40.96,Default,,0000,0000,0000,,interested in using to train the machine
Dialogue: 0,0:33:40.96,0:33:42.60,Default,,0000,0000,0000,,learning model and also what we
Dialogue: 0,0:33:42.60,0:33:45.28,Default,,0000,0000,0000,,interested to predict okay so these are
Dialogue: 0,0:33:45.28,0:33:47.96,Default,,0000,0000,0000,,the feature variables they describe or
Dialogue: 0,0:33:47.96,0:33:49.96,Default,,0000,0000,0000,,they provide information about this
Dialogue: 0,0:33:49.96,0:33:52.88,Default,,0000,0000,0000,,particular machine on the production
Dialogue: 0,0:33:52.88,0:33:55.08,Default,,0000,0000,0000,,line on the assembly line so you might
Dialogue: 0,0:33:55.08,0:33:56.80,Default,,0000,0000,0000,,know the product ID the type the air
Dialogue: 0,0:33:56.80,0:33:58.12,Default,,0000,0000,0000,,temperature process temperature
Dialogue: 0,0:33:58.12,0:34:00.48,Default,,0000,0000,0000,,rotational speed talk to where right so
Dialogue: 0,0:34:00.48,0:34:03.16,Default,,0000,0000,0000,,let's say you've got a iot sensor system
Dialogue: 0,0:34:03.16,0:34:06.12,Default,,0000,0000,0000,,that's basically capturing all this data
Dialogue: 0,0:34:06.12,0:34:08.36,Default,,0000,0000,0000,,about a product or a machine on your
Dialogue: 0,0:34:08.36,0:34:10.68,Default,,0000,0000,0000,,production or assembly line okay and
Dialogue: 0,0:34:10.68,0:34:13.92,Default,,0000,0000,0000,,you've also captured information about
Dialogue: 0,0:34:13.92,0:34:17.20,Default,,0000,0000,0000,,whether is for a specific uh sample
Dialogue: 0,0:34:17.20,0:34:19.84,Default,,0000,0000,0000,,whether that sample uh experien a
Dialogue: 0,0:34:19.84,0:34:23.04,Default,,0000,0000,0000,,failure or not okay so the target value
Dialogue: 0,0:34:23.04,0:34:25.52,Default,,0000,0000,0000,,of zero okay indicates that there's no
Dialogue: 0,0:34:25.52,0:34:28.00,Default,,0000,0000,0000,,failure so zero means no failure and we
Dialogue: 0,0:34:28.00,0:34:30.20,Default,,0000,0000,0000,,can see that the vast majority of data
Dialogue: 0,0:34:30.20,0:34:32.52,Default,,0000,0000,0000,,points in this data set are no failure
Dialogue: 0,0:34:32.52,0:34:34.00,Default,,0000,0000,0000,,and here we can see an example here
Dialogue: 0,0:34:34.00,0:34:36.72,Default,,0000,0000,0000,,where you have a case of a failure so a
Dialogue: 0,0:34:36.72,0:34:40.16,Default,,0000,0000,0000,,failure is marked as a one positive and
Dialogue: 0,0:34:40.16,0:34:42.64,Default,,0000,0000,0000,,no failure is marked as zero negative
Dialogue: 0,0:34:42.64,0:34:44.88,Default,,0000,0000,0000,,all right so here we have one type of a
Dialogue: 0,0:34:44.88,0:34:47.04,Default,,0000,0000,0000,,failure it's called a power failure and
Dialogue: 0,0:34:47.04,0:34:49.00,Default,,0000,0000,0000,,if you scroll down the data set you see
Dialogue: 0,0:34:49.00,0:34:50.40,Default,,0000,0000,0000,,there are also other kinds of failures
Dialogue: 0,0:34:50.40,0:34:52.84,Default,,0000,0000,0000,,like a towar
Dialogue: 0,0:34:52.84,0:34:56.96,Default,,0000,0000,0000,,failure uh we have a over strain failure
Dialogue: 0,0:34:56.96,0:34:58.68,Default,,0000,0000,0000,,here for example
Dialogue: 0,0:34:58.68,0:35:00.76,Default,,0000,0000,0000,,uh we also have a power failure again
Dialogue: 0,0:35:00.76,0:35:02.20,Default,,0000,0000,0000,,and so on so if you scroll down through
Dialogue: 0,0:35:02.20,0:35:04.16,Default,,0000,0000,0000,,these 10,000 data points and or if
Dialogue: 0,0:35:04.16,0:35:06.04,Default,,0000,0000,0000,,you're familiar with using Excel to
Dialogue: 0,0:35:06.04,0:35:08.84,Default,,0000,0000,0000,,filter out values in a colume you can
Dialogue: 0,0:35:08.84,0:35:12.28,Default,,0000,0000,0000,,see that in this particular colume here
Dialogue: 0,0:35:12.28,0:35:14.48,Default,,0000,0000,0000,,which is the so-called Target variable
Dialogue: 0,0:35:14.48,0:35:16.96,Default,,0000,0000,0000,,colume you are going to have the vast
Dialogue: 0,0:35:16.96,0:35:18.92,Default,,0000,0000,0000,,majority of values as zero which means
Dialogue: 0,0:35:18.92,0:35:22.76,Default,,0000,0000,0000,,no failure and some of the rows or the
Dialogue: 0,0:35:22.76,0:35:24.04,Default,,0000,0000,0000,,data points you are going to have a
Dialogue: 0,0:35:24.04,0:35:26.36,Default,,0000,0000,0000,,value of one and for those rows that you
Dialogue: 0,0:35:26.36,0:35:28.12,Default,,0000,0000,0000,,have a value of one for example example
Dialogue: 0,0:35:28.12,0:35:31.28,Default,,0000,0000,0000,,here you are sorry for example here you
Dialogue: 0,0:35:31.28,0:35:32.84,Default,,0000,0000,0000,,are going to have different types of
Dialogue: 0,0:35:32.84,0:35:34.64,Default,,0000,0000,0000,,failure so like I said just now power
Dialogue: 0,0:35:34.64,0:35:38.96,Default,,0000,0000,0000,,failure tool set filia etc etc so we are
Dialogue: 0,0:35:38.96,0:35:40.64,Default,,0000,0000,0000,,going to go through the entire machine
Dialogue: 0,0:35:40.64,0:35:43.60,Default,,0000,0000,0000,,learning workflow process with this data
Dialogue: 0,0:35:43.60,0:35:46.64,Default,,0000,0000,0000,,set so to see an example of that we are
Dialogue: 0,0:35:46.64,0:35:50.40,Default,,0000,0000,0000,,going to use a we're going to go to the
Dialogue: 0,0:35:50.40,0:35:52.28,Default,,0000,0000,0000,,code section here all right so if I
Dialogue: 0,0:35:52.28,0:35:54.28,Default,,0000,0000,0000,,click on the code section here and right
Dialogue: 0,0:35:54.28,0:35:56.40,Default,,0000,0000,0000,,down here we have see what is called a
Dialogue: 0,0:35:56.40,0:35:59.36,Default,,0000,0000,0000,,data set notebook so this is basically a
Dialogue: 0,0:35:59.36,0:36:02.32,Default,,0000,0000,0000,,Jupiter notebook Jupiter is basically an
Dialogue: 0,0:36:02.32,0:36:05.28,Default,,0000,0000,0000,,python application which allows you to
Dialogue: 0,0:36:05.28,0:36:09.24,Default,,0000,0000,0000,,create a python machine learning
Dialogue: 0,0:36:09.24,0:36:11.68,Default,,0000,0000,0000,,program that basically builds your
Dialogue: 0,0:36:11.68,0:36:14.52,Default,,0000,0000,0000,,machine learning model assesses or
Dialogue: 0,0:36:14.52,0:36:16.48,Default,,0000,0000,0000,,evaluates his accuracy and generates
Dialogue: 0,0:36:16.48,0:36:19.04,Default,,0000,0000,0000,,predictions from it okay so here we have
Dialogue: 0,0:36:19.04,0:36:21.68,Default,,0000,0000,0000,,a whole bunch of Jupiter notebooks that
Dialogue: 0,0:36:21.68,0:36:24.56,Default,,0000,0000,0000,,are available and you can select any one
Dialogue: 0,0:36:24.56,0:36:26.00,Default,,0000,0000,0000,,of them all these notebooks are
Dialogue: 0,0:36:26.00,0:36:28.72,Default,,0000,0000,0000,,essentially going to process the data
Dialogue: 0,0:36:28.72,0:36:31.72,Default,,0000,0000,0000,,from this particular data set so if I go
Dialogue: 0,0:36:31.72,0:36:34.72,Default,,0000,0000,0000,,to this code page here I've actually
Dialogue: 0,0:36:34.72,0:36:37.32,Default,,0000,0000,0000,,selected a specific notebook that I'm
Dialogue: 0,0:36:37.32,0:36:39.96,Default,,0000,0000,0000,,going to run through to demonstrate an
Dialogue: 0,0:36:39.96,0:36:42.84,Default,,0000,0000,0000,,endtoend machine learning workflow using
Dialogue: 0,0:36:42.84,0:36:45.56,Default,,0000,0000,0000,,various machine learning libraries from
Dialogue: 0,0:36:45.56,0:36:49.80,Default,,0000,0000,0000,,the Python programming language okay so
Dialogue: 0,0:36:49.80,0:36:52.44,Default,,0000,0000,0000,,the uh particular notebook I'm going to
Dialogue: 0,0:36:52.44,0:36:55.16,Default,,0000,0000,0000,,use is this particular notebook here and
Dialogue: 0,0:36:55.16,0:36:57.16,Default,,0000,0000,0000,,you can also get the URL for that
Dialogue: 0,0:36:57.16,0:37:00.44,Default,,0000,0000,0000,,particular The Notebook from
Dialogue: 0,0:37:00.44,0:37:03.76,Default,,0000,0000,0000,,here okay so let's quickly do a quick
Dialogue: 0,0:37:03.76,0:37:06.00,Default,,0000,0000,0000,,revision again what are we trying to do
Dialogue: 0,0:37:06.00,0:37:08.00,Default,,0000,0000,0000,,here we're trying to build a machine
Dialogue: 0,0:37:08.00,0:37:11.36,Default,,0000,0000,0000,,learning classification model right so
Dialogue: 0,0:37:11.36,0:37:12.96,Default,,0000,0000,0000,,we said there are two primary areas of
Dialogue: 0,0:37:12.96,0:37:14.56,Default,,0000,0000,0000,,supervised learning one is regression
Dialogue: 0,0:37:14.56,0:37:16.20,Default,,0000,0000,0000,,which is used to predict a numerical
Dialogue: 0,0:37:16.20,0:37:18.64,Default,,0000,0000,0000,,Target variable and the second kind of
Dialogue: 0,0:37:18.64,0:37:21.36,Default,,0000,0000,0000,,supervised learning is classification
Dialogue: 0,0:37:21.36,0:37:23.08,Default,,0000,0000,0000,,which is what we're doing here we're
Dialogue: 0,0:37:23.08,0:37:25.84,Default,,0000,0000,0000,,trying to predict a categorical Target
Dialogue: 0,0:37:25.84,0:37:29.68,Default,,0000,0000,0000,,variable okay so in this particular
Dialogue: 0,0:37:29.68,0:37:32.12,Default,,0000,0000,0000,,example we actually have two kinds of
Dialogue: 0,0:37:32.12,0:37:34.48,Default,,0000,0000,0000,,ways we can classify either a binary
Dialogue: 0,0:37:34.48,0:37:37.56,Default,,0000,0000,0000,,classification or a multiclass
Dialogue: 0,0:37:37.56,0:37:39.52,Default,,0000,0000,0000,,classification so for binary
Dialogue: 0,0:37:39.52,0:37:41.44,Default,,0000,0000,0000,,classification we are only going to
Dialogue: 0,0:37:41.44,0:37:43.40,Default,,0000,0000,0000,,classify the product or machine as
Dialogue: 0,0:37:43.40,0:37:47.16,Default,,0000,0000,0000,,either it failed or it did not fail okay
Dialogue: 0,0:37:47.16,0:37:48.88,Default,,0000,0000,0000,,so if we go back to the data set that I
Dialogue: 0,0:37:48.88,0:37:50.84,Default,,0000,0000,0000,,showed you just now if you look at this
Dialogue: 0,0:37:50.84,0:37:52.68,Default,,0000,0000,0000,,target variable colume there are only
Dialogue: 0,0:37:52.68,0:37:54.52,Default,,0000,0000,0000,,two possible values here they either
Dialogue: 0,0:37:54.52,0:37:58.28,Default,,0000,0000,0000,,zero or one zero means there's no fi
Dialogue: 0,0:37:58.28,0:38:01.24,Default,,0000,0000,0000,,one means that's a failure okay so this
Dialogue: 0,0:38:01.24,0:38:03.44,Default,,0000,0000,0000,,is an example of a binary classification
Dialogue: 0,0:38:03.44,0:38:07.24,Default,,0000,0000,0000,,only two possible outcomes zero or one
Dialogue: 0,0:38:07.24,0:38:10.12,Default,,0000,0000,0000,,didn't fail or fail all right two
Dialogue: 0,0:38:10.12,0:38:13.08,Default,,0000,0000,0000,,possible outcomes and then we can also
Dialogue: 0,0:38:13.08,0:38:15.48,Default,,0000,0000,0000,,for the same data set we can extend it
Dialogue: 0,0:38:15.48,0:38:18.08,Default,,0000,0000,0000,,and make it a multiclass classification
Dialogue: 0,0:38:18.08,0:38:20.88,Default,,0000,0000,0000,,problem all right so if we kind of want
Dialogue: 0,0:38:20.88,0:38:23.72,Default,,0000,0000,0000,,to drill down further we can say that
Dialogue: 0,0:38:23.72,0:38:26.80,Default,,0000,0000,0000,,not only is there a failure we can
Dialogue: 0,0:38:26.80,0:38:29.20,Default,,0000,0000,0000,,actually say that are different types of
Dialogue: 0,0:38:29.20,0:38:32.44,Default,,0000,0000,0000,,failures okay so we have one category of
Dialogue: 0,0:38:32.44,0:38:35.60,Default,,0000,0000,0000,,class that is basically no failure okay
Dialogue: 0,0:38:35.60,0:38:37.40,Default,,0000,0000,0000,,then we have a category for the
Dialogue: 0,0:38:37.40,0:38:40.40,Default,,0000,0000,0000,,different types of failures right so you
Dialogue: 0,0:38:40.40,0:38:43.92,Default,,0000,0000,0000,,can have a power failure you could have
Dialogue: 0,0:38:43.92,0:38:46.40,Default,,0000,0000,0000,,a tool Weare
Dialogue: 0,0:38:46.40,0:38:48.92,Default,,0000,0000,0000,,failure uh you could have let's go down
Dialogue: 0,0:38:48.92,0:38:50.88,Default,,0000,0000,0000,,here you could have a over strain
Dialogue: 0,0:38:50.88,0:38:53.76,Default,,0000,0000,0000,,failure and etc etc so you can have
Dialogue: 0,0:38:53.76,0:38:57.16,Default,,0000,0000,0000,,multiple classes of failure in addition
Dialogue: 0,0:38:57.16,0:39:00.52,Default,,0000,0000,0000,,to the general overall or the majority
Dialogue: 0,0:39:00.52,0:39:04.32,Default,,0000,0000,0000,,class of no failure and that would be a
Dialogue: 0,0:39:04.32,0:39:06.68,Default,,0000,0000,0000,,multiclass classification problem so
Dialogue: 0,0:39:06.68,0:39:08.40,Default,,0000,0000,0000,,with this data set we are going to see
Dialogue: 0,0:39:08.40,0:39:11.04,Default,,0000,0000,0000,,how to make it a binary classification
Dialogue: 0,0:39:11.04,0:39:12.80,Default,,0000,0000,0000,,problem and also a multiclass
Dialogue: 0,0:39:12.80,0:39:15.08,Default,,0000,0000,0000,,classification problem okay so let's
Dialogue: 0,0:39:15.08,0:39:16.88,Default,,0000,0000,0000,,look at the workflow so let's say we've
Dialogue: 0,0:39:16.88,0:39:18.88,Default,,0000,0000,0000,,already got the data so right now we do
Dialogue: 0,0:39:18.88,0:39:20.84,Default,,0000,0000,0000,,have the data set this is the data set
Dialogue: 0,0:39:20.84,0:39:22.72,Default,,0000,0000,0000,,that we have so let's assume we've
Dialogue: 0,0:39:22.72,0:39:24.56,Default,,0000,0000,0000,,somehow managed to get this data set
Dialogue: 0,0:39:24.56,0:39:26.88,Default,,0000,0000,0000,,from some iot sensors that are
Dialogue: 0,0:39:26.88,0:39:29.12,Default,,0000,0000,0000,,monitoring realtime data in our
Dialogue: 0,0:39:29.12,0:39:31.08,Default,,0000,0000,0000,,production environment on the assembly
Dialogue: 0,0:39:31.08,0:39:32.80,Default,,0000,0000,0000,,line on the production line we've got
Dialogue: 0,0:39:32.80,0:39:34.68,Default,,0000,0000,0000,,sensors reading data that gives us all
Dialogue: 0,0:39:34.68,0:39:37.96,Default,,0000,0000,0000,,these data that we have in this CSV file
Dialogue: 0,0:39:37.96,0:39:40.08,Default,,0000,0000,0000,,Okay so we've already got the data we've
Dialogue: 0,0:39:40.08,0:39:41.60,Default,,0000,0000,0000,,retrieved the data now we're going to go
Dialogue: 0,0:39:41.60,0:39:45.00,Default,,0000,0000,0000,,on to the cleaning and exploration part
Dialogue: 0,0:39:45.00,0:39:47.52,Default,,0000,0000,0000,,of your machine learning life cycle all
Dialogue: 0,0:39:47.52,0:39:49.80,Default,,0000,0000,0000,,right so let's look at the data cleaning
Dialogue: 0,0:39:49.80,0:39:51.40,Default,,0000,0000,0000,,part so the data cleaning part we
Dialogue: 0,0:39:51.40,0:39:53.72,Default,,0000,0000,0000,,interested in uh checking for missing
Dialogue: 0,0:39:53.72,0:39:56.20,Default,,0000,0000,0000,,values and maybe removing the rows you
Dialogue: 0,0:39:56.20,0:39:58.08,Default,,0000,0000,0000,,missing values okay
Dialogue: 0,0:39:58.08,0:39:59.76,Default,,0000,0000,0000,,uh so the kind of things we can sorry
Dialogue: 0,0:39:59.76,0:40:01.00,Default,,0000,0000,0000,,the kind of things we can do in missing
Dialogue: 0,0:40:01.00,0:40:02.88,Default,,0000,0000,0000,,values we can remove the row missing
Dialogue: 0,0:40:02.88,0:40:05.84,Default,,0000,0000,0000,,values we can put in some new values uh
Dialogue: 0,0:40:05.84,0:40:08.00,Default,,0000,0000,0000,,some replacement values which could be a
Dialogue: 0,0:40:08.00,0:40:09.88,Default,,0000,0000,0000,,average of all the values in that that
Dialogue: 0,0:40:09.88,0:40:12.88,Default,,0000,0000,0000,,particular colume etc etc we also try to
Dialogue: 0,0:40:12.88,0:40:15.48,Default,,0000,0000,0000,,identify outliers in our data set and
Dialogue: 0,0:40:15.48,0:40:17.48,Default,,0000,0000,0000,,also there are a variety of ways to deal
Dialogue: 0,0:40:17.48,0:40:19.48,Default,,0000,0000,0000,,with that so this is called Data
Dialogue: 0,0:40:19.48,0:40:21.36,Default,,0000,0000,0000,,cleansing which is a really important
Dialogue: 0,0:40:21.36,0:40:23.32,Default,,0000,0000,0000,,part of your machine learning workflow
Dialogue: 0,0:40:23.32,0:40:25.52,Default,,0000,0000,0000,,right so that's where we are now at
Dialogue: 0,0:40:25.52,0:40:26.84,Default,,0000,0000,0000,,we're doing cleansing and then we're
Dialogue: 0,0:40:26.84,0:40:28.84,Default,,0000,0000,0000,,going to follow up with
Dialogue: 0,0:40:28.84,0:40:31.16,Default,,0000,0000,0000,,exploration so let's look at the actual
Dialogue: 0,0:40:31.16,0:40:33.16,Default,,0000,0000,0000,,code that does the cleansing here so
Dialogue: 0,0:40:33.16,0:40:35.80,Default,,0000,0000,0000,,here we are right at the start of the uh
Dialogue: 0,0:40:35.80,0:40:38.40,Default,,0000,0000,0000,,machine learning uh life cycle here so
Dialogue: 0,0:40:38.40,0:40:40.84,Default,,0000,0000,0000,,this is a Jupiter notebook so here we
Dialogue: 0,0:40:40.84,0:40:43.36,Default,,0000,0000,0000,,have a brief description of the problem
Dialogue: 0,0:40:43.36,0:40:45.92,Default,,0000,0000,0000,,statement all right so this data set
Dialogue: 0,0:40:45.92,0:40:47.64,Default,,0000,0000,0000,,reflects real life predictive
Dialogue: 0,0:40:47.64,0:40:49.24,Default,,0000,0000,0000,,maintenance enounter industry with
Dialogue: 0,0:40:49.24,0:40:50.48,Default,,0000,0000,0000,,measurements from real equipment the
Dialogue: 0,0:40:50.48,0:40:52.40,Default,,0000,0000,0000,,features description is taken directly
Dialogue: 0,0:40:52.40,0:40:54.52,Default,,0000,0000,0000,,from the data source set so here we have
Dialogue: 0,0:40:54.52,0:40:57.40,Default,,0000,0000,0000,,a description of the six key features in
Dialogue: 0,0:40:57.40,0:40:59.60,Default,,0000,0000,0000,,our data set type which is the quality
Dialogue: 0,0:40:59.60,0:41:02.52,Default,,0000,0000,0000,,of the product the air temperature the
Dialogue: 0,0:41:02.52,0:41:04.68,Default,,0000,0000,0000,,process temperature the rotational speed
Dialogue: 0,0:41:04.68,0:41:06.60,Default,,0000,0000,0000,,the talk and the towar all right so
Dialogue: 0,0:41:06.60,0:41:08.88,Default,,0000,0000,0000,,these are the six feature variables and
Dialogue: 0,0:41:08.88,0:41:11.32,Default,,0000,0000,0000,,there are the two target variables so
Dialogue: 0,0:41:11.32,0:41:13.12,Default,,0000,0000,0000,,just now I showed you just now there's
Dialogue: 0,0:41:13.12,0:41:15.12,Default,,0000,0000,0000,,one target variable which only has two
Dialogue: 0,0:41:15.12,0:41:17.44,Default,,0000,0000,0000,,possible values either zero or one okay
Dialogue: 0,0:41:17.44,0:41:20.08,Default,,0000,0000,0000,,zero or one means failure or no failure
Dialogue: 0,0:41:20.08,0:41:23.08,Default,,0000,0000,0000,,so that will be this colume here right
Dialogue: 0,0:41:23.08,0:41:24.88,Default,,0000,0000,0000,,so let me go all the way back up to here
Dialogue: 0,0:41:24.88,0:41:26.64,Default,,0000,0000,0000,,so this colume here we already saw it
Dialogue: 0,0:41:26.64,0:41:29.44,Default,,0000,0000,0000,,only has two I values is either zero or
Dialogue: 0,0:41:29.44,0:41:32.68,Default,,0000,0000,0000,,one and then we also have this column
Dialogue: 0,0:41:32.68,0:41:35.04,Default,,0000,0000,0000,,here and this column here is basically
Dialogue: 0,0:41:35.04,0:41:38.08,Default,,0000,0000,0000,,the failure type and so the we have as I
Dialogue: 0,0:41:38.08,0:41:40.80,Default,,0000,0000,0000,,already demonstrated just now we do have
Dialogue: 0,0:41:40.80,0:41:43.44,Default,,0000,0000,0000,,uh several categories of or types of
Dialogue: 0,0:41:43.44,0:41:45.56,Default,,0000,0000,0000,,failure and so here we call this
Dialogue: 0,0:41:45.56,0:41:47.08,Default,,0000,0000,0000,,multiclass
Dialogue: 0,0:41:47.08,0:41:50.00,Default,,0000,0000,0000,,classification so we can either build a
Dialogue: 0,0:41:50.00,0:41:51.84,Default,,0000,0000,0000,,binary classification model for this
Dialogue: 0,0:41:51.84,0:41:53.52,Default,,0000,0000,0000,,problem domain or we can build a
Dialogue: 0,0:41:53.52,0:41:55.08,Default,,0000,0000,0000,,multiclass
Dialogue: 0,0:41:55.08,0:41:58.12,Default,,0000,0000,0000,,classification problem all right so this
Dialogue: 0,0:41:58.12,0:41:59.84,Default,,0000,0000,0000,,jupyter notebook is going to demonstrate
Dialogue: 0,0:41:59.84,0:42:02.32,Default,,0000,0000,0000,,both approaches to us so first step we
Dialogue: 0,0:42:02.32,0:42:04.80,Default,,0000,0000,0000,,are going to write all this python code
Dialogue: 0,0:42:04.80,0:42:06.88,Default,,0000,0000,0000,,that's going to import all the libraries
Dialogue: 0,0:42:06.88,0:42:09.08,Default,,0000,0000,0000,,that we need to use okay so this is
Dialogue: 0,0:42:09.08,0:42:12.32,Default,,0000,0000,0000,,basically python code okay and it's
Dialogue: 0,0:42:12.32,0:42:15.12,Default,,0000,0000,0000,,importing the relevant machine learn
Dialogue: 0,0:42:15.12,0:42:17.96,Default,,0000,0000,0000,,oops we are importing the relevant
Dialogue: 0,0:42:17.96,0:42:20.60,Default,,0000,0000,0000,,machine learning libraries related to
Dialogue: 0,0:42:20.60,0:42:23.52,Default,,0000,0000,0000,,our domain use case okay then we load in
Dialogue: 0,0:42:23.52,0:42:26.44,Default,,0000,0000,0000,,our data set okay so this our data set
Dialogue: 0,0:42:26.44,0:42:28.32,Default,,0000,0000,0000,,we describe it we have some quick
Dialogue: 0,0:42:28.32,0:42:30.92,Default,,0000,0000,0000,,insights into the data set um and then
Dialogue: 0,0:42:30.92,0:42:32.84,Default,,0000,0000,0000,,we just take a look at all the variables
Dialogue: 0,0:42:32.84,0:42:36.00,Default,,0000,0000,0000,,of the feature variables Etc and so on
Dialogue: 0,0:42:36.00,0:42:38.00,Default,,0000,0000,0000,,we just what we're doing now is just
Dialogue: 0,0:42:38.00,0:42:39.80,Default,,0000,0000,0000,,doing a quick overview of the data set
Dialogue: 0,0:42:39.80,0:42:41.56,Default,,0000,0000,0000,,so this all this python code here they
Dialogue: 0,0:42:41.56,0:42:43.76,Default,,0000,0000,0000,,were writing is allowing us the data
Dialogue: 0,0:42:43.76,0:42:45.36,Default,,0000,0000,0000,,scientist to get a quick overview of our
Dialogue: 0,0:42:45.36,0:42:48.36,Default,,0000,0000,0000,,data set right okay like how many um V
Dialogue: 0,0:42:48.36,0:42:50.24,Default,,0000,0000,0000,,how many rows are there how many columns
Dialogue: 0,0:42:50.24,0:42:51.76,Default,,0000,0000,0000,,are there what are the data types of the
Dialogue: 0,0:42:51.76,0:42:53.44,Default,,0000,0000,0000,,colums what are the name of the columns
Dialogue: 0,0:42:53.44,0:42:57.36,Default,,0000,0000,0000,,etc etc okay then we zoom in on to the
Dialogue: 0,0:42:57.36,0:42:58.84,Default,,0000,0000,0000,,Target variables so we look at the
Dialogue: 0,0:42:58.84,0:43:02.00,Default,,0000,0000,0000,,Target variables how many uh counts
Dialogue: 0,0:43:02.00,0:43:04.52,Default,,0000,0000,0000,,there are of this target variable uh and
Dialogue: 0,0:43:04.52,0:43:06.44,Default,,0000,0000,0000,,so on how many different types of
Dialogue: 0,0:43:06.44,0:43:08.24,Default,,0000,0000,0000,,failures there are then you want to
Dialogue: 0,0:43:08.24,0:43:09.00,Default,,0000,0000,0000,,check whether there are any
Dialogue: 0,0:43:09.00,0:43:10.76,Default,,0000,0000,0000,,inconsistencies between the Target and
Dialogue: 0,0:43:10.76,0:43:13.56,Default,,0000,0000,0000,,the failure type Etc okay so when you do
Dialogue: 0,0:43:13.56,0:43:15.12,Default,,0000,0000,0000,,all this checking you're going to
Dialogue: 0,0:43:15.12,0:43:16.96,Default,,0000,0000,0000,,discover there are some discrepancies in
Dialogue: 0,0:43:16.96,0:43:20.28,Default,,0000,0000,0000,,your data set so using a specific python
Dialogue: 0,0:43:20.28,0:43:21.84,Default,,0000,0000,0000,,code to do checking you're going to say
Dialogue: 0,0:43:21.84,0:43:23.48,Default,,0000,0000,0000,,hey you know what there's some errors
Dialogue: 0,0:43:23.48,0:43:25.00,Default,,0000,0000,0000,,here right there are nine values that
Dialogue: 0,0:43:25.00,0:43:26.60,Default,,0000,0000,0000,,classify as failure and Target variable
Dialogue: 0,0:43:26.60,0:43:28.20,Default,,0000,0000,0000,,but as no no failure in the failure type
Dialogue: 0,0:43:28.20,0:43:29.72,Default,,0000,0000,0000,,variable so that means there's a
Dialogue: 0,0:43:29.72,0:43:33.20,Default,,0000,0000,0000,,discrepancy in your data point right so
Dialogue: 0,0:43:33.20,0:43:34.76,Default,,0000,0000,0000,,which are so these are all the ones that
Dialogue: 0,0:43:34.76,0:43:36.36,Default,,0000,0000,0000,,are discrepancies because the target
Dialogue: 0,0:43:36.36,0:43:39.00,Default,,0000,0000,0000,,variable says one and we already know
Dialogue: 0,0:43:39.00,0:43:41.24,Default,,0000,0000,0000,,that Target variable one is supposed to
Dialogue: 0,0:43:41.24,0:43:43.24,Default,,0000,0000,0000,,mean that it's a failure right target
Dialogue: 0,0:43:43.24,0:43:44.88,Default,,0000,0000,0000,,varable one is supposed to mean that is
Dialogue: 0,0:43:44.88,0:43:47.12,Default,,0000,0000,0000,,a failure so we are kind of expecting to
Dialogue: 0,0:43:47.12,0:43:49.68,Default,,0000,0000,0000,,see the failure classification but some
Dialogue: 0,0:43:49.68,0:43:51.40,Default,,0000,0000,0000,,rows actually say there's no failure
Dialogue: 0,0:43:51.40,0:43:53.80,Default,,0000,0000,0000,,although the target type is one but here
Dialogue: 0,0:43:53.80,0:43:55.92,Default,,0000,0000,0000,,is a classic example of an error that
Dialogue: 0,0:43:55.92,0:43:58.64,Default,,0000,0000,0000,,can very well Ur in a data set so now
Dialogue: 0,0:43:58.64,0:44:00.56,Default,,0000,0000,0000,,the question is what do you do with
Dialogue: 0,0:44:00.56,0:44:04.72,Default,,0000,0000,0000,,these errors in your data set right so
Dialogue: 0,0:44:04.72,0:44:06.24,Default,,0000,0000,0000,,here the data scientist says I think it
Dialogue: 0,0:44:06.24,0:44:07.52,Default,,0000,0000,0000,,would make sense to remove those
Dialogue: 0,0:44:07.52,0:44:09.92,Default,,0000,0000,0000,,instances and so they write some code
Dialogue: 0,0:44:09.92,0:44:12.68,Default,,0000,0000,0000,,then to remove those instances or those
Dialogue: 0,0:44:12.68,0:44:14.92,Default,,0000,0000,0000,,uh rows or data points from the overall
Dialogue: 0,0:44:14.92,0:44:17.28,Default,,0000,0000,0000,,data set and same thing we can again
Dialogue: 0,0:44:17.28,0:44:19.24,Default,,0000,0000,0000,,check for other ISU so we find there's
Dialogue: 0,0:44:19.24,0:44:21.16,Default,,0000,0000,0000,,another ISU here with our data set which
Dialogue: 0,0:44:21.16,0:44:24.08,Default,,0000,0000,0000,,is another warning so again we can
Dialogue: 0,0:44:24.08,0:44:26.24,Default,,0000,0000,0000,,possibly remove them so you're going to
Dialogue: 0,0:44:26.24,0:44:31.28,Default,,0000,0000,0000,,remove 20 7 instances or rows from your
Dialogue: 0,0:44:31.28,0:44:34.44,Default,,0000,0000,0000,,overall data set so your data set has a
Dialogue: 0,0:44:34.44,0:44:37.08,Default,,0000,0000,0000,,10,000 uh rows or data points you're
Dialogue: 0,0:44:37.08,0:44:40.16,Default,,0000,0000,0000,,removing 27 which is only 0.27 of the
Dialogue: 0,0:44:40.16,0:44:42.24,Default,,0000,0000,0000,,entire data set and these were the
Dialogue: 0,0:44:42.24,0:44:45.72,Default,,0000,0000,0000,,reasons why you remove them okay so if
Dialogue: 0,0:44:45.72,0:44:48.16,Default,,0000,0000,0000,,you're just removing to uh 0.27% of the
Dialogue: 0,0:44:48.16,0:44:50.80,Default,,0000,0000,0000,,anti data set no big deal right still
Dialogue: 0,0:44:50.80,0:44:53.08,Default,,0000,0000,0000,,okay but you needed to remove them
Dialogue: 0,0:44:53.08,0:44:55.72,Default,,0000,0000,0000,,because these errors right this
Dialogue: 0,0:44:55.72,0:44:58.04,Default,,0000,0000,0000,,27 um
Dialogue: 0,0:44:58.04,0:45:00.56,Default,,0000,0000,0000,,errors okay data points with errors in
Dialogue: 0,0:45:00.56,0:45:02.96,Default,,0000,0000,0000,,your data set could really affect the
Dialogue: 0,0:45:02.96,0:45:05.00,Default,,0000,0000,0000,,training of your machine learning model
Dialogue: 0,0:45:05.00,0:45:08.64,Default,,0000,0000,0000,,so we need to do your data cleansing
Dialogue: 0,0:45:08.64,0:45:11.72,Default,,0000,0000,0000,,right so we are actually cleansing now
Dialogue: 0,0:45:11.72,0:45:15.20,Default,,0000,0000,0000,,uh uh some kind of data that is
Dialogue: 0,0:45:15.20,0:45:17.52,Default,,0000,0000,0000,,incorrect or erroneous in your original
Dialogue: 0,0:45:17.52,0:45:21.44,Default,,0000,0000,0000,,data set okay so then we go on to the
Dialogue: 0,0:45:21.44,0:45:23.84,Default,,0000,0000,0000,,next part which is called Eda right so
Dialogue: 0,0:45:23.84,0:45:28.88,Default,,0000,0000,0000,,Eda is where we kind of explore our data
Dialogue: 0,0:45:28.88,0:45:31.72,Default,,0000,0000,0000,,and we want to kind of get a visual
Dialogue: 0,0:45:31.72,0:45:34.24,Default,,0000,0000,0000,,overview of our data as a whole and also
Dialogue: 0,0:45:34.24,0:45:35.88,Default,,0000,0000,0000,,take a look at the statistical
Dialogue: 0,0:45:35.88,0:45:38.16,Default,,0000,0000,0000,,properties of data the statistical
Dialogue: 0,0:45:38.16,0:45:40.48,Default,,0000,0000,0000,,distribution of the data in all the
Dialogue: 0,0:45:40.48,0:45:43.08,Default,,0000,0000,0000,,various colums the correlation between
Dialogue: 0,0:45:43.08,0:45:44.64,Default,,0000,0000,0000,,the variables between the feature
Dialogue: 0,0:45:44.64,0:45:46.68,Default,,0000,0000,0000,,variables different columns and also the
Dialogue: 0,0:45:46.68,0:45:48.60,Default,,0000,0000,0000,,feature variable and the target variable
Dialogue: 0,0:45:48.60,0:45:52.04,Default,,0000,0000,0000,,so all of this is called Eda and Eda in
Dialogue: 0,0:45:52.04,0:45:54.08,Default,,0000,0000,0000,,a machine learning workflow is typically
Dialogue: 0,0:45:54.08,0:45:57.16,Default,,0000,0000,0000,,done through visualization
Dialogue: 0,0:45:57.16,0:45:58.84,Default,,0000,0000,0000,,all right so let's go back here and take
Dialogue: 0,0:45:58.84,0:46:00.60,Default,,0000,0000,0000,,a look right so for example here we are
Dialogue: 0,0:46:00.60,0:46:03.40,Default,,0000,0000,0000,,looking at correlation so we plot the
Dialogue: 0,0:46:03.40,0:46:05.68,Default,,0000,0000,0000,,values of all the various feature
Dialogue: 0,0:46:05.68,0:46:07.60,Default,,0000,0000,0000,,variables against each other and look
Dialogue: 0,0:46:07.60,0:46:10.80,Default,,0000,0000,0000,,for potential correlations and patterns
Dialogue: 0,0:46:10.80,0:46:13.36,Default,,0000,0000,0000,,and so on and all the different shapes
Dialogue: 0,0:46:13.36,0:46:17.28,Default,,0000,0000,0000,,that you see here in this pair plot okay
Dialogue: 0,0:46:17.28,0:46:18.40,Default,,0000,0000,0000,,uh will have different meaning
Dialogue: 0,0:46:18.40,0:46:20.00,Default,,0000,0000,0000,,statistical meaning and so the data
Dialogue: 0,0:46:20.00,0:46:21.80,Default,,0000,0000,0000,,scientist has to kind of visually
Dialogue: 0,0:46:21.80,0:46:23.76,Default,,0000,0000,0000,,inspect this P plot makes some
Dialogue: 0,0:46:23.76,0:46:25.56,Default,,0000,0000,0000,,interpretations of these different
Dialogue: 0,0:46:25.56,0:46:27.68,Default,,0000,0000,0000,,patterns that he sees here all right so
Dialogue: 0,0:46:27.68,0:46:30.48,Default,,0000,0000,0000,,these are some of the insights that that
Dialogue: 0,0:46:30.48,0:46:32.84,Default,,0000,0000,0000,,can be deduced from looking at these
Dialogue: 0,0:46:32.84,0:46:34.32,Default,,0000,0000,0000,,pattern so for example the Tor and
Dialogue: 0,0:46:34.32,0:46:36.28,Default,,0000,0000,0000,,rotational speed are highly correlated
Dialogue: 0,0:46:36.28,0:46:38.04,Default,,0000,0000,0000,,the process temperature and a
Dialogue: 0,0:46:38.04,0:46:39.92,Default,,0000,0000,0000,,temperature so highly correlated that
Dialogue: 0,0:46:39.92,0:46:41.56,Default,,0000,0000,0000,,failures occur for extreme values of
Dialogue: 0,0:46:41.56,0:46:44.52,Default,,0000,0000,0000,,some features etc etc then you can plot
Dialogue: 0,0:46:44.52,0:46:45.96,Default,,0000,0000,0000,,certain kinds of charts this called a
Dialogue: 0,0:46:45.96,0:46:48.48,Default,,0000,0000,0000,,violing chart to again get new insights
Dialogue: 0,0:46:48.48,0:46:49.84,Default,,0000,0000,0000,,for example regarding the talk and
Dialogue: 0,0:46:49.84,0:46:51.48,Default,,0000,0000,0000,,rotational speed it can see again that
Dialogue: 0,0:46:51.48,0:46:53.12,Default,,0000,0000,0000,,most failures are triggered for much
Dialogue: 0,0:46:53.12,0:46:55.12,Default,,0000,0000,0000,,lower or much higher values than the
Dialogue: 0,0:46:55.12,0:46:57.40,Default,,0000,0000,0000,,mean when they're not failing so all
Dialogue: 0,0:46:57.40,0:47:00.72,Default,,0000,0000,0000,,these visualizations they are there and
Dialogue: 0,0:47:00.72,0:47:02.48,Default,,0000,0000,0000,,a trained data scientist can look at
Dialogue: 0,0:47:02.48,0:47:05.08,Default,,0000,0000,0000,,them inspect them and make some kind of
Dialogue: 0,0:47:05.08,0:47:08.40,Default,,0000,0000,0000,,insightful deductions from them okay
Dialogue: 0,0:47:08.40,0:47:11.08,Default,,0000,0000,0000,,percentage of failure right uh the
Dialogue: 0,0:47:11.08,0:47:13.64,Default,,0000,0000,0000,,correlation heat map okay between all
Dialogue: 0,0:47:13.64,0:47:15.56,Default,,0000,0000,0000,,these different feature variables and
Dialogue: 0,0:47:15.56,0:47:16.92,Default,,0000,0000,0000,,also the target
Dialogue: 0,0:47:16.92,0:47:19.60,Default,,0000,0000,0000,,variable okay uh the product types
Dialogue: 0,0:47:19.60,0:47:21.08,Default,,0000,0000,0000,,percentage of product types percentage
Dialogue: 0,0:47:21.08,0:47:23.16,Default,,0000,0000,0000,,of failure with respect to the product
Dialogue: 0,0:47:23.16,0:47:25.72,Default,,0000,0000,0000,,type so we can also kind of visualize
Dialogue: 0,0:47:25.72,0:47:27.80,Default,,0000,0000,0000,,that as well so certain products have a
Dialogue: 0,0:47:27.80,0:47:29.84,Default,,0000,0000,0000,,higher ratio of faure compared to other
Dialogue: 0,0:47:29.84,0:47:33.24,Default,,0000,0000,0000,,product types Etc or for example uh M
Dialogue: 0,0:47:33.24,0:47:35.80,Default,,0000,0000,0000,,tends to feel more than H products etc
Dialogue: 0,0:47:35.80,0:47:38.88,Default,,0000,0000,0000,,etc so we can create a vast variety of
Dialogue: 0,0:47:38.88,0:47:41.32,Default,,0000,0000,0000,,visualizations in the Eda stage so you
Dialogue: 0,0:47:41.32,0:47:43.96,Default,,0000,0000,0000,,can see here and again the idea of this
Dialogue: 0,0:47:43.96,0:47:46.36,Default,,0000,0000,0000,,visualization is just to give us some
Dialogue: 0,0:47:46.36,0:47:49.68,Default,,0000,0000,0000,,insight some preliminary insight into
Dialogue: 0,0:47:49.68,0:47:52.52,Default,,0000,0000,0000,,our data set that helps us to model it
Dialogue: 0,0:47:52.52,0:47:54.12,Default,,0000,0000,0000,,more correctly so some more insights
Dialogue: 0,0:47:54.12,0:47:56.20,Default,,0000,0000,0000,,that we get into our data set from all
Dialogue: 0,0:47:56.20,0:47:57.60,Default,,0000,0000,0000,,this visualization
Dialogue: 0,0:47:57.60,0:47:59.56,Default,,0000,0000,0000,,then we can plot the distribution so we
Dialogue: 0,0:47:59.56,0:48:00.72,Default,,0000,0000,0000,,can see whether it's a normal
Dialogue: 0,0:48:00.72,0:48:03.08,Default,,0000,0000,0000,,distribution or some other kind of
Dialogue: 0,0:48:03.08,0:48:05.64,Default,,0000,0000,0000,,distribution uh we can have a box plot
Dialogue: 0,0:48:05.64,0:48:07.76,Default,,0000,0000,0000,,to see whether there are any outliers in
Dialogue: 0,0:48:07.76,0:48:10.40,Default,,0000,0000,0000,,your data set and so on right so we can
Dialogue: 0,0:48:10.40,0:48:11.64,Default,,0000,0000,0000,,see from the box plots we can see
Dialogue: 0,0:48:11.64,0:48:14.60,Default,,0000,0000,0000,,rotational speed and have outliers so we
Dialogue: 0,0:48:14.60,0:48:16.88,Default,,0000,0000,0000,,already saw outliers are basically a
Dialogue: 0,0:48:16.88,0:48:18.80,Default,,0000,0000,0000,,problem that you may need to kind of
Dialogue: 0,0:48:18.80,0:48:22.52,Default,,0000,0000,0000,,tackle right so outliers are an isue uh
Dialogue: 0,0:48:22.52,0:48:24.80,Default,,0000,0000,0000,,it's a it's a part of data cleansing and
Dialogue: 0,0:48:24.80,0:48:26.96,Default,,0000,0000,0000,,so you may need to tackle this so we may
Dialogue: 0,0:48:26.96,0:48:28.88,Default,,0000,0000,0000,,have to check okay well where are the
Dialogue: 0,0:48:28.88,0:48:31.32,Default,,0000,0000,0000,,potential outliers so we can analyze
Dialogue: 0,0:48:31.32,0:48:35.32,Default,,0000,0000,0000,,them from the box blot okay um but then
Dialogue: 0,0:48:35.32,0:48:37.08,Default,,0000,0000,0000,,we can say well they are outliers but
Dialogue: 0,0:48:37.08,0:48:38.80,Default,,0000,0000,0000,,maybe they're not really horrible
Dialogue: 0,0:48:38.80,0:48:40.76,Default,,0000,0000,0000,,outliers so we can tolerate them or
Dialogue: 0,0:48:40.76,0:48:42.88,Default,,0000,0000,0000,,maybe we want to remove them so we can
Dialogue: 0,0:48:42.88,0:48:44.92,Default,,0000,0000,0000,,see what the mean and maximum values for
Dialogue: 0,0:48:44.92,0:48:46.72,Default,,0000,0000,0000,,all these with respect to product type
Dialogue: 0,0:48:46.72,0:48:49.68,Default,,0000,0000,0000,,how many of them are above or highly
Dialogue: 0,0:48:49.68,0:48:51.44,Default,,0000,0000,0000,,correlated with the product type in
Dialogue: 0,0:48:51.44,0:48:54.24,Default,,0000,0000,0000,,terms of the maximum and minimum okay
Dialogue: 0,0:48:54.24,0:48:56.96,Default,,0000,0000,0000,,and then so on so the Insight is well we
Dialogue: 0,0:48:56.96,0:48:59.60,Default,,0000,0000,0000,,got 4.8% of the instances are outliers
Dialogue: 0,0:48:59.60,0:49:02.56,Default,,0000,0000,0000,,so maybe 4.87% is not really that much
Dialogue: 0,0:49:02.56,0:49:04.92,Default,,0000,0000,0000,,the outliers are not horrible so we just
Dialogue: 0,0:49:04.92,0:49:06.96,Default,,0000,0000,0000,,leave them in the data set now for a
Dialogue: 0,0:49:06.96,0:49:08.52,Default,,0000,0000,0000,,different data set the data scientist
Dialogue: 0,0:49:08.52,0:49:10.28,Default,,0000,0000,0000,,could come to different conclusion so
Dialogue: 0,0:49:10.28,0:49:12.28,Default,,0000,0000,0000,,then they would do whatever they've
Dialogue: 0,0:49:12.28,0:49:15.40,Default,,0000,0000,0000,,deemed is appropriate to kind of cleanse
Dialogue: 0,0:49:15.40,0:49:18.08,Default,,0000,0000,0000,,the data set okay so now that we have
Dialogue: 0,0:49:18.08,0:49:20.00,Default,,0000,0000,0000,,done all the Eda the next thing we're
Dialogue: 0,0:49:20.00,0:49:23.16,Default,,0000,0000,0000,,going to do is we are going to do what
Dialogue: 0,0:49:23.16,0:49:26.20,Default,,0000,0000,0000,,is called feature engineering so we are
Dialogue: 0,0:49:26.20,0:49:28.76,Default,,0000,0000,0000,,going to transform our original feature
Dialogue: 0,0:49:28.76,0:49:31.28,Default,,0000,0000,0000,,variables and these are our original
Dialogue: 0,0:49:31.28,0:49:32.96,Default,,0000,0000,0000,,feature variables right these are our
Dialogue: 0,0:49:32.96,0:49:35.04,Default,,0000,0000,0000,,original feature variables and we are
Dialogue: 0,0:49:35.04,0:49:37.76,Default,,0000,0000,0000,,going to transform them all right we're
Dialogue: 0,0:49:37.76,0:49:40.32,Default,,0000,0000,0000,,going to transform them in some sense uh
Dialogue: 0,0:49:40.32,0:49:43.76,Default,,0000,0000,0000,,into some other form before we fit this
Dialogue: 0,0:49:43.76,0:49:45.64,Default,,0000,0000,0000,,for training into our machine learning
Dialogue: 0,0:49:45.64,0:49:48.60,Default,,0000,0000,0000,,algorithm all right so these are
Dialogue: 0,0:49:48.60,0:49:51.60,Default,,0000,0000,0000,,examples of let's say this example of a
Dialogue: 0,0:49:51.60,0:49:55.20,Default,,0000,0000,0000,,original data set right and this is
Dialogue: 0,0:49:55.20,0:49:56.84,Default,,0000,0000,0000,,examples these are some of the examples
Dialogue: 0,0:49:56.84,0:49:58.04,Default,,0000,0000,0000,,you don't have to use all of them but
Dialogue: 0,0:49:58.04,0:49:59.44,Default,,0000,0000,0000,,these are some of examples of what we
Dialogue: 0,0:49:59.44,0:50:00.84,Default,,0000,0000,0000,,call feature engineering which you can
Dialogue: 0,0:50:00.84,0:50:03.56,Default,,0000,0000,0000,,then transform your original values in
Dialogue: 0,0:50:03.56,0:50:05.28,Default,,0000,0000,0000,,your feature variables to all these
Dialogue: 0,0:50:05.28,0:50:07.92,Default,,0000,0000,0000,,transform values here so we're going to
Dialogue: 0,0:50:07.92,0:50:09.68,Default,,0000,0000,0000,,pretty much do that here so we have a
Dialogue: 0,0:50:09.68,0:50:12.60,Default,,0000,0000,0000,,ordinal encoding we do scaling of the
Dialogue: 0,0:50:12.60,0:50:14.84,Default,,0000,0000,0000,,data so the data set is scaled we use a
Dialogue: 0,0:50:14.84,0:50:18.24,Default,,0000,0000,0000,,minmax scaling and then finally we come
Dialogue: 0,0:50:18.24,0:50:21.72,Default,,0000,0000,0000,,to do a modeling so we have to split our
Dialogue: 0,0:50:21.72,0:50:24.36,Default,,0000,0000,0000,,data set into a training data set and a
Dialogue: 0,0:50:24.36,0:50:28.64,Default,,0000,0000,0000,,test data set so coming back to again um
Dialogue: 0,0:50:28.64,0:50:32.16,Default,,0000,0000,0000,,we said that in a before you train your
Dialogue: 0,0:50:32.16,0:50:33.80,Default,,0000,0000,0000,,model sorry before you train your model
Dialogue: 0,0:50:33.80,0:50:35.60,Default,,0000,0000,0000,,you have to take your original data set
Dialogue: 0,0:50:35.60,0:50:37.32,Default,,0000,0000,0000,,now this is a featured engineered data
Dialogue: 0,0:50:37.32,0:50:38.84,Default,,0000,0000,0000,,set we're going to break it into two or
Dialogue: 0,0:50:38.84,0:50:40.84,Default,,0000,0000,0000,,more subsets okay so one is called the
Dialogue: 0,0:50:40.84,0:50:42.40,Default,,0000,0000,0000,,training data set that we use to Feit
Dialogue: 0,0:50:42.40,0:50:44.00,Default,,0000,0000,0000,,and train a machine learning model the
Dialogue: 0,0:50:44.00,0:50:45.92,Default,,0000,0000,0000,,second is test data set to evaluate the
Dialogue: 0,0:50:45.92,0:50:47.96,Default,,0000,0000,0000,,accuracy of the model okay so we got
Dialogue: 0,0:50:47.96,0:50:50.56,Default,,0000,0000,0000,,this training data set your test data
Dialogue: 0,0:50:50.56,0:50:52.72,Default,,0000,0000,0000,,set and we also need
Dialogue: 0,0:50:52.72,0:50:56.16,Default,,0000,0000,0000,,to sample so from our original data set
Dialogue: 0,0:50:56.16,0:50:57.40,Default,,0000,0000,0000,,we need to sample sample some points
Dialogue: 0,0:50:57.40,0:50:58.84,Default,,0000,0000,0000,,that go into your training data set some
Dialogue: 0,0:50:58.84,0:51:00.56,Default,,0000,0000,0000,,points that go in your test data set so
Dialogue: 0,0:51:00.56,0:51:02.72,Default,,0000,0000,0000,,there are many ways to do sampling one
Dialogue: 0,0:51:02.72,0:51:04.92,Default,,0000,0000,0000,,way is to do stratified sampling where
Dialogue: 0,0:51:04.92,0:51:06.72,Default,,0000,0000,0000,,we ensure the same proportion of data
Dialogue: 0,0:51:06.72,0:51:09.00,Default,,0000,0000,0000,,from each steta or class because right
Dialogue: 0,0:51:09.00,0:51:10.96,Default,,0000,0000,0000,,now we have a multiclass classification
Dialogue: 0,0:51:10.96,0:51:12.32,Default,,0000,0000,0000,,problem so you want to make sure the
Dialogue: 0,0:51:12.32,0:51:13.96,Default,,0000,0000,0000,,same proportion of data from each TR
Dialogue: 0,0:51:13.96,0:51:15.84,Default,,0000,0000,0000,,class is equally proportional in the
Dialogue: 0,0:51:15.84,0:51:17.92,Default,,0000,0000,0000,,training and test data set as the
Dialogue: 0,0:51:17.92,0:51:20.12,Default,,0000,0000,0000,,original data set which is very useful
Dialogue: 0,0:51:20.12,0:51:21.64,Default,,0000,0000,0000,,for dealing with what is called an
Dialogue: 0,0:51:21.64,0:51:24.32,Default,,0000,0000,0000,,imbalanced data set so here we have an
Dialogue: 0,0:51:24.32,0:51:25.84,Default,,0000,0000,0000,,example of what is called an imbalanced
Dialogue: 0,0:51:25.84,0:51:29.52,Default,,0000,0000,0000,,data set in the sense that you have the
Dialogue: 0,0:51:29.52,0:51:32.76,Default,,0000,0000,0000,,vast majority of data points in your
Dialogue: 0,0:51:32.76,0:51:34.96,Default,,0000,0000,0000,,data set they are going to have the
Dialogue: 0,0:51:34.96,0:51:37.48,Default,,0000,0000,0000,,value of zero for their target variable
Dialogue: 0,0:51:37.48,0:51:40.20,Default,,0000,0000,0000,,colume so only a extremely small
Dialogue: 0,0:51:40.20,0:51:43.12,Default,,0000,0000,0000,,minority of the data points in your data
Dialogue: 0,0:51:43.12,0:51:45.32,Default,,0000,0000,0000,,set will actually have the value of one
Dialogue: 0,0:51:45.32,0:51:48.72,Default,,0000,0000,0000,,for their target variable colume okay so
Dialogue: 0,0:51:48.72,0:51:51.04,Default,,0000,0000,0000,,a situation where you have your class or
Dialogue: 0,0:51:51.04,0:51:52.52,Default,,0000,0000,0000,,your target variable colume where the
Dialogue: 0,0:51:52.52,0:51:54.48,Default,,0000,0000,0000,,vast majority of values are from one
Dialogue: 0,0:51:54.48,0:51:58.12,Default,,0000,0000,0000,,class and a tiny small minority are from
Dialogue: 0,0:51:58.12,0:52:00.52,Default,,0000,0000,0000,,another class we call this an imbalanced
Dialogue: 0,0:52:00.52,0:52:02.72,Default,,0000,0000,0000,,data set and for an imbalanced data set
Dialogue: 0,0:52:02.72,0:52:04.32,Default,,0000,0000,0000,,typically we will have a specific
Dialogue: 0,0:52:04.32,0:52:05.92,Default,,0000,0000,0000,,technique to do the train test split
Dialogue: 0,0:52:05.92,0:52:08.12,Default,,0000,0000,0000,,which is called stratified sampling and
Dialogue: 0,0:52:08.12,0:52:09.60,Default,,0000,0000,0000,,so that's what's exactly happening here
Dialogue: 0,0:52:09.60,0:52:12.00,Default,,0000,0000,0000,,we're doing a stratified split here so
Dialogue: 0,0:52:12.00,0:52:14.84,Default,,0000,0000,0000,,we are doing a train test split here uh
Dialogue: 0,0:52:14.84,0:52:17.52,Default,,0000,0000,0000,,and we are doing a stratified split uh
Dialogue: 0,0:52:17.52,0:52:20.36,Default,,0000,0000,0000,,and then now we actually develop the
Dialogue: 0,0:52:20.36,0:52:23.36,Default,,0000,0000,0000,,models so now we've got the train test
Dialogue: 0,0:52:23.36,0:52:25.48,Default,,0000,0000,0000,,plate now here is where we actually
Dialogue: 0,0:52:25.48,0:52:27.08,Default,,0000,0000,0000,,train the models
Dialogue: 0,0:52:27.08,0:52:29.92,Default,,0000,0000,0000,,now in terms of classification there are
Dialogue: 0,0:52:29.92,0:52:32.32,Default,,0000,0000,0000,,a whole bunch of
Dialogue: 0,0:52:32.32,0:52:35.40,Default,,0000,0000,0000,,possibilities right that you can use
Dialogue: 0,0:52:35.40,0:52:38.48,Default,,0000,0000,0000,,there are many many different algorithms
Dialogue: 0,0:52:38.48,0:52:41.00,Default,,0000,0000,0000,,that we can use to create a
Dialogue: 0,0:52:41.00,0:52:42.84,Default,,0000,0000,0000,,classification model so this are an
Dialogue: 0,0:52:42.84,0:52:45.08,Default,,0000,0000,0000,,example of some of the more common ones
Dialogue: 0,0:52:45.08,0:52:47.48,Default,,0000,0000,0000,,logistic support Vector machine decision
Dialogue: 0,0:52:47.48,0:52:49.52,Default,,0000,0000,0000,,trees random Forest bagging balance
Dialogue: 0,0:52:49.52,0:52:52.72,Default,,0000,0000,0000,,bagging boost assemble Ensemble so all
Dialogue: 0,0:52:52.72,0:52:55.04,Default,,0000,0000,0000,,these are different algorithms which
Dialogue: 0,0:52:55.04,0:52:57.76,Default,,0000,0000,0000,,will create different kind of models
Dialogue: 0,0:52:57.76,0:53:01.60,Default,,0000,0000,0000,,which will result in different accuracy
Dialogue: 0,0:53:01.60,0:53:05.40,Default,,0000,0000,0000,,measures okay so it's the goal of the
Dialogue: 0,0:53:05.40,0:53:08.92,Default,,0000,0000,0000,,data scientist to find the best model
Dialogue: 0,0:53:08.92,0:53:11.52,Default,,0000,0000,0000,,that gives the best accuracy for the
Dialogue: 0,0:53:11.52,0:53:14.12,Default,,0000,0000,0000,,given data set for training on that
Dialogue: 0,0:53:14.12,0:53:16.88,Default,,0000,0000,0000,,given data set so let's head back again
Dialogue: 0,0:53:16.88,0:53:19.76,Default,,0000,0000,0000,,to uh our machine learning workflow so
Dialogue: 0,0:53:19.76,0:53:21.52,Default,,0000,0000,0000,,here basically what I'm doing is I'm
Dialogue: 0,0:53:21.52,0:53:23.52,Default,,0000,0000,0000,,creating a whole bunch of models here
Dialogue: 0,0:53:23.52,0:53:25.52,Default,,0000,0000,0000,,all right so one is a random Forest one
Dialogue: 0,0:53:25.52,0:53:27.16,Default,,0000,0000,0000,,is balance bagging one is a boost
Dialogue: 0,0:53:27.16,0:53:29.52,Default,,0000,0000,0000,,classifier one's The Ensemble classifier
Dialogue: 0,0:53:29.52,0:53:32.76,Default,,0000,0000,0000,,and using all of these I am going to
Dialogue: 0,0:53:32.76,0:53:35.32,Default,,0000,0000,0000,,basically Feit or train my model using
Dialogue: 0,0:53:35.32,0:53:37.44,Default,,0000,0000,0000,,all these algorithms and then I'm going
Dialogue: 0,0:53:37.44,0:53:39.80,Default,,0000,0000,0000,,to evaluate them okay I'm going to
Dialogue: 0,0:53:39.80,0:53:42.48,Default,,0000,0000,0000,,evaluate how good each of these models
Dialogue: 0,0:53:42.48,0:53:45.76,Default,,0000,0000,0000,,are and here you can see your value your
Dialogue: 0,0:53:45.76,0:53:48.84,Default,,0000,0000,0000,,evaluation data right okay and this is
Dialogue: 0,0:53:48.84,0:53:50.84,Default,,0000,0000,0000,,the confusion Matrix which is another
Dialogue: 0,0:53:50.84,0:53:54.28,Default,,0000,0000,0000,,way of evaluating so now we come to the
Dialogue: 0,0:53:54.28,0:53:56.32,Default,,0000,0000,0000,,kind of the the the key part here which
Dialogue: 0,0:53:56.32,0:53:58.52,Default,,0000,0000,0000,,is which is how do I distinguish between
Dialogue: 0,0:53:58.52,0:54:00.08,Default,,0000,0000,0000,,all these models right I've got all
Dialogue: 0,0:54:00.08,0:54:01.40,Default,,0000,0000,0000,,these different models which are built
Dialogue: 0,0:54:01.40,0:54:03.04,Default,,0000,0000,0000,,with different algorithms which I'm
Dialogue: 0,0:54:03.04,0:54:05.36,Default,,0000,0000,0000,,using to train on the same data set how
Dialogue: 0,0:54:05.36,0:54:07.36,Default,,0000,0000,0000,,do I distinguish between all these
Dialogue: 0,0:54:07.36,0:54:10.36,Default,,0000,0000,0000,,models okay and so for that sense for
Dialogue: 0,0:54:10.36,0:54:13.88,Default,,0000,0000,0000,,that we actually have a whole bunch of
Dialogue: 0,0:54:13.88,0:54:16.20,Default,,0000,0000,0000,,common evaluation matrics for
Dialogue: 0,0:54:16.20,0:54:18.32,Default,,0000,0000,0000,,classification right so this evaluation
Dialogue: 0,0:54:18.32,0:54:22.24,Default,,0000,0000,0000,,matrics tell us how good a model is in
Dialogue: 0,0:54:22.24,0:54:24.32,Default,,0000,0000,0000,,terms of its accuracy in
Dialogue: 0,0:54:24.32,0:54:27.00,Default,,0000,0000,0000,,classification so in terms of
Dialogue: 0,0:54:27.00,0:54:29.44,Default,,0000,0000,0000,,accuracy we actually have many different
Dialogue: 0,0:54:29.44,0:54:31.68,Default,,0000,0000,0000,,models uh sorry many different measures
Dialogue: 0,0:54:31.68,0:54:33.44,Default,,0000,0000,0000,,right you might think well accuracy is
Dialogue: 0,0:54:33.44,0:54:35.40,Default,,0000,0000,0000,,just accuracy well that's all right it's
Dialogue: 0,0:54:35.40,0:54:36.88,Default,,0000,0000,0000,,just either it's accurate or it's not
Dialogue: 0,0:54:36.88,0:54:39.32,Default,,0000,0000,0000,,accurate right but actually it's not
Dialogue: 0,0:54:39.32,0:54:41.36,Default,,0000,0000,0000,,that simple there are many different
Dialogue: 0,0:54:41.36,0:54:43.84,Default,,0000,0000,0000,,ways to measure the accuracy of a
Dialogue: 0,0:54:43.84,0:54:45.48,Default,,0000,0000,0000,,classification model and these are some
Dialogue: 0,0:54:45.48,0:54:48.28,Default,,0000,0000,0000,,of the more common ones so for example
Dialogue: 0,0:54:48.28,0:54:51.00,Default,,0000,0000,0000,,the confusion metrix tells us how many
Dialogue: 0,0:54:51.00,0:54:54.00,Default,,0000,0000,0000,,true positives that means the value is
Dialogue: 0,0:54:54.00,0:54:55.88,Default,,0000,0000,0000,,positive the prediction is positive how
Dialogue: 0,0:54:55.88,0:54:57.52,Default,,0000,0000,0000,,many false FAL positives which means the
Dialogue: 0,0:54:57.52,0:54:59.04,Default,,0000,0000,0000,,value is negative the machine learning
Dialogue: 0,0:54:59.04,0:55:01.84,Default,,0000,0000,0000,,model predicts positive how many false
Dialogue: 0,0:55:01.84,0:55:03.84,Default,,0000,0000,0000,,negatives which means that the machine
Dialogue: 0,0:55:03.84,0:55:05.56,Default,,0000,0000,0000,,learning model predicts negative but
Dialogue: 0,0:55:05.56,0:55:07.48,Default,,0000,0000,0000,,it's actually positive and how many true
Dialogue: 0,0:55:07.48,0:55:09.36,Default,,0000,0000,0000,,negatives there are which means that the
Dialogue: 0,0:55:09.36,0:55:11.24,Default,,0000,0000,0000,,machine the machine learning model
Dialogue: 0,0:55:11.24,0:55:12.88,Default,,0000,0000,0000,,predicts negative and the true value is
Dialogue: 0,0:55:12.88,0:55:14.76,Default,,0000,0000,0000,,also negative so this is called a
Dialogue: 0,0:55:14.76,0:55:16.92,Default,,0000,0000,0000,,confusion Matrix this is one way we
Dialogue: 0,0:55:16.92,0:55:19.48,Default,,0000,0000,0000,,assess or evaluate the performance of a
Dialogue: 0,0:55:19.48,0:55:20.52,Default,,0000,0000,0000,,classification
Dialogue: 0,0:55:20.52,0:55:23.32,Default,,0000,0000,0000,,model okay this is for binary
Dialogue: 0,0:55:23.32,0:55:24.68,Default,,0000,0000,0000,,classification we can also have
Dialogue: 0,0:55:24.68,0:55:26.88,Default,,0000,0000,0000,,multiclass confusion Matrix
Dialogue: 0,0:55:26.88,0:55:29.00,Default,,0000,0000,0000,,and then we can also measure things like
Dialogue: 0,0:55:29.00,0:55:31.72,Default,,0000,0000,0000,,accuracy so accuracy is the true
Dialogue: 0,0:55:31.72,0:55:34.08,Default,,0000,0000,0000,,positives plus the true negatives which
Dialogue: 0,0:55:34.08,0:55:35.44,Default,,0000,0000,0000,,is the total number of correct
Dialogue: 0,0:55:35.44,0:55:37.84,Default,,0000,0000,0000,,predictions made by the model divided by
Dialogue: 0,0:55:37.84,0:55:39.84,Default,,0000,0000,0000,,the total number of data points in your
Dialogue: 0,0:55:39.84,0:55:42.60,Default,,0000,0000,0000,,data set and then you have also other
Dialogue: 0,0:55:42.60,0:55:43.72,Default,,0000,0000,0000,,kinds of
Dialogue: 0,0:55:43.72,0:55:46.60,Default,,0000,0000,0000,,measures uh such as recall and this is a
Dialogue: 0,0:55:46.60,0:55:49.16,Default,,0000,0000,0000,,formula for recall this is a formula for
Dialogue: 0,0:55:49.16,0:55:51.48,Default,,0000,0000,0000,,the F1 score okay and then there's
Dialogue: 0,0:55:51.48,0:55:55.56,Default,,0000,0000,0000,,something called the uh R curve right so
Dialogue: 0,0:55:55.56,0:55:57.04,Default,,0000,0000,0000,,without going too much in the detail of
Dialogue: 0,0:55:57.04,0:55:59.00,Default,,0000,0000,0000,,what each of these entails essentially
Dialogue: 0,0:55:59.00,0:56:00.64,Default,,0000,0000,0000,,these are all different ways these are
Dialogue: 0,0:56:00.64,0:56:03.28,Default,,0000,0000,0000,,different kpi right just like if you
Dialogue: 0,0:56:03.28,0:56:06.12,Default,,0000,0000,0000,,work in a company you have different kpi
Dialogue: 0,0:56:06.12,0:56:08.08,Default,,0000,0000,0000,,right certain employees have certain kpi
Dialogue: 0,0:56:08.08,0:56:11.28,Default,,0000,0000,0000,,that measures how good or how how uh you
Dialogue: 0,0:56:11.28,0:56:13.20,Default,,0000,0000,0000,,know efficient or how effective a
Dialogue: 0,0:56:13.20,0:56:16.24,Default,,0000,0000,0000,,particular employee is right so the
Dialogue: 0,0:56:16.24,0:56:19.88,Default,,0000,0000,0000,,kpi kpi for your machine learning models
Dialogue: 0,0:56:19.88,0:56:24.24,Default,,0000,0000,0000,,are Roc curve F1 score recall accuracy
Dialogue: 0,0:56:24.24,0:56:26.60,Default,,0000,0000,0000,,okay and your confusion Matrix so so
Dialogue: 0,0:56:26.60,0:56:29.84,Default,,0000,0000,0000,,fundamentally after I have built right
Dialogue: 0,0:56:29.84,0:56:33.36,Default,,0000,0000,0000,,so here I've built my four different
Dialogue: 0,0:56:33.36,0:56:35.24,Default,,0000,0000,0000,,models so after I built these form
Dialogue: 0,0:56:35.24,0:56:37.64,Default,,0000,0000,0000,,different models I'm going to check and
Dialogue: 0,0:56:37.64,0:56:39.68,Default,,0000,0000,0000,,evaluate them using all those different
Dialogue: 0,0:56:39.68,0:56:42.44,Default,,0000,0000,0000,,metrics like for example the F1 score
Dialogue: 0,0:56:42.44,0:56:44.84,Default,,0000,0000,0000,,the Precision score the recall score all
Dialogue: 0,0:56:44.84,0:56:47.32,Default,,0000,0000,0000,,right so for this model I can check out
Dialogue: 0,0:56:47.32,0:56:50.04,Default,,0000,0000,0000,,the ROC score the F1 score the Precision
Dialogue: 0,0:56:50.04,0:56:52.12,Default,,0000,0000,0000,,score the recall score then for this
Dialogue: 0,0:56:52.12,0:56:54.80,Default,,0000,0000,0000,,model this is the ROC score the F1 score
Dialogue: 0,0:56:54.80,0:56:56.84,Default,,0000,0000,0000,,the Precision score the recall called
Dialogue: 0,0:56:56.84,0:56:59.68,Default,,0000,0000,0000,,then for this model and so on so for
Dialogue: 0,0:56:59.68,0:57:03.24,Default,,0000,0000,0000,,every single model I've created using my
Dialogue: 0,0:57:03.24,0:57:05.84,Default,,0000,0000,0000,,training data set I will have all my set
Dialogue: 0,0:57:05.84,0:57:08.00,Default,,0000,0000,0000,,of evaluation metrics that I can use to
Dialogue: 0,0:57:08.00,0:57:11.84,Default,,0000,0000,0000,,evaluate how good this model is okay
Dialogue: 0,0:57:11.84,0:57:13.12,Default,,0000,0000,0000,,same thing here I've got a confusion
Dialogue: 0,0:57:13.12,0:57:15.08,Default,,0000,0000,0000,,Matrix here right so I can use that
Dialogue: 0,0:57:15.08,0:57:18.12,Default,,0000,0000,0000,,again to evaluate between all these four
Dialogue: 0,0:57:18.12,0:57:20.20,Default,,0000,0000,0000,,different models and then I kind of
Dialogue: 0,0:57:20.20,0:57:22.24,Default,,0000,0000,0000,,summarize it up here so we can see from
Dialogue: 0,0:57:22.24,0:57:25.44,Default,,0000,0000,0000,,this summary here that actually the top
Dialogue: 0,0:57:25.44,0:57:27.60,Default,,0000,0000,0000,,two models right which are I'm going to
Dialogue: 0,0:57:27.60,0:57:29.44,Default,,0000,0000,0000,,give a lot as a data scientist I'm now
Dialogue: 0,0:57:29.44,0:57:31.12,Default,,0000,0000,0000,,going to just focus on these two models
Dialogue: 0,0:57:31.12,0:57:33.44,Default,,0000,0000,0000,,so these two models are begging
Dialogue: 0,0:57:33.44,0:57:36.00,Default,,0000,0000,0000,,classifier and random Forest classifier
Dialogue: 0,0:57:36.00,0:57:38.48,Default,,0000,0000,0000,,they have the highest values of F1 score
Dialogue: 0,0:57:38.48,0:57:40.48,Default,,0000,0000,0000,,and the highest values of the rooc curve
Dialogue: 0,0:57:40.48,0:57:42.64,Default,,0000,0000,0000,,score okay so we can say these are the
Dialogue: 0,0:57:42.64,0:57:45.84,Default,,0000,0000,0000,,top two models in terms of accuracy okay
Dialogue: 0,0:57:45.84,0:57:48.92,Default,,0000,0000,0000,,using the fub1 evaluation metric and the
Dialogue: 0,0:57:48.92,0:57:53.72,Default,,0000,0000,0000,,r Au evaluation metric okay so these
Dialogue: 0,0:57:53.72,0:57:57.48,Default,,0000,0000,0000,,results uh kind of summarize here and
Dialogue: 0,0:57:57.48,0:57:59.08,Default,,0000,0000,0000,,then we use different sampling
Dialogue: 0,0:57:59.08,0:58:00.88,Default,,0000,0000,0000,,techniques okay so just now I talked
Dialogue: 0,0:58:00.88,0:58:03.68,Default,,0000,0000,0000,,about um different kinds of sampling
Dialogue: 0,0:58:03.68,0:58:06.40,Default,,0000,0000,0000,,techniques and so the idea of different
Dialogue: 0,0:58:06.40,0:58:08.32,Default,,0000,0000,0000,,kinds of sampling techniques is to just
Dialogue: 0,0:58:08.32,0:58:11.32,Default,,0000,0000,0000,,get a different feel for different
Dialogue: 0,0:58:11.32,0:58:13.72,Default,,0000,0000,0000,,distributions of the data in different
Dialogue: 0,0:58:13.72,0:58:16.36,Default,,0000,0000,0000,,areas of your data set so that you want
Dialogue: 0,0:58:16.36,0:58:20.00,Default,,0000,0000,0000,,to just kind of make sure that your your
Dialogue: 0,0:58:20.00,0:58:22.80,Default,,0000,0000,0000,,your evaluation of accuracy is actually
Dialogue: 0,0:58:22.80,0:58:27.08,Default,,0000,0000,0000,,statistically correct right so we can um
Dialogue: 0,0:58:27.08,0:58:29.60,Default,,0000,0000,0000,,do what is called oversampling and under
Dialogue: 0,0:58:29.60,0:58:30.88,Default,,0000,0000,0000,,sampling which is very useful when
Dialogue: 0,0:58:30.88,0:58:32.28,Default,,0000,0000,0000,,you're working with an imbalance data
Dialogue: 0,0:58:32.28,0:58:35.04,Default,,0000,0000,0000,,set so this is example of doing that and
Dialogue: 0,0:58:35.04,0:58:37.24,Default,,0000,0000,0000,,then here we again again check out the
Dialogue: 0,0:58:37.24,0:58:38.80,Default,,0000,0000,0000,,results for all these different
Dialogue: 0,0:58:38.80,0:58:41.68,Default,,0000,0000,0000,,techniques we use uh the F1 score the Au
Dialogue: 0,0:58:41.68,0:58:43.60,Default,,0000,0000,0000,,score all right these are the two key
Dialogue: 0,0:58:43.60,0:58:46.76,Default,,0000,0000,0000,,measures of accuracy right so and then
Dialogue: 0,0:58:46.76,0:58:47.92,Default,,0000,0000,0000,,we can check out the scores for the
Dialogue: 0,0:58:47.92,0:58:50.48,Default,,0000,0000,0000,,different approaches okay so we can see
Dialogue: 0,0:58:50.48,0:58:53.12,Default,,0000,0000,0000,,oh well overall the models have lower Au
Dialogue: 0,0:58:53.12,0:58:55.72,Default,,0000,0000,0000,,r r Au C score but they have a much
Dialogue: 0,0:58:55.72,0:58:58.28,Default,,0000,0000,0000,,higher F1 score the begging classifier
Dialogue: 0,0:58:58.28,0:59:00.84,Default,,0000,0000,0000,,had the highest R1 highest roc1 score
Dialogue: 0,0:59:00.84,0:59:04.12,Default,,0000,0000,0000,,but F1 score was too low okay then in
Dialogue: 0,0:59:04.12,0:59:06.52,Default,,0000,0000,0000,,the data scientist opinion the random
Dialogue: 0,0:59:06.52,0:59:08.52,Default,,0000,0000,0000,,forest with this particular technique of
Dialogue: 0,0:59:08.52,0:59:10.76,Default,,0000,0000,0000,,sampling has equilibrium between the F1
Dialogue: 0,0:59:10.76,0:59:14.48,Default,,0000,0000,0000,,R F1 R and A score so the takeaway one
Dialogue: 0,0:59:14.48,0:59:16.68,Default,,0000,0000,0000,,is the macro F1 score improves
Dialogue: 0,0:59:16.68,0:59:18.48,Default,,0000,0000,0000,,dramatically using the sampl sampling
Dialogue: 0,0:59:18.48,0:59:20.16,Default,,0000,0000,0000,,techniqu so these models might be better
Dialogue: 0,0:59:20.16,0:59:22.44,Default,,0000,0000,0000,,compared to the balanced ones all right
Dialogue: 0,0:59:22.44,0:59:26.28,Default,,0000,0000,0000,,so based on all this uh evaluation the
Dialogue: 0,0:59:26.28,0:59:27.68,Default,,0000,0000,0000,,data scientist says they're going to
Dialogue: 0,0:59:27.68,0:59:29.92,Default,,0000,0000,0000,,continue to work with these two models
Dialogue: 0,0:59:29.92,0:59:31.44,Default,,0000,0000,0000,,all right and the balance begging one
Dialogue: 0,0:59:31.44,0:59:33.08,Default,,0000,0000,0000,,and then continue to make further
Dialogue: 0,0:59:33.08,0:59:35.04,Default,,0000,0000,0000,,comparisons all right so then we
Dialogue: 0,0:59:35.04,0:59:37.08,Default,,0000,0000,0000,,continue to keep refining on our
Dialogue: 0,0:59:37.08,0:59:38.60,Default,,0000,0000,0000,,evaluation work here we're going to
Dialogue: 0,0:59:38.60,0:59:41.00,Default,,0000,0000,0000,,train the models one more time again so
Dialogue: 0,0:59:41.00,0:59:43.04,Default,,0000,0000,0000,,we again do a training test plate and
Dialogue: 0,0:59:43.04,0:59:44.80,Default,,0000,0000,0000,,then we do that for this particular uh
Dialogue: 0,0:59:44.80,0:59:47.04,Default,,0000,0000,0000,,approach model and then we print out we
Dialogue: 0,0:59:47.04,0:59:48.20,Default,,0000,0000,0000,,print out what is called a
Dialogue: 0,0:59:48.20,0:59:50.96,Default,,0000,0000,0000,,classification report and this is
Dialogue: 0,0:59:50.96,0:59:53.40,Default,,0000,0000,0000,,basically a summary of all those metrics
Dialogue: 0,0:59:53.40,0:59:55.36,Default,,0000,0000,0000,,that I talk about just now so just now
Dialogue: 0,0:59:55.36,0:59:57.52,Default,,0000,0000,0000,,remember I said the the there was
Dialogue: 0,0:59:57.52,0:59:59.68,Default,,0000,0000,0000,,several evaluation metrics right so uh
Dialogue: 0,0:59:59.68,1:00:01.48,Default,,0000,0000,0000,,we had the confusion matrics the
Dialogue: 0,1:00:01.48,1:00:04.12,Default,,0000,0000,0000,,accuracy the Precision the recall the Au
Dialogue: 0,1:00:04.12,1:00:08.12,Default,,0000,0000,0000,,ccore so here with the um classification
Dialogue: 0,1:00:08.12,1:00:09.88,Default,,0000,0000,0000,,report I can get a summary of all of
Dialogue: 0,1:00:09.88,1:00:11.76,Default,,0000,0000,0000,,that so I can see all the values here
Dialogue: 0,1:00:11.76,1:00:14.64,Default,,0000,0000,0000,,okay for this particular model begging
Dialogue: 0,1:00:14.64,1:00:17.16,Default,,0000,0000,0000,,Tomac links and then I can do that for
Dialogue: 0,1:00:17.16,1:00:18.64,Default,,0000,0000,0000,,another model the random Forest
Dialogue: 0,1:00:18.64,1:00:20.60,Default,,0000,0000,0000,,borderline SME and then I can do that
Dialogue: 0,1:00:20.60,1:00:22.20,Default,,0000,0000,0000,,for another model which is the balance
Dialogue: 0,1:00:22.20,1:00:25.16,Default,,0000,0000,0000,,ping so again we see this a lot of
Dialogue: 0,1:00:25.16,1:00:27.08,Default,,0000,0000,0000,,comparison between different models
Dialogue: 0,1:00:27.08,1:00:28.64,Default,,0000,0000,0000,,trying to figure out what all these
Dialogue: 0,1:00:28.64,1:00:30.72,Default,,0000,0000,0000,,evaluation metrics are telling us all
Dialogue: 0,1:00:30.72,1:00:32.96,Default,,0000,0000,0000,,right then again we have a confusion
Dialogue: 0,1:00:32.96,1:00:35.88,Default,,0000,0000,0000,,Matrix so we generate a confusion Matrix
Dialogue: 0,1:00:35.88,1:00:38.88,Default,,0000,0000,0000,,for the bagging with the toac links
Dialogue: 0,1:00:38.88,1:00:40.72,Default,,0000,0000,0000,,under sampling for the random followers
Dialogue: 0,1:00:40.72,1:00:42.68,Default,,0000,0000,0000,,with the borderline mod over sampling
Dialogue: 0,1:00:42.68,1:00:44.96,Default,,0000,0000,0000,,and just balance begging by itself then
Dialogue: 0,1:00:44.96,1:00:47.72,Default,,0000,0000,0000,,again we compare between these three uh
Dialogue: 0,1:00:47.72,1:00:50.80,Default,,0000,0000,0000,,models uh using the confusion Matrix
Dialogue: 0,1:00:50.80,1:00:52.60,Default,,0000,0000,0000,,evaluation Matrix and then we can kind
Dialogue: 0,1:00:52.60,1:00:55.68,Default,,0000,0000,0000,,of come to some conclusions all right so
Dialogue: 0,1:00:55.68,1:00:58.16,Default,,0000,0000,0000,,right so now we look at all the data
Dialogue: 0,1:00:58.16,1:01:01.20,Default,,0000,0000,0000,,then we move on and look at another um
Dialogue: 0,1:01:01.20,1:01:03.16,Default,,0000,0000,0000,,another kind of evaluation metrix which
Dialogue: 0,1:01:03.16,1:01:06.72,Default,,0000,0000,0000,,is the r score right so this is one of
Dialogue: 0,1:01:06.72,1:01:08.68,Default,,0000,0000,0000,,the other evaluation metrics I talk
Dialogue: 0,1:01:08.68,1:01:11.20,Default,,0000,0000,0000,,about so this one is a kind of a curve
Dialogue: 0,1:01:11.20,1:01:12.52,Default,,0000,0000,0000,,you look at it to see the area
Dialogue: 0,1:01:12.52,1:01:14.36,Default,,0000,0000,0000,,underneath the curve this is called AOC
Dialogue: 0,1:01:14.36,1:01:18.08,Default,,0000,0000,0000,,R area under the curve sorry Au Au R
Dialogue: 0,1:01:18.08,1:01:19.88,Default,,0000,0000,0000,,area under the curve all right so the
Dialogue: 0,1:01:19.88,1:01:21.84,Default,,0000,0000,0000,,area under the curve uh
Dialogue: 0,1:01:21.84,1:01:24.32,Default,,0000,0000,0000,,score will give us some idea about the
Dialogue: 0,1:01:24.32,1:01:25.60,Default,,0000,0000,0000,,threshold that we're going to use for
Dialogue: 0,1:01:25.60,1:01:27.68,Default,,0000,0000,0000,,classif ification so we can examine this
Dialogue: 0,1:01:27.68,1:01:29.20,Default,,0000,0000,0000,,for the bagging classifier for the
Dialogue: 0,1:01:29.20,1:01:30.96,Default,,0000,0000,0000,,random forest classifier for the balance
Dialogue: 0,1:01:30.96,1:01:33.60,Default,,0000,0000,0000,,bagging classifier okay then we can also
Dialogue: 0,1:01:33.60,1:01:36.20,Default,,0000,0000,0000,,again do that uh finally we can check
Dialogue: 0,1:01:36.20,1:01:37.88,Default,,0000,0000,0000,,the classification report of this
Dialogue: 0,1:01:37.88,1:01:39.68,Default,,0000,0000,0000,,particular model so we keep doing this
Dialogue: 0,1:01:39.68,1:01:43.20,Default,,0000,0000,0000,,over and over again evaluating this m
Dialogue: 0,1:01:43.20,1:01:45.72,Default,,0000,0000,0000,,The Matrix the the accuracy Matrix the
Dialogue: 0,1:01:45.72,1:01:46.88,Default,,0000,0000,0000,,evaluation Matrix for all these
Dialogue: 0,1:01:46.88,1:01:48.88,Default,,0000,0000,0000,,different models so we keep doing this
Dialogue: 0,1:01:48.88,1:01:50.52,Default,,0000,0000,0000,,over and over again for different
Dialogue: 0,1:01:50.52,1:01:53.44,Default,,0000,0000,0000,,thresholds or for classification and so
Dialogue: 0,1:01:53.44,1:01:56.88,Default,,0000,0000,0000,,as we keep drilling into these we kind
Dialogue: 0,1:01:56.88,1:02:00.84,Default,,0000,0000,0000,,of get more and more understanding of
Dialogue: 0,1:02:00.84,1:02:02.80,Default,,0000,0000,0000,,all these different models which one is
Dialogue: 0,1:02:02.80,1:02:04.76,Default,,0000,0000,0000,,the best one that gives the best
Dialogue: 0,1:02:04.76,1:02:08.52,Default,,0000,0000,0000,,performance for our data set okay so
Dialogue: 0,1:02:08.52,1:02:11.44,Default,,0000,0000,0000,,finally we come to this conclusion this
Dialogue: 0,1:02:11.44,1:02:13.52,Default,,0000,0000,0000,,particular model is not able to reduce
Dialogue: 0,1:02:13.52,1:02:15.28,Default,,0000,0000,0000,,the record on failure test than
Dialogue: 0,1:02:15.28,1:02:17.52,Default,,0000,0000,0000,,95.8% on the other hand balance begging
Dialogue: 0,1:02:17.52,1:02:19.40,Default,,0000,0000,0000,,with a decision thresold of 0.6 is able
Dialogue: 0,1:02:19.40,1:02:21.52,Default,,0000,0000,0000,,to have a better recall blah blah blah
Dialogue: 0,1:02:21.52,1:02:25.32,Default,,0000,0000,0000,,Etc so finally after having done all of
Dialogue: 0,1:02:25.32,1:02:27.48,Default,,0000,0000,0000,,this evalu ations
Dialogue: 0,1:02:27.48,1:02:31.12,Default,,0000,0000,0000,,okay this is the conclusion
Dialogue: 0,1:02:31.12,1:02:33.96,Default,,0000,0000,0000,,so after having gone so right now we
Dialogue: 0,1:02:33.96,1:02:35.28,Default,,0000,0000,0000,,have gone through all the steps of the
Dialogue: 0,1:02:35.28,1:02:37.76,Default,,0000,0000,0000,,Machining learning life cycle and which
Dialogue: 0,1:02:37.76,1:02:40.24,Default,,0000,0000,0000,,means we have right now or the data
Dialogue: 0,1:02:40.24,1:02:41.96,Default,,0000,0000,0000,,scientist right now has gone through all
Dialogue: 0,1:02:41.96,1:02:43.00,Default,,0000,0000,0000,,these
Dialogue: 0,1:02:43.00,1:02:47.08,Default,,0000,0000,0000,,steps uh which is now we have done this
Dialogue: 0,1:02:47.08,1:02:48.64,Default,,0000,0000,0000,,validation so we have done the cleaning
Dialogue: 0,1:02:48.64,1:02:50.56,Default,,0000,0000,0000,,exploration preparation transformation
Dialogue: 0,1:02:50.56,1:02:52.60,Default,,0000,0000,0000,,the future engineering we have developed
Dialogue: 0,1:02:52.60,1:02:54.36,Default,,0000,0000,0000,,and trained multiple models we have
Dialogue: 0,1:02:54.36,1:02:56.48,Default,,0000,0000,0000,,evaluated all these different models so
Dialogue: 0,1:02:56.48,1:02:58.60,Default,,0000,0000,0000,,right now we have reached this stage so
Dialogue: 0,1:02:58.60,1:03:02.72,Default,,0000,0000,0000,,at this stage we as the data scientist
Dialogue: 0,1:03:02.72,1:03:05.48,Default,,0000,0000,0000,,kind of have completed our job so we've
Dialogue: 0,1:03:05.48,1:03:08.12,Default,,0000,0000,0000,,come to some very useful conclusions
Dialogue: 0,1:03:08.12,1:03:09.64,Default,,0000,0000,0000,,which we now can share with our
Dialogue: 0,1:03:09.64,1:03:13.24,Default,,0000,0000,0000,,colleagues all right and based on this
Dialogue: 0,1:03:13.24,1:03:15.40,Default,,0000,0000,0000,,uh conclusions or recommendations
Dialogue: 0,1:03:15.40,1:03:17.16,Default,,0000,0000,0000,,somebody is going to choose a
Dialogue: 0,1:03:17.16,1:03:19.16,Default,,0000,0000,0000,,appropriate model and that model is
Dialogue: 0,1:03:19.16,1:03:22.64,Default,,0000,0000,0000,,going to get deployed for realtime use
Dialogue: 0,1:03:22.64,1:03:25.32,Default,,0000,0000,0000,,in a real life production environment
Dialogue: 0,1:03:25.32,1:03:27.24,Default,,0000,0000,0000,,okay and that decision is going to be
Dialogue: 0,1:03:27.24,1:03:29.36,Default,,0000,0000,0000,,made based on the recommendations coming
Dialogue: 0,1:03:29.36,1:03:30.88,Default,,0000,0000,0000,,from the data scientist at the end of
Dialogue: 0,1:03:30.88,1:03:33.48,Default,,0000,0000,0000,,this phase okay so at the end of this
Dialogue: 0,1:03:33.48,1:03:35.08,Default,,0000,0000,0000,,phase the data scientist is going to
Dialogue: 0,1:03:35.08,1:03:36.88,Default,,0000,0000,0000,,come up with these conclusions so
Dialogue: 0,1:03:36.88,1:03:41.76,Default,,0000,0000,0000,,conclusions is okay if the engineering
Dialogue: 0,1:03:41.76,1:03:44.52,Default,,0000,0000,0000,,team they are looking okay the
Dialogue: 0,1:03:44.52,1:03:46.12,Default,,0000,0000,0000,,engineering team right the engineering
Dialogue: 0,1:03:46.12,1:03:48.72,Default,,0000,0000,0000,,team if they are looking for the highest
Dialogue: 0,1:03:48.72,1:03:51.84,Default,,0000,0000,0000,,failure detection rate possible then
Dialogue: 0,1:03:51.84,1:03:54.48,Default,,0000,0000,0000,,they should go with this particular
Dialogue: 0,1:03:54.48,1:03:56.52,Default,,0000,0000,0000,,model okay
Dialogue: 0,1:03:56.52,1:03:58.68,Default,,0000,0000,0000,,and if they want a balance between
Dialogue: 0,1:03:58.68,1:04:01.04,Default,,0000,0000,0000,,precision and recall then they should
Dialogue: 0,1:04:01.04,1:04:03.24,Default,,0000,0000,0000,,choose between the begging model with a
Dialogue: 0,1:04:03.24,1:04:05.96,Default,,0000,0000,0000,,0.4 decision threshold or the random
Dialogue: 0,1:04:05.96,1:04:09.60,Default,,0000,0000,0000,,forest model with a 0.5 threshold but if
Dialogue: 0,1:04:09.60,1:04:11.88,Default,,0000,0000,0000,,they don't care so much about predicting
Dialogue: 0,1:04:11.88,1:04:14.48,Default,,0000,0000,0000,,every failure and they want the highest
Dialogue: 0,1:04:14.48,1:04:16.76,Default,,0000,0000,0000,,Precision possible then they should opt
Dialogue: 0,1:04:16.76,1:04:19.80,Default,,0000,0000,0000,,for the begging toax link classifier
Dialogue: 0,1:04:19.80,1:04:23.16,Default,,0000,0000,0000,,with a bit higher decision threshold and
Dialogue: 0,1:04:23.16,1:04:26.16,Default,,0000,0000,0000,,so this is the key thing that the data
Dialogue: 0,1:04:26.16,1:04:28.32,Default,,0000,0000,0000,,scientist is going to give right this is
Dialogue: 0,1:04:28.32,1:04:30.76,Default,,0000,0000,0000,,the key takeaway this is the kind of the
Dialogue: 0,1:04:30.76,1:04:32.68,Default,,0000,0000,0000,,end result of the entire machine
Dialogue: 0,1:04:32.68,1:04:34.68,Default,,0000,0000,0000,,learning life cycle right now the data
Dialogue: 0,1:04:34.68,1:04:36.40,Default,,0000,0000,0000,,scientist is going to tell the
Dialogue: 0,1:04:36.40,1:04:38.60,Default,,0000,0000,0000,,engineering team all right you guys
Dialogue: 0,1:04:38.60,1:04:41.16,Default,,0000,0000,0000,,which is more important for you point a
Dialogue: 0,1:04:41.16,1:04:45.04,Default,,0000,0000,0000,,point B or Point C make your decision so
Dialogue: 0,1:04:45.04,1:04:47.40,Default,,0000,0000,0000,,the engineering team will then discuss
Dialogue: 0,1:04:47.40,1:04:48.96,Default,,0000,0000,0000,,among themselves and say hey you know
Dialogue: 0,1:04:48.96,1:04:52.28,Default,,0000,0000,0000,,what what we want is we want to get the
Dialogue: 0,1:04:52.28,1:04:54.72,Default,,0000,0000,0000,,highest failure detection possible
Dialogue: 0,1:04:54.72,1:04:58.36,Default,,0000,0000,0000,,because any kind kind of failure of that
Dialogue: 0,1:04:58.36,1:05:00.40,Default,,0000,0000,0000,,machine or the product on the samply
Dialogue: 0,1:05:00.40,1:05:03.12,Default,,0000,0000,0000,,line is really going to screw us up big
Dialogue: 0,1:05:03.12,1:05:05.64,Default,,0000,0000,0000,,time so what we're looking for is the
Dialogue: 0,1:05:05.64,1:05:08.08,Default,,0000,0000,0000,,model that will give us the highest
Dialogue: 0,1:05:08.08,1:05:10.88,Default,,0000,0000,0000,,failure detection rate we don't care
Dialogue: 0,1:05:10.88,1:05:13.48,Default,,0000,0000,0000,,about Precision but we want to be make
Dialogue: 0,1:05:13.48,1:05:15.44,Default,,0000,0000,0000,,sure that if there's a failure we are
Dialogue: 0,1:05:15.44,1:05:17.72,Default,,0000,0000,0000,,going to catch it right so that's what
Dialogue: 0,1:05:17.72,1:05:19.60,Default,,0000,0000,0000,,they want and so the data scientist will
Dialogue: 0,1:05:19.60,1:05:22.20,Default,,0000,0000,0000,,say Hey you go for the balance begging
Dialogue: 0,1:05:22.20,1:05:24.88,Default,,0000,0000,0000,,model okay then the data scientist saves
Dialogue: 0,1:05:24.88,1:05:27.72,Default,,0000,0000,0000,,this all right uh and then once you have
Dialogue: 0,1:05:27.72,1:05:30.00,Default,,0000,0000,0000,,saved this uh you can then go right
Dialogue: 0,1:05:30.00,1:05:32.32,Default,,0000,0000,0000,,ahead and deploy that so you can go
Dialogue: 0,1:05:32.32,1:05:33.52,Default,,0000,0000,0000,,right ahead and deploy that to
Dialogue: 0,1:05:33.52,1:05:37.16,Default,,0000,0000,0000,,production okay and so if you want to
Dialogue: 0,1:05:37.16,1:05:38.84,Default,,0000,0000,0000,,continue we can actually further
Dialogue: 0,1:05:38.84,1:05:41.12,Default,,0000,0000,0000,,continue this modeling problem so just
Dialogue: 0,1:05:41.12,1:05:43.48,Default,,0000,0000,0000,,now I model this problem as a binary
Dialogue: 0,1:05:43.48,1:05:46.72,Default,,0000,0000,0000,,classification problem uh sorry just I
Dialogue: 0,1:05:46.72,1:05:48.24,Default,,0000,0000,0000,,modeled this problem as a binary
Dialogue: 0,1:05:48.24,1:05:49.52,Default,,0000,0000,0000,,classification which means it's either
Dialogue: 0,1:05:49.52,1:05:51.68,Default,,0000,0000,0000,,zero or one either fail or not fail but
Dialogue: 0,1:05:51.68,1:05:53.60,Default,,0000,0000,0000,,we can also model it as a multiclass
Dialogue: 0,1:05:53.60,1:05:55.64,Default,,0000,0000,0000,,classification problem right because as
Dialogue: 0,1:05:55.64,1:05:57.64,Default,,0000,0000,0000,,as I said earlier just now for the
Dialogue: 0,1:05:57.64,1:06:00.20,Default,,0000,0000,0000,,Target variable colum which is sorry for
Dialogue: 0,1:06:00.20,1:06:02.52,Default,,0000,0000,0000,,the failure type colume you actually
Dialogue: 0,1:06:02.52,1:06:04.84,Default,,0000,0000,0000,,have multiple kinds of failures right
Dialogue: 0,1:06:04.84,1:06:07.56,Default,,0000,0000,0000,,for example you may have a power failure
Dialogue: 0,1:06:07.56,1:06:10.00,Default,,0000,0000,0000,,uh you may have a towar failure uh you
Dialogue: 0,1:06:10.00,1:06:12.92,Default,,0000,0000,0000,,may have a overstrain failure so now we
Dialogue: 0,1:06:12.92,1:06:14.84,Default,,0000,0000,0000,,can model the problem slightly
Dialogue: 0,1:06:14.84,1:06:17.24,Default,,0000,0000,0000,,differently so we can model it as a
Dialogue: 0,1:06:17.24,1:06:19.68,Default,,0000,0000,0000,,multiclass classification problem and
Dialogue: 0,1:06:19.68,1:06:21.16,Default,,0000,0000,0000,,then we go through the entire same
Dialogue: 0,1:06:21.16,1:06:22.68,Default,,0000,0000,0000,,process that we went through just now so
Dialogue: 0,1:06:22.68,1:06:24.88,Default,,0000,0000,0000,,we create different models we test this
Dialogue: 0,1:06:24.88,1:06:26.72,Default,,0000,0000,0000,,out but now the confusion Matrix is for
Dialogue: 0,1:06:26.72,1:06:30.12,Default,,0000,0000,0000,,a multiclass classification isue right
Dialogue: 0,1:06:30.12,1:06:30.96,Default,,0000,0000,0000,,so we're going
Dialogue: 0,1:06:30.96,1:06:34.04,Default,,0000,0000,0000,,to check them out we're going to again
Dialogue: 0,1:06:34.04,1:06:36.08,Default,,0000,0000,0000,,uh try different algorithms or models
Dialogue: 0,1:06:36.08,1:06:38.04,Default,,0000,0000,0000,,again train and test our data set do the
Dialogue: 0,1:06:38.04,1:06:39.76,Default,,0000,0000,0000,,training test split uh on these
Dialogue: 0,1:06:39.76,1:06:42.00,Default,,0000,0000,0000,,different models all right so we have
Dialogue: 0,1:06:42.00,1:06:43.40,Default,,0000,0000,0000,,like for example we have bon random
Dialogue: 0,1:06:43.40,1:06:46.16,Default,,0000,0000,0000,,Forest B random Forest a great search
Dialogue: 0,1:06:46.16,1:06:47.72,Default,,0000,0000,0000,,then you train the models using what is
Dialogue: 0,1:06:47.72,1:06:49.68,Default,,0000,0000,0000,,called hyperparameter tuning then you
Dialogue: 0,1:06:49.68,1:06:51.08,Default,,0000,0000,0000,,get the scores all right so you get the
Dialogue: 0,1:06:51.08,1:06:53.16,Default,,0000,0000,0000,,same evaluation scores again you check
Dialogue: 0,1:06:53.16,1:06:54.60,Default,,0000,0000,0000,,out the evaluation scores compare
Dialogue: 0,1:06:54.60,1:06:57.08,Default,,0000,0000,0000,,between them generate a confusion Matrix
Dialogue: 0,1:06:57.08,1:06:59.96,Default,,0000,0000,0000,,so this is a multiclass confusion Matrix
Dialogue: 0,1:06:59.96,1:07:02.40,Default,,0000,0000,0000,,and then you come to the final
Dialogue: 0,1:07:02.40,1:07:05.76,Default,,0000,0000,0000,,conclusion so now if you are interested
Dialogue: 0,1:07:05.76,1:07:09.00,Default,,0000,0000,0000,,to frame your problem domain as a
Dialogue: 0,1:07:09.00,1:07:11.36,Default,,0000,0000,0000,,multiclass classification problem all
Dialogue: 0,1:07:11.36,1:07:13.84,Default,,0000,0000,0000,,right then these are the recommendations
Dialogue: 0,1:07:13.84,1:07:15.48,Default,,0000,0000,0000,,from the data scientist so the data
Dialogue: 0,1:07:15.48,1:07:17.24,Default,,0000,0000,0000,,scientist will say you know what I'm
Dialogue: 0,1:07:17.24,1:07:19.56,Default,,0000,0000,0000,,going to pick this particular model the
Dialogue: 0,1:07:19.56,1:07:22.04,Default,,0000,0000,0000,,balance backing classifier and these are
Dialogue: 0,1:07:22.04,1:07:24.52,Default,,0000,0000,0000,,all the reasons that the data scientist
Dialogue: 0,1:07:24.52,1:07:27.28,Default,,0000,0000,0000,,is going to give as a rational for
Dialogue: 0,1:07:27.28,1:07:29.40,Default,,0000,0000,0000,,selecting this particular
Dialogue: 0,1:07:29.40,1:07:32.04,Default,,0000,0000,0000,,model and then once that's done you save
Dialogue: 0,1:07:32.04,1:07:35.00,Default,,0000,0000,0000,,the model and that's that's it that's it
Dialogue: 0,1:07:35.00,1:07:38.92,Default,,0000,0000,0000,,so that's all done now and so then the
Dialogue: 0,1:07:38.92,1:07:41.04,Default,,0000,0000,0000,,uh the model the machine learning model
Dialogue: 0,1:07:41.04,1:07:43.72,Default,,0000,0000,0000,,now you can put it live run it on the
Dialogue: 0,1:07:43.72,1:07:45.28,Default,,0000,0000,0000,,server and now the machine learning
Dialogue: 0,1:07:45.28,1:07:47.20,Default,,0000,0000,0000,,model is ready to work which means it's
Dialogue: 0,1:07:47.20,1:07:48.92,Default,,0000,0000,0000,,ready to generate predictions right
Dialogue: 0,1:07:48.92,1:07:50.28,Default,,0000,0000,0000,,that's the main job of the machine
Dialogue: 0,1:07:50.28,1:07:52.04,Default,,0000,0000,0000,,learning model you have picked the best
Dialogue: 0,1:07:52.04,1:07:53.68,Default,,0000,0000,0000,,machine learning model with the best
Dialogue: 0,1:07:53.68,1:07:55.80,Default,,0000,0000,0000,,evaluation metrics for whatever accur
Dialogue: 0,1:07:55.80,1:07:57.76,Default,,0000,0000,0000,,see goal you're trying to achieve and
Dialogue: 0,1:07:57.76,1:07:59.64,Default,,0000,0000,0000,,now you're going to run it on a server
Dialogue: 0,1:07:59.64,1:08:00.80,Default,,0000,0000,0000,,and now you're going to get all this
Dialogue: 0,1:08:00.80,1:08:02.96,Default,,0000,0000,0000,,real time data that's coming from your
Dialogue: 0,1:08:02.96,1:08:04.52,Default,,0000,0000,0000,,sensus you're going to pump that into
Dialogue: 0,1:08:04.52,1:08:06.36,Default,,0000,0000,0000,,your machine learning model your machine
Dialogue: 0,1:08:06.36,1:08:07.88,Default,,0000,0000,0000,,learning model will pump out a whole
Dialogue: 0,1:08:07.88,1:08:09.52,Default,,0000,0000,0000,,bunch of predictions and we're going to
Dialogue: 0,1:08:09.52,1:08:12.80,Default,,0000,0000,0000,,use that predictions in real time to
Dialogue: 0,1:08:12.80,1:08:15.40,Default,,0000,0000,0000,,make real time real world decision
Dialogue: 0,1:08:15.40,1:08:17.56,Default,,0000,0000,0000,,making right you're going to say okay
Dialogue: 0,1:08:17.56,1:08:19.60,Default,,0000,0000,0000,,I'm predicting that that machine is
Dialogue: 0,1:08:19.60,1:08:23.20,Default,,0000,0000,0000,,going to fail on Thursday at 5:00 p.m.
Dialogue: 0,1:08:23.20,1:08:25.52,Default,,0000,0000,0000,,so you better get your service folks in
Dialogue: 0,1:08:25.52,1:08:28.64,Default,,0000,0000,0000,,to service it on Thursday 2: p.m. or you
Dialogue: 0,1:08:28.64,1:08:31.64,Default,,0000,0000,0000,,know whatever so you can you know uh
Dialogue: 0,1:08:31.64,1:08:33.48,Default,,0000,0000,0000,,make decisions on when you want to do
Dialogue: 0,1:08:33.48,1:08:35.32,Default,,0000,0000,0000,,your maintenance you know and and make
Dialogue: 0,1:08:35.32,1:08:37.64,Default,,0000,0000,0000,,the best decisions to optimize the cost
Dialogue: 0,1:08:37.64,1:08:41.16,Default,,0000,0000,0000,,of Maintenance etc etc and then based on
Dialogue: 0,1:08:41.16,1:08:42.12,Default,,0000,0000,0000,,the
Dialogue: 0,1:08:42.12,1:08:45.00,Default,,0000,0000,0000,,results that are coming up from the
Dialogue: 0,1:08:45.00,1:08:46.76,Default,,0000,0000,0000,,predictions so the predictions may be
Dialogue: 0,1:08:46.76,1:08:49.12,Default,,0000,0000,0000,,good the predictions may be lousy the
Dialogue: 0,1:08:49.12,1:08:51.36,Default,,0000,0000,0000,,predictions may be average right so we
Dialogue: 0,1:08:51.36,1:08:53.72,Default,,0000,0000,0000,,are we're constantly monitoring how good
Dialogue: 0,1:08:53.72,1:08:55.44,Default,,0000,0000,0000,,or how useful are the predictions
Dialogue: 0,1:08:55.44,1:08:57.76,Default,,0000,0000,0000,,generated by this realtime model that's
Dialogue: 0,1:08:57.76,1:08:59.88,Default,,0000,0000,0000,,running on the server and based on our
Dialogue: 0,1:08:59.88,1:09:02.68,Default,,0000,0000,0000,,monitoring we will then take some new
Dialogue: 0,1:09:02.68,1:09:05.32,Default,,0000,0000,0000,,data and then repeat this entire life
Dialogue: 0,1:09:05.32,1:09:07.04,Default,,0000,0000,0000,,cycle again so this is basically a
Dialogue: 0,1:09:07.04,1:09:09.24,Default,,0000,0000,0000,,workflow that's iterative and we are
Dialogue: 0,1:09:09.24,1:09:11.12,Default,,0000,0000,0000,,constantly or the data scientist is
Dialogue: 0,1:09:11.12,1:09:13.32,Default,,0000,0000,0000,,constantly getting in all these new data
Dialogue: 0,1:09:13.32,1:09:15.28,Default,,0000,0000,0000,,points and then refining the model
Dialogue: 0,1:09:15.28,1:09:17.96,Default,,0000,0000,0000,,picking maybe a new model deploying the
Dialogue: 0,1:09:17.96,1:09:21.68,Default,,0000,0000,0000,,new model onto the server and so on all
Dialogue: 0,1:09:21.68,1:09:23.92,Default,,0000,0000,0000,,right and so that's it so that is
Dialogue: 0,1:09:23.92,1:09:26.40,Default,,0000,0000,0000,,basically your machine learning workflow
Dialogue: 0,1:09:26.40,1:09:29.48,Default,,0000,0000,0000,,in a nutshell okay so for this
Dialogue: 0,1:09:29.48,1:09:32.08,Default,,0000,0000,0000,,particular approach we have used a bunch
Dialogue: 0,1:09:32.08,1:09:34.56,Default,,0000,0000,0000,,of uh data science libraries from python
Dialogue: 0,1:09:34.56,1:09:36.52,Default,,0000,0000,0000,,so we have used pandas which is the most
Dialogue: 0,1:09:36.52,1:09:38.56,Default,,0000,0000,0000,,B basic data science libraries that
Dialogue: 0,1:09:38.56,1:09:40.28,Default,,0000,0000,0000,,provides all the tools to work with raw
Dialogue: 0,1:09:40.28,1:09:42.52,Default,,0000,0000,0000,,data we have used numai which is a high
Dialogue: 0,1:09:42.52,1:09:44.08,Default,,0000,0000,0000,,performance library for implementing
Dialogue: 0,1:09:44.08,1:09:46.44,Default,,0000,0000,0000,,complex array metrix operations we have
Dialogue: 0,1:09:46.44,1:09:49.56,Default,,0000,0000,0000,,used met plot lip and cbon which is used
Dialogue: 0,1:09:49.56,1:09:52.44,Default,,0000,0000,0000,,for doing the Eda the explorat
Dialogue: 0,1:09:52.44,1:09:55.56,Default,,0000,0000,0000,,exploratory data analysis phase machine
Dialogue: 0,1:09:55.56,1:09:57.04,Default,,0000,0000,0000,,learning where you visualize all your
Dialogue: 0,1:09:57.04,1:09:59.04,Default,,0000,0000,0000,,data we have used psyit learn which is
Dialogue: 0,1:09:59.04,1:10:01.28,Default,,0000,0000,0000,,the machine L learning library to do all
Dialogue: 0,1:10:01.28,1:10:02.92,Default,,0000,0000,0000,,your implementation for all your call
Dialogue: 0,1:10:02.92,1:10:06.00,Default,,0000,0000,0000,,machine learning algorithms uh we we we
Dialogue: 0,1:10:06.00,1:10:08.00,Default,,0000,0000,0000,,have not used this because this is not a
Dialogue: 0,1:10:08.00,1:10:11.04,Default,,0000,0000,0000,,deep learning uh problem but if you are
Dialogue: 0,1:10:11.04,1:10:12.80,Default,,0000,0000,0000,,working with a deep learning problem
Dialogue: 0,1:10:12.80,1:10:15.36,Default,,0000,0000,0000,,like image classification image
Dialogue: 0,1:10:15.36,1:10:17.84,Default,,0000,0000,0000,,recognition object detection okay
Dialogue: 0,1:10:17.84,1:10:20.20,Default,,0000,0000,0000,,natural language processing text
Dialogue: 0,1:10:20.20,1:10:21.92,Default,,0000,0000,0000,,classification well then you're going to
Dialogue: 0,1:10:21.92,1:10:24.36,Default,,0000,0000,0000,,use these libraries from python which is
Dialogue: 0,1:10:24.36,1:10:28.96,Default,,0000,0000,0000,,tensor flow okay and also py
Dialogue: 0,1:10:28.96,1:10:32.68,Default,,0000,0000,0000,,to and then lastly that whole thing that
Dialogue: 0,1:10:32.68,1:10:34.72,Default,,0000,0000,0000,,whole data science project that you saw
Dialogue: 0,1:10:34.72,1:10:36.80,Default,,0000,0000,0000,,just now this entire data science
Dialogue: 0,1:10:36.80,1:10:38.88,Default,,0000,0000,0000,,project is actually developed in
Dialogue: 0,1:10:38.88,1:10:41.08,Default,,0000,0000,0000,,something called a Jupiter notebook so
Dialogue: 0,1:10:41.08,1:10:44.04,Default,,0000,0000,0000,,all this python code along with all the
Dialogue: 0,1:10:44.04,1:10:46.36,Default,,0000,0000,0000,,observations from the data
Dialogue: 0,1:10:46.36,1:10:48.68,Default,,0000,0000,0000,,scientists okay for this entire data
Dialogue: 0,1:10:48.68,1:10:50.44,Default,,0000,0000,0000,,science project was actually run in
Dialogue: 0,1:10:50.44,1:10:53.36,Default,,0000,0000,0000,,something called a Jupiter notebook so
Dialogue: 0,1:10:53.36,1:10:55.76,Default,,0000,0000,0000,,that is uh the
Dialogue: 0,1:10:55.76,1:10:59.08,Default,,0000,0000,0000,,most widely used tool for interactively
Dialogue: 0,1:10:59.08,1:11:02.36,Default,,0000,0000,0000,,developing and presenting data science
Dialogue: 0,1:11:02.36,1:11:04.64,Default,,0000,0000,0000,,projects okay so that brings me to the
Dialogue: 0,1:11:04.64,1:11:07.40,Default,,0000,0000,0000,,end of this entire presentation I hope
Dialogue: 0,1:11:07.40,1:11:10.36,Default,,0000,0000,0000,,that you find it useful for you and that
Dialogue: 0,1:11:10.36,1:11:13.20,Default,,0000,0000,0000,,you can appreciate the importance of
Dialogue: 0,1:11:13.20,1:11:15.28,Default,,0000,0000,0000,,machine learning and how it can be
Dialogue: 0,1:11:15.28,1:11:19.80,Default,,0000,0000,0000,,applied in a real life use case in a
Dialogue: 0,1:11:19.80,1:11:23.36,Default,,0000,0000,0000,,typical production environment all right
Dialogue: 0,1:11:23.36,1:11:27.24,Default,,0000,0000,0000,,thank you all so much for watching