[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,0:00:01.20,0:00:03.76,Default,,0000,0000,0000,,Hello everyone, my name is Victor. I'm Dialogue: 0,0:00:03.76,0:00:05.36,Default,,0000,0000,0000,,your friendly neighborhood data Dialogue: 0,0:00:05.36,0:00:07.76,Default,,0000,0000,0000,,scientist from DreamCatcher. So in this Dialogue: 0,0:00:07.76,0:00:10.16,Default,,0000,0000,0000,,presentation, I would like to talk about Dialogue: 0,0:00:10.16,0:00:12.76,Default,,0000,0000,0000,,a specific industry use case of AI or Dialogue: 0,0:00:12.76,0:00:15.07,Default,,0000,0000,0000,,machine learning which is predictive Dialogue: 0,0:00:15.07,0:00:19.00,Default,,0000,0000,0000,,maintenance. So I will be covering these Dialogue: 0,0:00:19.00,0:00:21.32,Default,,0000,0000,0000,,topics and feel free to jump forward to Dialogue: 0,0:00:21.32,0:00:23.36,Default,,0000,0000,0000,,the specific part in the video where I Dialogue: 0,0:00:23.36,0:00:25.16,Default,,0000,0000,0000,,talk about all these topics. So I'm going Dialogue: 0,0:00:25.16,0:00:27.16,Default,,0000,0000,0000,,to start off with a general preview of Dialogue: 0,0:00:27.16,0:00:29.08,Default,,0000,0000,0000,,AI and machine learning. Then, I'll Dialogue: 0,0:00:29.08,0:00:30.84,Default,,0000,0000,0000,,discuss the use case which is predictive Dialogue: 0,0:00:30.84,0:00:32.72,Default,,0000,0000,0000,,maintenance. I'll talk about the basics Dialogue: 0,0:00:32.72,0:00:34.80,Default,,0000,0000,0000,,of machine learning, the workflow of Dialogue: 0,0:00:34.80,0:00:37.24,Default,,0000,0000,0000,,machine learning, and then we will come Dialogue: 0,0:00:37.24,0:00:40.76,Default,,0000,0000,0000,,to the meat of this presentation which Dialogue: 0,0:00:40.76,0:00:43.68,Default,,0000,0000,0000,,is essentially a demonstration of the Dialogue: 0,0:00:43.68,0:00:45.40,Default,,0000,0000,0000,,machine learning workflow from end to Dialogue: 0,0:00:45.40,0:00:47.58,Default,,0000,0000,0000,,end on a real life predictive Dialogue: 0,0:00:47.58,0:00:51.52,Default,,0000,0000,0000,,maintenance domain problem. All right, so Dialogue: 0,0:00:51.52,0:00:53.64,Default,,0000,0000,0000,,without any further ado, let's jump into Dialogue: 0,0:00:53.64,0:00:56.68,Default,,0000,0000,0000,,it. So let's start off with a quick Dialogue: 0,0:00:56.68,0:01:00.08,Default,,0000,0000,0000,,preview of AI and machine learning. Well Dialogue: 0,0:01:00.08,0:01:03.60,Default,,0000,0000,0000,,AI is a very general term, it encompasses Dialogue: 0,0:01:03.60,0:01:06.68,Default,,0000,0000,0000,,the entire area of science and Dialogue: 0,0:01:06.68,0:01:09.04,Default,,0000,0000,0000,,engineering that is related to creating Dialogue: 0,0:01:09.04,0:01:10.84,Default,,0000,0000,0000,,software programs and machines that Dialogue: 0,0:01:10.84,0:01:13.76,Default,,0000,0000,0000,,will be capable of performing tasks Dialogue: 0,0:01:13.76,0:01:16.08,Default,,0000,0000,0000,,that would normally require human Dialogue: 0,0:01:16.08,0:01:19.60,Default,,0000,0000,0000,,intelligence. But AI is a catchall term, Dialogue: 0,0:01:19.60,0:01:22.92,Default,,0000,0000,0000,,so really when we talk about apply AI, Dialogue: 0,0:01:22.92,0:01:25.92,Default,,0000,0000,0000,,how we use AI in our daily work, we are Dialogue: 0,0:01:25.92,0:01:27.72,Default,,0000,0000,0000,,really going to be talking about machine Dialogue: 0,0:01:27.72,0:01:30.00,Default,,0000,0000,0000,,learning. So machine learning is the Dialogue: 0,0:01:30.00,0:01:31.68,Default,,0000,0000,0000,,design and application of software Dialogue: 0,0:01:31.68,0:01:34.08,Default,,0000,0000,0000,,algorithms that are capable of learning Dialogue: 0,0:01:34.08,0:01:37.96,Default,,0000,0000,0000,,on their own without any explicit human Dialogue: 0,0:01:37.96,0:01:40.40,Default,,0000,0000,0000,,intervention. And the primary purpose of Dialogue: 0,0:01:40.40,0:01:43.28,Default,,0000,0000,0000,,these algorithms are to optimize Dialogue: 0,0:01:43.28,0:01:46.84,Default,,0000,0000,0000,,performance in a specific task. And the Dialogue: 0,0:01:46.84,0:01:49.68,Default,,0000,0000,0000,,primary performance or the primary task Dialogue: 0,0:01:49.68,0:01:52.00,Default,,0000,0000,0000,,that you want to optimize performance in Dialogue: 0,0:01:52.00,0:01:54.24,Default,,0000,0000,0000,,is to be able to make accurate Dialogue: 0,0:01:54.24,0:01:57.48,Default,,0000,0000,0000,,predictions about future outcomes based Dialogue: 0,0:01:57.48,0:02:00.56,Default,,0000,0000,0000,,on the analysis of historical data Dialogue: 0,0:02:00.56,0:02:02.96,Default,,0000,0000,0000,,from the past. So essentially machine Dialogue: 0,0:02:02.96,0:02:05.32,Default,,0000,0000,0000,,learning is about making predictions Dialogue: 0,0:02:05.32,0:02:06.88,Default,,0000,0000,0000,,about the future or what we call Dialogue: 0,0:02:06.88,0:02:08.92,Default,,0000,0000,0000,,predictive analytics. Dialogue: 0,0:02:08.92,0:02:11.00,Default,,0000,0000,0000,,And there are many different Dialogue: 0,0:02:11.00,0:02:12.72,Default,,0000,0000,0000,,kinds of algorithms that are available in Dialogue: 0,0:02:12.72,0:02:14.52,Default,,0000,0000,0000,,machine learning under the three primary Dialogue: 0,0:02:14.52,0:02:16.44,Default,,0000,0000,0000,,categories of supervised learning, Dialogue: 0,0:02:16.44,0:02:18.92,Default,,0000,0000,0000,,unsupervised learning, and reinforcement Dialogue: 0,0:02:18.92,0:02:21.44,Default,,0000,0000,0000,,learning. And here we can see some of the Dialogue: 0,0:02:21.44,0:02:23.56,Default,,0000,0000,0000,,different kinds of algorithms and their Dialogue: 0,0:02:23.56,0:02:27.48,Default,,0000,0000,0000,,use cases in various areas in Dialogue: 0,0:02:27.48,0:02:29.68,Default,,0000,0000,0000,,industry. So we have various domain use Dialogue: 0,0:02:29.68,0:02:30.48,Default,,0000,0000,0000,,cases Dialogue: 0,0:02:30.48,0:02:31.80,Default,,0000,0000,0000,,for all these different kind of Dialogue: 0,0:02:31.80,0:02:33.84,Default,,0000,0000,0000,,algorithms, and we can see that different Dialogue: 0,0:02:33.84,0:02:38.12,Default,,0000,0000,0000,,algorithms are fitted for different use cases. Dialogue: 0,0:02:38.12,0:02:41.00,Default,,0000,0000,0000,,Deep learning is an advanced form Dialogue: 0,0:02:41.00,0:02:42.40,Default,,0000,0000,0000,,of machine learning that's based on Dialogue: 0,0:02:42.40,0:02:44.28,Default,,0000,0000,0000,,something called an artificial neural Dialogue: 0,0:02:44.28,0:02:46.32,Default,,0000,0000,0000,,network or ANN for short, and this Dialogue: 0,0:02:46.32,0:02:47.84,Default,,0000,0000,0000,,essentially simulates the structure of Dialogue: 0,0:02:47.84,0:02:49.52,Default,,0000,0000,0000,,the human brain whereby neurons Dialogue: 0,0:02:49.52,0:02:51.36,Default,,0000,0000,0000,,interconnect and work together to Dialogue: 0,0:02:51.36,0:02:54.96,Default,,0000,0000,0000,,process and learn new information. So DL Dialogue: 0,0:02:54.96,0:02:57.24,Default,,0000,0000,0000,,is the foundational technology for most Dialogue: 0,0:02:57.24,0:02:59.36,Default,,0000,0000,0000,,of the popular AI tools that you Dialogue: 0,0:02:59.36,0:03:01.40,Default,,0000,0000,0000,,probably have heard of today. So I'm sure Dialogue: 0,0:03:01.40,0:03:03.20,Default,,0000,0000,0000,,you have heard of ChatGPT if you haven't Dialogue: 0,0:03:03.20,0:03:05.36,Default,,0000,0000,0000,,been living in a cave for the past 2 Dialogue: 0,0:03:05.36,0:03:08.28,Default,,0000,0000,0000,,years. And yeah, so ChatGPT is an example Dialogue: 0,0:03:08.28,0:03:10.12,Default,,0000,0000,0000,,of what we call a large language model Dialogue: 0,0:03:10.12,0:03:11.60,Default,,0000,0000,0000,,and that's based on this technology Dialogue: 0,0:03:11.60,0:03:14.88,Default,,0000,0000,0000,,called deep learning. Also, all the modern Dialogue: 0,0:03:14.88,0:03:17.44,Default,,0000,0000,0000,,computer vision applications where a Dialogue: 0,0:03:17.44,0:03:20.04,Default,,0000,0000,0000,,computer program can classify images or Dialogue: 0,0:03:20.04,0:03:23.24,Default,,0000,0000,0000,,detect images or recognize images on Dialogue: 0,0:03:23.24,0:03:25.28,Default,,0000,0000,0000,,its own, okay, we call this computer Dialogue: 0,0:03:25.28,0:03:27.76,Default,,0000,0000,0000,,vision applications. They also use Dialogue: 0,0:03:27.76,0:03:29.52,Default,,0000,0000,0000,,this particular form of machine learning Dialogue: 0,0:03:29.52,0:03:31.56,Default,,0000,0000,0000,,called deep learning, right? So this is a Dialogue: 0,0:03:31.56,0:03:33.64,Default,,0000,0000,0000,,example of an artificial neural network. Dialogue: 0,0:03:33.64,0:03:35.20,Default,,0000,0000,0000,,For example, here I have an image of a Dialogue: 0,0:03:35.20,0:03:37.16,Default,,0000,0000,0000,,bird that's fed into this artificial Dialogue: 0,0:03:37.16,0:03:39.56,Default,,0000,0000,0000,,neural network, and output from this Dialogue: 0,0:03:39.56,0:03:41.24,Default,,0000,0000,0000,,artificial neural network is a Dialogue: 0,0:03:41.24,0:03:43.96,Default,,0000,0000,0000,,classification of this image into one of Dialogue: 0,0:03:43.96,0:03:46.40,Default,,0000,0000,0000,,these three potential categories. So in Dialogue: 0,0:03:46.40,0:03:49.08,Default,,0000,0000,0000,,this case if the a Ann has been trained Dialogue: 0,0:03:49.08,0:03:51.80,Default,,0000,0000,0000,,properly uh we fit in this image this Dialogue: 0,0:03:51.80,0:03:54.08,Default,,0000,0000,0000,,a&n should correctly classify this image Dialogue: 0,0:03:54.08,0:03:56.88,Default,,0000,0000,0000,,as a bird right so this is image Dialogue: 0,0:03:56.88,0:03:58.96,Default,,0000,0000,0000,,classification problem which is a Dialogue: 0,0:03:58.96,0:04:01.08,Default,,0000,0000,0000,,classic use case for an artificial Dialogue: 0,0:04:01.08,0:04:04.44,Default,,0000,0000,0000,,neural network in the field of computer Dialogue: 0,0:04:04.44,0:04:07.88,Default,,0000,0000,0000,,vision and and just like in the case of Dialogue: 0,0:04:07.88,0:04:09.40,Default,,0000,0000,0000,,machine learning there are a variety of Dialogue: 0,0:04:09.40,0:04:11.64,Default,,0000,0000,0000,,algorithms uh that are available for Dialogue: 0,0:04:11.64,0:04:13.60,Default,,0000,0000,0000,,deep learning under the category of Dialogue: 0,0:04:13.60,0:04:15.00,Default,,0000,0000,0000,,supervised learning and also Dialogue: 0,0:04:15.00,0:04:16.84,Default,,0000,0000,0000,,unsupervised Dialogue: 0,0:04:16.84,0:04:19.20,Default,,0000,0000,0000,,learning all right so this is how we can Dialogue: 0,0:04:19.20,0:04:20.84,Default,,0000,0000,0000,,kind of categorize this you can think of Dialogue: 0,0:04:20.84,0:04:23.88,Default,,0000,0000,0000,,AI is a general area of Smart Systems Dialogue: 0,0:04:23.88,0:04:26.56,Default,,0000,0000,0000,,and machine machine learning is Dialogue: 0,0:04:26.56,0:04:29.36,Default,,0000,0000,0000,,basically applied Ai and deep learning Dialogue: 0,0:04:29.36,0:04:30.36,Default,,0000,0000,0000,,is a Dialogue: 0,0:04:30.36,0:04:32.56,Default,,0000,0000,0000,,subspecialization of machine learning Dialogue: 0,0:04:32.56,0:04:35.00,Default,,0000,0000,0000,,using a particular architecture called Dialogue: 0,0:04:35.00,0:04:37.96,Default,,0000,0000,0000,,an artificial neural Dialogue: 0,0:04:37.96,0:04:42.16,Default,,0000,0000,0000,,network and generative AI so if you talk Dialogue: 0,0:04:42.16,0:04:45.28,Default,,0000,0000,0000,,about chat GPT Okay Google Gemini Dialogue: 0,0:04:45.28,0:04:47.64,Default,,0000,0000,0000,,Microsoft co-pilot okay all these Dialogue: 0,0:04:47.64,0:04:49.60,Default,,0000,0000,0000,,examples of generative AI they are Dialogue: 0,0:04:49.60,0:04:51.60,Default,,0000,0000,0000,,basically large language models and they Dialogue: 0,0:04:51.60,0:04:53.88,Default,,0000,0000,0000,,are a further subcategory within the Dialogue: 0,0:04:53.88,0:04:55.52,Default,,0000,0000,0000,,area of deep Dialogue: 0,0:04:55.52,0:04:57.76,Default,,0000,0000,0000,,learning and there are many applications Dialogue: 0,0:04:57.76,0:04:59.40,Default,,0000,0000,0000,,of machine learning in Industry right Dialogue: 0,0:04:59.40,0:05:01.76,Default,,0000,0000,0000,,now so pick which particular industry Dialogue: 0,0:05:01.76,0:05:03.68,Default,,0000,0000,0000,,you involved in and these are all the Dialogue: 0,0:05:03.68,0:05:06.08,Default,,0000,0000,0000,,specific areas of Dialogue: 0,0:05:06.08,0:05:09.96,Default,,0000,0000,0000,,applications right uh so probably I'm Dialogue: 0,0:05:09.96,0:05:11.68,Default,,0000,0000,0000,,going to guess the vast majority of you Dialogue: 0,0:05:11.68,0:05:12.88,Default,,0000,0000,0000,,who are watching this video you're Dialogue: 0,0:05:12.88,0:05:14.36,Default,,0000,0000,0000,,probably coming from the manufacturing Dialogue: 0,0:05:14.36,0:05:16.64,Default,,0000,0000,0000,,industry and so in the manufacturing Dialogue: 0,0:05:16.64,0:05:18.48,Default,,0000,0000,0000,,industry some of the standard use cases Dialogue: 0,0:05:18.48,0:05:20.04,Default,,0000,0000,0000,,for machine learning and deep learning Dialogue: 0,0:05:20.04,0:05:23.08,Default,,0000,0000,0000,,are predicting potential problems okay Dialogue: 0,0:05:23.08,0:05:25.32,Default,,0000,0000,0000,,so sometimes you call this uh predictive Dialogue: 0,0:05:25.32,0:05:27.16,Default,,0000,0000,0000,,maintenance where you want to predict Dialogue: 0,0:05:27.16,0:05:28.80,Default,,0000,0000,0000,,when a problem is going to happen and Dialogue: 0,0:05:28.80,0:05:30.40,Default,,0000,0000,0000,,then kind of address it before it Dialogue: 0,0:05:30.40,0:05:32.76,Default,,0000,0000,0000,,happens and then monitoring systems Dialogue: 0,0:05:32.76,0:05:35.20,Default,,0000,0000,0000,,automating your manufacturing assembly Dialogue: 0,0:05:35.20,0:05:37.88,Default,,0000,0000,0000,,line or production line okay smart Dialogue: 0,0:05:37.88,0:05:40.12,Default,,0000,0000,0000,,scheduling and detecting anomaly on your Dialogue: 0,0:05:40.12,0:05:41.60,Default,,0000,0000,0000,,production Dialogue: 0,0:05:41.60,0:05:44.16,Default,,0000,0000,0000,,line okay so let's talk about the use Dialogue: 0,0:05:44.16,0:05:45.68,Default,,0000,0000,0000,,case here which is predictive Dialogue: 0,0:05:45.68,0:05:49.28,Default,,0000,0000,0000,,maintenance right so what is predictive Dialogue: 0,0:05:49.28,0:05:51.72,Default,,0000,0000,0000,,maintenance well predictive maintenance Dialogue: 0,0:05:51.72,0:05:53.20,Default,,0000,0000,0000,,uh here's the long definition is a Dialogue: 0,0:05:53.20,0:05:54.64,Default,,0000,0000,0000,,equipment maintenance strategy that Dialogue: 0,0:05:54.64,0:05:56.28,Default,,0000,0000,0000,,relies on real-time monitoring of Dialogue: 0,0:05:56.28,0:05:58.36,Default,,0000,0000,0000,,equipment conditions and data to predict Dialogue: 0,0:05:58.36,0:06:00.28,Default,,0000,0000,0000,,equipment failures in advance Dialogue: 0,0:06:00.28,0:06:02.68,Default,,0000,0000,0000,,and this uses Advanced Data models Dialogue: 0,0:06:02.68,0:06:05.24,Default,,0000,0000,0000,,analytics and machine learning whereby Dialogue: 0,0:06:05.24,0:06:07.48,Default,,0000,0000,0000,,we can reliably assess when failures are Dialogue: 0,0:06:07.48,0:06:09.20,Default,,0000,0000,0000,,more likely to occur including which Dialogue: 0,0:06:09.20,0:06:11.12,Default,,0000,0000,0000,,components are more likely to be Dialogue: 0,0:06:11.12,0:06:13.56,Default,,0000,0000,0000,,affected on your production or assembly Dialogue: 0,0:06:13.56,0:06:16.60,Default,,0000,0000,0000,,line so where does pred predictive Dialogue: 0,0:06:16.60,0:06:18.76,Default,,0000,0000,0000,,maintenance fit into the overall scheme Dialogue: 0,0:06:18.76,0:06:20.76,Default,,0000,0000,0000,,of things right so let's talk about the Dialogue: 0,0:06:20.76,0:06:23.04,Default,,0000,0000,0000,,kind of standard way that you know Dialogue: 0,0:06:23.04,0:06:25.52,Default,,0000,0000,0000,,factories or production uh or production Dialogue: 0,0:06:25.52,0:06:27.68,Default,,0000,0000,0000,,lines assembly lines in factories tend Dialogue: 0,0:06:27.68,0:06:31.08,Default,,0000,0000,0000,,to handle uh Main maintenance issues say Dialogue: 0,0:06:31.08,0:06:33.12,Default,,0000,0000,0000,,10 or 20 years ago right so what you Dialogue: 0,0:06:33.12,0:06:34.52,Default,,0000,0000,0000,,have is the what you would probably Dialogue: 0,0:06:34.52,0:06:36.40,Default,,0000,0000,0000,,start off is is the most basic mode Dialogue: 0,0:06:36.40,0:06:38.24,Default,,0000,0000,0000,,which is reactive maintenance so you Dialogue: 0,0:06:38.24,0:06:40.68,Default,,0000,0000,0000,,just wait until your machine breaks down Dialogue: 0,0:06:40.68,0:06:43.04,Default,,0000,0000,0000,,and then you repair right the simplest Dialogue: 0,0:06:43.04,0:06:44.72,Default,,0000,0000,0000,,but of course I'm sure if you work on a Dialogue: 0,0:06:44.72,0:06:46.72,Default,,0000,0000,0000,,production line for any period of time Dialogue: 0,0:06:46.72,0:06:48.88,Default,,0000,0000,0000,,you know that this reactive maintenance Dialogue: 0,0:06:48.88,0:06:50.76,Default,,0000,0000,0000,,can give you a whole bunch of headaches Dialogue: 0,0:06:50.76,0:06:52.16,Default,,0000,0000,0000,,especially if the machine breaks down Dialogue: 0,0:06:52.16,0:06:54.12,Default,,0000,0000,0000,,just before a critical delivery dat line Dialogue: 0,0:06:54.12,0:06:55.52,Default,,0000,0000,0000,,right then you're you're going to have a Dialogue: 0,0:06:55.52,0:06:56.80,Default,,0000,0000,0000,,backlog of orders and you're going to Dialogue: 0,0:06:56.80,0:06:59.16,Default,,0000,0000,0000,,run a lot of problems okay so we move on Dialogue: 0,0:06:59.16,0:07:00.88,Default,,0000,0000,0000,,to PR preventive maintenance which is Dialogue: 0,0:07:00.88,0:07:03.84,Default,,0000,0000,0000,,you regularly schedule a maintenance of Dialogue: 0,0:07:03.84,0:07:07.00,Default,,0000,0000,0000,,your production machines uh to reduce Dialogue: 0,0:07:07.00,0:07:08.80,Default,,0000,0000,0000,,the failure rate so you might do Dialogue: 0,0:07:08.80,0:07:10.52,Default,,0000,0000,0000,,maintenance once every month once every Dialogue: 0,0:07:10.52,0:07:13.12,Default,,0000,0000,0000,,two weeks whatever okay this is great Dialogue: 0,0:07:13.12,0:07:15.24,Default,,0000,0000,0000,,but the problem of course then is well Dialogue: 0,0:07:15.24,0:07:16.20,Default,,0000,0000,0000,,sometimes you're doing too much Dialogue: 0,0:07:16.20,0:07:18.40,Default,,0000,0000,0000,,maintenance it's not really necessary Dialogue: 0,0:07:18.40,0:07:20.64,Default,,0000,0000,0000,,and it still doesn't totally uh prevent Dialogue: 0,0:07:20.64,0:07:23.24,Default,,0000,0000,0000,,this uh you know uh a failure of the Dialogue: 0,0:07:23.24,0:07:25.64,Default,,0000,0000,0000,,machine that occurs outside of your plan Dialogue: 0,0:07:25.64,0:07:28.68,Default,,0000,0000,0000,,maintenance right so a bit of Dialogue: 0,0:07:28.68,0:07:31.16,Default,,0000,0000,0000,,improvement but not not that much better Dialogue: 0,0:07:31.16,0:07:33.28,Default,,0000,0000,0000,,and then these last two categories is Dialogue: 0,0:07:33.28,0:07:34.68,Default,,0000,0000,0000,,where we bring in Ai and machine Dialogue: 0,0:07:34.68,0:07:36.76,Default,,0000,0000,0000,,learning so with machine learning we're Dialogue: 0,0:07:36.76,0:07:39.28,Default,,0000,0000,0000,,going to use sensors to do real-time Dialogue: 0,0:07:39.28,0:07:41.76,Default,,0000,0000,0000,,monitoring of the data and then using Dialogue: 0,0:07:41.76,0:07:43.32,Default,,0000,0000,0000,,that data we're going to build a machine Dialogue: 0,0:07:43.32,0:07:46.48,Default,,0000,0000,0000,,learning model which helps us to predict Dialogue: 0,0:07:46.48,0:07:50.00,Default,,0000,0000,0000,,with a reasonable level of accuracy when Dialogue: 0,0:07:50.00,0:07:52.52,Default,,0000,0000,0000,,the next failure is going to happen on Dialogue: 0,0:07:52.52,0:07:54.44,Default,,0000,0000,0000,,your assembly or production line on a Dialogue: 0,0:07:54.44,0:07:57.44,Default,,0000,0000,0000,,specific component or specific machine Dialogue: 0,0:07:57.44,0:07:59.52,Default,,0000,0000,0000,,right so you just want to be predict to Dialogue: 0,0:07:59.52,0:08:01.96,Default,,0000,0000,0000,,to a high level of accuracy like maybe Dialogue: 0,0:08:01.96,0:08:04.44,Default,,0000,0000,0000,,to the specific day even the specific Dialogue: 0,0:08:04.44,0:08:06.40,Default,,0000,0000,0000,,hour or even minute itself when you Dialogue: 0,0:08:06.40,0:08:08.36,Default,,0000,0000,0000,,expect that particular product to fail Dialogue: 0,0:08:08.36,0:08:10.96,Default,,0000,0000,0000,,or the particular machine to fail all Dialogue: 0,0:08:10.96,0:08:12.64,Default,,0000,0000,0000,,right so these are advantages of Dialogue: 0,0:08:12.64,0:08:14.88,Default,,0000,0000,0000,,predictive maintenance it minimizes Dialogue: 0,0:08:14.88,0:08:16.72,Default,,0000,0000,0000,,occurrence of unscheduled downtime it Dialogue: 0,0:08:16.72,0:08:18.08,Default,,0000,0000,0000,,gives you a realtime overview of your Dialogue: 0,0:08:18.08,0:08:19.92,Default,,0000,0000,0000,,current condition of assets ensures Dialogue: 0,0:08:19.92,0:08:22.68,Default,,0000,0000,0000,,minimal disruptions of productivity uh Dialogue: 0,0:08:22.68,0:08:24.72,Default,,0000,0000,0000,,optimizes time span on maintenance work Dialogue: 0,0:08:24.72,0:08:26.64,Default,,0000,0000,0000,,optimizes the use of spare parts and so Dialogue: 0,0:08:26.64,0:08:28.28,Default,,0000,0000,0000,,on and of course there are some Dialogue: 0,0:08:28.28,0:08:30.64,Default,,0000,0000,0000,,disadvantages with which is uh the Dialogue: 0,0:08:30.64,0:08:32.56,Default,,0000,0000,0000,,primary one you need a specialized set Dialogue: 0,0:08:32.56,0:08:35.52,Default,,0000,0000,0000,,of skills among your engineers to Dialogue: 0,0:08:35.52,0:08:37.72,Default,,0000,0000,0000,,understand and create machine learning Dialogue: 0,0:08:37.72,0:08:40.60,Default,,0000,0000,0000,,models that can work on the realtime Dialogue: 0,0:08:40.60,0:08:43.56,Default,,0000,0000,0000,,data that you're getting okay so we're Dialogue: 0,0:08:43.56,0:08:45.00,Default,,0000,0000,0000,,going to take a look at some real life Dialogue: 0,0:08:45.00,0:08:47.20,Default,,0000,0000,0000,,use cases so these are a bunch of links Dialogue: 0,0:08:47.20,0:08:48.72,Default,,0000,0000,0000,,here so if you navigate to these links Dialogue: 0,0:08:48.72,0:08:50.12,Default,,0000,0000,0000,,here you'll be able to get a look at Dialogue: 0,0:08:50.12,0:08:54.36,Default,,0000,0000,0000,,some real life use cases of um machine Dialogue: 0,0:08:54.36,0:08:57.64,Default,,0000,0000,0000,,learning uh in predictive maintenance so Dialogue: 0,0:08:57.64,0:09:00.96,Default,,0000,0000,0000,,the IBM website okay gives you a look at Dialogue: 0,0:09:00.96,0:09:04.88,Default,,0000,0000,0000,,a bunch of five use cases so you can Dialogue: 0,0:09:04.88,0:09:06.52,Default,,0000,0000,0000,,click on these links and follow up with Dialogue: 0,0:09:06.52,0:09:08.28,Default,,0000,0000,0000,,them if you want to read more okay this Dialogue: 0,0:09:08.28,0:09:11.48,Default,,0000,0000,0000,,is Waste Management manufacturing okay Dialogue: 0,0:09:11.48,0:09:14.76,Default,,0000,0000,0000,,Building Services and renewable energy Dialogue: 0,0:09:14.76,0:09:16.88,Default,,0000,0000,0000,,and also mining right so these are all Dialogue: 0,0:09:16.88,0:09:18.28,Default,,0000,0000,0000,,use cases if you want to know more about Dialogue: 0,0:09:18.28,0:09:20.48,Default,,0000,0000,0000,,them you can read up and follow them Dialogue: 0,0:09:20.48,0:09:23.60,Default,,0000,0000,0000,,from this website uh and this website Dialogue: 0,0:09:23.60,0:09:25.76,Default,,0000,0000,0000,,gives uh this is a pretty good website I Dialogue: 0,0:09:25.76,0:09:27.72,Default,,0000,0000,0000,,would really encourage you to just look Dialogue: 0,0:09:27.72,0:09:28.88,Default,,0000,0000,0000,,through this if you're interested in Dialogue: 0,0:09:28.88,0:09:31.16,Default,,0000,0000,0000,,predictive maintenance so here it tells Dialogue: 0,0:09:31.16,0:09:34.28,Default,,0000,0000,0000,,you about you know an industry survey of Dialogue: 0,0:09:34.28,0:09:36.36,Default,,0000,0000,0000,,predictive maintenance we can see that a Dialogue: 0,0:09:36.36,0:09:38.20,Default,,0000,0000,0000,,large portion of the industry Dialogue: 0,0:09:38.20,0:09:39.68,Default,,0000,0000,0000,,manufacturing industry agreed that Dialogue: 0,0:09:39.68,0:09:41.36,Default,,0000,0000,0000,,predictive maintenance is a real need to Dialogue: 0,0:09:41.36,0:09:43.96,Default,,0000,0000,0000,,stay competitive uh and predictive Dialogue: 0,0:09:43.96,0:09:45.24,Default,,0000,0000,0000,,maintenance is essential for Dialogue: 0,0:09:45.24,0:09:46.72,Default,,0000,0000,0000,,manufacturing industry and will gain Dialogue: 0,0:09:46.72,0:09:48.28,Default,,0000,0000,0000,,additional strength in the future so Dialogue: 0,0:09:48.28,0:09:50.20,Default,,0000,0000,0000,,this is a survey that was done um quite Dialogue: 0,0:09:50.20,0:09:52.04,Default,,0000,0000,0000,,some time ago and this was the results Dialogue: 0,0:09:52.04,0:09:53.88,Default,,0000,0000,0000,,that we got back so we can see the vast Dialogue: 0,0:09:53.88,0:09:55.72,Default,,0000,0000,0000,,majority of key industry players in the Dialogue: 0,0:09:55.72,0:09:57.64,Default,,0000,0000,0000,,manufacturing sector they consider Dialogue: 0,0:09:57.64,0:09:59.00,Default,,0000,0000,0000,,predictive maintenance to be very Dialogue: 0,0:09:59.00,0:09:59.84,Default,,0000,0000,0000,,important Dialogue: 0,0:09:59.84,0:10:01.60,Default,,0000,0000,0000,,um activity that they want to Dialogue: 0,0:10:01.60,0:10:04.52,Default,,0000,0000,0000,,incorporate into their workflow right Dialogue: 0,0:10:04.52,0:10:07.72,Default,,0000,0000,0000,,and we can see here the kind of Roi that Dialogue: 0,0:10:07.72,0:10:10.68,Default,,0000,0000,0000,,we expect on investment in predictive Dialogue: 0,0:10:10.68,0:10:13.40,Default,,0000,0000,0000,,maintenance so 45% reduction in downtime Dialogue: 0,0:10:13.40,0:10:17.12,Default,,0000,0000,0000,,25% growth in productivity 75% fault Dialogue: 0,0:10:17.12,0:10:19.48,Default,,0000,0000,0000,,elimination 30% reduction in maintenance Dialogue: 0,0:10:19.48,0:10:22.64,Default,,0000,0000,0000,,cost okay and best of all if you really Dialogue: 0,0:10:22.64,0:10:25.04,Default,,0000,0000,0000,,want to kind of take a look at examples Dialogue: 0,0:10:25.04,0:10:26.68,Default,,0000,0000,0000,,all right so there are all these Dialogue: 0,0:10:26.68,0:10:28.12,Default,,0000,0000,0000,,different companies that have uh Dialogue: 0,0:10:28.12,0:10:30.16,Default,,0000,0000,0000,,significantly invested in predictive Dialogue: 0,0:10:30.16,0:10:31.64,Default,,0000,0000,0000,,maintenance technology in their Dialogue: 0,0:10:31.64,0:10:34.24,Default,,0000,0000,0000,,manufacturing processes So pepsic Co we Dialogue: 0,0:10:34.24,0:10:38.48,Default,,0000,0000,0000,,have got uh Frito General motos Mii EOP Dialogue: 0,0:10:38.48,0:10:40.96,Default,,0000,0000,0000,,plan all right so you can jump over here Dialogue: 0,0:10:40.96,0:10:42.96,Default,,0000,0000,0000,,and and take a look at some of these uh Dialogue: 0,0:10:42.96,0:10:46.04,Default,,0000,0000,0000,,use cases let me perhaps let me try and Dialogue: 0,0:10:46.04,0:10:48.08,Default,,0000,0000,0000,,open this up for example Mii right you Dialogue: 0,0:10:48.08,0:10:51.88,Default,,0000,0000,0000,,can see Mii has impl oops Mii has used Dialogue: 0,0:10:51.88,0:10:53.72,Default,,0000,0000,0000,,uh this particular piece of software Dialogue: 0,0:10:53.72,0:10:55.84,Default,,0000,0000,0000,,called ma lab all right or math work Dialogue: 0,0:10:55.84,0:10:59.76,Default,,0000,0000,0000,,sorry uh to do uh predictive maintenance Dialogue: 0,0:10:59.76,0:11:01.92,Default,,0000,0000,0000,,for their manufacturing processes using Dialogue: 0,0:11:01.92,0:11:05.08,Default,,0000,0000,0000,,machine learning and we can talk you can Dialogue: 0,0:11:05.08,0:11:07.68,Default,,0000,0000,0000,,study how they have used it all right Dialogue: 0,0:11:07.68,0:11:09.00,Default,,0000,0000,0000,,and how it works what was their Dialogue: 0,0:11:09.00,0:11:10.92,Default,,0000,0000,0000,,challenge all right the problems that Dialogue: 0,0:11:10.92,0:11:12.64,Default,,0000,0000,0000,,were facing the solution that they use Dialogue: 0,0:11:12.64,0:11:14.56,Default,,0000,0000,0000,,using this MathWorks Consulting piece of Dialogue: 0,0:11:14.56,0:11:17.16,Default,,0000,0000,0000,,software and data that they collected in Dialogue: 0,0:11:17.16,0:11:20.40,Default,,0000,0000,0000,,a uh matlb database all right uh sorry Dialogue: 0,0:11:20.40,0:11:23.64,Default,,0000,0000,0000,,in a Oracle databas uh Oracle database Dialogue: 0,0:11:23.64,0:11:26.40,Default,,0000,0000,0000,,so using math works from math lab all Dialogue: 0,0:11:26.40,0:11:27.96,Default,,0000,0000,0000,,right they were able to create a deep Dialogue: 0,0:11:27.96,0:11:30.56,Default,,0000,0000,0000,,learning model to to to you know to Dialogue: 0,0:11:30.56,0:11:32.84,Default,,0000,0000,0000,,solve this particular issue for their Dialogue: 0,0:11:32.84,0:11:35.72,Default,,0000,0000,0000,,domain so if you're interested please I Dialogue: 0,0:11:35.72,0:11:37.64,Default,,0000,0000,0000,,strongly encourage you to read up on all Dialogue: 0,0:11:37.64,0:11:40.44,Default,,0000,0000,0000,,these real life customer Stories We Dialogue: 0,0:11:40.44,0:11:44.24,Default,,0000,0000,0000,,showcase uh use cases for predictive Dialogue: 0,0:11:44.24,0:11:48.24,Default,,0000,0000,0000,,maintenance okay so that's it for uh Dialogue: 0,0:11:48.24,0:11:52.20,Default,,0000,0000,0000,,real life use cases for predictive Dialogue: 0,0:11:52.96,0:11:56.60,Default,,0000,0000,0000,,maintenance now in this uh topic I'm I'm Dialogue: 0,0:11:56.60,0:11:58.00,Default,,0000,0000,0000,,going to talk about machine learning Dialogue: 0,0:11:58.00,0:12:00.04,Default,,0000,0000,0000,,Basics so what is is actually involved Dialogue: 0,0:12:00.04,0:12:01.48,Default,,0000,0000,0000,,in machine learning and I'm going to Dialogue: 0,0:12:01.48,0:12:03.84,Default,,0000,0000,0000,,give a very quick fast conceptual high Dialogue: 0,0:12:03.84,0:12:05.92,Default,,0000,0000,0000,,level overview of machine learning all Dialogue: 0,0:12:05.92,0:12:09.00,Default,,0000,0000,0000,,right so there are several categories of Dialogue: 0,0:12:09.00,0:12:10.96,Default,,0000,0000,0000,,machine learning supervised unsupervised Dialogue: 0,0:12:10.96,0:12:13.00,Default,,0000,0000,0000,,semi-supervised reinforcement and deep Dialogue: 0,0:12:13.00,0:12:15.88,Default,,0000,0000,0000,,learning okay and let's talk about the Dialogue: 0,0:12:15.88,0:12:19.36,Default,,0000,0000,0000,,most common and widely used category of Dialogue: 0,0:12:19.36,0:12:20.56,Default,,0000,0000,0000,,machine learning which is called Dialogue: 0,0:12:20.56,0:12:25.04,Default,,0000,0000,0000,,supervised learning so the par um use Dialogue: 0,0:12:25.04,0:12:26.28,Default,,0000,0000,0000,,case here that I'm going to be Dialogue: 0,0:12:26.28,0:12:28.56,Default,,0000,0000,0000,,discussing predictive maintenance it's Dialogue: 0,0:12:28.56,0:12:31.32,Default,,0000,0000,0000,,basically a form of supervised learning Dialogue: 0,0:12:31.32,0:12:33.48,Default,,0000,0000,0000,,so how does supervised learning work Dialogue: 0,0:12:33.48,0:12:35.20,Default,,0000,0000,0000,,well in supervised learning you're going Dialogue: 0,0:12:35.20,0:12:37.24,Default,,0000,0000,0000,,to create a machine learning model by Dialogue: 0,0:12:37.24,0:12:39.36,Default,,0000,0000,0000,,providing what is called a labeled data Dialogue: 0,0:12:39.36,0:12:41.68,Default,,0000,0000,0000,,set as a input to a machine learning Dialogue: 0,0:12:41.68,0:12:44.68,Default,,0000,0000,0000,,program or algorithm and this data set Dialogue: 0,0:12:44.68,0:12:46.44,Default,,0000,0000,0000,,is going to contain what is called an Dialogue: 0,0:12:46.44,0:12:48.76,Default,,0000,0000,0000,,independent of feature variables all Dialogue: 0,0:12:48.76,0:12:51.24,Default,,0000,0000,0000,,right so this will be a set of variables Dialogue: 0,0:12:51.24,0:12:52.96,Default,,0000,0000,0000,,and there will be one dependent or Dialogue: 0,0:12:52.96,0:12:54.96,Default,,0000,0000,0000,,Target variable which we also call the Dialogue: 0,0:12:54.96,0:12:57.72,Default,,0000,0000,0000,,label and the idea is that the Dialogue: 0,0:12:57.72,0:12:59.84,Default,,0000,0000,0000,,independent or the feature variable are Dialogue: 0,0:12:59.84,0:13:01.60,Default,,0000,0000,0000,,the attributes or properties of your Dialogue: 0,0:13:01.60,0:13:04.16,Default,,0000,0000,0000,,data set that influence the dependent or Dialogue: 0,0:13:04.16,0:13:07.76,Default,,0000,0000,0000,,the target variable okay so this process Dialogue: 0,0:13:07.76,0:13:09.12,Default,,0000,0000,0000,,that I've just described is called Dialogue: 0,0:13:09.12,0:13:11.60,Default,,0000,0000,0000,,training the machine learning model and Dialogue: 0,0:13:11.60,0:13:14.28,Default,,0000,0000,0000,,the model is fundamentally a Dialogue: 0,0:13:14.28,0:13:16.40,Default,,0000,0000,0000,,mathematical function that best Dialogue: 0,0:13:16.40,0:13:18.40,Default,,0000,0000,0000,,approximates the relationship between Dialogue: 0,0:13:18.40,0:13:20.64,Default,,0000,0000,0000,,the independent variables and the Dialogue: 0,0:13:20.64,0:13:22.64,Default,,0000,0000,0000,,dependent variable all right so there's Dialogue: 0,0:13:22.64,0:13:24.48,Default,,0000,0000,0000,,quite a bit of a mouthful so let's jump Dialogue: 0,0:13:24.48,0:13:26.32,Default,,0000,0000,0000,,into a diagram that maybe illustrates Dialogue: 0,0:13:26.32,0:13:27.88,Default,,0000,0000,0000,,this more clearly so let's say you have Dialogue: 0,0:13:27.88,0:13:30.00,Default,,0000,0000,0000,,a data set here an Excel spreadsheet Dialogue: 0,0:13:30.00,0:13:32.16,Default,,0000,0000,0000,,right and this Excel spreadsheet has a Dialogue: 0,0:13:32.16,0:13:34.04,Default,,0000,0000,0000,,bunch of columns here and a bunch of Dialogue: 0,0:13:34.04,0:13:36.80,Default,,0000,0000,0000,,rows okay so these rows here represent Dialogue: 0,0:13:36.80,0:13:39.00,Default,,0000,0000,0000,,observations or or these rows are what Dialogue: 0,0:13:39.00,0:13:40.96,Default,,0000,0000,0000,,we call observations or samples or data Dialogue: 0,0:13:40.96,0:13:43.12,Default,,0000,0000,0000,,points in our data set okay so let's Dialogue: 0,0:13:43.12,0:13:46.88,Default,,0000,0000,0000,,assume this data set is uh gathered by a Dialogue: 0,0:13:46.88,0:13:49.96,Default,,0000,0000,0000,,marketing manager at a mall at a retail Dialogue: 0,0:13:49.96,0:13:52.28,Default,,0000,0000,0000,,mall all right so they've got all this Dialogue: 0,0:13:52.28,0:13:54.92,Default,,0000,0000,0000,,uh information about the customers who Dialogue: 0,0:13:54.92,0:13:56.80,Default,,0000,0000,0000,,purchase products at this mall all right Dialogue: 0,0:13:56.80,0:13:58.52,Default,,0000,0000,0000,,so some of the information they've Dialogue: 0,0:13:58.52,0:14:00.00,Default,,0000,0000,0000,,gotten about the customers are their Dialogue: 0,0:14:00.00,0:14:01.84,Default,,0000,0000,0000,,gender their age their income and the Dialogue: 0,0:14:01.84,0:14:03.60,Default,,0000,0000,0000,,number of children so all this Dialogue: 0,0:14:03.60,0:14:05.68,Default,,0000,0000,0000,,information about the customers we call Dialogue: 0,0:14:05.68,0:14:07.36,Default,,0000,0000,0000,,this the independent of the feature Dialogue: 0,0:14:07.36,0:14:10.08,Default,,0000,0000,0000,,variables all right and based on all Dialogue: 0,0:14:10.08,0:14:12.76,Default,,0000,0000,0000,,this information about the customer we Dialogue: 0,0:14:12.76,0:14:16.20,Default,,0000,0000,0000,,also managed to get some or we record Dialogue: 0,0:14:16.20,0:14:17.60,Default,,0000,0000,0000,,the information about how much the Dialogue: 0,0:14:17.60,0:14:20.48,Default,,0000,0000,0000,,customer spends all right so this uh Dialogue: 0,0:14:20.48,0:14:22.08,Default,,0000,0000,0000,,information or this numbers here we call Dialogue: 0,0:14:22.08,0:14:23.84,Default,,0000,0000,0000,,this the target variable or the Dialogue: 0,0:14:23.84,0:14:26.60,Default,,0000,0000,0000,,dependent variable right so on the Dialogue: 0,0:14:26.60,0:14:29.52,Default,,0000,0000,0000,,single Row the data point one single one Dialogue: 0,0:14:29.52,0:14:32.56,Default,,0000,0000,0000,,single data point contains all the data Dialogue: 0,0:14:32.56,0:14:35.04,Default,,0000,0000,0000,,for the feature variables and one single Dialogue: 0,0:14:35.04,0:14:37.80,Default,,0000,0000,0000,,value for the label or the target Dialogue: 0,0:14:37.80,0:14:41.20,Default,,0000,0000,0000,,variable okay and the primary purpose of Dialogue: 0,0:14:41.20,0:14:43.24,Default,,0000,0000,0000,,the machine learning model is to create Dialogue: 0,0:14:43.24,0:14:45.52,Default,,0000,0000,0000,,a mapping from all your feature Dialogue: 0,0:14:45.52,0:14:48.16,Default,,0000,0000,0000,,variables to your target variable so Dialogue: 0,0:14:48.16,0:14:50.92,Default,,0000,0000,0000,,somehow there's going to be a function Dialogue: 0,0:14:50.92,0:14:52.16,Default,,0000,0000,0000,,okay this will be a mathematical Dialogue: 0,0:14:52.16,0:14:54.80,Default,,0000,0000,0000,,function that maps all the values of Dialogue: 0,0:14:54.80,0:14:57.04,Default,,0000,0000,0000,,your feature variable to the value of Dialogue: 0,0:14:57.04,0:14:59.64,Default,,0000,0000,0000,,your target variable in other words this Dialogue: 0,0:14:59.64,0:15:01.28,Default,,0000,0000,0000,,function represents the relationship Dialogue: 0,0:15:01.28,0:15:03.36,Default,,0000,0000,0000,,between your future variables and your Dialogue: 0,0:15:03.36,0:15:07.08,Default,,0000,0000,0000,,target variable okay so this whole thing Dialogue: 0,0:15:07.08,0:15:08.56,Default,,0000,0000,0000,,this training process we call this the Dialogue: 0,0:15:08.56,0:15:11.32,Default,,0000,0000,0000,,fitting the model and the target Dialogue: 0,0:15:11.32,0:15:13.24,Default,,0000,0000,0000,,variable or the label this thing here Dialogue: 0,0:15:13.24,0:15:15.12,Default,,0000,0000,0000,,this colume here or the values here Dialogue: 0,0:15:15.12,0:15:17.40,Default,,0000,0000,0000,,these are critical for providing a Dialogue: 0,0:15:17.40,0:15:19.00,Default,,0000,0000,0000,,context to do the fitting of the Dialogue: 0,0:15:19.00,0:15:21.16,Default,,0000,0000,0000,,training of the model and once you've Dialogue: 0,0:15:21.16,0:15:23.36,Default,,0000,0000,0000,,got a trained and fitted model you can Dialogue: 0,0:15:23.36,0:15:25.96,Default,,0000,0000,0000,,then use the model to make an accurate Dialogue: 0,0:15:25.96,0:15:28.32,Default,,0000,0000,0000,,prediction of Target values Dialogue: 0,0:15:28.32,0:15:30.24,Default,,0000,0000,0000,,corresponding to new feature values that Dialogue: 0,0:15:30.24,0:15:32.52,Default,,0000,0000,0000,,the model has yet to encounter or yet to Dialogue: 0,0:15:32.52,0:15:34.76,Default,,0000,0000,0000,,see and this as I've already said Dialogue: 0,0:15:34.76,0:15:36.24,Default,,0000,0000,0000,,earlier this is called Predictive Dialogue: 0,0:15:36.24,0:15:38.48,Default,,0000,0000,0000,,Analytics okay so let's see what's Dialogue: 0,0:15:38.48,0:15:40.12,Default,,0000,0000,0000,,actually happening here you take your Dialogue: 0,0:15:40.12,0:15:43.08,Default,,0000,0000,0000,,training data all right so this is this Dialogue: 0,0:15:43.08,0:15:44.88,Default,,0000,0000,0000,,whole bunch of data this data set here Dialogue: 0,0:15:44.88,0:15:47.44,Default,,0000,0000,0000,,cons consisting of a thousand rows of Dialogue: 0,0:15:47.44,0:15:49.92,Default,,0000,0000,0000,,data 10,000 rows of data you take this Dialogue: 0,0:15:49.92,0:15:52.04,Default,,0000,0000,0000,,entire data set all right this entire Dialogue: 0,0:15:52.04,0:15:54.00,Default,,0000,0000,0000,,data set you jam it into your machine Dialogue: 0,0:15:54.00,0:15:56.52,Default,,0000,0000,0000,,learning algorithm and a couple of hours Dialogue: 0,0:15:56.52,0:15:58.08,Default,,0000,0000,0000,,later your machine learning algorithm Dialogue: 0,0:15:58.08,0:16:01.36,Default,,0000,0000,0000,,comes out with a model and the model is Dialogue: 0,0:16:01.36,0:16:04.20,Default,,0000,0000,0000,,essentially a function that maps all Dialogue: 0,0:16:04.20,0:16:05.96,Default,,0000,0000,0000,,your feature variables which is these Dialogue: 0,0:16:05.96,0:16:08.20,Default,,0000,0000,0000,,four columns here to your target Dialogue: 0,0:16:08.20,0:16:10.44,Default,,0000,0000,0000,,variable which is this one single colume Dialogue: 0,0:16:10.44,0:16:14.28,Default,,0000,0000,0000,,here okay so once you have the model you Dialogue: 0,0:16:14.28,0:16:17.04,Default,,0000,0000,0000,,can put in a new data point so basically Dialogue: 0,0:16:17.04,0:16:19.08,Default,,0000,0000,0000,,the new data point represents data about Dialogue: 0,0:16:19.08,0:16:20.96,Default,,0000,0000,0000,,new customer a new customer that you Dialogue: 0,0:16:20.96,0:16:23.12,Default,,0000,0000,0000,,have never seen before so let's say Dialogue: 0,0:16:23.12,0:16:25.08,Default,,0000,0000,0000,,you've already got information about Dialogue: 0,0:16:25.08,0:16:27.56,Default,,0000,0000,0000,,10,000 customers that have visited this Dialogue: 0,0:16:27.56,0:16:29.92,Default,,0000,0000,0000,,mall and how much each of these 10,000 Dialogue: 0,0:16:29.92,0:16:31.52,Default,,0000,0000,0000,,customers have spent when they at this Dialogue: 0,0:16:31.52,0:16:34.04,Default,,0000,0000,0000,,mall so now you have a totally new Dialogue: 0,0:16:34.04,0:16:35.80,Default,,0000,0000,0000,,customer that comes in the mall this Dialogue: 0,0:16:35.80,0:16:37.80,Default,,0000,0000,0000,,customer has never come into this mall Dialogue: 0,0:16:37.80,0:16:39.84,Default,,0000,0000,0000,,before and what we know about this Dialogue: 0,0:16:39.84,0:16:42.68,Default,,0000,0000,0000,,customer is that he is a male the age is Dialogue: 0,0:16:42.68,0:16:45.20,Default,,0000,0000,0000,,50 the income is 18 and they have nine Dialogue: 0,0:16:45.20,0:16:48.16,Default,,0000,0000,0000,,children so now when you take this data Dialogue: 0,0:16:48.16,0:16:50.52,Default,,0000,0000,0000,,and you pump that into your model your Dialogue: 0,0:16:50.52,0:16:52.92,Default,,0000,0000,0000,,model is going to make a prediction it's Dialogue: 0,0:16:52.92,0:16:55.72,Default,,0000,0000,0000,,going to say hey you know what based on Dialogue: 0,0:16:55.72,0:16:57.28,Default,,0000,0000,0000,,everything that have been trained before Dialogue: 0,0:16:57.28,0:16:59.36,Default,,0000,0000,0000,,and based on the model I've developed Dialogue: 0,0:16:59.36,0:17:01.96,Default,,0000,0000,0000,,I am going to predict that a customer Dialogue: 0,0:17:01.96,0:17:04.88,Default,,0000,0000,0000,,that is of a male gender of the age 50 Dialogue: 0,0:17:04.88,0:17:08.28,Default,,0000,0000,0000,,with the income of 18 and nine children Dialogue: 0,0:17:08.28,0:17:12.40,Default,,0000,0000,0000,,that customer is going to spend 25 ring Dialogue: 0,0:17:12.40,0:17:15.84,Default,,0000,0000,0000,,at the mall and this is it this is what Dialogue: 0,0:17:15.84,0:17:18.60,Default,,0000,0000,0000,,you want right right there right here Dialogue: 0,0:17:18.60,0:17:21.32,Default,,0000,0000,0000,,can you see here that is the final Dialogue: 0,0:17:21.32,0:17:23.48,Default,,0000,0000,0000,,output of your machine learning model Dialogue: 0,0:17:23.48,0:17:27.36,Default,,0000,0000,0000,,it's going to make a prediction about Dialogue: 0,0:17:27.36,0:17:29.76,Default,,0000,0000,0000,,something that it has not ever seen Dialogue: 0,0:17:29.76,0:17:32.92,Default,,0000,0000,0000,,before okay that is the core this is Dialogue: 0,0:17:32.92,0:17:35.52,Default,,0000,0000,0000,,essentially the core of machine learning Dialogue: 0,0:17:35.52,0:17:38.64,Default,,0000,0000,0000,,Predictive Analytics making prediction Dialogue: 0,0:17:38.64,0:17:40.12,Default,,0000,0000,0000,,about the Dialogue: 0,0:17:40.12,0:17:43.80,Default,,0000,0000,0000,,future based on a historical data Dialogue: 0,0:17:43.80,0:17:47.44,Default,,0000,0000,0000,,set okay so there are two areas of Dialogue: 0,0:17:47.44,0:17:49.48,Default,,0000,0000,0000,,supervised learning regression and Dialogue: 0,0:17:49.48,0:17:51.40,Default,,0000,0000,0000,,classification so regression is used to Dialogue: 0,0:17:51.40,0:17:53.44,Default,,0000,0000,0000,,predict a numerical Target variable such Dialogue: 0,0:17:53.44,0:17:55.32,Default,,0000,0000,0000,,as the price of a house or the salary of Dialogue: 0,0:17:55.32,0:17:57.80,Default,,0000,0000,0000,,an employee whereas classification is Dialogue: 0,0:17:57.80,0:17:59.92,Default,,0000,0000,0000,,used to predict a c categorical Target Dialogue: 0,0:17:59.92,0:18:03.56,Default,,0000,0000,0000,,variable or class label okay so for Dialogue: 0,0:18:03.56,0:18:05.80,Default,,0000,0000,0000,,classification you can have either Dialogue: 0,0:18:05.80,0:18:08.68,Default,,0000,0000,0000,,binary or multiclass so for example Dialogue: 0,0:18:08.68,0:18:11.56,Default,,0000,0000,0000,,binary will be just true or false zero Dialogue: 0,0:18:11.56,0:18:14.84,Default,,0000,0000,0000,,or one so whether your machine is going Dialogue: 0,0:18:14.84,0:18:17.36,Default,,0000,0000,0000,,to fail or is it not going to fail right Dialogue: 0,0:18:17.36,0:18:19.00,Default,,0000,0000,0000,,so just two classes two possible Dialogue: 0,0:18:19.00,0:18:21.64,Default,,0000,0000,0000,,outcomes or is the customer going to Dialogue: 0,0:18:21.64,0:18:23.68,Default,,0000,0000,0000,,make a purchase or is the customer not Dialogue: 0,0:18:23.68,0:18:26.16,Default,,0000,0000,0000,,going to make a purchase uh we call this Dialogue: 0,0:18:26.16,0:18:28.12,Default,,0000,0000,0000,,binary classification and then for Dialogue: 0,0:18:28.12,0:18:29.68,Default,,0000,0000,0000,,multiclass when there are more than two Dialogue: 0,0:18:29.68,0:18:32.56,Default,,0000,0000,0000,,classes or types of values so for Dialogue: 0,0:18:32.56,0:18:34.04,Default,,0000,0000,0000,,example here this would be a Dialogue: 0,0:18:34.04,0:18:35.76,Default,,0000,0000,0000,,classification problem so if you have a Dialogue: 0,0:18:35.76,0:18:37.96,Default,,0000,0000,0000,,data set here you've got information Dialogue: 0,0:18:37.96,0:18:39.36,Default,,0000,0000,0000,,about your customers you've got your Dialogue: 0,0:18:39.36,0:18:41.16,Default,,0000,0000,0000,,gender of the customer the age of the Dialogue: 0,0:18:41.16,0:18:42.92,Default,,0000,0000,0000,,customer the salary of the customer and Dialogue: 0,0:18:42.92,0:18:44.64,Default,,0000,0000,0000,,you also have record about whether the Dialogue: 0,0:18:44.64,0:18:47.68,Default,,0000,0000,0000,,customer made a purchase or not okay so Dialogue: 0,0:18:47.68,0:18:50.08,Default,,0000,0000,0000,,you can take this data set to train a Dialogue: 0,0:18:50.08,0:18:52.44,Default,,0000,0000,0000,,classification model and then the Dialogue: 0,0:18:52.44,0:18:54.12,Default,,0000,0000,0000,,classification model can then make a Dialogue: 0,0:18:54.12,0:18:56.32,Default,,0000,0000,0000,,prediction about a new customer and Dialogue: 0,0:18:56.32,0:18:58.80,Default,,0000,0000,0000,,they're going to predict zero which Dialogue: 0,0:18:58.80,0:19:00.48,Default,,0000,0000,0000,,means the customer didn't make a Dialogue: 0,0:19:00.48,0:19:03.16,Default,,0000,0000,0000,,purchase or one which means the customer Dialogue: 0,0:19:03.16,0:19:06.32,Default,,0000,0000,0000,,make a purchase right and regression Dialogue: 0,0:19:06.32,0:19:08.60,Default,,0000,0000,0000,,this is regression so let's say you want Dialogue: 0,0:19:08.60,0:19:11.28,Default,,0000,0000,0000,,to predict the wind speed and you've got Dialogue: 0,0:19:11.28,0:19:13.80,Default,,0000,0000,0000,,historical data about all these four Dialogue: 0,0:19:13.80,0:19:16.56,Default,,0000,0000,0000,,other independent variables or feature Dialogue: 0,0:19:16.56,0:19:18.04,Default,,0000,0000,0000,,variables so you have recorded Dialogue: 0,0:19:18.04,0:19:19.64,Default,,0000,0000,0000,,temperature the pressure the relative Dialogue: 0,0:19:19.64,0:19:21.80,Default,,0000,0000,0000,,humidity and the wind direction for the Dialogue: 0,0:19:21.80,0:19:24.80,Default,,0000,0000,0000,,past 10 days 15 days or whatever okay so Dialogue: 0,0:19:24.80,0:19:26.76,Default,,0000,0000,0000,,now you are going to train your machine Dialogue: 0,0:19:26.76,0:19:28.72,Default,,0000,0000,0000,,learning model using this data set and Dialogue: 0,0:19:28.72,0:19:31.68,Default,,0000,0000,0000,,and the target variable colume okay this Dialogue: 0,0:19:31.68,0:19:33.76,Default,,0000,0000,0000,,colume here the label is basically a Dialogue: 0,0:19:33.76,0:19:37.08,Default,,0000,0000,0000,,number right so now with this number Dialogue: 0,0:19:37.08,0:19:39.60,Default,,0000,0000,0000,,this is a regression model and so now Dialogue: 0,0:19:39.60,0:19:41.76,Default,,0000,0000,0000,,you can put in a new data point so a new Dialogue: 0,0:19:41.76,0:19:45.08,Default,,0000,0000,0000,,data point means a new set of values for Dialogue: 0,0:19:45.08,0:19:46.96,Default,,0000,0000,0000,,temperature pressure relative humidity Dialogue: 0,0:19:46.96,0:19:48.60,Default,,0000,0000,0000,,and wind direction and your machine Dialogue: 0,0:19:48.60,0:19:50.68,Default,,0000,0000,0000,,learning model will then predict the Dialogue: 0,0:19:50.68,0:19:53.64,Default,,0000,0000,0000,,wind speed for that new data point okay Dialogue: 0,0:19:53.64,0:19:57.48,Default,,0000,0000,0000,,so that's a regression model Dialogue: 0,0:19:59.16,0:20:02.28,Default,,0000,0000,0000,,all right so in this particular topic Dialogue: 0,0:20:02.28,0:20:04.92,Default,,0000,0000,0000,,I'm going to talk about the workflow of Dialogue: 0,0:20:04.92,0:20:07.96,Default,,0000,0000,0000,,that's involved in machine learning so Dialogue: 0,0:20:07.96,0:20:12.64,Default,,0000,0000,0000,,in the previous um slides I talked about Dialogue: 0,0:20:12.64,0:20:14.60,Default,,0000,0000,0000,,developing the model all right but Dialogue: 0,0:20:14.60,0:20:16.36,Default,,0000,0000,0000,,that's just one part of the entire Dialogue: 0,0:20:16.36,0:20:19.08,Default,,0000,0000,0000,,workflow so in real life when you use Dialogue: 0,0:20:19.08,0:20:20.48,Default,,0000,0000,0000,,machine learning there's an endtoend Dialogue: 0,0:20:20.48,0:20:22.48,Default,,0000,0000,0000,,workflow that's involved so the first Dialogue: 0,0:20:22.48,0:20:24.16,Default,,0000,0000,0000,,thing of course is you need to get your Dialogue: 0,0:20:24.16,0:20:26.88,Default,,0000,0000,0000,,data and then you need to clean your Dialogue: 0,0:20:26.88,0:20:29.00,Default,,0000,0000,0000,,data and then you need to exploore your Dialogue: 0,0:20:29.00,0:20:30.80,Default,,0000,0000,0000,,data you need to see what's going on in Dialogue: 0,0:20:30.80,0:20:33.28,Default,,0000,0000,0000,,your data set right and your data set Dialogue: 0,0:20:33.28,0:20:35.72,Default,,0000,0000,0000,,real life data sets are not trivial they Dialogue: 0,0:20:35.72,0:20:38.76,Default,,0000,0000,0000,,are hundreds of rows thousands of rows Dialogue: 0,0:20:38.76,0:20:40.64,Default,,0000,0000,0000,,sometimes millions of rows billions of Dialogue: 0,0:20:40.64,0:20:43.08,Default,,0000,0000,0000,,rows we're talking about billions or Dialogue: 0,0:20:43.08,0:20:45.12,Default,,0000,0000,0000,,millions of data points especially if Dialogue: 0,0:20:45.12,0:20:47.12,Default,,0000,0000,0000,,you're using an iot sensor to get data Dialogue: 0,0:20:47.12,0:20:49.00,Default,,0000,0000,0000,,in real time so you've got all these Dialogue: 0,0:20:49.00,0:20:51.32,Default,,0000,0000,0000,,super large data sets you need to clean Dialogue: 0,0:20:51.32,0:20:53.40,Default,,0000,0000,0000,,them and explore them and then you need Dialogue: 0,0:20:53.40,0:20:56.36,Default,,0000,0000,0000,,to prepare them into a right format so Dialogue: 0,0:20:56.36,0:20:59.60,Default,,0000,0000,0000,,that you can put them into the training Dialogue: 0,0:20:59.60,0:21:01.52,Default,,0000,0000,0000,,process to create your machine learning Dialogue: 0,0:21:01.52,0:21:04.80,Default,,0000,0000,0000,,model and then subsequently you check Dialogue: 0,0:21:04.80,0:21:07.56,Default,,0000,0000,0000,,how good is the model right how accurate Dialogue: 0,0:21:07.56,0:21:10.08,Default,,0000,0000,0000,,is the model in terms of its ability to Dialogue: 0,0:21:10.08,0:21:12.56,Default,,0000,0000,0000,,generate predictions or or for the Dialogue: 0,0:21:12.56,0:21:14.96,Default,,0000,0000,0000,,future right how accurate are the Dialogue: 0,0:21:14.96,0:21:16.68,Default,,0000,0000,0000,,predictions that are coming up from your Dialogue: 0,0:21:16.68,0:21:18.40,Default,,0000,0000,0000,,machine learning model so that's Dialogue: 0,0:21:18.40,0:21:20.76,Default,,0000,0000,0000,,validating or evaluating your model and Dialogue: 0,0:21:20.76,0:21:22.56,Default,,0000,0000,0000,,then subsequently if you determine that Dialogue: 0,0:21:22.56,0:21:25.40,Default,,0000,0000,0000,,your model is of adequate accuracy to Dialogue: 0,0:21:25.40,0:21:27.24,Default,,0000,0000,0000,,meet whatever your domain use case Dialogue: 0,0:21:27.24,0:21:29.40,Default,,0000,0000,0000,,requirements are right so let's say the Dialogue: 0,0:21:29.40,0:21:31.44,Default,,0000,0000,0000,,accuracy that's required for your domain Dialogue: 0,0:21:31.44,0:21:32.96,Default,,0000,0000,0000,,use case is Dialogue: 0,0:21:32.96,0:21:35.32,Default,,0000,0000,0000,,85% okay if my machine learning model Dialogue: 0,0:21:35.32,0:21:38.52,Default,,0000,0000,0000,,can give an 85% accuracy rate I think Dialogue: 0,0:21:38.52,0:21:40.16,Default,,0000,0000,0000,,it's good enough then I'm going to Dialogue: 0,0:21:40.16,0:21:42.88,Default,,0000,0000,0000,,deploy it into rail world use case so Dialogue: 0,0:21:42.88,0:21:45.00,Default,,0000,0000,0000,,here the machine learning model gets uh Dialogue: 0,0:21:45.00,0:21:48.44,Default,,0000,0000,0000,,deployed on the server and then um other Dialogue: 0,0:21:48.44,0:21:50.76,Default,,0000,0000,0000,,you know other data sources are going to Dialogue: 0,0:21:50.76,0:21:52.56,Default,,0000,0000,0000,,be captured from somewhere that data is Dialogue: 0,0:21:52.56,0:21:54.20,Default,,0000,0000,0000,,pump into the machine learning model the Dialogue: 0,0:21:54.20,0:21:55.44,Default,,0000,0000,0000,,machine learning model generates Dialogue: 0,0:21:55.44,0:21:57.76,Default,,0000,0000,0000,,predictions and those predictions are Dialogue: 0,0:21:57.76,0:21:59.60,Default,,0000,0000,0000,,then used to make decisions on the Dialogue: 0,0:21:59.60,0:22:02.00,Default,,0000,0000,0000,,factory floor in real time or in any Dialogue: 0,0:22:02.00,0:22:04.56,Default,,0000,0000,0000,,other particular scenario and then you Dialogue: 0,0:22:04.56,0:22:06.84,Default,,0000,0000,0000,,constantly Monitor and update the model Dialogue: 0,0:22:06.84,0:22:09.36,Default,,0000,0000,0000,,you get more new data and then the Dialogue: 0,0:22:09.36,0:22:11.96,Default,,0000,0000,0000,,entire cycle repeats itself so that's Dialogue: 0,0:22:11.96,0:22:14.48,Default,,0000,0000,0000,,your machine learning workflow okay in a Dialogue: 0,0:22:14.48,0:22:16.92,Default,,0000,0000,0000,,nutshell uh here's another example of Dialogue: 0,0:22:16.92,0:22:18.52,Default,,0000,0000,0000,,this the same thing maybe in a slightly Dialogue: 0,0:22:18.52,0:22:20.04,Default,,0000,0000,0000,,different format so again you have your Dialogue: 0,0:22:20.04,0:22:22.16,Default,,0000,0000,0000,,data collection and preparation here we Dialogue: 0,0:22:22.16,0:22:24.36,Default,,0000,0000,0000,,talk more about the different kinds of Dialogue: 0,0:22:24.36,0:22:26.52,Default,,0000,0000,0000,,algorithms that available to create a Dialogue: 0,0:22:26.52,0:22:28.12,Default,,0000,0000,0000,,model and I'll talk about this more in Dialogue: 0,0:22:28.12,0:22:30.00,Default,,0000,0000,0000,,detail when we look at the real world Dialogue: 0,0:22:30.00,0:22:32.32,Default,,0000,0000,0000,,example of a endtoend machine learning Dialogue: 0,0:22:32.32,0:22:34.56,Default,,0000,0000,0000,,workflow for the predictive maintenance Dialogue: 0,0:22:34.56,0:22:36.88,Default,,0000,0000,0000,,use case so once you have chosen the Dialogue: 0,0:22:36.88,0:22:38.84,Default,,0000,0000,0000,,appropriate algorithm you then have Dialogue: 0,0:22:38.84,0:22:41.24,Default,,0000,0000,0000,,trained your model you then have Dialogue: 0,0:22:41.24,0:22:44.08,Default,,0000,0000,0000,,selected the appropriate train model Dialogue: 0,0:22:44.08,0:22:46.44,Default,,0000,0000,0000,,among the multiple models you you are Dialogue: 0,0:22:46.44,0:22:47.80,Default,,0000,0000,0000,,probably going to develop multiple Dialogue: 0,0:22:47.80,0:22:49.56,Default,,0000,0000,0000,,models from multiple algorithms you're Dialogue: 0,0:22:49.56,0:22:51.68,Default,,0000,0000,0000,,going to evaluate them all and then Dialogue: 0,0:22:51.68,0:22:53.20,Default,,0000,0000,0000,,you're going to say hey you know what Dialogue: 0,0:22:53.20,0:22:55.28,Default,,0000,0000,0000,,after I've evaluated and tested that Dialogue: 0,0:22:55.28,0:22:57.48,Default,,0000,0000,0000,,I've chosen the best model I'm going to Dialogue: 0,0:22:57.48,0:22:59.64,Default,,0000,0000,0000,,deploy the model all right so this is Dialogue: 0,0:22:59.64,0:23:02.64,Default,,0000,0000,0000,,for Real Life production use okay real Dialogue: 0,0:23:02.64,0:23:04.28,Default,,0000,0000,0000,,life sensor data is going to be pumped Dialogue: 0,0:23:04.28,0:23:06.04,Default,,0000,0000,0000,,into my model my model is going to Dialogue: 0,0:23:06.04,0:23:08.04,Default,,0000,0000,0000,,generate predictions the predicted data Dialogue: 0,0:23:08.04,0:23:10.12,Default,,0000,0000,0000,,is going to used immediately in real Dialogue: 0,0:23:10.12,0:23:12.84,Default,,0000,0000,0000,,time for real life decision making and Dialogue: 0,0:23:12.84,0:23:15.00,Default,,0000,0000,0000,,then I'm going to monitor right the Dialogue: 0,0:23:15.00,0:23:17.44,Default,,0000,0000,0000,,results so somebody's using the Dialogue: 0,0:23:17.44,0:23:19.28,Default,,0000,0000,0000,,predictions from my model if the Dialogue: 0,0:23:19.28,0:23:21.88,Default,,0000,0000,0000,,predictions are lousy that goes into the Dialogue: 0,0:23:21.88,0:23:23.44,Default,,0000,0000,0000,,monitoring the monitoring system Dialogue: 0,0:23:23.44,0:23:25.28,Default,,0000,0000,0000,,captures that if the predictions are Dialogue: 0,0:23:25.28,0:23:27.72,Default,,0000,0000,0000,,fantastic well that also captured by the Dialogue: 0,0:23:27.72,0:23:29.80,Default,,0000,0000,0000,,monitoring system system and that gets Dialogue: 0,0:23:29.80,0:23:32.36,Default,,0000,0000,0000,,feedback again to the next cycle of my Dialogue: 0,0:23:32.36,0:23:33.68,Default,,0000,0000,0000,,machine learning Dialogue: 0,0:23:33.68,0:23:35.96,Default,,0000,0000,0000,,pipeline okay so that's the kind of Dialogue: 0,0:23:35.96,0:23:38.36,Default,,0000,0000,0000,,overall View and here are the kind of Dialogue: 0,0:23:38.36,0:23:41.56,Default,,0000,0000,0000,,key phases of your workflow so one of Dialogue: 0,0:23:41.56,0:23:43.96,Default,,0000,0000,0000,,the important phases is called Eda Dialogue: 0,0:23:43.96,0:23:47.52,Default,,0000,0000,0000,,exploratory data analysis and in this Dialogue: 0,0:23:47.52,0:23:49.88,Default,,0000,0000,0000,,particular uh phase uh you're going to Dialogue: 0,0:23:49.88,0:23:53.12,Default,,0000,0000,0000,,do a lot of stuff primarily just to Dialogue: 0,0:23:53.12,0:23:54.88,Default,,0000,0000,0000,,understand your data set so like I said Dialogue: 0,0:23:54.88,0:23:56.56,Default,,0000,0000,0000,,real life data sets they tend to be very Dialogue: 0,0:23:56.56,0:23:59.32,Default,,0000,0000,0000,,complex and they tend to have various Dialogue: 0,0:23:59.32,0:24:01.04,Default,,0000,0000,0000,,statistical properties all right Dialogue: 0,0:24:01.04,0:24:02.68,Default,,0000,0000,0000,,statistics is a very important component Dialogue: 0,0:24:02.68,0:24:05.60,Default,,0000,0000,0000,,of machine learning so an Eda helps you Dialogue: 0,0:24:05.60,0:24:07.48,Default,,0000,0000,0000,,to kind of get an overview of your data Dialogue: 0,0:24:07.48,0:24:09.68,Default,,0000,0000,0000,,set get an overview of any problems in Dialogue: 0,0:24:09.68,0:24:11.52,Default,,0000,0000,0000,,your data set like any data that's Dialogue: 0,0:24:11.52,0:24:13.44,Default,,0000,0000,0000,,missing the statistical properties your Dialogue: 0,0:24:13.44,0:24:15.16,Default,,0000,0000,0000,,data set the distribution of your data Dialogue: 0,0:24:15.16,0:24:17.28,Default,,0000,0000,0000,,set the statistical correlation of Dialogue: 0,0:24:17.28,0:24:19.64,Default,,0000,0000,0000,,variables in your data set etc Dialogue: 0,0:24:19.64,0:24:23.40,Default,,0000,0000,0000,,etc okay then we have data cleaning or Dialogue: 0,0:24:23.40,0:24:25.28,Default,,0000,0000,0000,,sometimes you call it data cleansing and Dialogue: 0,0:24:25.28,0:24:27.60,Default,,0000,0000,0000,,in this phase what you want to do is Dialogue: 0,0:24:27.60,0:24:29.44,Default,,0000,0000,0000,,primarily you want to kind of do things Dialogue: 0,0:24:29.44,0:24:31.96,Default,,0000,0000,0000,,like remove duplicate records or rows in Dialogue: 0,0:24:31.96,0:24:33.68,Default,,0000,0000,0000,,your table you want to make sure that Dialogue: 0,0:24:33.68,0:24:36.80,Default,,0000,0000,0000,,there I your your data or your data Dialogue: 0,0:24:36.80,0:24:39.40,Default,,0000,0000,0000,,points your samples have appropriate IDs Dialogue: 0,0:24:39.40,0:24:41.08,Default,,0000,0000,0000,,and most importantly you want to make Dialogue: 0,0:24:41.08,0:24:43.04,Default,,0000,0000,0000,,sure there's not too many missing values Dialogue: 0,0:24:43.04,0:24:44.88,Default,,0000,0000,0000,,in your data set so what I mean by Dialogue: 0,0:24:44.88,0:24:46.32,Default,,0000,0000,0000,,missing values are things like that Dialogue: 0,0:24:46.32,0:24:48.20,Default,,0000,0000,0000,,right you have got a data set and for Dialogue: 0,0:24:48.20,0:24:51.64,Default,,0000,0000,0000,,some reason there are some cells or Dialogue: 0,0:24:51.64,0:24:54.56,Default,,0000,0000,0000,,locations in your data set which are Dialogue: 0,0:24:54.56,0:24:56.52,Default,,0000,0000,0000,,missing values right and if you have a Dialogue: 0,0:24:56.52,0:24:58.68,Default,,0000,0000,0000,,lot of these missing values then you've Dialogue: 0,0:24:58.68,0:25:00.44,Default,,0000,0000,0000,,got a poor quality data set and you're Dialogue: 0,0:25:00.44,0:25:02.20,Default,,0000,0000,0000,,not going to be able to build a good Dialogue: 0,0:25:02.20,0:25:04.16,Default,,0000,0000,0000,,model from this data set you're not Dialogue: 0,0:25:04.16,0:25:06.00,Default,,0000,0000,0000,,going to be able to train a good machine Dialogue: 0,0:25:06.00,0:25:08.12,Default,,0000,0000,0000,,learning model from a data set with a Dialogue: 0,0:25:08.12,0:25:10.20,Default,,0000,0000,0000,,lot of missing values like this so you Dialogue: 0,0:25:10.20,0:25:11.88,Default,,0000,0000,0000,,have to figure out whether there are a Dialogue: 0,0:25:11.88,0:25:13.40,Default,,0000,0000,0000,,lot of missing values in your data set Dialogue: 0,0:25:13.40,0:25:15.40,Default,,0000,0000,0000,,how do you handle them another thing Dialogue: 0,0:25:15.40,0:25:16.92,Default,,0000,0000,0000,,that's important in data cleansing is Dialogue: 0,0:25:16.92,0:25:18.80,Default,,0000,0000,0000,,figuring out the outliers in your data Dialogue: 0,0:25:18.80,0:25:21.92,Default,,0000,0000,0000,,set so uh outliers are things like this Dialogue: 0,0:25:21.92,0:25:24.04,Default,,0000,0000,0000,,you know data points are very far from Dialogue: 0,0:25:24.04,0:25:26.44,Default,,0000,0000,0000,,the general trend of data points in your Dialogue: 0,0:25:26.44,0:25:29.56,Default,,0000,0000,0000,,data set right and and so there are also Dialogue: 0,0:25:29.56,0:25:31.92,Default,,0000,0000,0000,,several ways to detect outliers in your Dialogue: 0,0:25:31.92,0:25:34.20,Default,,0000,0000,0000,,data set and there are several ways to Dialogue: 0,0:25:34.20,0:25:36.64,Default,,0000,0000,0000,,handle outliers in your data set Dialogue: 0,0:25:36.64,0:25:38.20,Default,,0000,0000,0000,,similarly as well there are several ways Dialogue: 0,0:25:38.20,0:25:39.96,Default,,0000,0000,0000,,to handle missing values in your data Dialogue: 0,0:25:39.96,0:25:42.88,Default,,0000,0000,0000,,set so handling missing values handling Dialogue: 0,0:25:42.88,0:25:45.68,Default,,0000,0000,0000,,outliers those are really two very key Dialogue: 0,0:25:45.68,0:25:47.28,Default,,0000,0000,0000,,importance of data Dialogue: 0,0:25:47.28,0:25:49.12,Default,,0000,0000,0000,,cleansing and there are many many Dialogue: 0,0:25:49.12,0:25:50.76,Default,,0000,0000,0000,,techniques to handle this so a data Dialogue: 0,0:25:50.76,0:25:52.00,Default,,0000,0000,0000,,scientist needs to be acquainted with Dialogue: 0,0:25:52.00,0:25:55.36,Default,,0000,0000,0000,,all of this all right why do I need to Dialogue: 0,0:25:55.36,0:25:58.00,Default,,0000,0000,0000,,do data cleansing well here is the key Dialogue: 0,0:25:58.00,0:25:59.36,Default,,0000,0000,0000,,point Dialogue: 0,0:25:59.36,0:26:02.80,Default,,0000,0000,0000,,if you have a very poor quality data set Dialogue: 0,0:26:02.80,0:26:04.88,Default,,0000,0000,0000,,which means youve got a lot of outliers Dialogue: 0,0:26:04.88,0:26:06.72,Default,,0000,0000,0000,,which are errors in your data set or you Dialogue: 0,0:26:06.72,0:26:08.16,Default,,0000,0000,0000,,got a lot of missing values in your data Dialogue: 0,0:26:08.16,0:26:10.84,Default,,0000,0000,0000,,set even though youve got a fantastic Dialogue: 0,0:26:10.84,0:26:13.04,Default,,0000,0000,0000,,algorithm you've got a fantastic model Dialogue: 0,0:26:13.04,0:26:15.72,Default,,0000,0000,0000,,the predictions that your model is going Dialogue: 0,0:26:15.72,0:26:18.96,Default,,0000,0000,0000,,to give is absolutely rubbish it's kind Dialogue: 0,0:26:18.96,0:26:22.08,Default,,0000,0000,0000,,of like taking water and putting water Dialogue: 0,0:26:22.08,0:26:26.00,Default,,0000,0000,0000,,into the tank of a mercedesbenz so Dialogue: 0,0:26:26.00,0:26:28.44,Default,,0000,0000,0000,,Mercedes-Benz is a great car but if you Dialogue: 0,0:26:28.44,0:26:30.08,Default,,0000,0000,0000,,take water and put it into your Dialogue: 0,0:26:30.08,0:26:33.40,Default,,0000,0000,0000,,mercedes-ben it will just die right your Dialogue: 0,0:26:33.40,0:26:36.52,Default,,0000,0000,0000,,car will just die can't run on on water Dialogue: 0,0:26:36.52,0:26:38.28,Default,,0000,0000,0000,,right on the other hand if you have a Dialogue: 0,0:26:38.28,0:26:41.56,Default,,0000,0000,0000,,myv myv is just a lousy car but if Dialogue: 0,0:26:41.56,0:26:44.84,Default,,0000,0000,0000,,you take a high octane good Patrol and Dialogue: 0,0:26:44.84,0:26:47.24,Default,,0000,0000,0000,,you point to a MV the MV will just go at Dialogue: 0,0:26:47.24,0:26:49.48,Default,,0000,0000,0000,,you know 100 Mil hour it which just Dialogue: 0,0:26:49.48,0:26:51.16,Default,,0000,0000,0000,,completely destroy the Mercedes-Benz in Dialogue: 0,0:26:51.16,0:26:53.36,Default,,0000,0000,0000,,terms of performance so it doesn't it Dialogue: 0,0:26:53.36,0:26:54.80,Default,,0000,0000,0000,,doesn't really matter what model you're Dialogue: 0,0:26:54.80,0:26:57.08,Default,,0000,0000,0000,,using right so you can be using the most Dialogue: 0,0:26:57.08,0:26:58.68,Default,,0000,0000,0000,,Fantastic Model like the the Dialogue: 0,0:26:58.68,0:27:01.20,Default,,0000,0000,0000,,mercedesbenz or machine learning but if Dialogue: 0,0:27:01.20,0:27:03.08,Default,,0000,0000,0000,,your data is lousy quality your Dialogue: 0,0:27:03.08,0:27:06.48,Default,,0000,0000,0000,,predictions is also going to be rubbish Dialogue: 0,0:27:06.48,0:27:10.00,Default,,0000,0000,0000,,okay so cleansing data set is in fact Dialogue: 0,0:27:10.00,0:27:11.88,Default,,0000,0000,0000,,probably the most important thing that Dialogue: 0,0:27:11.88,0:27:13.64,Default,,0000,0000,0000,,data scientists need to do and that's Dialogue: 0,0:27:13.64,0:27:15.52,Default,,0000,0000,0000,,what they spend most of the time doing Dialogue: 0,0:27:15.52,0:27:17.60,Default,,0000,0000,0000,,right building the model trading the Dialogue: 0,0:27:17.60,0:27:20.24,Default,,0000,0000,0000,,model getting the right algorithms and Dialogue: 0,0:27:20.24,0:27:23.24,Default,,0000,0000,0000,,so on that's really a small portion of Dialogue: 0,0:27:23.24,0:27:25.20,Default,,0000,0000,0000,,the actual machine learning workflow Dialogue: 0,0:27:25.20,0:27:27.36,Default,,0000,0000,0000,,right the actual uh machine learning Dialogue: 0,0:27:27.36,0:27:29.68,Default,,0000,0000,0000,,workflow the vast majority of time is on Dialogue: 0,0:27:29.68,0:27:31.56,Default,,0000,0000,0000,,cleaning and organizing your Dialogue: 0,0:27:31.56,0:27:33.36,Default,,0000,0000,0000,,data then you have something called Dialogue: 0,0:27:33.36,0:27:35.08,Default,,0000,0000,0000,,feature engineering which is you Dialogue: 0,0:27:35.08,0:27:37.00,Default,,0000,0000,0000,,pre-process the feature variables of Dialogue: 0,0:27:37.00,0:27:38.92,Default,,0000,0000,0000,,your original data set prior to using Dialogue: 0,0:27:38.92,0:27:40.60,Default,,0000,0000,0000,,them to train the model and this is Dialogue: 0,0:27:40.60,0:27:41.96,Default,,0000,0000,0000,,either through addition deletion Dialogue: 0,0:27:41.96,0:27:43.60,Default,,0000,0000,0000,,combination or transformation of these Dialogue: 0,0:27:43.60,0:27:45.40,Default,,0000,0000,0000,,variables and then the idea is you want Dialogue: 0,0:27:45.40,0:27:47.00,Default,,0000,0000,0000,,to improve the predictive accuracy of Dialogue: 0,0:27:47.00,0:27:49.32,Default,,0000,0000,0000,,the model and also because some models Dialogue: 0,0:27:49.32,0:27:51.08,Default,,0000,0000,0000,,can only work with numeric data so you Dialogue: 0,0:27:51.08,0:27:53.72,Default,,0000,0000,0000,,need to transform categorical data into Dialogue: 0,0:27:53.72,0:27:57.04,Default,,0000,0000,0000,,numeric data all right so just now um in Dialogue: 0,0:27:57.04,0:27:58.80,Default,,0000,0000,0000,,the earlier slides I showed you that you Dialogue: 0,0:27:58.80,0:28:00.76,Default,,0000,0000,0000,,take your original data set you pum it Dialogue: 0,0:28:00.76,0:28:03.20,Default,,0000,0000,0000,,into algorithm and then couple of hours Dialogue: 0,0:28:03.20,0:28:05.20,Default,,0000,0000,0000,,later you get a machine learning model Dialogue: 0,0:28:05.20,0:28:08.64,Default,,0000,0000,0000,,right so you didn't do anything to your Dialogue: 0,0:28:08.64,0:28:10.16,Default,,0000,0000,0000,,data set to the feature variables in Dialogue: 0,0:28:10.16,0:28:12.16,Default,,0000,0000,0000,,your data set before you pump it into a Dialogue: 0,0:28:12.16,0:28:14.40,Default,,0000,0000,0000,,machine machine learning algorithm so Dialogue: 0,0:28:14.40,0:28:15.84,Default,,0000,0000,0000,,what I showed you earlier is you just Dialogue: 0,0:28:15.84,0:28:18.92,Default,,0000,0000,0000,,take the data set exactly as it is and Dialogue: 0,0:28:18.92,0:28:20.80,Default,,0000,0000,0000,,you just pump it into the algorithm Dialogue: 0,0:28:20.80,0:28:23.12,Default,,0000,0000,0000,,couple of hours later you get the model Dialogue: 0,0:28:23.12,0:28:27.64,Default,,0000,0000,0000,,right uh but that's not what generally Dialogue: 0,0:28:27.64,0:28:29.60,Default,,0000,0000,0000,,happens in in real life in real life Dialogue: 0,0:28:29.60,0:28:31.56,Default,,0000,0000,0000,,you're going to take all the original Dialogue: 0,0:28:31.56,0:28:34.32,Default,,0000,0000,0000,,feature variables from your data set and Dialogue: 0,0:28:34.32,0:28:36.72,Default,,0000,0000,0000,,you're going to transform them in some Dialogue: 0,0:28:36.72,0:28:38.96,Default,,0000,0000,0000,,way so you can see here these are the Dialogue: 0,0:28:38.96,0:28:42.12,Default,,0000,0000,0000,,colums of data from my original data set Dialogue: 0,0:28:42.12,0:28:46.04,Default,,0000,0000,0000,,and before I actually put all these data Dialogue: 0,0:28:46.04,0:28:48.24,Default,,0000,0000,0000,,points from my original data set into my Dialogue: 0,0:28:48.24,0:28:50.72,Default,,0000,0000,0000,,algorithm to train and get my model I Dialogue: 0,0:28:50.72,0:28:54.96,Default,,0000,0000,0000,,will actually transform them okay so the Dialogue: 0,0:28:54.96,0:28:57.60,Default,,0000,0000,0000,,transformation of these feature variable Dialogue: 0,0:28:57.60,0:29:00.60,Default,,0000,0000,0000,,values we call this feature engineering Dialogue: 0,0:29:00.60,0:29:02.44,Default,,0000,0000,0000,,and there are many many techniques to do Dialogue: 0,0:29:02.44,0:29:04.96,Default,,0000,0000,0000,,feature engineering so one hot encoding Dialogue: 0,0:29:04.96,0:29:08.28,Default,,0000,0000,0000,,scaling log transformation descri Dialogue: 0,0:29:08.28,0:29:10.48,Default,,0000,0000,0000,,discretization date extraction Boolean Dialogue: 0,0:29:10.48,0:29:12.04,Default,,0000,0000,0000,,logic etc Dialogue: 0,0:29:12.04,0:29:14.88,Default,,0000,0000,0000,,etc okay then finally we do something Dialogue: 0,0:29:14.88,0:29:16.80,Default,,0000,0000,0000,,called a train test plate so where we Dialogue: 0,0:29:16.80,0:29:19.44,Default,,0000,0000,0000,,take our original data set right so this Dialogue: 0,0:29:19.44,0:29:21.36,Default,,0000,0000,0000,,was the original data set and we break Dialogue: 0,0:29:21.36,0:29:23.72,Default,,0000,0000,0000,,it into two parts so one is called the Dialogue: 0,0:29:23.72,0:29:25.76,Default,,0000,0000,0000,,training data set and the other is Dialogue: 0,0:29:25.76,0:29:28.12,Default,,0000,0000,0000,,called the test data set and the primary Dialogue: 0,0:29:28.12,0:29:30.00,Default,,0000,0000,0000,,purpose for this is when we feed and Dialogue: 0,0:29:30.00,0:29:31.40,Default,,0000,0000,0000,,train the machine learning model we're Dialogue: 0,0:29:31.40,0:29:32.64,Default,,0000,0000,0000,,going to use what is called the training Dialogue: 0,0:29:32.64,0:29:35.56,Default,,0000,0000,0000,,data set and we when we want to evaluate Dialogue: 0,0:29:35.56,0:29:37.40,Default,,0000,0000,0000,,the accuracy of the model right so this Dialogue: 0,0:29:37.40,0:29:40.96,Default,,0000,0000,0000,,is the key part of your machine learning Dialogue: 0,0:29:40.96,0:29:43.64,Default,,0000,0000,0000,,life cycle because you are not only just Dialogue: 0,0:29:43.64,0:29:45.44,Default,,0000,0000,0000,,going to have one possible models Dialogue: 0,0:29:45.44,0:29:47.72,Default,,0000,0000,0000,,because there are a vast range of Dialogue: 0,0:29:47.72,0:29:50.08,Default,,0000,0000,0000,,algorithms that you can use to create a Dialogue: 0,0:29:50.08,0:29:53.00,Default,,0000,0000,0000,,model so fundamentally you have a wide Dialogue: 0,0:29:53.00,0:29:55.68,Default,,0000,0000,0000,,range of choices right like wide range Dialogue: 0,0:29:55.68,0:29:57.64,Default,,0000,0000,0000,,of cars right you want to buy a car you Dialogue: 0,0:29:57.64,0:30:00.56,Default,,0000,0000,0000,,can buy buy a myv you can buy a paroda Dialogue: 0,0:30:00.56,0:30:02.64,Default,,0000,0000,0000,,you can buy a Honda you can buy a Dialogue: 0,0:30:02.64,0:30:05.04,Default,,0000,0000,0000,,mercedesbenz you can buy a Audi you can Dialogue: 0,0:30:05.04,0:30:07.76,Default,,0000,0000,0000,,buy a beamer many many different cars Dialogue: 0,0:30:07.76,0:30:09.24,Default,,0000,0000,0000,,you that available for you if you want Dialogue: 0,0:30:09.24,0:30:11.68,Default,,0000,0000,0000,,to buy a car right same thing with a Dialogue: 0,0:30:11.68,0:30:14.36,Default,,0000,0000,0000,,machine learning model that are aast Dialogue: 0,0:30:14.36,0:30:16.72,Default,,0000,0000,0000,,variety of algorithms that you can Dialogue: 0,0:30:16.72,0:30:19.48,Default,,0000,0000,0000,,choose from in order to create a model Dialogue: 0,0:30:19.48,0:30:21.52,Default,,0000,0000,0000,,and so once you create a model from a Dialogue: 0,0:30:21.52,0:30:24.48,Default,,0000,0000,0000,,given algorithm you need to say hey how Dialogue: 0,0:30:24.48,0:30:26.44,Default,,0000,0000,0000,,accurate is this model that have created Dialogue: 0,0:30:26.44,0:30:28.64,Default,,0000,0000,0000,,from this algorithm and and different Dialogue: 0,0:30:28.64,0:30:30.40,Default,,0000,0000,0000,,algorithms are going to create different Dialogue: 0,0:30:30.40,0:30:33.72,Default,,0000,0000,0000,,models with different rates of accuracy Dialogue: 0,0:30:33.72,0:30:35.68,Default,,0000,0000,0000,,and so the primary purpose of the test Dialogue: 0,0:30:35.68,0:30:38.20,Default,,0000,0000,0000,,data set is to evaluate the ACC accuracy Dialogue: 0,0:30:38.20,0:30:41.48,Default,,0000,0000,0000,,of the model to see hey is this model Dialogue: 0,0:30:41.48,0:30:43.36,Default,,0000,0000,0000,,that I've created using this algorithm Dialogue: 0,0:30:43.36,0:30:45.88,Default,,0000,0000,0000,,is it adequate for me to use in a real Dialogue: 0,0:30:45.88,0:30:48.60,Default,,0000,0000,0000,,life production use case Okay so that's Dialogue: 0,0:30:48.60,0:30:52.32,Default,,0000,0000,0000,,what it's all about okay so this is my Dialogue: 0,0:30:52.32,0:30:54.28,Default,,0000,0000,0000,,original data set I break it into my Dialogue: 0,0:30:54.28,0:30:56.56,Default,,0000,0000,0000,,feature data uh feature data set and Dialogue: 0,0:30:56.56,0:30:58.52,Default,,0000,0000,0000,,also my target variable colum so my Dialogue: 0,0:30:58.52,0:31:00.64,Default,,0000,0000,0000,,feature variable uh colums the target Dialogue: 0,0:31:00.64,0:31:02.20,Default,,0000,0000,0000,,variable colums and then I further break Dialogue: 0,0:31:02.20,0:31:04.24,Default,,0000,0000,0000,,it into a training data set and a test Dialogue: 0,0:31:04.24,0:31:06.60,Default,,0000,0000,0000,,data set the training data set is to use Dialogue: 0,0:31:06.60,0:31:08.32,Default,,0000,0000,0000,,the train to create the machine learning Dialogue: 0,0:31:08.32,0:31:10.48,Default,,0000,0000,0000,,model and then once the machine learning Dialogue: 0,0:31:10.48,0:31:12.20,Default,,0000,0000,0000,,model is created I then use the test Dialogue: 0,0:31:12.20,0:31:15.08,Default,,0000,0000,0000,,data set to evaluate the accuracy of the Dialogue: 0,0:31:15.08,0:31:16.28,Default,,0000,0000,0000,,machine learning Dialogue: 0,0:31:16.28,0:31:21.00,Default,,0000,0000,0000,,model all right and then finally we can Dialogue: 0,0:31:21.00,0:31:23.20,Default,,0000,0000,0000,,see what are the different parts or Dialogue: 0,0:31:23.20,0:31:26.08,Default,,0000,0000,0000,,aspects that go into a successful model Dialogue: 0,0:31:26.08,0:31:29.52,Default,,0000,0000,0000,,so Eda about 10% data cleansing about Dialogue: 0,0:31:29.52,0:31:32.36,Default,,0000,0000,0000,,20% feature engineering about Dialogue: 0,0:31:32.36,0:31:36.32,Default,,0000,0000,0000,,25% selecting a specific algorithm about Dialogue: 0,0:31:36.32,0:31:39.12,Default,,0000,0000,0000,,10% and then training the model from Dialogue: 0,0:31:39.12,0:31:41.64,Default,,0000,0000,0000,,that algorithm about 15% and then Dialogue: 0,0:31:41.64,0:31:43.68,Default,,0000,0000,0000,,finally evaluating the model deciding Dialogue: 0,0:31:43.68,0:31:45.96,Default,,0000,0000,0000,,which is the best model with the highest Dialogue: 0,0:31:45.96,0:31:50.68,Default,,0000,0000,0000,,accuracy rate that's about Dialogue: 0,0:31:54.08,0:31:56.92,Default,,0000,0000,0000,,20% all right so we have reached the Dialogue: 0,0:31:56.92,0:31:58.88,Default,,0000,0000,0000,,most interesting part of this Dialogue: 0,0:31:58.88,0:32:01.04,Default,,0000,0000,0000,,presentation which is the demonstration Dialogue: 0,0:32:01.04,0:32:03.76,Default,,0000,0000,0000,,of an endtoend machine learning workflow Dialogue: 0,0:32:03.76,0:32:06.08,Default,,0000,0000,0000,,on a real life data set that Dialogue: 0,0:32:06.08,0:32:10.08,Default,,0000,0000,0000,,demonstrates the use case of predictive Dialogue: 0,0:32:10.08,0:32:13.52,Default,,0000,0000,0000,,maintenance so the for the data set for Dialogue: 0,0:32:13.52,0:32:16.24,Default,,0000,0000,0000,,this particular use case I've used a Dialogue: 0,0:32:16.24,0:32:19.20,Default,,0000,0000,0000,,data set from kegle so for those of you Dialogue: 0,0:32:19.20,0:32:21.40,Default,,0000,0000,0000,,are not aware of this kegle is the Dialogue: 0,0:32:21.40,0:32:24.88,Default,,0000,0000,0000,,world's largest open-source Community Dialogue: 0,0:32:24.88,0:32:28.08,Default,,0000,0000,0000,,for data science and Ai and they have a Dialogue: 0,0:32:28.08,0:32:31.16,Default,,0000,0000,0000,,large collection of data sets from all Dialogue: 0,0:32:31.16,0:32:34.44,Default,,0000,0000,0000,,various uh areas of industry and human Dialogue: 0,0:32:34.44,0:32:37.04,Default,,0000,0000,0000,,endeavor and they also have a large Dialogue: 0,0:32:37.04,0:32:38.84,Default,,0000,0000,0000,,collection of models that have been Dialogue: 0,0:32:38.84,0:32:42.88,Default,,0000,0000,0000,,developed using these data sets so here Dialogue: 0,0:32:42.88,0:32:47.04,Default,,0000,0000,0000,,we have a data set for the particular Dialogue: 0,0:32:47.04,0:32:50.52,Default,,0000,0000,0000,,use case predictive maintenance okay so Dialogue: 0,0:32:50.52,0:32:52.92,Default,,0000,0000,0000,,this is some information about the data Dialogue: 0,0:32:52.92,0:32:56.44,Default,,0000,0000,0000,,set uh so in case um you do not know how Dialogue: 0,0:32:56.44,0:32:59.20,Default,,0000,0000,0000,,to get to there this is the URL to click Dialogue: 0,0:32:59.20,0:33:02.24,Default,,0000,0000,0000,,on okay to get to that data set so once Dialogue: 0,0:33:02.24,0:33:05.12,Default,,0000,0000,0000,,you at the data set here you can or the Dialogue: 0,0:33:05.12,0:33:07.40,Default,,0000,0000,0000,,page for about this data set you can see Dialogue: 0,0:33:07.40,0:33:09.96,Default,,0000,0000,0000,,all the information about this data set Dialogue: 0,0:33:09.96,0:33:13.04,Default,,0000,0000,0000,,and you can download the data set in a Dialogue: 0,0:33:13.04,0:33:14.16,Default,,0000,0000,0000,,CSV Dialogue: 0,0:33:14.16,0:33:16.36,Default,,0000,0000,0000,,format okay so let's take a look at the Dialogue: 0,0:33:16.36,0:33:19.56,Default,,0000,0000,0000,,data set so this data set has a total of Dialogue: 0,0:33:19.56,0:33:23.44,Default,,0000,0000,0000,,10,000 samples okay and these are the Dialogue: 0,0:33:23.44,0:33:26.28,Default,,0000,0000,0000,,feature variables the type the product Dialogue: 0,0:33:26.28,0:33:28.44,Default,,0000,0000,0000,,ID the add temperature process Dialogue: 0,0:33:28.44,0:33:31.00,Default,,0000,0000,0000,,temperature rotational speed talk tool Dialogue: 0,0:33:31.00,0:33:34.80,Default,,0000,0000,0000,,Weare and this is the target variable Dialogue: 0,0:33:34.80,0:33:36.72,Default,,0000,0000,0000,,all right so the target variable is what Dialogue: 0,0:33:36.72,0:33:38.16,Default,,0000,0000,0000,,we are interested in what we are Dialogue: 0,0:33:38.16,0:33:40.96,Default,,0000,0000,0000,,interested in using to train the machine Dialogue: 0,0:33:40.96,0:33:42.60,Default,,0000,0000,0000,,learning model and also what we Dialogue: 0,0:33:42.60,0:33:45.28,Default,,0000,0000,0000,,interested to predict okay so these are Dialogue: 0,0:33:45.28,0:33:47.96,Default,,0000,0000,0000,,the feature variables they describe or Dialogue: 0,0:33:47.96,0:33:49.96,Default,,0000,0000,0000,,they provide information about this Dialogue: 0,0:33:49.96,0:33:52.88,Default,,0000,0000,0000,,particular machine on the production Dialogue: 0,0:33:52.88,0:33:55.08,Default,,0000,0000,0000,,line on the assembly line so you might Dialogue: 0,0:33:55.08,0:33:56.80,Default,,0000,0000,0000,,know the product ID the type the air Dialogue: 0,0:33:56.80,0:33:58.12,Default,,0000,0000,0000,,temperature process temperature Dialogue: 0,0:33:58.12,0:34:00.48,Default,,0000,0000,0000,,rotational speed talk to where right so Dialogue: 0,0:34:00.48,0:34:03.16,Default,,0000,0000,0000,,let's say you've got a iot sensor system Dialogue: 0,0:34:03.16,0:34:06.12,Default,,0000,0000,0000,,that's basically capturing all this data Dialogue: 0,0:34:06.12,0:34:08.36,Default,,0000,0000,0000,,about a product or a machine on your Dialogue: 0,0:34:08.36,0:34:10.68,Default,,0000,0000,0000,,production or assembly line okay and Dialogue: 0,0:34:10.68,0:34:13.92,Default,,0000,0000,0000,,you've also captured information about Dialogue: 0,0:34:13.92,0:34:17.20,Default,,0000,0000,0000,,whether is for a specific uh sample Dialogue: 0,0:34:17.20,0:34:19.84,Default,,0000,0000,0000,,whether that sample uh experien a Dialogue: 0,0:34:19.84,0:34:23.04,Default,,0000,0000,0000,,failure or not okay so the target value Dialogue: 0,0:34:23.04,0:34:25.52,Default,,0000,0000,0000,,of zero okay indicates that there's no Dialogue: 0,0:34:25.52,0:34:28.00,Default,,0000,0000,0000,,failure so zero means no failure and we Dialogue: 0,0:34:28.00,0:34:30.20,Default,,0000,0000,0000,,can see that the vast majority of data Dialogue: 0,0:34:30.20,0:34:32.52,Default,,0000,0000,0000,,points in this data set are no failure Dialogue: 0,0:34:32.52,0:34:34.00,Default,,0000,0000,0000,,and here we can see an example here Dialogue: 0,0:34:34.00,0:34:36.72,Default,,0000,0000,0000,,where you have a case of a failure so a Dialogue: 0,0:34:36.72,0:34:40.16,Default,,0000,0000,0000,,failure is marked as a one positive and Dialogue: 0,0:34:40.16,0:34:42.64,Default,,0000,0000,0000,,no failure is marked as zero negative Dialogue: 0,0:34:42.64,0:34:44.88,Default,,0000,0000,0000,,all right so here we have one type of a Dialogue: 0,0:34:44.88,0:34:47.04,Default,,0000,0000,0000,,failure it's called a power failure and Dialogue: 0,0:34:47.04,0:34:49.00,Default,,0000,0000,0000,,if you scroll down the data set you see Dialogue: 0,0:34:49.00,0:34:50.40,Default,,0000,0000,0000,,there are also other kinds of failures Dialogue: 0,0:34:50.40,0:34:52.84,Default,,0000,0000,0000,,like a towar Dialogue: 0,0:34:52.84,0:34:56.96,Default,,0000,0000,0000,,failure uh we have a over strain failure Dialogue: 0,0:34:56.96,0:34:58.68,Default,,0000,0000,0000,,here for example Dialogue: 0,0:34:58.68,0:35:00.76,Default,,0000,0000,0000,,uh we also have a power failure again Dialogue: 0,0:35:00.76,0:35:02.20,Default,,0000,0000,0000,,and so on so if you scroll down through Dialogue: 0,0:35:02.20,0:35:04.16,Default,,0000,0000,0000,,these 10,000 data points and or if Dialogue: 0,0:35:04.16,0:35:06.04,Default,,0000,0000,0000,,you're familiar with using Excel to Dialogue: 0,0:35:06.04,0:35:08.84,Default,,0000,0000,0000,,filter out values in a colume you can Dialogue: 0,0:35:08.84,0:35:12.28,Default,,0000,0000,0000,,see that in this particular colume here Dialogue: 0,0:35:12.28,0:35:14.48,Default,,0000,0000,0000,,which is the so-called Target variable Dialogue: 0,0:35:14.48,0:35:16.96,Default,,0000,0000,0000,,colume you are going to have the vast Dialogue: 0,0:35:16.96,0:35:18.92,Default,,0000,0000,0000,,majority of values as zero which means Dialogue: 0,0:35:18.92,0:35:22.76,Default,,0000,0000,0000,,no failure and some of the rows or the Dialogue: 0,0:35:22.76,0:35:24.04,Default,,0000,0000,0000,,data points you are going to have a Dialogue: 0,0:35:24.04,0:35:26.36,Default,,0000,0000,0000,,value of one and for those rows that you Dialogue: 0,0:35:26.36,0:35:28.12,Default,,0000,0000,0000,,have a value of one for example example Dialogue: 0,0:35:28.12,0:35:31.28,Default,,0000,0000,0000,,here you are sorry for example here you Dialogue: 0,0:35:31.28,0:35:32.84,Default,,0000,0000,0000,,are going to have different types of Dialogue: 0,0:35:32.84,0:35:34.64,Default,,0000,0000,0000,,failure so like I said just now power Dialogue: 0,0:35:34.64,0:35:38.96,Default,,0000,0000,0000,,failure tool set filia etc etc so we are Dialogue: 0,0:35:38.96,0:35:40.64,Default,,0000,0000,0000,,going to go through the entire machine Dialogue: 0,0:35:40.64,0:35:43.60,Default,,0000,0000,0000,,learning workflow process with this data Dialogue: 0,0:35:43.60,0:35:46.64,Default,,0000,0000,0000,,set so to see an example of that we are Dialogue: 0,0:35:46.64,0:35:50.40,Default,,0000,0000,0000,,going to use a we're going to go to the Dialogue: 0,0:35:50.40,0:35:52.28,Default,,0000,0000,0000,,code section here all right so if I Dialogue: 0,0:35:52.28,0:35:54.28,Default,,0000,0000,0000,,click on the code section here and right Dialogue: 0,0:35:54.28,0:35:56.40,Default,,0000,0000,0000,,down here we have see what is called a Dialogue: 0,0:35:56.40,0:35:59.36,Default,,0000,0000,0000,,data set notebook so this is basically a Dialogue: 0,0:35:59.36,0:36:02.32,Default,,0000,0000,0000,,Jupiter notebook Jupiter is basically an Dialogue: 0,0:36:02.32,0:36:05.28,Default,,0000,0000,0000,,python application which allows you to Dialogue: 0,0:36:05.28,0:36:09.24,Default,,0000,0000,0000,,create a python machine learning Dialogue: 0,0:36:09.24,0:36:11.68,Default,,0000,0000,0000,,program that basically builds your Dialogue: 0,0:36:11.68,0:36:14.52,Default,,0000,0000,0000,,machine learning model assesses or Dialogue: 0,0:36:14.52,0:36:16.48,Default,,0000,0000,0000,,evaluates his accuracy and generates Dialogue: 0,0:36:16.48,0:36:19.04,Default,,0000,0000,0000,,predictions from it okay so here we have Dialogue: 0,0:36:19.04,0:36:21.68,Default,,0000,0000,0000,,a whole bunch of Jupiter notebooks that Dialogue: 0,0:36:21.68,0:36:24.56,Default,,0000,0000,0000,,are available and you can select any one Dialogue: 0,0:36:24.56,0:36:26.00,Default,,0000,0000,0000,,of them all these notebooks are Dialogue: 0,0:36:26.00,0:36:28.72,Default,,0000,0000,0000,,essentially going to process the data Dialogue: 0,0:36:28.72,0:36:31.72,Default,,0000,0000,0000,,from this particular data set so if I go Dialogue: 0,0:36:31.72,0:36:34.72,Default,,0000,0000,0000,,to this code page here I've actually Dialogue: 0,0:36:34.72,0:36:37.32,Default,,0000,0000,0000,,selected a specific notebook that I'm Dialogue: 0,0:36:37.32,0:36:39.96,Default,,0000,0000,0000,,going to run through to demonstrate an Dialogue: 0,0:36:39.96,0:36:42.84,Default,,0000,0000,0000,,endtoend machine learning workflow using Dialogue: 0,0:36:42.84,0:36:45.56,Default,,0000,0000,0000,,various machine learning libraries from Dialogue: 0,0:36:45.56,0:36:49.80,Default,,0000,0000,0000,,the Python programming language okay so Dialogue: 0,0:36:49.80,0:36:52.44,Default,,0000,0000,0000,,the uh particular notebook I'm going to Dialogue: 0,0:36:52.44,0:36:55.16,Default,,0000,0000,0000,,use is this particular notebook here and Dialogue: 0,0:36:55.16,0:36:57.16,Default,,0000,0000,0000,,you can also get the URL for that Dialogue: 0,0:36:57.16,0:37:00.44,Default,,0000,0000,0000,,particular The Notebook from Dialogue: 0,0:37:00.44,0:37:03.76,Default,,0000,0000,0000,,here okay so let's quickly do a quick Dialogue: 0,0:37:03.76,0:37:06.00,Default,,0000,0000,0000,,revision again what are we trying to do Dialogue: 0,0:37:06.00,0:37:08.00,Default,,0000,0000,0000,,here we're trying to build a machine Dialogue: 0,0:37:08.00,0:37:11.36,Default,,0000,0000,0000,,learning classification model right so Dialogue: 0,0:37:11.36,0:37:12.96,Default,,0000,0000,0000,,we said there are two primary areas of Dialogue: 0,0:37:12.96,0:37:14.56,Default,,0000,0000,0000,,supervised learning one is regression Dialogue: 0,0:37:14.56,0:37:16.20,Default,,0000,0000,0000,,which is used to predict a numerical Dialogue: 0,0:37:16.20,0:37:18.64,Default,,0000,0000,0000,,Target variable and the second kind of Dialogue: 0,0:37:18.64,0:37:21.36,Default,,0000,0000,0000,,supervised learning is classification Dialogue: 0,0:37:21.36,0:37:23.08,Default,,0000,0000,0000,,which is what we're doing here we're Dialogue: 0,0:37:23.08,0:37:25.84,Default,,0000,0000,0000,,trying to predict a categorical Target Dialogue: 0,0:37:25.84,0:37:29.68,Default,,0000,0000,0000,,variable okay so in this particular Dialogue: 0,0:37:29.68,0:37:32.12,Default,,0000,0000,0000,,example we actually have two kinds of Dialogue: 0,0:37:32.12,0:37:34.48,Default,,0000,0000,0000,,ways we can classify either a binary Dialogue: 0,0:37:34.48,0:37:37.56,Default,,0000,0000,0000,,classification or a multiclass Dialogue: 0,0:37:37.56,0:37:39.52,Default,,0000,0000,0000,,classification so for binary Dialogue: 0,0:37:39.52,0:37:41.44,Default,,0000,0000,0000,,classification we are only going to Dialogue: 0,0:37:41.44,0:37:43.40,Default,,0000,0000,0000,,classify the product or machine as Dialogue: 0,0:37:43.40,0:37:47.16,Default,,0000,0000,0000,,either it failed or it did not fail okay Dialogue: 0,0:37:47.16,0:37:48.88,Default,,0000,0000,0000,,so if we go back to the data set that I Dialogue: 0,0:37:48.88,0:37:50.84,Default,,0000,0000,0000,,showed you just now if you look at this Dialogue: 0,0:37:50.84,0:37:52.68,Default,,0000,0000,0000,,target variable colume there are only Dialogue: 0,0:37:52.68,0:37:54.52,Default,,0000,0000,0000,,two possible values here they either Dialogue: 0,0:37:54.52,0:37:58.28,Default,,0000,0000,0000,,zero or one zero means there's no fi Dialogue: 0,0:37:58.28,0:38:01.24,Default,,0000,0000,0000,,one means that's a failure okay so this Dialogue: 0,0:38:01.24,0:38:03.44,Default,,0000,0000,0000,,is an example of a binary classification Dialogue: 0,0:38:03.44,0:38:07.24,Default,,0000,0000,0000,,only two possible outcomes zero or one Dialogue: 0,0:38:07.24,0:38:10.12,Default,,0000,0000,0000,,didn't fail or fail all right two Dialogue: 0,0:38:10.12,0:38:13.08,Default,,0000,0000,0000,,possible outcomes and then we can also Dialogue: 0,0:38:13.08,0:38:15.48,Default,,0000,0000,0000,,for the same data set we can extend it Dialogue: 0,0:38:15.48,0:38:18.08,Default,,0000,0000,0000,,and make it a multiclass classification Dialogue: 0,0:38:18.08,0:38:20.88,Default,,0000,0000,0000,,problem all right so if we kind of want Dialogue: 0,0:38:20.88,0:38:23.72,Default,,0000,0000,0000,,to drill down further we can say that Dialogue: 0,0:38:23.72,0:38:26.80,Default,,0000,0000,0000,,not only is there a failure we can Dialogue: 0,0:38:26.80,0:38:29.20,Default,,0000,0000,0000,,actually say that are different types of Dialogue: 0,0:38:29.20,0:38:32.44,Default,,0000,0000,0000,,failures okay so we have one category of Dialogue: 0,0:38:32.44,0:38:35.60,Default,,0000,0000,0000,,class that is basically no failure okay Dialogue: 0,0:38:35.60,0:38:37.40,Default,,0000,0000,0000,,then we have a category for the Dialogue: 0,0:38:37.40,0:38:40.40,Default,,0000,0000,0000,,different types of failures right so you Dialogue: 0,0:38:40.40,0:38:43.92,Default,,0000,0000,0000,,can have a power failure you could have Dialogue: 0,0:38:43.92,0:38:46.40,Default,,0000,0000,0000,,a tool Weare Dialogue: 0,0:38:46.40,0:38:48.92,Default,,0000,0000,0000,,failure uh you could have let's go down Dialogue: 0,0:38:48.92,0:38:50.88,Default,,0000,0000,0000,,here you could have a over strain Dialogue: 0,0:38:50.88,0:38:53.76,Default,,0000,0000,0000,,failure and etc etc so you can have Dialogue: 0,0:38:53.76,0:38:57.16,Default,,0000,0000,0000,,multiple classes of failure in addition Dialogue: 0,0:38:57.16,0:39:00.52,Default,,0000,0000,0000,,to the general overall or the majority Dialogue: 0,0:39:00.52,0:39:04.32,Default,,0000,0000,0000,,class of no failure and that would be a Dialogue: 0,0:39:04.32,0:39:06.68,Default,,0000,0000,0000,,multiclass classification problem so Dialogue: 0,0:39:06.68,0:39:08.40,Default,,0000,0000,0000,,with this data set we are going to see Dialogue: 0,0:39:08.40,0:39:11.04,Default,,0000,0000,0000,,how to make it a binary classification Dialogue: 0,0:39:11.04,0:39:12.80,Default,,0000,0000,0000,,problem and also a multiclass Dialogue: 0,0:39:12.80,0:39:15.08,Default,,0000,0000,0000,,classification problem okay so let's Dialogue: 0,0:39:15.08,0:39:16.88,Default,,0000,0000,0000,,look at the workflow so let's say we've Dialogue: 0,0:39:16.88,0:39:18.88,Default,,0000,0000,0000,,already got the data so right now we do Dialogue: 0,0:39:18.88,0:39:20.84,Default,,0000,0000,0000,,have the data set this is the data set Dialogue: 0,0:39:20.84,0:39:22.72,Default,,0000,0000,0000,,that we have so let's assume we've Dialogue: 0,0:39:22.72,0:39:24.56,Default,,0000,0000,0000,,somehow managed to get this data set Dialogue: 0,0:39:24.56,0:39:26.88,Default,,0000,0000,0000,,from some iot sensors that are Dialogue: 0,0:39:26.88,0:39:29.12,Default,,0000,0000,0000,,monitoring realtime data in our Dialogue: 0,0:39:29.12,0:39:31.08,Default,,0000,0000,0000,,production environment on the assembly Dialogue: 0,0:39:31.08,0:39:32.80,Default,,0000,0000,0000,,line on the production line we've got Dialogue: 0,0:39:32.80,0:39:34.68,Default,,0000,0000,0000,,sensors reading data that gives us all Dialogue: 0,0:39:34.68,0:39:37.96,Default,,0000,0000,0000,,these data that we have in this CSV file Dialogue: 0,0:39:37.96,0:39:40.08,Default,,0000,0000,0000,,Okay so we've already got the data we've Dialogue: 0,0:39:40.08,0:39:41.60,Default,,0000,0000,0000,,retrieved the data now we're going to go Dialogue: 0,0:39:41.60,0:39:45.00,Default,,0000,0000,0000,,on to the cleaning and exploration part Dialogue: 0,0:39:45.00,0:39:47.52,Default,,0000,0000,0000,,of your machine learning life cycle all Dialogue: 0,0:39:47.52,0:39:49.80,Default,,0000,0000,0000,,right so let's look at the data cleaning Dialogue: 0,0:39:49.80,0:39:51.40,Default,,0000,0000,0000,,part so the data cleaning part we Dialogue: 0,0:39:51.40,0:39:53.72,Default,,0000,0000,0000,,interested in uh checking for missing Dialogue: 0,0:39:53.72,0:39:56.20,Default,,0000,0000,0000,,values and maybe removing the rows you Dialogue: 0,0:39:56.20,0:39:58.08,Default,,0000,0000,0000,,missing values okay Dialogue: 0,0:39:58.08,0:39:59.76,Default,,0000,0000,0000,,uh so the kind of things we can sorry Dialogue: 0,0:39:59.76,0:40:01.00,Default,,0000,0000,0000,,the kind of things we can do in missing Dialogue: 0,0:40:01.00,0:40:02.88,Default,,0000,0000,0000,,values we can remove the row missing Dialogue: 0,0:40:02.88,0:40:05.84,Default,,0000,0000,0000,,values we can put in some new values uh Dialogue: 0,0:40:05.84,0:40:08.00,Default,,0000,0000,0000,,some replacement values which could be a Dialogue: 0,0:40:08.00,0:40:09.88,Default,,0000,0000,0000,,average of all the values in that that Dialogue: 0,0:40:09.88,0:40:12.88,Default,,0000,0000,0000,,particular colume etc etc we also try to Dialogue: 0,0:40:12.88,0:40:15.48,Default,,0000,0000,0000,,identify outliers in our data set and Dialogue: 0,0:40:15.48,0:40:17.48,Default,,0000,0000,0000,,also there are a variety of ways to deal Dialogue: 0,0:40:17.48,0:40:19.48,Default,,0000,0000,0000,,with that so this is called Data Dialogue: 0,0:40:19.48,0:40:21.36,Default,,0000,0000,0000,,cleansing which is a really important Dialogue: 0,0:40:21.36,0:40:23.32,Default,,0000,0000,0000,,part of your machine learning workflow Dialogue: 0,0:40:23.32,0:40:25.52,Default,,0000,0000,0000,,right so that's where we are now at Dialogue: 0,0:40:25.52,0:40:26.84,Default,,0000,0000,0000,,we're doing cleansing and then we're Dialogue: 0,0:40:26.84,0:40:28.84,Default,,0000,0000,0000,,going to follow up with Dialogue: 0,0:40:28.84,0:40:31.16,Default,,0000,0000,0000,,exploration so let's look at the actual Dialogue: 0,0:40:31.16,0:40:33.16,Default,,0000,0000,0000,,code that does the cleansing here so Dialogue: 0,0:40:33.16,0:40:35.80,Default,,0000,0000,0000,,here we are right at the start of the uh Dialogue: 0,0:40:35.80,0:40:38.40,Default,,0000,0000,0000,,machine learning uh life cycle here so Dialogue: 0,0:40:38.40,0:40:40.84,Default,,0000,0000,0000,,this is a Jupiter notebook so here we Dialogue: 0,0:40:40.84,0:40:43.36,Default,,0000,0000,0000,,have a brief description of the problem Dialogue: 0,0:40:43.36,0:40:45.92,Default,,0000,0000,0000,,statement all right so this data set Dialogue: 0,0:40:45.92,0:40:47.64,Default,,0000,0000,0000,,reflects real life predictive Dialogue: 0,0:40:47.64,0:40:49.24,Default,,0000,0000,0000,,maintenance enounter industry with Dialogue: 0,0:40:49.24,0:40:50.48,Default,,0000,0000,0000,,measurements from real equipment the Dialogue: 0,0:40:50.48,0:40:52.40,Default,,0000,0000,0000,,features description is taken directly Dialogue: 0,0:40:52.40,0:40:54.52,Default,,0000,0000,0000,,from the data source set so here we have Dialogue: 0,0:40:54.52,0:40:57.40,Default,,0000,0000,0000,,a description of the six key features in Dialogue: 0,0:40:57.40,0:40:59.60,Default,,0000,0000,0000,,our data set type which is the quality Dialogue: 0,0:40:59.60,0:41:02.52,Default,,0000,0000,0000,,of the product the air temperature the Dialogue: 0,0:41:02.52,0:41:04.68,Default,,0000,0000,0000,,process temperature the rotational speed Dialogue: 0,0:41:04.68,0:41:06.60,Default,,0000,0000,0000,,the talk and the towar all right so Dialogue: 0,0:41:06.60,0:41:08.88,Default,,0000,0000,0000,,these are the six feature variables and Dialogue: 0,0:41:08.88,0:41:11.32,Default,,0000,0000,0000,,there are the two target variables so Dialogue: 0,0:41:11.32,0:41:13.12,Default,,0000,0000,0000,,just now I showed you just now there's Dialogue: 0,0:41:13.12,0:41:15.12,Default,,0000,0000,0000,,one target variable which only has two Dialogue: 0,0:41:15.12,0:41:17.44,Default,,0000,0000,0000,,possible values either zero or one okay Dialogue: 0,0:41:17.44,0:41:20.08,Default,,0000,0000,0000,,zero or one means failure or no failure Dialogue: 0,0:41:20.08,0:41:23.08,Default,,0000,0000,0000,,so that will be this colume here right Dialogue: 0,0:41:23.08,0:41:24.88,Default,,0000,0000,0000,,so let me go all the way back up to here Dialogue: 0,0:41:24.88,0:41:26.64,Default,,0000,0000,0000,,so this colume here we already saw it Dialogue: 0,0:41:26.64,0:41:29.44,Default,,0000,0000,0000,,only has two I values is either zero or Dialogue: 0,0:41:29.44,0:41:32.68,Default,,0000,0000,0000,,one and then we also have this column Dialogue: 0,0:41:32.68,0:41:35.04,Default,,0000,0000,0000,,here and this column here is basically Dialogue: 0,0:41:35.04,0:41:38.08,Default,,0000,0000,0000,,the failure type and so the we have as I Dialogue: 0,0:41:38.08,0:41:40.80,Default,,0000,0000,0000,,already demonstrated just now we do have Dialogue: 0,0:41:40.80,0:41:43.44,Default,,0000,0000,0000,,uh several categories of or types of Dialogue: 0,0:41:43.44,0:41:45.56,Default,,0000,0000,0000,,failure and so here we call this Dialogue: 0,0:41:45.56,0:41:47.08,Default,,0000,0000,0000,,multiclass Dialogue: 0,0:41:47.08,0:41:50.00,Default,,0000,0000,0000,,classification so we can either build a Dialogue: 0,0:41:50.00,0:41:51.84,Default,,0000,0000,0000,,binary classification model for this Dialogue: 0,0:41:51.84,0:41:53.52,Default,,0000,0000,0000,,problem domain or we can build a Dialogue: 0,0:41:53.52,0:41:55.08,Default,,0000,0000,0000,,multiclass Dialogue: 0,0:41:55.08,0:41:58.12,Default,,0000,0000,0000,,classification problem all right so this Dialogue: 0,0:41:58.12,0:41:59.84,Default,,0000,0000,0000,,jupyter notebook is going to demonstrate Dialogue: 0,0:41:59.84,0:42:02.32,Default,,0000,0000,0000,,both approaches to us so first step we Dialogue: 0,0:42:02.32,0:42:04.80,Default,,0000,0000,0000,,are going to write all this python code Dialogue: 0,0:42:04.80,0:42:06.88,Default,,0000,0000,0000,,that's going to import all the libraries Dialogue: 0,0:42:06.88,0:42:09.08,Default,,0000,0000,0000,,that we need to use okay so this is Dialogue: 0,0:42:09.08,0:42:12.32,Default,,0000,0000,0000,,basically python code okay and it's Dialogue: 0,0:42:12.32,0:42:15.12,Default,,0000,0000,0000,,importing the relevant machine learn Dialogue: 0,0:42:15.12,0:42:17.96,Default,,0000,0000,0000,,oops we are importing the relevant Dialogue: 0,0:42:17.96,0:42:20.60,Default,,0000,0000,0000,,machine learning libraries related to Dialogue: 0,0:42:20.60,0:42:23.52,Default,,0000,0000,0000,,our domain use case okay then we load in Dialogue: 0,0:42:23.52,0:42:26.44,Default,,0000,0000,0000,,our data set okay so this our data set Dialogue: 0,0:42:26.44,0:42:28.32,Default,,0000,0000,0000,,we describe it we have some quick Dialogue: 0,0:42:28.32,0:42:30.92,Default,,0000,0000,0000,,insights into the data set um and then Dialogue: 0,0:42:30.92,0:42:32.84,Default,,0000,0000,0000,,we just take a look at all the variables Dialogue: 0,0:42:32.84,0:42:36.00,Default,,0000,0000,0000,,of the feature variables Etc and so on Dialogue: 0,0:42:36.00,0:42:38.00,Default,,0000,0000,0000,,we just what we're doing now is just Dialogue: 0,0:42:38.00,0:42:39.80,Default,,0000,0000,0000,,doing a quick overview of the data set Dialogue: 0,0:42:39.80,0:42:41.56,Default,,0000,0000,0000,,so this all this python code here they Dialogue: 0,0:42:41.56,0:42:43.76,Default,,0000,0000,0000,,were writing is allowing us the data Dialogue: 0,0:42:43.76,0:42:45.36,Default,,0000,0000,0000,,scientist to get a quick overview of our Dialogue: 0,0:42:45.36,0:42:48.36,Default,,0000,0000,0000,,data set right okay like how many um V Dialogue: 0,0:42:48.36,0:42:50.24,Default,,0000,0000,0000,,how many rows are there how many columns Dialogue: 0,0:42:50.24,0:42:51.76,Default,,0000,0000,0000,,are there what are the data types of the Dialogue: 0,0:42:51.76,0:42:53.44,Default,,0000,0000,0000,,colums what are the name of the columns Dialogue: 0,0:42:53.44,0:42:57.36,Default,,0000,0000,0000,,etc etc okay then we zoom in on to the Dialogue: 0,0:42:57.36,0:42:58.84,Default,,0000,0000,0000,,Target variables so we look at the Dialogue: 0,0:42:58.84,0:43:02.00,Default,,0000,0000,0000,,Target variables how many uh counts Dialogue: 0,0:43:02.00,0:43:04.52,Default,,0000,0000,0000,,there are of this target variable uh and Dialogue: 0,0:43:04.52,0:43:06.44,Default,,0000,0000,0000,,so on how many different types of Dialogue: 0,0:43:06.44,0:43:08.24,Default,,0000,0000,0000,,failures there are then you want to Dialogue: 0,0:43:08.24,0:43:09.00,Default,,0000,0000,0000,,check whether there are any Dialogue: 0,0:43:09.00,0:43:10.76,Default,,0000,0000,0000,,inconsistencies between the Target and Dialogue: 0,0:43:10.76,0:43:13.56,Default,,0000,0000,0000,,the failure type Etc okay so when you do Dialogue: 0,0:43:13.56,0:43:15.12,Default,,0000,0000,0000,,all this checking you're going to Dialogue: 0,0:43:15.12,0:43:16.96,Default,,0000,0000,0000,,discover there are some discrepancies in Dialogue: 0,0:43:16.96,0:43:20.28,Default,,0000,0000,0000,,your data set so using a specific python Dialogue: 0,0:43:20.28,0:43:21.84,Default,,0000,0000,0000,,code to do checking you're going to say Dialogue: 0,0:43:21.84,0:43:23.48,Default,,0000,0000,0000,,hey you know what there's some errors Dialogue: 0,0:43:23.48,0:43:25.00,Default,,0000,0000,0000,,here right there are nine values that Dialogue: 0,0:43:25.00,0:43:26.60,Default,,0000,0000,0000,,classify as failure and Target variable Dialogue: 0,0:43:26.60,0:43:28.20,Default,,0000,0000,0000,,but as no no failure in the failure type Dialogue: 0,0:43:28.20,0:43:29.72,Default,,0000,0000,0000,,variable so that means there's a Dialogue: 0,0:43:29.72,0:43:33.20,Default,,0000,0000,0000,,discrepancy in your data point right so Dialogue: 0,0:43:33.20,0:43:34.76,Default,,0000,0000,0000,,which are so these are all the ones that Dialogue: 0,0:43:34.76,0:43:36.36,Default,,0000,0000,0000,,are discrepancies because the target Dialogue: 0,0:43:36.36,0:43:39.00,Default,,0000,0000,0000,,variable says one and we already know Dialogue: 0,0:43:39.00,0:43:41.24,Default,,0000,0000,0000,,that Target variable one is supposed to Dialogue: 0,0:43:41.24,0:43:43.24,Default,,0000,0000,0000,,mean that it's a failure right target Dialogue: 0,0:43:43.24,0:43:44.88,Default,,0000,0000,0000,,varable one is supposed to mean that is Dialogue: 0,0:43:44.88,0:43:47.12,Default,,0000,0000,0000,,a failure so we are kind of expecting to Dialogue: 0,0:43:47.12,0:43:49.68,Default,,0000,0000,0000,,see the failure classification but some Dialogue: 0,0:43:49.68,0:43:51.40,Default,,0000,0000,0000,,rows actually say there's no failure Dialogue: 0,0:43:51.40,0:43:53.80,Default,,0000,0000,0000,,although the target type is one but here Dialogue: 0,0:43:53.80,0:43:55.92,Default,,0000,0000,0000,,is a classic example of an error that Dialogue: 0,0:43:55.92,0:43:58.64,Default,,0000,0000,0000,,can very well Ur in a data set so now Dialogue: 0,0:43:58.64,0:44:00.56,Default,,0000,0000,0000,,the question is what do you do with Dialogue: 0,0:44:00.56,0:44:04.72,Default,,0000,0000,0000,,these errors in your data set right so Dialogue: 0,0:44:04.72,0:44:06.24,Default,,0000,0000,0000,,here the data scientist says I think it Dialogue: 0,0:44:06.24,0:44:07.52,Default,,0000,0000,0000,,would make sense to remove those Dialogue: 0,0:44:07.52,0:44:09.92,Default,,0000,0000,0000,,instances and so they write some code Dialogue: 0,0:44:09.92,0:44:12.68,Default,,0000,0000,0000,,then to remove those instances or those Dialogue: 0,0:44:12.68,0:44:14.92,Default,,0000,0000,0000,,uh rows or data points from the overall Dialogue: 0,0:44:14.92,0:44:17.28,Default,,0000,0000,0000,,data set and same thing we can again Dialogue: 0,0:44:17.28,0:44:19.24,Default,,0000,0000,0000,,check for other ISU so we find there's Dialogue: 0,0:44:19.24,0:44:21.16,Default,,0000,0000,0000,,another ISU here with our data set which Dialogue: 0,0:44:21.16,0:44:24.08,Default,,0000,0000,0000,,is another warning so again we can Dialogue: 0,0:44:24.08,0:44:26.24,Default,,0000,0000,0000,,possibly remove them so you're going to Dialogue: 0,0:44:26.24,0:44:31.28,Default,,0000,0000,0000,,remove 20 7 instances or rows from your Dialogue: 0,0:44:31.28,0:44:34.44,Default,,0000,0000,0000,,overall data set so your data set has a Dialogue: 0,0:44:34.44,0:44:37.08,Default,,0000,0000,0000,,10,000 uh rows or data points you're Dialogue: 0,0:44:37.08,0:44:40.16,Default,,0000,0000,0000,,removing 27 which is only 0.27 of the Dialogue: 0,0:44:40.16,0:44:42.24,Default,,0000,0000,0000,,entire data set and these were the Dialogue: 0,0:44:42.24,0:44:45.72,Default,,0000,0000,0000,,reasons why you remove them okay so if Dialogue: 0,0:44:45.72,0:44:48.16,Default,,0000,0000,0000,,you're just removing to uh 0.27% of the Dialogue: 0,0:44:48.16,0:44:50.80,Default,,0000,0000,0000,,anti data set no big deal right still Dialogue: 0,0:44:50.80,0:44:53.08,Default,,0000,0000,0000,,okay but you needed to remove them Dialogue: 0,0:44:53.08,0:44:55.72,Default,,0000,0000,0000,,because these errors right this Dialogue: 0,0:44:55.72,0:44:58.04,Default,,0000,0000,0000,,27 um Dialogue: 0,0:44:58.04,0:45:00.56,Default,,0000,0000,0000,,errors okay data points with errors in Dialogue: 0,0:45:00.56,0:45:02.96,Default,,0000,0000,0000,,your data set could really affect the Dialogue: 0,0:45:02.96,0:45:05.00,Default,,0000,0000,0000,,training of your machine learning model Dialogue: 0,0:45:05.00,0:45:08.64,Default,,0000,0000,0000,,so we need to do your data cleansing Dialogue: 0,0:45:08.64,0:45:11.72,Default,,0000,0000,0000,,right so we are actually cleansing now Dialogue: 0,0:45:11.72,0:45:15.20,Default,,0000,0000,0000,,uh uh some kind of data that is Dialogue: 0,0:45:15.20,0:45:17.52,Default,,0000,0000,0000,,incorrect or erroneous in your original Dialogue: 0,0:45:17.52,0:45:21.44,Default,,0000,0000,0000,,data set okay so then we go on to the Dialogue: 0,0:45:21.44,0:45:23.84,Default,,0000,0000,0000,,next part which is called Eda right so Dialogue: 0,0:45:23.84,0:45:28.88,Default,,0000,0000,0000,,Eda is where we kind of explore our data Dialogue: 0,0:45:28.88,0:45:31.72,Default,,0000,0000,0000,,and we want to kind of get a visual Dialogue: 0,0:45:31.72,0:45:34.24,Default,,0000,0000,0000,,overview of our data as a whole and also Dialogue: 0,0:45:34.24,0:45:35.88,Default,,0000,0000,0000,,take a look at the statistical Dialogue: 0,0:45:35.88,0:45:38.16,Default,,0000,0000,0000,,properties of data the statistical Dialogue: 0,0:45:38.16,0:45:40.48,Default,,0000,0000,0000,,distribution of the data in all the Dialogue: 0,0:45:40.48,0:45:43.08,Default,,0000,0000,0000,,various colums the correlation between Dialogue: 0,0:45:43.08,0:45:44.64,Default,,0000,0000,0000,,the variables between the feature Dialogue: 0,0:45:44.64,0:45:46.68,Default,,0000,0000,0000,,variables different columns and also the Dialogue: 0,0:45:46.68,0:45:48.60,Default,,0000,0000,0000,,feature variable and the target variable Dialogue: 0,0:45:48.60,0:45:52.04,Default,,0000,0000,0000,,so all of this is called Eda and Eda in Dialogue: 0,0:45:52.04,0:45:54.08,Default,,0000,0000,0000,,a machine learning workflow is typically Dialogue: 0,0:45:54.08,0:45:57.16,Default,,0000,0000,0000,,done through visualization Dialogue: 0,0:45:57.16,0:45:58.84,Default,,0000,0000,0000,,all right so let's go back here and take Dialogue: 0,0:45:58.84,0:46:00.60,Default,,0000,0000,0000,,a look right so for example here we are Dialogue: 0,0:46:00.60,0:46:03.40,Default,,0000,0000,0000,,looking at correlation so we plot the Dialogue: 0,0:46:03.40,0:46:05.68,Default,,0000,0000,0000,,values of all the various feature Dialogue: 0,0:46:05.68,0:46:07.60,Default,,0000,0000,0000,,variables against each other and look Dialogue: 0,0:46:07.60,0:46:10.80,Default,,0000,0000,0000,,for potential correlations and patterns Dialogue: 0,0:46:10.80,0:46:13.36,Default,,0000,0000,0000,,and so on and all the different shapes Dialogue: 0,0:46:13.36,0:46:17.28,Default,,0000,0000,0000,,that you see here in this pair plot okay Dialogue: 0,0:46:17.28,0:46:18.40,Default,,0000,0000,0000,,uh will have different meaning Dialogue: 0,0:46:18.40,0:46:20.00,Default,,0000,0000,0000,,statistical meaning and so the data Dialogue: 0,0:46:20.00,0:46:21.80,Default,,0000,0000,0000,,scientist has to kind of visually Dialogue: 0,0:46:21.80,0:46:23.76,Default,,0000,0000,0000,,inspect this P plot makes some Dialogue: 0,0:46:23.76,0:46:25.56,Default,,0000,0000,0000,,interpretations of these different Dialogue: 0,0:46:25.56,0:46:27.68,Default,,0000,0000,0000,,patterns that he sees here all right so Dialogue: 0,0:46:27.68,0:46:30.48,Default,,0000,0000,0000,,these are some of the insights that that Dialogue: 0,0:46:30.48,0:46:32.84,Default,,0000,0000,0000,,can be deduced from looking at these Dialogue: 0,0:46:32.84,0:46:34.32,Default,,0000,0000,0000,,pattern so for example the Tor and Dialogue: 0,0:46:34.32,0:46:36.28,Default,,0000,0000,0000,,rotational speed are highly correlated Dialogue: 0,0:46:36.28,0:46:38.04,Default,,0000,0000,0000,,the process temperature and a Dialogue: 0,0:46:38.04,0:46:39.92,Default,,0000,0000,0000,,temperature so highly correlated that Dialogue: 0,0:46:39.92,0:46:41.56,Default,,0000,0000,0000,,failures occur for extreme values of Dialogue: 0,0:46:41.56,0:46:44.52,Default,,0000,0000,0000,,some features etc etc then you can plot Dialogue: 0,0:46:44.52,0:46:45.96,Default,,0000,0000,0000,,certain kinds of charts this called a Dialogue: 0,0:46:45.96,0:46:48.48,Default,,0000,0000,0000,,violing chart to again get new insights Dialogue: 0,0:46:48.48,0:46:49.84,Default,,0000,0000,0000,,for example regarding the talk and Dialogue: 0,0:46:49.84,0:46:51.48,Default,,0000,0000,0000,,rotational speed it can see again that Dialogue: 0,0:46:51.48,0:46:53.12,Default,,0000,0000,0000,,most failures are triggered for much Dialogue: 0,0:46:53.12,0:46:55.12,Default,,0000,0000,0000,,lower or much higher values than the Dialogue: 0,0:46:55.12,0:46:57.40,Default,,0000,0000,0000,,mean when they're not failing so all Dialogue: 0,0:46:57.40,0:47:00.72,Default,,0000,0000,0000,,these visualizations they are there and Dialogue: 0,0:47:00.72,0:47:02.48,Default,,0000,0000,0000,,a trained data scientist can look at Dialogue: 0,0:47:02.48,0:47:05.08,Default,,0000,0000,0000,,them inspect them and make some kind of Dialogue: 0,0:47:05.08,0:47:08.40,Default,,0000,0000,0000,,insightful deductions from them okay Dialogue: 0,0:47:08.40,0:47:11.08,Default,,0000,0000,0000,,percentage of failure right uh the Dialogue: 0,0:47:11.08,0:47:13.64,Default,,0000,0000,0000,,correlation heat map okay between all Dialogue: 0,0:47:13.64,0:47:15.56,Default,,0000,0000,0000,,these different feature variables and Dialogue: 0,0:47:15.56,0:47:16.92,Default,,0000,0000,0000,,also the target Dialogue: 0,0:47:16.92,0:47:19.60,Default,,0000,0000,0000,,variable okay uh the product types Dialogue: 0,0:47:19.60,0:47:21.08,Default,,0000,0000,0000,,percentage of product types percentage Dialogue: 0,0:47:21.08,0:47:23.16,Default,,0000,0000,0000,,of failure with respect to the product Dialogue: 0,0:47:23.16,0:47:25.72,Default,,0000,0000,0000,,type so we can also kind of visualize Dialogue: 0,0:47:25.72,0:47:27.80,Default,,0000,0000,0000,,that as well so certain products have a Dialogue: 0,0:47:27.80,0:47:29.84,Default,,0000,0000,0000,,higher ratio of faure compared to other Dialogue: 0,0:47:29.84,0:47:33.24,Default,,0000,0000,0000,,product types Etc or for example uh M Dialogue: 0,0:47:33.24,0:47:35.80,Default,,0000,0000,0000,,tends to feel more than H products etc Dialogue: 0,0:47:35.80,0:47:38.88,Default,,0000,0000,0000,,etc so we can create a vast variety of Dialogue: 0,0:47:38.88,0:47:41.32,Default,,0000,0000,0000,,visualizations in the Eda stage so you Dialogue: 0,0:47:41.32,0:47:43.96,Default,,0000,0000,0000,,can see here and again the idea of this Dialogue: 0,0:47:43.96,0:47:46.36,Default,,0000,0000,0000,,visualization is just to give us some Dialogue: 0,0:47:46.36,0:47:49.68,Default,,0000,0000,0000,,insight some preliminary insight into Dialogue: 0,0:47:49.68,0:47:52.52,Default,,0000,0000,0000,,our data set that helps us to model it Dialogue: 0,0:47:52.52,0:47:54.12,Default,,0000,0000,0000,,more correctly so some more insights Dialogue: 0,0:47:54.12,0:47:56.20,Default,,0000,0000,0000,,that we get into our data set from all Dialogue: 0,0:47:56.20,0:47:57.60,Default,,0000,0000,0000,,this visualization Dialogue: 0,0:47:57.60,0:47:59.56,Default,,0000,0000,0000,,then we can plot the distribution so we Dialogue: 0,0:47:59.56,0:48:00.72,Default,,0000,0000,0000,,can see whether it's a normal Dialogue: 0,0:48:00.72,0:48:03.08,Default,,0000,0000,0000,,distribution or some other kind of Dialogue: 0,0:48:03.08,0:48:05.64,Default,,0000,0000,0000,,distribution uh we can have a box plot Dialogue: 0,0:48:05.64,0:48:07.76,Default,,0000,0000,0000,,to see whether there are any outliers in Dialogue: 0,0:48:07.76,0:48:10.40,Default,,0000,0000,0000,,your data set and so on right so we can Dialogue: 0,0:48:10.40,0:48:11.64,Default,,0000,0000,0000,,see from the box plots we can see Dialogue: 0,0:48:11.64,0:48:14.60,Default,,0000,0000,0000,,rotational speed and have outliers so we Dialogue: 0,0:48:14.60,0:48:16.88,Default,,0000,0000,0000,,already saw outliers are basically a Dialogue: 0,0:48:16.88,0:48:18.80,Default,,0000,0000,0000,,problem that you may need to kind of Dialogue: 0,0:48:18.80,0:48:22.52,Default,,0000,0000,0000,,tackle right so outliers are an isue uh Dialogue: 0,0:48:22.52,0:48:24.80,Default,,0000,0000,0000,,it's a it's a part of data cleansing and Dialogue: 0,0:48:24.80,0:48:26.96,Default,,0000,0000,0000,,so you may need to tackle this so we may Dialogue: 0,0:48:26.96,0:48:28.88,Default,,0000,0000,0000,,have to check okay well where are the Dialogue: 0,0:48:28.88,0:48:31.32,Default,,0000,0000,0000,,potential outliers so we can analyze Dialogue: 0,0:48:31.32,0:48:35.32,Default,,0000,0000,0000,,them from the box blot okay um but then Dialogue: 0,0:48:35.32,0:48:37.08,Default,,0000,0000,0000,,we can say well they are outliers but Dialogue: 0,0:48:37.08,0:48:38.80,Default,,0000,0000,0000,,maybe they're not really horrible Dialogue: 0,0:48:38.80,0:48:40.76,Default,,0000,0000,0000,,outliers so we can tolerate them or Dialogue: 0,0:48:40.76,0:48:42.88,Default,,0000,0000,0000,,maybe we want to remove them so we can Dialogue: 0,0:48:42.88,0:48:44.92,Default,,0000,0000,0000,,see what the mean and maximum values for Dialogue: 0,0:48:44.92,0:48:46.72,Default,,0000,0000,0000,,all these with respect to product type Dialogue: 0,0:48:46.72,0:48:49.68,Default,,0000,0000,0000,,how many of them are above or highly Dialogue: 0,0:48:49.68,0:48:51.44,Default,,0000,0000,0000,,correlated with the product type in Dialogue: 0,0:48:51.44,0:48:54.24,Default,,0000,0000,0000,,terms of the maximum and minimum okay Dialogue: 0,0:48:54.24,0:48:56.96,Default,,0000,0000,0000,,and then so on so the Insight is well we Dialogue: 0,0:48:56.96,0:48:59.60,Default,,0000,0000,0000,,got 4.8% of the instances are outliers Dialogue: 0,0:48:59.60,0:49:02.56,Default,,0000,0000,0000,,so maybe 4.87% is not really that much Dialogue: 0,0:49:02.56,0:49:04.92,Default,,0000,0000,0000,,the outliers are not horrible so we just Dialogue: 0,0:49:04.92,0:49:06.96,Default,,0000,0000,0000,,leave them in the data set now for a Dialogue: 0,0:49:06.96,0:49:08.52,Default,,0000,0000,0000,,different data set the data scientist Dialogue: 0,0:49:08.52,0:49:10.28,Default,,0000,0000,0000,,could come to different conclusion so Dialogue: 0,0:49:10.28,0:49:12.28,Default,,0000,0000,0000,,then they would do whatever they've Dialogue: 0,0:49:12.28,0:49:15.40,Default,,0000,0000,0000,,deemed is appropriate to kind of cleanse Dialogue: 0,0:49:15.40,0:49:18.08,Default,,0000,0000,0000,,the data set okay so now that we have Dialogue: 0,0:49:18.08,0:49:20.00,Default,,0000,0000,0000,,done all the Eda the next thing we're Dialogue: 0,0:49:20.00,0:49:23.16,Default,,0000,0000,0000,,going to do is we are going to do what Dialogue: 0,0:49:23.16,0:49:26.20,Default,,0000,0000,0000,,is called feature engineering so we are Dialogue: 0,0:49:26.20,0:49:28.76,Default,,0000,0000,0000,,going to transform our original feature Dialogue: 0,0:49:28.76,0:49:31.28,Default,,0000,0000,0000,,variables and these are our original Dialogue: 0,0:49:31.28,0:49:32.96,Default,,0000,0000,0000,,feature variables right these are our Dialogue: 0,0:49:32.96,0:49:35.04,Default,,0000,0000,0000,,original feature variables and we are Dialogue: 0,0:49:35.04,0:49:37.76,Default,,0000,0000,0000,,going to transform them all right we're Dialogue: 0,0:49:37.76,0:49:40.32,Default,,0000,0000,0000,,going to transform them in some sense uh Dialogue: 0,0:49:40.32,0:49:43.76,Default,,0000,0000,0000,,into some other form before we fit this Dialogue: 0,0:49:43.76,0:49:45.64,Default,,0000,0000,0000,,for training into our machine learning Dialogue: 0,0:49:45.64,0:49:48.60,Default,,0000,0000,0000,,algorithm all right so these are Dialogue: 0,0:49:48.60,0:49:51.60,Default,,0000,0000,0000,,examples of let's say this example of a Dialogue: 0,0:49:51.60,0:49:55.20,Default,,0000,0000,0000,,original data set right and this is Dialogue: 0,0:49:55.20,0:49:56.84,Default,,0000,0000,0000,,examples these are some of the examples Dialogue: 0,0:49:56.84,0:49:58.04,Default,,0000,0000,0000,,you don't have to use all of them but Dialogue: 0,0:49:58.04,0:49:59.44,Default,,0000,0000,0000,,these are some of examples of what we Dialogue: 0,0:49:59.44,0:50:00.84,Default,,0000,0000,0000,,call feature engineering which you can Dialogue: 0,0:50:00.84,0:50:03.56,Default,,0000,0000,0000,,then transform your original values in Dialogue: 0,0:50:03.56,0:50:05.28,Default,,0000,0000,0000,,your feature variables to all these Dialogue: 0,0:50:05.28,0:50:07.92,Default,,0000,0000,0000,,transform values here so we're going to Dialogue: 0,0:50:07.92,0:50:09.68,Default,,0000,0000,0000,,pretty much do that here so we have a Dialogue: 0,0:50:09.68,0:50:12.60,Default,,0000,0000,0000,,ordinal encoding we do scaling of the Dialogue: 0,0:50:12.60,0:50:14.84,Default,,0000,0000,0000,,data so the data set is scaled we use a Dialogue: 0,0:50:14.84,0:50:18.24,Default,,0000,0000,0000,,minmax scaling and then finally we come Dialogue: 0,0:50:18.24,0:50:21.72,Default,,0000,0000,0000,,to do a modeling so we have to split our Dialogue: 0,0:50:21.72,0:50:24.36,Default,,0000,0000,0000,,data set into a training data set and a Dialogue: 0,0:50:24.36,0:50:28.64,Default,,0000,0000,0000,,test data set so coming back to again um Dialogue: 0,0:50:28.64,0:50:32.16,Default,,0000,0000,0000,,we said that in a before you train your Dialogue: 0,0:50:32.16,0:50:33.80,Default,,0000,0000,0000,,model sorry before you train your model Dialogue: 0,0:50:33.80,0:50:35.60,Default,,0000,0000,0000,,you have to take your original data set Dialogue: 0,0:50:35.60,0:50:37.32,Default,,0000,0000,0000,,now this is a featured engineered data Dialogue: 0,0:50:37.32,0:50:38.84,Default,,0000,0000,0000,,set we're going to break it into two or Dialogue: 0,0:50:38.84,0:50:40.84,Default,,0000,0000,0000,,more subsets okay so one is called the Dialogue: 0,0:50:40.84,0:50:42.40,Default,,0000,0000,0000,,training data set that we use to Feit Dialogue: 0,0:50:42.40,0:50:44.00,Default,,0000,0000,0000,,and train a machine learning model the Dialogue: 0,0:50:44.00,0:50:45.92,Default,,0000,0000,0000,,second is test data set to evaluate the Dialogue: 0,0:50:45.92,0:50:47.96,Default,,0000,0000,0000,,accuracy of the model okay so we got Dialogue: 0,0:50:47.96,0:50:50.56,Default,,0000,0000,0000,,this training data set your test data Dialogue: 0,0:50:50.56,0:50:52.72,Default,,0000,0000,0000,,set and we also need Dialogue: 0,0:50:52.72,0:50:56.16,Default,,0000,0000,0000,,to sample so from our original data set Dialogue: 0,0:50:56.16,0:50:57.40,Default,,0000,0000,0000,,we need to sample sample some points Dialogue: 0,0:50:57.40,0:50:58.84,Default,,0000,0000,0000,,that go into your training data set some Dialogue: 0,0:50:58.84,0:51:00.56,Default,,0000,0000,0000,,points that go in your test data set so Dialogue: 0,0:51:00.56,0:51:02.72,Default,,0000,0000,0000,,there are many ways to do sampling one Dialogue: 0,0:51:02.72,0:51:04.92,Default,,0000,0000,0000,,way is to do stratified sampling where Dialogue: 0,0:51:04.92,0:51:06.72,Default,,0000,0000,0000,,we ensure the same proportion of data Dialogue: 0,0:51:06.72,0:51:09.00,Default,,0000,0000,0000,,from each steta or class because right Dialogue: 0,0:51:09.00,0:51:10.96,Default,,0000,0000,0000,,now we have a multiclass classification Dialogue: 0,0:51:10.96,0:51:12.32,Default,,0000,0000,0000,,problem so you want to make sure the Dialogue: 0,0:51:12.32,0:51:13.96,Default,,0000,0000,0000,,same proportion of data from each TR Dialogue: 0,0:51:13.96,0:51:15.84,Default,,0000,0000,0000,,class is equally proportional in the Dialogue: 0,0:51:15.84,0:51:17.92,Default,,0000,0000,0000,,training and test data set as the Dialogue: 0,0:51:17.92,0:51:20.12,Default,,0000,0000,0000,,original data set which is very useful Dialogue: 0,0:51:20.12,0:51:21.64,Default,,0000,0000,0000,,for dealing with what is called an Dialogue: 0,0:51:21.64,0:51:24.32,Default,,0000,0000,0000,,imbalanced data set so here we have an Dialogue: 0,0:51:24.32,0:51:25.84,Default,,0000,0000,0000,,example of what is called an imbalanced Dialogue: 0,0:51:25.84,0:51:29.52,Default,,0000,0000,0000,,data set in the sense that you have the Dialogue: 0,0:51:29.52,0:51:32.76,Default,,0000,0000,0000,,vast majority of data points in your Dialogue: 0,0:51:32.76,0:51:34.96,Default,,0000,0000,0000,,data set they are going to have the Dialogue: 0,0:51:34.96,0:51:37.48,Default,,0000,0000,0000,,value of zero for their target variable Dialogue: 0,0:51:37.48,0:51:40.20,Default,,0000,0000,0000,,colume so only a extremely small Dialogue: 0,0:51:40.20,0:51:43.12,Default,,0000,0000,0000,,minority of the data points in your data Dialogue: 0,0:51:43.12,0:51:45.32,Default,,0000,0000,0000,,set will actually have the value of one Dialogue: 0,0:51:45.32,0:51:48.72,Default,,0000,0000,0000,,for their target variable colume okay so Dialogue: 0,0:51:48.72,0:51:51.04,Default,,0000,0000,0000,,a situation where you have your class or Dialogue: 0,0:51:51.04,0:51:52.52,Default,,0000,0000,0000,,your target variable colume where the Dialogue: 0,0:51:52.52,0:51:54.48,Default,,0000,0000,0000,,vast majority of values are from one Dialogue: 0,0:51:54.48,0:51:58.12,Default,,0000,0000,0000,,class and a tiny small minority are from Dialogue: 0,0:51:58.12,0:52:00.52,Default,,0000,0000,0000,,another class we call this an imbalanced Dialogue: 0,0:52:00.52,0:52:02.72,Default,,0000,0000,0000,,data set and for an imbalanced data set Dialogue: 0,0:52:02.72,0:52:04.32,Default,,0000,0000,0000,,typically we will have a specific Dialogue: 0,0:52:04.32,0:52:05.92,Default,,0000,0000,0000,,technique to do the train test split Dialogue: 0,0:52:05.92,0:52:08.12,Default,,0000,0000,0000,,which is called stratified sampling and Dialogue: 0,0:52:08.12,0:52:09.60,Default,,0000,0000,0000,,so that's what's exactly happening here Dialogue: 0,0:52:09.60,0:52:12.00,Default,,0000,0000,0000,,we're doing a stratified split here so Dialogue: 0,0:52:12.00,0:52:14.84,Default,,0000,0000,0000,,we are doing a train test split here uh Dialogue: 0,0:52:14.84,0:52:17.52,Default,,0000,0000,0000,,and we are doing a stratified split uh Dialogue: 0,0:52:17.52,0:52:20.36,Default,,0000,0000,0000,,and then now we actually develop the Dialogue: 0,0:52:20.36,0:52:23.36,Default,,0000,0000,0000,,models so now we've got the train test Dialogue: 0,0:52:23.36,0:52:25.48,Default,,0000,0000,0000,,plate now here is where we actually Dialogue: 0,0:52:25.48,0:52:27.08,Default,,0000,0000,0000,,train the models Dialogue: 0,0:52:27.08,0:52:29.92,Default,,0000,0000,0000,,now in terms of classification there are Dialogue: 0,0:52:29.92,0:52:32.32,Default,,0000,0000,0000,,a whole bunch of Dialogue: 0,0:52:32.32,0:52:35.40,Default,,0000,0000,0000,,possibilities right that you can use Dialogue: 0,0:52:35.40,0:52:38.48,Default,,0000,0000,0000,,there are many many different algorithms Dialogue: 0,0:52:38.48,0:52:41.00,Default,,0000,0000,0000,,that we can use to create a Dialogue: 0,0:52:41.00,0:52:42.84,Default,,0000,0000,0000,,classification model so this are an Dialogue: 0,0:52:42.84,0:52:45.08,Default,,0000,0000,0000,,example of some of the more common ones Dialogue: 0,0:52:45.08,0:52:47.48,Default,,0000,0000,0000,,logistic support Vector machine decision Dialogue: 0,0:52:47.48,0:52:49.52,Default,,0000,0000,0000,,trees random Forest bagging balance Dialogue: 0,0:52:49.52,0:52:52.72,Default,,0000,0000,0000,,bagging boost assemble Ensemble so all Dialogue: 0,0:52:52.72,0:52:55.04,Default,,0000,0000,0000,,these are different algorithms which Dialogue: 0,0:52:55.04,0:52:57.76,Default,,0000,0000,0000,,will create different kind of models Dialogue: 0,0:52:57.76,0:53:01.60,Default,,0000,0000,0000,,which will result in different accuracy Dialogue: 0,0:53:01.60,0:53:05.40,Default,,0000,0000,0000,,measures okay so it's the goal of the Dialogue: 0,0:53:05.40,0:53:08.92,Default,,0000,0000,0000,,data scientist to find the best model Dialogue: 0,0:53:08.92,0:53:11.52,Default,,0000,0000,0000,,that gives the best accuracy for the Dialogue: 0,0:53:11.52,0:53:14.12,Default,,0000,0000,0000,,given data set for training on that Dialogue: 0,0:53:14.12,0:53:16.88,Default,,0000,0000,0000,,given data set so let's head back again Dialogue: 0,0:53:16.88,0:53:19.76,Default,,0000,0000,0000,,to uh our machine learning workflow so Dialogue: 0,0:53:19.76,0:53:21.52,Default,,0000,0000,0000,,here basically what I'm doing is I'm Dialogue: 0,0:53:21.52,0:53:23.52,Default,,0000,0000,0000,,creating a whole bunch of models here Dialogue: 0,0:53:23.52,0:53:25.52,Default,,0000,0000,0000,,all right so one is a random Forest one Dialogue: 0,0:53:25.52,0:53:27.16,Default,,0000,0000,0000,,is balance bagging one is a boost Dialogue: 0,0:53:27.16,0:53:29.52,Default,,0000,0000,0000,,classifier one's The Ensemble classifier Dialogue: 0,0:53:29.52,0:53:32.76,Default,,0000,0000,0000,,and using all of these I am going to Dialogue: 0,0:53:32.76,0:53:35.32,Default,,0000,0000,0000,,basically Feit or train my model using Dialogue: 0,0:53:35.32,0:53:37.44,Default,,0000,0000,0000,,all these algorithms and then I'm going Dialogue: 0,0:53:37.44,0:53:39.80,Default,,0000,0000,0000,,to evaluate them okay I'm going to Dialogue: 0,0:53:39.80,0:53:42.48,Default,,0000,0000,0000,,evaluate how good each of these models Dialogue: 0,0:53:42.48,0:53:45.76,Default,,0000,0000,0000,,are and here you can see your value your Dialogue: 0,0:53:45.76,0:53:48.84,Default,,0000,0000,0000,,evaluation data right okay and this is Dialogue: 0,0:53:48.84,0:53:50.84,Default,,0000,0000,0000,,the confusion Matrix which is another Dialogue: 0,0:53:50.84,0:53:54.28,Default,,0000,0000,0000,,way of evaluating so now we come to the Dialogue: 0,0:53:54.28,0:53:56.32,Default,,0000,0000,0000,,kind of the the the key part here which Dialogue: 0,0:53:56.32,0:53:58.52,Default,,0000,0000,0000,,is which is how do I distinguish between Dialogue: 0,0:53:58.52,0:54:00.08,Default,,0000,0000,0000,,all these models right I've got all Dialogue: 0,0:54:00.08,0:54:01.40,Default,,0000,0000,0000,,these different models which are built Dialogue: 0,0:54:01.40,0:54:03.04,Default,,0000,0000,0000,,with different algorithms which I'm Dialogue: 0,0:54:03.04,0:54:05.36,Default,,0000,0000,0000,,using to train on the same data set how Dialogue: 0,0:54:05.36,0:54:07.36,Default,,0000,0000,0000,,do I distinguish between all these Dialogue: 0,0:54:07.36,0:54:10.36,Default,,0000,0000,0000,,models okay and so for that sense for Dialogue: 0,0:54:10.36,0:54:13.88,Default,,0000,0000,0000,,that we actually have a whole bunch of Dialogue: 0,0:54:13.88,0:54:16.20,Default,,0000,0000,0000,,common evaluation matrics for Dialogue: 0,0:54:16.20,0:54:18.32,Default,,0000,0000,0000,,classification right so this evaluation Dialogue: 0,0:54:18.32,0:54:22.24,Default,,0000,0000,0000,,matrics tell us how good a model is in Dialogue: 0,0:54:22.24,0:54:24.32,Default,,0000,0000,0000,,terms of its accuracy in Dialogue: 0,0:54:24.32,0:54:27.00,Default,,0000,0000,0000,,classification so in terms of Dialogue: 0,0:54:27.00,0:54:29.44,Default,,0000,0000,0000,,accuracy we actually have many different Dialogue: 0,0:54:29.44,0:54:31.68,Default,,0000,0000,0000,,models uh sorry many different measures Dialogue: 0,0:54:31.68,0:54:33.44,Default,,0000,0000,0000,,right you might think well accuracy is Dialogue: 0,0:54:33.44,0:54:35.40,Default,,0000,0000,0000,,just accuracy well that's all right it's Dialogue: 0,0:54:35.40,0:54:36.88,Default,,0000,0000,0000,,just either it's accurate or it's not Dialogue: 0,0:54:36.88,0:54:39.32,Default,,0000,0000,0000,,accurate right but actually it's not Dialogue: 0,0:54:39.32,0:54:41.36,Default,,0000,0000,0000,,that simple there are many different Dialogue: 0,0:54:41.36,0:54:43.84,Default,,0000,0000,0000,,ways to measure the accuracy of a Dialogue: 0,0:54:43.84,0:54:45.48,Default,,0000,0000,0000,,classification model and these are some Dialogue: 0,0:54:45.48,0:54:48.28,Default,,0000,0000,0000,,of the more common ones so for example Dialogue: 0,0:54:48.28,0:54:51.00,Default,,0000,0000,0000,,the confusion metrix tells us how many Dialogue: 0,0:54:51.00,0:54:54.00,Default,,0000,0000,0000,,true positives that means the value is Dialogue: 0,0:54:54.00,0:54:55.88,Default,,0000,0000,0000,,positive the prediction is positive how Dialogue: 0,0:54:55.88,0:54:57.52,Default,,0000,0000,0000,,many false FAL positives which means the Dialogue: 0,0:54:57.52,0:54:59.04,Default,,0000,0000,0000,,value is negative the machine learning Dialogue: 0,0:54:59.04,0:55:01.84,Default,,0000,0000,0000,,model predicts positive how many false Dialogue: 0,0:55:01.84,0:55:03.84,Default,,0000,0000,0000,,negatives which means that the machine Dialogue: 0,0:55:03.84,0:55:05.56,Default,,0000,0000,0000,,learning model predicts negative but Dialogue: 0,0:55:05.56,0:55:07.48,Default,,0000,0000,0000,,it's actually positive and how many true Dialogue: 0,0:55:07.48,0:55:09.36,Default,,0000,0000,0000,,negatives there are which means that the Dialogue: 0,0:55:09.36,0:55:11.24,Default,,0000,0000,0000,,machine the machine learning model Dialogue: 0,0:55:11.24,0:55:12.88,Default,,0000,0000,0000,,predicts negative and the true value is Dialogue: 0,0:55:12.88,0:55:14.76,Default,,0000,0000,0000,,also negative so this is called a Dialogue: 0,0:55:14.76,0:55:16.92,Default,,0000,0000,0000,,confusion Matrix this is one way we Dialogue: 0,0:55:16.92,0:55:19.48,Default,,0000,0000,0000,,assess or evaluate the performance of a Dialogue: 0,0:55:19.48,0:55:20.52,Default,,0000,0000,0000,,classification Dialogue: 0,0:55:20.52,0:55:23.32,Default,,0000,0000,0000,,model okay this is for binary Dialogue: 0,0:55:23.32,0:55:24.68,Default,,0000,0000,0000,,classification we can also have Dialogue: 0,0:55:24.68,0:55:26.88,Default,,0000,0000,0000,,multiclass confusion Matrix Dialogue: 0,0:55:26.88,0:55:29.00,Default,,0000,0000,0000,,and then we can also measure things like Dialogue: 0,0:55:29.00,0:55:31.72,Default,,0000,0000,0000,,accuracy so accuracy is the true Dialogue: 0,0:55:31.72,0:55:34.08,Default,,0000,0000,0000,,positives plus the true negatives which Dialogue: 0,0:55:34.08,0:55:35.44,Default,,0000,0000,0000,,is the total number of correct Dialogue: 0,0:55:35.44,0:55:37.84,Default,,0000,0000,0000,,predictions made by the model divided by Dialogue: 0,0:55:37.84,0:55:39.84,Default,,0000,0000,0000,,the total number of data points in your Dialogue: 0,0:55:39.84,0:55:42.60,Default,,0000,0000,0000,,data set and then you have also other Dialogue: 0,0:55:42.60,0:55:43.72,Default,,0000,0000,0000,,kinds of Dialogue: 0,0:55:43.72,0:55:46.60,Default,,0000,0000,0000,,measures uh such as recall and this is a Dialogue: 0,0:55:46.60,0:55:49.16,Default,,0000,0000,0000,,formula for recall this is a formula for Dialogue: 0,0:55:49.16,0:55:51.48,Default,,0000,0000,0000,,the F1 score okay and then there's Dialogue: 0,0:55:51.48,0:55:55.56,Default,,0000,0000,0000,,something called the uh R curve right so Dialogue: 0,0:55:55.56,0:55:57.04,Default,,0000,0000,0000,,without going too much in the detail of Dialogue: 0,0:55:57.04,0:55:59.00,Default,,0000,0000,0000,,what each of these entails essentially Dialogue: 0,0:55:59.00,0:56:00.64,Default,,0000,0000,0000,,these are all different ways these are Dialogue: 0,0:56:00.64,0:56:03.28,Default,,0000,0000,0000,,different kpi right just like if you Dialogue: 0,0:56:03.28,0:56:06.12,Default,,0000,0000,0000,,work in a company you have different kpi Dialogue: 0,0:56:06.12,0:56:08.08,Default,,0000,0000,0000,,right certain employees have certain kpi Dialogue: 0,0:56:08.08,0:56:11.28,Default,,0000,0000,0000,,that measures how good or how how uh you Dialogue: 0,0:56:11.28,0:56:13.20,Default,,0000,0000,0000,,know efficient or how effective a Dialogue: 0,0:56:13.20,0:56:16.24,Default,,0000,0000,0000,,particular employee is right so the Dialogue: 0,0:56:16.24,0:56:19.88,Default,,0000,0000,0000,,kpi kpi for your machine learning models Dialogue: 0,0:56:19.88,0:56:24.24,Default,,0000,0000,0000,,are Roc curve F1 score recall accuracy Dialogue: 0,0:56:24.24,0:56:26.60,Default,,0000,0000,0000,,okay and your confusion Matrix so so Dialogue: 0,0:56:26.60,0:56:29.84,Default,,0000,0000,0000,,fundamentally after I have built right Dialogue: 0,0:56:29.84,0:56:33.36,Default,,0000,0000,0000,,so here I've built my four different Dialogue: 0,0:56:33.36,0:56:35.24,Default,,0000,0000,0000,,models so after I built these form Dialogue: 0,0:56:35.24,0:56:37.64,Default,,0000,0000,0000,,different models I'm going to check and Dialogue: 0,0:56:37.64,0:56:39.68,Default,,0000,0000,0000,,evaluate them using all those different Dialogue: 0,0:56:39.68,0:56:42.44,Default,,0000,0000,0000,,metrics like for example the F1 score Dialogue: 0,0:56:42.44,0:56:44.84,Default,,0000,0000,0000,,the Precision score the recall score all Dialogue: 0,0:56:44.84,0:56:47.32,Default,,0000,0000,0000,,right so for this model I can check out Dialogue: 0,0:56:47.32,0:56:50.04,Default,,0000,0000,0000,,the ROC score the F1 score the Precision Dialogue: 0,0:56:50.04,0:56:52.12,Default,,0000,0000,0000,,score the recall score then for this Dialogue: 0,0:56:52.12,0:56:54.80,Default,,0000,0000,0000,,model this is the ROC score the F1 score Dialogue: 0,0:56:54.80,0:56:56.84,Default,,0000,0000,0000,,the Precision score the recall called Dialogue: 0,0:56:56.84,0:56:59.68,Default,,0000,0000,0000,,then for this model and so on so for Dialogue: 0,0:56:59.68,0:57:03.24,Default,,0000,0000,0000,,every single model I've created using my Dialogue: 0,0:57:03.24,0:57:05.84,Default,,0000,0000,0000,,training data set I will have all my set Dialogue: 0,0:57:05.84,0:57:08.00,Default,,0000,0000,0000,,of evaluation metrics that I can use to Dialogue: 0,0:57:08.00,0:57:11.84,Default,,0000,0000,0000,,evaluate how good this model is okay Dialogue: 0,0:57:11.84,0:57:13.12,Default,,0000,0000,0000,,same thing here I've got a confusion Dialogue: 0,0:57:13.12,0:57:15.08,Default,,0000,0000,0000,,Matrix here right so I can use that Dialogue: 0,0:57:15.08,0:57:18.12,Default,,0000,0000,0000,,again to evaluate between all these four Dialogue: 0,0:57:18.12,0:57:20.20,Default,,0000,0000,0000,,different models and then I kind of Dialogue: 0,0:57:20.20,0:57:22.24,Default,,0000,0000,0000,,summarize it up here so we can see from Dialogue: 0,0:57:22.24,0:57:25.44,Default,,0000,0000,0000,,this summary here that actually the top Dialogue: 0,0:57:25.44,0:57:27.60,Default,,0000,0000,0000,,two models right which are I'm going to Dialogue: 0,0:57:27.60,0:57:29.44,Default,,0000,0000,0000,,give a lot as a data scientist I'm now Dialogue: 0,0:57:29.44,0:57:31.12,Default,,0000,0000,0000,,going to just focus on these two models Dialogue: 0,0:57:31.12,0:57:33.44,Default,,0000,0000,0000,,so these two models are begging Dialogue: 0,0:57:33.44,0:57:36.00,Default,,0000,0000,0000,,classifier and random Forest classifier Dialogue: 0,0:57:36.00,0:57:38.48,Default,,0000,0000,0000,,they have the highest values of F1 score Dialogue: 0,0:57:38.48,0:57:40.48,Default,,0000,0000,0000,,and the highest values of the rooc curve Dialogue: 0,0:57:40.48,0:57:42.64,Default,,0000,0000,0000,,score okay so we can say these are the Dialogue: 0,0:57:42.64,0:57:45.84,Default,,0000,0000,0000,,top two models in terms of accuracy okay Dialogue: 0,0:57:45.84,0:57:48.92,Default,,0000,0000,0000,,using the fub1 evaluation metric and the Dialogue: 0,0:57:48.92,0:57:53.72,Default,,0000,0000,0000,,r Au evaluation metric okay so these Dialogue: 0,0:57:53.72,0:57:57.48,Default,,0000,0000,0000,,results uh kind of summarize here and Dialogue: 0,0:57:57.48,0:57:59.08,Default,,0000,0000,0000,,then we use different sampling Dialogue: 0,0:57:59.08,0:58:00.88,Default,,0000,0000,0000,,techniques okay so just now I talked Dialogue: 0,0:58:00.88,0:58:03.68,Default,,0000,0000,0000,,about um different kinds of sampling Dialogue: 0,0:58:03.68,0:58:06.40,Default,,0000,0000,0000,,techniques and so the idea of different Dialogue: 0,0:58:06.40,0:58:08.32,Default,,0000,0000,0000,,kinds of sampling techniques is to just Dialogue: 0,0:58:08.32,0:58:11.32,Default,,0000,0000,0000,,get a different feel for different Dialogue: 0,0:58:11.32,0:58:13.72,Default,,0000,0000,0000,,distributions of the data in different Dialogue: 0,0:58:13.72,0:58:16.36,Default,,0000,0000,0000,,areas of your data set so that you want Dialogue: 0,0:58:16.36,0:58:20.00,Default,,0000,0000,0000,,to just kind of make sure that your your Dialogue: 0,0:58:20.00,0:58:22.80,Default,,0000,0000,0000,,your evaluation of accuracy is actually Dialogue: 0,0:58:22.80,0:58:27.08,Default,,0000,0000,0000,,statistically correct right so we can um Dialogue: 0,0:58:27.08,0:58:29.60,Default,,0000,0000,0000,,do what is called oversampling and under Dialogue: 0,0:58:29.60,0:58:30.88,Default,,0000,0000,0000,,sampling which is very useful when Dialogue: 0,0:58:30.88,0:58:32.28,Default,,0000,0000,0000,,you're working with an imbalance data Dialogue: 0,0:58:32.28,0:58:35.04,Default,,0000,0000,0000,,set so this is example of doing that and Dialogue: 0,0:58:35.04,0:58:37.24,Default,,0000,0000,0000,,then here we again again check out the Dialogue: 0,0:58:37.24,0:58:38.80,Default,,0000,0000,0000,,results for all these different Dialogue: 0,0:58:38.80,0:58:41.68,Default,,0000,0000,0000,,techniques we use uh the F1 score the Au Dialogue: 0,0:58:41.68,0:58:43.60,Default,,0000,0000,0000,,score all right these are the two key Dialogue: 0,0:58:43.60,0:58:46.76,Default,,0000,0000,0000,,measures of accuracy right so and then Dialogue: 0,0:58:46.76,0:58:47.92,Default,,0000,0000,0000,,we can check out the scores for the Dialogue: 0,0:58:47.92,0:58:50.48,Default,,0000,0000,0000,,different approaches okay so we can see Dialogue: 0,0:58:50.48,0:58:53.12,Default,,0000,0000,0000,,oh well overall the models have lower Au Dialogue: 0,0:58:53.12,0:58:55.72,Default,,0000,0000,0000,,r r Au C score but they have a much Dialogue: 0,0:58:55.72,0:58:58.28,Default,,0000,0000,0000,,higher F1 score the begging classifier Dialogue: 0,0:58:58.28,0:59:00.84,Default,,0000,0000,0000,,had the highest R1 highest roc1 score Dialogue: 0,0:59:00.84,0:59:04.12,Default,,0000,0000,0000,,but F1 score was too low okay then in Dialogue: 0,0:59:04.12,0:59:06.52,Default,,0000,0000,0000,,the data scientist opinion the random Dialogue: 0,0:59:06.52,0:59:08.52,Default,,0000,0000,0000,,forest with this particular technique of Dialogue: 0,0:59:08.52,0:59:10.76,Default,,0000,0000,0000,,sampling has equilibrium between the F1 Dialogue: 0,0:59:10.76,0:59:14.48,Default,,0000,0000,0000,,R F1 R and A score so the takeaway one Dialogue: 0,0:59:14.48,0:59:16.68,Default,,0000,0000,0000,,is the macro F1 score improves Dialogue: 0,0:59:16.68,0:59:18.48,Default,,0000,0000,0000,,dramatically using the sampl sampling Dialogue: 0,0:59:18.48,0:59:20.16,Default,,0000,0000,0000,,techniqu so these models might be better Dialogue: 0,0:59:20.16,0:59:22.44,Default,,0000,0000,0000,,compared to the balanced ones all right Dialogue: 0,0:59:22.44,0:59:26.28,Default,,0000,0000,0000,,so based on all this uh evaluation the Dialogue: 0,0:59:26.28,0:59:27.68,Default,,0000,0000,0000,,data scientist says they're going to Dialogue: 0,0:59:27.68,0:59:29.92,Default,,0000,0000,0000,,continue to work with these two models Dialogue: 0,0:59:29.92,0:59:31.44,Default,,0000,0000,0000,,all right and the balance begging one Dialogue: 0,0:59:31.44,0:59:33.08,Default,,0000,0000,0000,,and then continue to make further Dialogue: 0,0:59:33.08,0:59:35.04,Default,,0000,0000,0000,,comparisons all right so then we Dialogue: 0,0:59:35.04,0:59:37.08,Default,,0000,0000,0000,,continue to keep refining on our Dialogue: 0,0:59:37.08,0:59:38.60,Default,,0000,0000,0000,,evaluation work here we're going to Dialogue: 0,0:59:38.60,0:59:41.00,Default,,0000,0000,0000,,train the models one more time again so Dialogue: 0,0:59:41.00,0:59:43.04,Default,,0000,0000,0000,,we again do a training test plate and Dialogue: 0,0:59:43.04,0:59:44.80,Default,,0000,0000,0000,,then we do that for this particular uh Dialogue: 0,0:59:44.80,0:59:47.04,Default,,0000,0000,0000,,approach model and then we print out we Dialogue: 0,0:59:47.04,0:59:48.20,Default,,0000,0000,0000,,print out what is called a Dialogue: 0,0:59:48.20,0:59:50.96,Default,,0000,0000,0000,,classification report and this is Dialogue: 0,0:59:50.96,0:59:53.40,Default,,0000,0000,0000,,basically a summary of all those metrics Dialogue: 0,0:59:53.40,0:59:55.36,Default,,0000,0000,0000,,that I talk about just now so just now Dialogue: 0,0:59:55.36,0:59:57.52,Default,,0000,0000,0000,,remember I said the the there was Dialogue: 0,0:59:57.52,0:59:59.68,Default,,0000,0000,0000,,several evaluation metrics right so uh Dialogue: 0,0:59:59.68,1:00:01.48,Default,,0000,0000,0000,,we had the confusion matrics the Dialogue: 0,1:00:01.48,1:00:04.12,Default,,0000,0000,0000,,accuracy the Precision the recall the Au Dialogue: 0,1:00:04.12,1:00:08.12,Default,,0000,0000,0000,,ccore so here with the um classification Dialogue: 0,1:00:08.12,1:00:09.88,Default,,0000,0000,0000,,report I can get a summary of all of Dialogue: 0,1:00:09.88,1:00:11.76,Default,,0000,0000,0000,,that so I can see all the values here Dialogue: 0,1:00:11.76,1:00:14.64,Default,,0000,0000,0000,,okay for this particular model begging Dialogue: 0,1:00:14.64,1:00:17.16,Default,,0000,0000,0000,,Tomac links and then I can do that for Dialogue: 0,1:00:17.16,1:00:18.64,Default,,0000,0000,0000,,another model the random Forest Dialogue: 0,1:00:18.64,1:00:20.60,Default,,0000,0000,0000,,borderline SME and then I can do that Dialogue: 0,1:00:20.60,1:00:22.20,Default,,0000,0000,0000,,for another model which is the balance Dialogue: 0,1:00:22.20,1:00:25.16,Default,,0000,0000,0000,,ping so again we see this a lot of Dialogue: 0,1:00:25.16,1:00:27.08,Default,,0000,0000,0000,,comparison between different models Dialogue: 0,1:00:27.08,1:00:28.64,Default,,0000,0000,0000,,trying to figure out what all these Dialogue: 0,1:00:28.64,1:00:30.72,Default,,0000,0000,0000,,evaluation metrics are telling us all Dialogue: 0,1:00:30.72,1:00:32.96,Default,,0000,0000,0000,,right then again we have a confusion Dialogue: 0,1:00:32.96,1:00:35.88,Default,,0000,0000,0000,,Matrix so we generate a confusion Matrix Dialogue: 0,1:00:35.88,1:00:38.88,Default,,0000,0000,0000,,for the bagging with the toac links Dialogue: 0,1:00:38.88,1:00:40.72,Default,,0000,0000,0000,,under sampling for the random followers Dialogue: 0,1:00:40.72,1:00:42.68,Default,,0000,0000,0000,,with the borderline mod over sampling Dialogue: 0,1:00:42.68,1:00:44.96,Default,,0000,0000,0000,,and just balance begging by itself then Dialogue: 0,1:00:44.96,1:00:47.72,Default,,0000,0000,0000,,again we compare between these three uh Dialogue: 0,1:00:47.72,1:00:50.80,Default,,0000,0000,0000,,models uh using the confusion Matrix Dialogue: 0,1:00:50.80,1:00:52.60,Default,,0000,0000,0000,,evaluation Matrix and then we can kind Dialogue: 0,1:00:52.60,1:00:55.68,Default,,0000,0000,0000,,of come to some conclusions all right so Dialogue: 0,1:00:55.68,1:00:58.16,Default,,0000,0000,0000,,right so now we look at all the data Dialogue: 0,1:00:58.16,1:01:01.20,Default,,0000,0000,0000,,then we move on and look at another um Dialogue: 0,1:01:01.20,1:01:03.16,Default,,0000,0000,0000,,another kind of evaluation metrix which Dialogue: 0,1:01:03.16,1:01:06.72,Default,,0000,0000,0000,,is the r score right so this is one of Dialogue: 0,1:01:06.72,1:01:08.68,Default,,0000,0000,0000,,the other evaluation metrics I talk Dialogue: 0,1:01:08.68,1:01:11.20,Default,,0000,0000,0000,,about so this one is a kind of a curve Dialogue: 0,1:01:11.20,1:01:12.52,Default,,0000,0000,0000,,you look at it to see the area Dialogue: 0,1:01:12.52,1:01:14.36,Default,,0000,0000,0000,,underneath the curve this is called AOC Dialogue: 0,1:01:14.36,1:01:18.08,Default,,0000,0000,0000,,R area under the curve sorry Au Au R Dialogue: 0,1:01:18.08,1:01:19.88,Default,,0000,0000,0000,,area under the curve all right so the Dialogue: 0,1:01:19.88,1:01:21.84,Default,,0000,0000,0000,,area under the curve uh Dialogue: 0,1:01:21.84,1:01:24.32,Default,,0000,0000,0000,,score will give us some idea about the Dialogue: 0,1:01:24.32,1:01:25.60,Default,,0000,0000,0000,,threshold that we're going to use for Dialogue: 0,1:01:25.60,1:01:27.68,Default,,0000,0000,0000,,classif ification so we can examine this Dialogue: 0,1:01:27.68,1:01:29.20,Default,,0000,0000,0000,,for the bagging classifier for the Dialogue: 0,1:01:29.20,1:01:30.96,Default,,0000,0000,0000,,random forest classifier for the balance Dialogue: 0,1:01:30.96,1:01:33.60,Default,,0000,0000,0000,,bagging classifier okay then we can also Dialogue: 0,1:01:33.60,1:01:36.20,Default,,0000,0000,0000,,again do that uh finally we can check Dialogue: 0,1:01:36.20,1:01:37.88,Default,,0000,0000,0000,,the classification report of this Dialogue: 0,1:01:37.88,1:01:39.68,Default,,0000,0000,0000,,particular model so we keep doing this Dialogue: 0,1:01:39.68,1:01:43.20,Default,,0000,0000,0000,,over and over again evaluating this m Dialogue: 0,1:01:43.20,1:01:45.72,Default,,0000,0000,0000,,The Matrix the the accuracy Matrix the Dialogue: 0,1:01:45.72,1:01:46.88,Default,,0000,0000,0000,,evaluation Matrix for all these Dialogue: 0,1:01:46.88,1:01:48.88,Default,,0000,0000,0000,,different models so we keep doing this Dialogue: 0,1:01:48.88,1:01:50.52,Default,,0000,0000,0000,,over and over again for different Dialogue: 0,1:01:50.52,1:01:53.44,Default,,0000,0000,0000,,thresholds or for classification and so Dialogue: 0,1:01:53.44,1:01:56.88,Default,,0000,0000,0000,,as we keep drilling into these we kind Dialogue: 0,1:01:56.88,1:02:00.84,Default,,0000,0000,0000,,of get more and more understanding of Dialogue: 0,1:02:00.84,1:02:02.80,Default,,0000,0000,0000,,all these different models which one is Dialogue: 0,1:02:02.80,1:02:04.76,Default,,0000,0000,0000,,the best one that gives the best Dialogue: 0,1:02:04.76,1:02:08.52,Default,,0000,0000,0000,,performance for our data set okay so Dialogue: 0,1:02:08.52,1:02:11.44,Default,,0000,0000,0000,,finally we come to this conclusion this Dialogue: 0,1:02:11.44,1:02:13.52,Default,,0000,0000,0000,,particular model is not able to reduce Dialogue: 0,1:02:13.52,1:02:15.28,Default,,0000,0000,0000,,the record on failure test than Dialogue: 0,1:02:15.28,1:02:17.52,Default,,0000,0000,0000,,95.8% on the other hand balance begging Dialogue: 0,1:02:17.52,1:02:19.40,Default,,0000,0000,0000,,with a decision thresold of 0.6 is able Dialogue: 0,1:02:19.40,1:02:21.52,Default,,0000,0000,0000,,to have a better recall blah blah blah Dialogue: 0,1:02:21.52,1:02:25.32,Default,,0000,0000,0000,,Etc so finally after having done all of Dialogue: 0,1:02:25.32,1:02:27.48,Default,,0000,0000,0000,,this evalu ations Dialogue: 0,1:02:27.48,1:02:31.12,Default,,0000,0000,0000,,okay this is the conclusion Dialogue: 0,1:02:31.12,1:02:33.96,Default,,0000,0000,0000,,so after having gone so right now we Dialogue: 0,1:02:33.96,1:02:35.28,Default,,0000,0000,0000,,have gone through all the steps of the Dialogue: 0,1:02:35.28,1:02:37.76,Default,,0000,0000,0000,,Machining learning life cycle and which Dialogue: 0,1:02:37.76,1:02:40.24,Default,,0000,0000,0000,,means we have right now or the data Dialogue: 0,1:02:40.24,1:02:41.96,Default,,0000,0000,0000,,scientist right now has gone through all Dialogue: 0,1:02:41.96,1:02:43.00,Default,,0000,0000,0000,,these Dialogue: 0,1:02:43.00,1:02:47.08,Default,,0000,0000,0000,,steps uh which is now we have done this Dialogue: 0,1:02:47.08,1:02:48.64,Default,,0000,0000,0000,,validation so we have done the cleaning Dialogue: 0,1:02:48.64,1:02:50.56,Default,,0000,0000,0000,,exploration preparation transformation Dialogue: 0,1:02:50.56,1:02:52.60,Default,,0000,0000,0000,,the future engineering we have developed Dialogue: 0,1:02:52.60,1:02:54.36,Default,,0000,0000,0000,,and trained multiple models we have Dialogue: 0,1:02:54.36,1:02:56.48,Default,,0000,0000,0000,,evaluated all these different models so Dialogue: 0,1:02:56.48,1:02:58.60,Default,,0000,0000,0000,,right now we have reached this stage so Dialogue: 0,1:02:58.60,1:03:02.72,Default,,0000,0000,0000,,at this stage we as the data scientist Dialogue: 0,1:03:02.72,1:03:05.48,Default,,0000,0000,0000,,kind of have completed our job so we've Dialogue: 0,1:03:05.48,1:03:08.12,Default,,0000,0000,0000,,come to some very useful conclusions Dialogue: 0,1:03:08.12,1:03:09.64,Default,,0000,0000,0000,,which we now can share with our Dialogue: 0,1:03:09.64,1:03:13.24,Default,,0000,0000,0000,,colleagues all right and based on this Dialogue: 0,1:03:13.24,1:03:15.40,Default,,0000,0000,0000,,uh conclusions or recommendations Dialogue: 0,1:03:15.40,1:03:17.16,Default,,0000,0000,0000,,somebody is going to choose a Dialogue: 0,1:03:17.16,1:03:19.16,Default,,0000,0000,0000,,appropriate model and that model is Dialogue: 0,1:03:19.16,1:03:22.64,Default,,0000,0000,0000,,going to get deployed for realtime use Dialogue: 0,1:03:22.64,1:03:25.32,Default,,0000,0000,0000,,in a real life production environment Dialogue: 0,1:03:25.32,1:03:27.24,Default,,0000,0000,0000,,okay and that decision is going to be Dialogue: 0,1:03:27.24,1:03:29.36,Default,,0000,0000,0000,,made based on the recommendations coming Dialogue: 0,1:03:29.36,1:03:30.88,Default,,0000,0000,0000,,from the data scientist at the end of Dialogue: 0,1:03:30.88,1:03:33.48,Default,,0000,0000,0000,,this phase okay so at the end of this Dialogue: 0,1:03:33.48,1:03:35.08,Default,,0000,0000,0000,,phase the data scientist is going to Dialogue: 0,1:03:35.08,1:03:36.88,Default,,0000,0000,0000,,come up with these conclusions so Dialogue: 0,1:03:36.88,1:03:41.76,Default,,0000,0000,0000,,conclusions is okay if the engineering Dialogue: 0,1:03:41.76,1:03:44.52,Default,,0000,0000,0000,,team they are looking okay the Dialogue: 0,1:03:44.52,1:03:46.12,Default,,0000,0000,0000,,engineering team right the engineering Dialogue: 0,1:03:46.12,1:03:48.72,Default,,0000,0000,0000,,team if they are looking for the highest Dialogue: 0,1:03:48.72,1:03:51.84,Default,,0000,0000,0000,,failure detection rate possible then Dialogue: 0,1:03:51.84,1:03:54.48,Default,,0000,0000,0000,,they should go with this particular Dialogue: 0,1:03:54.48,1:03:56.52,Default,,0000,0000,0000,,model okay Dialogue: 0,1:03:56.52,1:03:58.68,Default,,0000,0000,0000,,and if they want a balance between Dialogue: 0,1:03:58.68,1:04:01.04,Default,,0000,0000,0000,,precision and recall then they should Dialogue: 0,1:04:01.04,1:04:03.24,Default,,0000,0000,0000,,choose between the begging model with a Dialogue: 0,1:04:03.24,1:04:05.96,Default,,0000,0000,0000,,0.4 decision threshold or the random Dialogue: 0,1:04:05.96,1:04:09.60,Default,,0000,0000,0000,,forest model with a 0.5 threshold but if Dialogue: 0,1:04:09.60,1:04:11.88,Default,,0000,0000,0000,,they don't care so much about predicting Dialogue: 0,1:04:11.88,1:04:14.48,Default,,0000,0000,0000,,every failure and they want the highest Dialogue: 0,1:04:14.48,1:04:16.76,Default,,0000,0000,0000,,Precision possible then they should opt Dialogue: 0,1:04:16.76,1:04:19.80,Default,,0000,0000,0000,,for the begging toax link classifier Dialogue: 0,1:04:19.80,1:04:23.16,Default,,0000,0000,0000,,with a bit higher decision threshold and Dialogue: 0,1:04:23.16,1:04:26.16,Default,,0000,0000,0000,,so this is the key thing that the data Dialogue: 0,1:04:26.16,1:04:28.32,Default,,0000,0000,0000,,scientist is going to give right this is Dialogue: 0,1:04:28.32,1:04:30.76,Default,,0000,0000,0000,,the key takeaway this is the kind of the Dialogue: 0,1:04:30.76,1:04:32.68,Default,,0000,0000,0000,,end result of the entire machine Dialogue: 0,1:04:32.68,1:04:34.68,Default,,0000,0000,0000,,learning life cycle right now the data Dialogue: 0,1:04:34.68,1:04:36.40,Default,,0000,0000,0000,,scientist is going to tell the Dialogue: 0,1:04:36.40,1:04:38.60,Default,,0000,0000,0000,,engineering team all right you guys Dialogue: 0,1:04:38.60,1:04:41.16,Default,,0000,0000,0000,,which is more important for you point a Dialogue: 0,1:04:41.16,1:04:45.04,Default,,0000,0000,0000,,point B or Point C make your decision so Dialogue: 0,1:04:45.04,1:04:47.40,Default,,0000,0000,0000,,the engineering team will then discuss Dialogue: 0,1:04:47.40,1:04:48.96,Default,,0000,0000,0000,,among themselves and say hey you know Dialogue: 0,1:04:48.96,1:04:52.28,Default,,0000,0000,0000,,what what we want is we want to get the Dialogue: 0,1:04:52.28,1:04:54.72,Default,,0000,0000,0000,,highest failure detection possible Dialogue: 0,1:04:54.72,1:04:58.36,Default,,0000,0000,0000,,because any kind kind of failure of that Dialogue: 0,1:04:58.36,1:05:00.40,Default,,0000,0000,0000,,machine or the product on the samply Dialogue: 0,1:05:00.40,1:05:03.12,Default,,0000,0000,0000,,line is really going to screw us up big Dialogue: 0,1:05:03.12,1:05:05.64,Default,,0000,0000,0000,,time so what we're looking for is the Dialogue: 0,1:05:05.64,1:05:08.08,Default,,0000,0000,0000,,model that will give us the highest Dialogue: 0,1:05:08.08,1:05:10.88,Default,,0000,0000,0000,,failure detection rate we don't care Dialogue: 0,1:05:10.88,1:05:13.48,Default,,0000,0000,0000,,about Precision but we want to be make Dialogue: 0,1:05:13.48,1:05:15.44,Default,,0000,0000,0000,,sure that if there's a failure we are Dialogue: 0,1:05:15.44,1:05:17.72,Default,,0000,0000,0000,,going to catch it right so that's what Dialogue: 0,1:05:17.72,1:05:19.60,Default,,0000,0000,0000,,they want and so the data scientist will Dialogue: 0,1:05:19.60,1:05:22.20,Default,,0000,0000,0000,,say Hey you go for the balance begging Dialogue: 0,1:05:22.20,1:05:24.88,Default,,0000,0000,0000,,model okay then the data scientist saves Dialogue: 0,1:05:24.88,1:05:27.72,Default,,0000,0000,0000,,this all right uh and then once you have Dialogue: 0,1:05:27.72,1:05:30.00,Default,,0000,0000,0000,,saved this uh you can then go right Dialogue: 0,1:05:30.00,1:05:32.32,Default,,0000,0000,0000,,ahead and deploy that so you can go Dialogue: 0,1:05:32.32,1:05:33.52,Default,,0000,0000,0000,,right ahead and deploy that to Dialogue: 0,1:05:33.52,1:05:37.16,Default,,0000,0000,0000,,production okay and so if you want to Dialogue: 0,1:05:37.16,1:05:38.84,Default,,0000,0000,0000,,continue we can actually further Dialogue: 0,1:05:38.84,1:05:41.12,Default,,0000,0000,0000,,continue this modeling problem so just Dialogue: 0,1:05:41.12,1:05:43.48,Default,,0000,0000,0000,,now I model this problem as a binary Dialogue: 0,1:05:43.48,1:05:46.72,Default,,0000,0000,0000,,classification problem uh sorry just I Dialogue: 0,1:05:46.72,1:05:48.24,Default,,0000,0000,0000,,modeled this problem as a binary Dialogue: 0,1:05:48.24,1:05:49.52,Default,,0000,0000,0000,,classification which means it's either Dialogue: 0,1:05:49.52,1:05:51.68,Default,,0000,0000,0000,,zero or one either fail or not fail but Dialogue: 0,1:05:51.68,1:05:53.60,Default,,0000,0000,0000,,we can also model it as a multiclass Dialogue: 0,1:05:53.60,1:05:55.64,Default,,0000,0000,0000,,classification problem right because as Dialogue: 0,1:05:55.64,1:05:57.64,Default,,0000,0000,0000,,as I said earlier just now for the Dialogue: 0,1:05:57.64,1:06:00.20,Default,,0000,0000,0000,,Target variable colum which is sorry for Dialogue: 0,1:06:00.20,1:06:02.52,Default,,0000,0000,0000,,the failure type colume you actually Dialogue: 0,1:06:02.52,1:06:04.84,Default,,0000,0000,0000,,have multiple kinds of failures right Dialogue: 0,1:06:04.84,1:06:07.56,Default,,0000,0000,0000,,for example you may have a power failure Dialogue: 0,1:06:07.56,1:06:10.00,Default,,0000,0000,0000,,uh you may have a towar failure uh you Dialogue: 0,1:06:10.00,1:06:12.92,Default,,0000,0000,0000,,may have a overstrain failure so now we Dialogue: 0,1:06:12.92,1:06:14.84,Default,,0000,0000,0000,,can model the problem slightly Dialogue: 0,1:06:14.84,1:06:17.24,Default,,0000,0000,0000,,differently so we can model it as a Dialogue: 0,1:06:17.24,1:06:19.68,Default,,0000,0000,0000,,multiclass classification problem and Dialogue: 0,1:06:19.68,1:06:21.16,Default,,0000,0000,0000,,then we go through the entire same Dialogue: 0,1:06:21.16,1:06:22.68,Default,,0000,0000,0000,,process that we went through just now so Dialogue: 0,1:06:22.68,1:06:24.88,Default,,0000,0000,0000,,we create different models we test this Dialogue: 0,1:06:24.88,1:06:26.72,Default,,0000,0000,0000,,out but now the confusion Matrix is for Dialogue: 0,1:06:26.72,1:06:30.12,Default,,0000,0000,0000,,a multiclass classification isue right Dialogue: 0,1:06:30.12,1:06:30.96,Default,,0000,0000,0000,,so we're going Dialogue: 0,1:06:30.96,1:06:34.04,Default,,0000,0000,0000,,to check them out we're going to again Dialogue: 0,1:06:34.04,1:06:36.08,Default,,0000,0000,0000,,uh try different algorithms or models Dialogue: 0,1:06:36.08,1:06:38.04,Default,,0000,0000,0000,,again train and test our data set do the Dialogue: 0,1:06:38.04,1:06:39.76,Default,,0000,0000,0000,,training test split uh on these Dialogue: 0,1:06:39.76,1:06:42.00,Default,,0000,0000,0000,,different models all right so we have Dialogue: 0,1:06:42.00,1:06:43.40,Default,,0000,0000,0000,,like for example we have bon random Dialogue: 0,1:06:43.40,1:06:46.16,Default,,0000,0000,0000,,Forest B random Forest a great search Dialogue: 0,1:06:46.16,1:06:47.72,Default,,0000,0000,0000,,then you train the models using what is Dialogue: 0,1:06:47.72,1:06:49.68,Default,,0000,0000,0000,,called hyperparameter tuning then you Dialogue: 0,1:06:49.68,1:06:51.08,Default,,0000,0000,0000,,get the scores all right so you get the Dialogue: 0,1:06:51.08,1:06:53.16,Default,,0000,0000,0000,,same evaluation scores again you check Dialogue: 0,1:06:53.16,1:06:54.60,Default,,0000,0000,0000,,out the evaluation scores compare Dialogue: 0,1:06:54.60,1:06:57.08,Default,,0000,0000,0000,,between them generate a confusion Matrix Dialogue: 0,1:06:57.08,1:06:59.96,Default,,0000,0000,0000,,so this is a multiclass confusion Matrix Dialogue: 0,1:06:59.96,1:07:02.40,Default,,0000,0000,0000,,and then you come to the final Dialogue: 0,1:07:02.40,1:07:05.76,Default,,0000,0000,0000,,conclusion so now if you are interested Dialogue: 0,1:07:05.76,1:07:09.00,Default,,0000,0000,0000,,to frame your problem domain as a Dialogue: 0,1:07:09.00,1:07:11.36,Default,,0000,0000,0000,,multiclass classification problem all Dialogue: 0,1:07:11.36,1:07:13.84,Default,,0000,0000,0000,,right then these are the recommendations Dialogue: 0,1:07:13.84,1:07:15.48,Default,,0000,0000,0000,,from the data scientist so the data Dialogue: 0,1:07:15.48,1:07:17.24,Default,,0000,0000,0000,,scientist will say you know what I'm Dialogue: 0,1:07:17.24,1:07:19.56,Default,,0000,0000,0000,,going to pick this particular model the Dialogue: 0,1:07:19.56,1:07:22.04,Default,,0000,0000,0000,,balance backing classifier and these are Dialogue: 0,1:07:22.04,1:07:24.52,Default,,0000,0000,0000,,all the reasons that the data scientist Dialogue: 0,1:07:24.52,1:07:27.28,Default,,0000,0000,0000,,is going to give as a rational for Dialogue: 0,1:07:27.28,1:07:29.40,Default,,0000,0000,0000,,selecting this particular Dialogue: 0,1:07:29.40,1:07:32.04,Default,,0000,0000,0000,,model and then once that's done you save Dialogue: 0,1:07:32.04,1:07:35.00,Default,,0000,0000,0000,,the model and that's that's it that's it Dialogue: 0,1:07:35.00,1:07:38.92,Default,,0000,0000,0000,,so that's all done now and so then the Dialogue: 0,1:07:38.92,1:07:41.04,Default,,0000,0000,0000,,uh the model the machine learning model Dialogue: 0,1:07:41.04,1:07:43.72,Default,,0000,0000,0000,,now you can put it live run it on the Dialogue: 0,1:07:43.72,1:07:45.28,Default,,0000,0000,0000,,server and now the machine learning Dialogue: 0,1:07:45.28,1:07:47.20,Default,,0000,0000,0000,,model is ready to work which means it's Dialogue: 0,1:07:47.20,1:07:48.92,Default,,0000,0000,0000,,ready to generate predictions right Dialogue: 0,1:07:48.92,1:07:50.28,Default,,0000,0000,0000,,that's the main job of the machine Dialogue: 0,1:07:50.28,1:07:52.04,Default,,0000,0000,0000,,learning model you have picked the best Dialogue: 0,1:07:52.04,1:07:53.68,Default,,0000,0000,0000,,machine learning model with the best Dialogue: 0,1:07:53.68,1:07:55.80,Default,,0000,0000,0000,,evaluation metrics for whatever accur Dialogue: 0,1:07:55.80,1:07:57.76,Default,,0000,0000,0000,,see goal you're trying to achieve and Dialogue: 0,1:07:57.76,1:07:59.64,Default,,0000,0000,0000,,now you're going to run it on a server Dialogue: 0,1:07:59.64,1:08:00.80,Default,,0000,0000,0000,,and now you're going to get all this Dialogue: 0,1:08:00.80,1:08:02.96,Default,,0000,0000,0000,,real time data that's coming from your Dialogue: 0,1:08:02.96,1:08:04.52,Default,,0000,0000,0000,,sensus you're going to pump that into Dialogue: 0,1:08:04.52,1:08:06.36,Default,,0000,0000,0000,,your machine learning model your machine Dialogue: 0,1:08:06.36,1:08:07.88,Default,,0000,0000,0000,,learning model will pump out a whole Dialogue: 0,1:08:07.88,1:08:09.52,Default,,0000,0000,0000,,bunch of predictions and we're going to Dialogue: 0,1:08:09.52,1:08:12.80,Default,,0000,0000,0000,,use that predictions in real time to Dialogue: 0,1:08:12.80,1:08:15.40,Default,,0000,0000,0000,,make real time real world decision Dialogue: 0,1:08:15.40,1:08:17.56,Default,,0000,0000,0000,,making right you're going to say okay Dialogue: 0,1:08:17.56,1:08:19.60,Default,,0000,0000,0000,,I'm predicting that that machine is Dialogue: 0,1:08:19.60,1:08:23.20,Default,,0000,0000,0000,,going to fail on Thursday at 5:00 p.m. Dialogue: 0,1:08:23.20,1:08:25.52,Default,,0000,0000,0000,,so you better get your service folks in Dialogue: 0,1:08:25.52,1:08:28.64,Default,,0000,0000,0000,,to service it on Thursday 2: p.m. or you Dialogue: 0,1:08:28.64,1:08:31.64,Default,,0000,0000,0000,,know whatever so you can you know uh Dialogue: 0,1:08:31.64,1:08:33.48,Default,,0000,0000,0000,,make decisions on when you want to do Dialogue: 0,1:08:33.48,1:08:35.32,Default,,0000,0000,0000,,your maintenance you know and and make Dialogue: 0,1:08:35.32,1:08:37.64,Default,,0000,0000,0000,,the best decisions to optimize the cost Dialogue: 0,1:08:37.64,1:08:41.16,Default,,0000,0000,0000,,of Maintenance etc etc and then based on Dialogue: 0,1:08:41.16,1:08:42.12,Default,,0000,0000,0000,,the Dialogue: 0,1:08:42.12,1:08:45.00,Default,,0000,0000,0000,,results that are coming up from the Dialogue: 0,1:08:45.00,1:08:46.76,Default,,0000,0000,0000,,predictions so the predictions may be Dialogue: 0,1:08:46.76,1:08:49.12,Default,,0000,0000,0000,,good the predictions may be lousy the Dialogue: 0,1:08:49.12,1:08:51.36,Default,,0000,0000,0000,,predictions may be average right so we Dialogue: 0,1:08:51.36,1:08:53.72,Default,,0000,0000,0000,,are we're constantly monitoring how good Dialogue: 0,1:08:53.72,1:08:55.44,Default,,0000,0000,0000,,or how useful are the predictions Dialogue: 0,1:08:55.44,1:08:57.76,Default,,0000,0000,0000,,generated by this realtime model that's Dialogue: 0,1:08:57.76,1:08:59.88,Default,,0000,0000,0000,,running on the server and based on our Dialogue: 0,1:08:59.88,1:09:02.68,Default,,0000,0000,0000,,monitoring we will then take some new Dialogue: 0,1:09:02.68,1:09:05.32,Default,,0000,0000,0000,,data and then repeat this entire life Dialogue: 0,1:09:05.32,1:09:07.04,Default,,0000,0000,0000,,cycle again so this is basically a Dialogue: 0,1:09:07.04,1:09:09.24,Default,,0000,0000,0000,,workflow that's iterative and we are Dialogue: 0,1:09:09.24,1:09:11.12,Default,,0000,0000,0000,,constantly or the data scientist is Dialogue: 0,1:09:11.12,1:09:13.32,Default,,0000,0000,0000,,constantly getting in all these new data Dialogue: 0,1:09:13.32,1:09:15.28,Default,,0000,0000,0000,,points and then refining the model Dialogue: 0,1:09:15.28,1:09:17.96,Default,,0000,0000,0000,,picking maybe a new model deploying the Dialogue: 0,1:09:17.96,1:09:21.68,Default,,0000,0000,0000,,new model onto the server and so on all Dialogue: 0,1:09:21.68,1:09:23.92,Default,,0000,0000,0000,,right and so that's it so that is Dialogue: 0,1:09:23.92,1:09:26.40,Default,,0000,0000,0000,,basically your machine learning workflow Dialogue: 0,1:09:26.40,1:09:29.48,Default,,0000,0000,0000,,in a nutshell okay so for this Dialogue: 0,1:09:29.48,1:09:32.08,Default,,0000,0000,0000,,particular approach we have used a bunch Dialogue: 0,1:09:32.08,1:09:34.56,Default,,0000,0000,0000,,of uh data science libraries from python Dialogue: 0,1:09:34.56,1:09:36.52,Default,,0000,0000,0000,,so we have used pandas which is the most Dialogue: 0,1:09:36.52,1:09:38.56,Default,,0000,0000,0000,,B basic data science libraries that Dialogue: 0,1:09:38.56,1:09:40.28,Default,,0000,0000,0000,,provides all the tools to work with raw Dialogue: 0,1:09:40.28,1:09:42.52,Default,,0000,0000,0000,,data we have used numai which is a high Dialogue: 0,1:09:42.52,1:09:44.08,Default,,0000,0000,0000,,performance library for implementing Dialogue: 0,1:09:44.08,1:09:46.44,Default,,0000,0000,0000,,complex array metrix operations we have Dialogue: 0,1:09:46.44,1:09:49.56,Default,,0000,0000,0000,,used met plot lip and cbon which is used Dialogue: 0,1:09:49.56,1:09:52.44,Default,,0000,0000,0000,,for doing the Eda the explorat Dialogue: 0,1:09:52.44,1:09:55.56,Default,,0000,0000,0000,,exploratory data analysis phase machine Dialogue: 0,1:09:55.56,1:09:57.04,Default,,0000,0000,0000,,learning where you visualize all your Dialogue: 0,1:09:57.04,1:09:59.04,Default,,0000,0000,0000,,data we have used psyit learn which is Dialogue: 0,1:09:59.04,1:10:01.28,Default,,0000,0000,0000,,the machine L learning library to do all Dialogue: 0,1:10:01.28,1:10:02.92,Default,,0000,0000,0000,,your implementation for all your call Dialogue: 0,1:10:02.92,1:10:06.00,Default,,0000,0000,0000,,machine learning algorithms uh we we we Dialogue: 0,1:10:06.00,1:10:08.00,Default,,0000,0000,0000,,have not used this because this is not a Dialogue: 0,1:10:08.00,1:10:11.04,Default,,0000,0000,0000,,deep learning uh problem but if you are Dialogue: 0,1:10:11.04,1:10:12.80,Default,,0000,0000,0000,,working with a deep learning problem Dialogue: 0,1:10:12.80,1:10:15.36,Default,,0000,0000,0000,,like image classification image Dialogue: 0,1:10:15.36,1:10:17.84,Default,,0000,0000,0000,,recognition object detection okay Dialogue: 0,1:10:17.84,1:10:20.20,Default,,0000,0000,0000,,natural language processing text Dialogue: 0,1:10:20.20,1:10:21.92,Default,,0000,0000,0000,,classification well then you're going to Dialogue: 0,1:10:21.92,1:10:24.36,Default,,0000,0000,0000,,use these libraries from python which is Dialogue: 0,1:10:24.36,1:10:28.96,Default,,0000,0000,0000,,tensor flow okay and also py Dialogue: 0,1:10:28.96,1:10:32.68,Default,,0000,0000,0000,,to and then lastly that whole thing that Dialogue: 0,1:10:32.68,1:10:34.72,Default,,0000,0000,0000,,whole data science project that you saw Dialogue: 0,1:10:34.72,1:10:36.80,Default,,0000,0000,0000,,just now this entire data science Dialogue: 0,1:10:36.80,1:10:38.88,Default,,0000,0000,0000,,project is actually developed in Dialogue: 0,1:10:38.88,1:10:41.08,Default,,0000,0000,0000,,something called a Jupiter notebook so Dialogue: 0,1:10:41.08,1:10:44.04,Default,,0000,0000,0000,,all this python code along with all the Dialogue: 0,1:10:44.04,1:10:46.36,Default,,0000,0000,0000,,observations from the data Dialogue: 0,1:10:46.36,1:10:48.68,Default,,0000,0000,0000,,scientists okay for this entire data Dialogue: 0,1:10:48.68,1:10:50.44,Default,,0000,0000,0000,,science project was actually run in Dialogue: 0,1:10:50.44,1:10:53.36,Default,,0000,0000,0000,,something called a Jupiter notebook so Dialogue: 0,1:10:53.36,1:10:55.76,Default,,0000,0000,0000,,that is uh the Dialogue: 0,1:10:55.76,1:10:59.08,Default,,0000,0000,0000,,most widely used tool for interactively Dialogue: 0,1:10:59.08,1:11:02.36,Default,,0000,0000,0000,,developing and presenting data science Dialogue: 0,1:11:02.36,1:11:04.64,Default,,0000,0000,0000,,projects okay so that brings me to the Dialogue: 0,1:11:04.64,1:11:07.40,Default,,0000,0000,0000,,end of this entire presentation I hope Dialogue: 0,1:11:07.40,1:11:10.36,Default,,0000,0000,0000,,that you find it useful for you and that Dialogue: 0,1:11:10.36,1:11:13.20,Default,,0000,0000,0000,,you can appreciate the importance of Dialogue: 0,1:11:13.20,1:11:15.28,Default,,0000,0000,0000,,machine learning and how it can be Dialogue: 0,1:11:15.28,1:11:19.80,Default,,0000,0000,0000,,applied in a real life use case in a Dialogue: 0,1:11:19.80,1:11:23.36,Default,,0000,0000,0000,,typical production environment all right Dialogue: 0,1:11:23.36,1:11:27.24,Default,,0000,0000,0000,,thank you all so much for watching