0:00:01.920,0:00:04.072 [CARLOTTA]: Great, so I think we can start 0:00:04.072,0:00:06.340 since the meeting is recorded, so if 0:00:06.340,0:00:10.090 everyone, uh, jump-jumps in later, they 0:00:10.090,0:00:12.420 can watch the recording. 0:00:12.420,0:00:15.780 So, hi everyone and welcome to this 0:00:15.780,0:00:18.000 um, Cloud Skill Challenge study session 0:00:18.000,0:00:20.880 around a create classification models 0:00:20.880,0:00:24.000 with Azure Machine learning designer. 0:00:24.000,0:00:27.240 So today I'm thrilled to be here with 0:00:27.240,0:00:29.149 John. Uh, John do you mind 0:00:29.149,0:00:31.619 introduce briefly yourself? 0:00:31.619,0:00:33.160 [JOHN]: Uh, thank you Carlotta. 0:00:33.160,0:00:34.160 Hello everyone. 0:00:34.160,0:00:38.080 Welcome to our workshop today. I hope 0:00:38.080,0:00:40.559 that you are all excited for it. I am 0:00:40.559,0:00:43.140 John Aziz, a gold Microsoft Learn student 0:00:43.140,0:00:47.460 ambassador, and I will be here with, uh, 0:00:47.460,0:00:50.760 Carlotta to do the practical part 0:00:50.760,0:00:53.820 about this module of the Cloud Skills 0:00:53.820,0:00:56.623 Challenge. Thank you for having me. 0:00:56.623,0:00:58.219 [CARLOTTA]: Perfect, thanks John. 0:00:58.219,0:00:59.623 So for those who 0:00:59.623,0:01:03.440 don't know me, I'm Carlotta Castelluccio, 0:01:03.440,0:01:06.479 based in Italy and focused on AI 0:01:06.479,0:01:08.760 machine learning technologies and about 0:01:08.760,0:01:11.200 the use in education. 0:01:11.200,0:01:12.340 Um, so, 0:01:12.737,0:01:14.537 um this Cloud Skill Challenge study 0:01:14.537,0:01:17.117 session is based on a learn module, a 0:01:17.120,0:01:21.080 dedicated learn module. I sent to you, uh 0:01:21.320,0:01:23.939 the link to this module, uh, in the chat 0:01:23.939,0:01:25.619 in a way that you can follow along the 0:01:25.619,0:01:28.680 module if you want, or just have a look at 0:01:28.680,0:01:32.470 the module later at your own pace. 0:01:32.470,0:01:33.780 Um... 0:01:33.780,0:01:37.020 So, before starting I would also like to 0:01:37.020,0:01:40.619 remember to remember you, uh, the code of 0:01:40.619,0:01:43.439 conduct and guidelines of our student 0:01:43.439,0:01:47.510 ambassadors community. So please during this 0:01:47.510,0:01:51.000 meeting be respectful and inclusive and 0:01:51.000,0:01:53.579 be friendly, open, and welcoming and 0:01:53.579,0:01:56.159 respectful of other-each other 0:01:56.159,0:01:57.720 differences. 0:01:57.720,0:02:01.200 If you want to learn more about the code 0:02:01.200,0:02:03.390 of conduct, you can use this link in the 0:02:03.390,0:02:08.880 deck: aka.ms/SACoC. 0:02:09.660,0:02:11.730 And now we are, 0:02:11.730,0:02:15.420 um, we are ready to to start our session. 0:02:15.420,0:02:18.959 So as we mentioned it we are going to 0:02:18.959,0:02:21.980 focus on classification models and Azure ML, 0:02:21.980,0:02:24.900 uh, today. So, first of all, we are going 0:02:24.900,0:02:28.430 to, um, identify, uh, the kind of 0:02:28.430,0:02:31.080 um, of scenarios in which you should 0:02:31.080,0:02:34.490 choose to use a classification model. 0:02:34.490,0:02:36.660 We're going to introduce Azure Machine 0:02:36.660,0:02:39.060 Learning and Azure Machine Designer. 0:02:39.060,0:02:41.879 We're going to understand, uh, which are 0:02:41.879,0:02:43.680 the steps to follow, to create a 0:02:43.680,0:02:46.200 classification model in Azure Machine 0:02:46.200,0:02:48.076 Learning, and then John will, 0:02:48.076,0:02:49.500 um, 0:02:49.500,0:02:52.219 lead an amazing demo about training and 0:02:52.219,0:02:54.300 publishing a classification model in 0:02:54.300,0:02:57.000 Azure ML Designer. 0:02:57.000,0:02:59.819 So, let's start from the beginning. Let's 0:02:59.819,0:03:02.640 start from identifying classification 0:03:02.640,0:03:05.220 machine learning scenarios. 0:03:05.220,0:03:07.640 So, first of all, what is classification? 0:03:07.640,0:03:09.959 Classification is a form of machine 0:03:09.959,0:03:12.120 learning that is used to predict which 0:03:12.120,0:03:15.599 category or class an item belongs to. For 0:03:15.599,0:03:17.340 example, we might want to develop a 0:03:17.340,0:03:19.800 classifier able to identify if an 0:03:19.800,0:03:22.200 incoming email should be filtered or not 0:03:22.200,0:03:25.080 according to the style, the sender, the 0:03:25.080,0:03:26.935 length of the email, etc. 0:03:26.935,0:03:28.140 In this case, the 0:03:28.140,0:03:30.060 characteristics of the email are the 0:03:30.060,0:03:31.080 features. 0:03:31.080,0:03:34.200 And the label is a classification of 0:03:34.200,0:03:38.099 either a zero or one, representing a spam 0:03:38.099,0:03:40.860 or non-spam for the incoming email. So 0:03:40.860,0:03:42.360 this is an example of a binary 0:03:42.360,0:03:44.400 classifier. If you want to assign 0:03:44.400,0:03:46.260 multiple categories to the incoming 0:03:46.260,0:03:48.959 email like work letters, love letters, 0:03:48.959,0:03:52.080 complaints, or other categories, in this 0:03:52.080,0:03:54.000 case a binary classifier is no longer 0:03:54.000,0:03:55.739 enough, and we should develop a 0:03:55.739,0:03:58.319 multi-class classifier. So classification 0:03:58.319,0:04:00.599 is an example of what is called 0:04:00.599,0:04:02.519 supervised machine learning 0:04:02.519,0:04:05.280 in which you train a model using data 0:04:05.280,0:04:07.080 that includes both the features and 0:04:07.080,0:04:08.879 known values for label 0:04:08.879,0:04:11.099 so that the model learns to fit the 0:04:11.099,0:04:13.560 feature combinations to the label. Then, 0:04:13.560,0:04:15.420 after training has been completed, you 0:04:15.420,0:04:17.040 can use the train model to predict 0:04:17.040,0:04:19.500 labels for new items for-for which the 0:04:19.500,0:04:22.320 label is unknown. 0:04:22.320,0:04:25.440 But let's see some examples of scenarios 0:04:25.440,0:04:27.120 for classification machine learning 0:04:27.120,0:04:29.160 models. So, we already mentioned an 0:04:29.160,0:04:31.020 example of a solution in which we would 0:04:31.020,0:04:33.660 need a classifier, but let's explore 0:04:33.660,0:04:35.699 other scenarios for classification in 0:04:35.699,0:04:37.979 other industries. For example, you can use 0:04:37.979,0:04:40.380 a classification model for a health 0:04:40.380,0:04:43.680 clinic scenario, and use clinical data to 0:04:43.680,0:04:45.720 predict whether patient will become sick 0:04:45.720,0:04:47.060 or not. 0:04:47.060,0:04:49.553 You can use, um... 0:04:49.553,0:04:59.250 [NO AUDIO] 0:04:59.250,0:05:00.930 [JOHN]: Carlotta, you are muted. 0:05:03.780,0:05:07.700 [CARLOTTA]: Oh, sorry. [br]So, when I became muted, it's a 0:05:07.700,0:05:08.807 long time, or? 0:05:08.807,0:05:11.940 [JOHN]: You can use-you can use, uh 0:05:11.940,0:05:13.430 some models for classification. 0:05:13.430,0:05:14.729 For example, you can use... 0:05:14.729,0:05:16.919 You were saying this. 0:05:16.919,0:05:20.020 [CARLOTTA]: Uh, so I was in this deck, 0:05:20.020,0:05:21.660 or the previous one? 0:05:21.660,0:05:24.180 [JOHN]: This one, you have been muted 0:05:24.180,0:05:25.901 for, uh, one second [LAUGHS]. 0:05:25.901,0:05:28.018 [CARLOTTA]: Okay, okay perfect, perfect. 0:05:28.018,0:05:30.419 Uh, yeah I was talking...sorry for 0:05:30.419,0:05:33.278 that. So, I was talking about the possible 0:05:33.278,0:05:34.560 scenarios in which you, 0:05:34.560,0:05:37.320 you can use a classification model. Like 0:05:37.320,0:05:39.660 have clinic scenario, financial scenario, 0:05:39.660,0:05:41.699 or the third one is business type of 0:05:41.699,0:05:44.100 scenario. You can use characteristics of 0:05:44.100,0:05:45.900 small business to predict if a new 0:05:45.900,0:05:47.880 venture will succeed or not, for 0:05:47.880,0:05:49.560 example. And these are all types of 0:05:49.560,0:05:52.160 binary classification. 0:05:52.160,0:05:55.199 Uh, but today we are also going to talk 0:05:55.199,0:05:57.240 about Azure Machine Learning. So let's 0:05:57.240,0:05:58.139 see. 0:05:58.139,0:06:00.660 What is Azure Machine Learning? So 0:06:00.660,0:06:02.160 training and deploying an effective 0:06:02.160,0:06:04.199 machine learning model involves a lot of 0:06:04.199,0:06:06.539 work, much of it time-consuming and 0:06:06.539,0:06:08.880 resource intensive. So, Azure Machine 0:06:08.880,0:06:11.039 Learning is a cloud-based service that 0:06:11.039,0:06:12.780 helps simplify some of the tasks it 0:06:12.780,0:06:15.720 takes to prepare data, train a model, and 0:06:15.720,0:06:18.060 also deploy it as a predictive service. 0:06:18.060,0:06:20.220 So it helps that the scientists increase 0:06:20.220,0:06:22.380 their efficiency by automating many of 0:06:22.380,0:06:24.660 the time-consuming tasks associated to 0:06:24.660,0:06:27.539 creating and training a model. 0:06:27.539,0:06:29.520 And it enables them also to use 0:06:29.520,0:06:31.740 cloud-based compute resources that scale 0:06:31.740,0:06:33.720 effectively to handle large volumes of 0:06:33.720,0:06:36.300 data while incurring costs only when 0:06:36.300,0:06:38.699 actually used. 0:06:38.699,0:06:41.220 To use Azure Machine Learning, you, 0:06:41.220,0:06:43.199 first thing's first, you need to create a 0:06:43.199,0:06:44.940 workspace resource in your Azure 0:06:44.940,0:06:47.520 subscription, and you can then use these 0:06:47.520,0:06:50.220 workspace to manage data, compute 0:06:50.220,0:06:52.440 resources, code models and other 0:06:52.440,0:06:54.959 artifacts after you have created an 0:06:54.959,0:06:56.519 Azure Machine Learning workspace, 0:06:56.519,0:06:57.808 you can develop solutions with the 0:06:57.808,0:06:59.338 Azure Machine Learning service, 0:06:59.338,0:07:00.840 either with developer 0:07:00.840,0:07:02.580 tools or the Azure Machine Learning 0:07:02.580,0:07:04.088 studio web portal. 0:07:04.088,0:07:06.440 In particular, [br]Azure Machine Learning studio 0:07:06.440,0:07:07.800 is a web portal for machine 0:07:07.800,0:07:09.720 learning solutions in Azure, and it 0:07:09.720,0:07:11.639 includes a wide range of features and 0:07:11.639,0:07:13.800 capabilities that help data scientists 0:07:13.800,0:07:16.259 prepare data, train models, publish 0:07:16.259,0:07:18.479 predictive services, and monitor also 0:07:18.479,0:07:19.680 their usage. 0:07:19.680,0:07:22.139 So to begin using the web portal, you need 0:07:22.139,0:07:23.294 to assign the workspace 0:07:23.294,0:07:24.781 you created in the Azure portal 0:07:24.781,0:07:26.819 to the Azure Machine 0:07:26.819,0:07:29.520 Learning studio. 0:07:29.520,0:07:31.800 At its core, Azure Machine Learning is a 0:07:31.800,0:07:33.720 service for training and managing 0:07:33.720,0:07:36.000 machine learning models for which you 0:07:36.000,0:07:38.220 need compute resources on which to run 0:07:38.220,0:07:39.919 the training process. 0:07:39.919,0:07:44.280 Compute targets are, um, one of the main 0:07:44.280,0:07:46.740 basic concepts of Azure Machine Learning. 0:07:46.740,0:07:48.780 They are cloud-based resources on which 0:07:48.780,0:07:50.639 you can run model training and data 0:07:50.639,0:07:53.220 exploration processes. 0:07:53.220,0:07:54.780 So in Azure Machine Learning studio, you 0:07:54.780,0:07:56.759 can manage the compute targets for your 0:07:56.759,0:07:58.740 data science activities, and there are 0:07:58.740,0:08:03.240 four kinds of of compute targets you can 0:08:03.240,0:08:05.940 create. We have the compute instances, 0:08:05.940,0:08:09.539 which are vital machine set up for 0:08:09.539,0:08:10.979 running machine learning code during 0:08:10.979,0:08:13.319 development, so they are not designed for 0:08:13.319,0:08:14.460 production. 0:08:14.460,0:08:17.099 Then we have compute clusters, which are 0:08:17.099,0:08:19.800 a set of virtual machines that can scale 0:08:19.800,0:08:22.199 up automatically based on traffic. 0:08:22.199,0:08:24.599 We have inference clusters, which are 0:08:24.599,0:08:26.699 similar to compute clusters, but they are 0:08:26.699,0:08:29.340 designed for deployment, so they are 0:08:29.340,0:08:31.979 deployment targets for predictive 0:08:31.979,0:08:35.820 services that use trained models. 0:08:35.820,0:08:38.339 And finally, we have attached compute, 0:08:38.339,0:08:41.339 which are any compute target that you 0:08:41.339,0:08:44.159 manage yourself outside of Azure ML, like, 0:08:44.159,0:08:46.560 for example, virtual machines or Azure 0:08:46.560,0:08:49.700 data bricks clusters. 0:08:49.980,0:08:52.800 So we talked about Azure Machine 0:08:52.800,0:08:54.300 Learning, but we also mentioned- 0:08:54.300,0:08:55.500 mentioned Azure Machine Learning 0:08:55.500,0:08:57.540 designer. What is Azure Machine Learning 0:08:57.540,0:09:00.120 designer? So, in Azure Machine Learning 0:09:00.120,0:09:02.880 Studio, there are several ways to author 0:09:02.880,0:09:04.560 classification machine learning models. 0:09:04.560,0:09:08.100 One way is to use a visual interface, and 0:09:08.100,0:09:10.260 this visual interface is called designer, 0:09:10.260,0:09:13.140 and you can use it to train, test, and 0:09:13.140,0:09:15.540 also deploy machine learning models. And 0:09:15.540,0:09:17.940 the drag-and-drop interface makes use of 0:09:17.940,0:09:20.279 clearly defined inputs and outputs that 0:09:20.279,0:09:22.680 can be shared, reused, and also version 0:09:22.680,0:09:23.880 control. 0:09:23.880,0:09:25.920 And using the designer, you can identify 0:09:25.920,0:09:28.080 the building blocks or components needed 0:09:28.080,0:09:30.839 for your model, place and connect them on 0:09:30.839,0:09:33.120 your canvas, and run a machine learning 0:09:33.120,0:09:35.300 job. 0:09:35.399,0:09:36.779 So, 0:09:36.779,0:09:39.120 each designer project, so each project 0:09:39.120,0:09:42.360 in the designer is known as a pipeline. 0:09:42.360,0:09:45.600 And in the design, we have a left panel 0:09:45.600,0:09:48.360 for navigation and a canvas on your 0:09:48.360,0:09:50.640 right hand side in which you build your 0:09:50.640,0:09:53.940 pipeline visually. So pipelines let you 0:09:53.940,0:09:56.100 organize, manage, and reuse complex 0:09:56.100,0:09:58.260 machine learning workflows across 0:09:58.260,0:10:00.480 projects and users. 0:10:00.480,0:10:03.000 A pipeline starts with the data set from 0:10:03.000,0:10:04.140 which you want to train the model 0:10:04.140,0:10:05.880 because all begins with data when 0:10:05.880,0:10:07.380 talking about data science and machine 0:10:07.380,0:10:09.540 learning. And each time you run a 0:10:09.540,0:10:10.980 pipeline, the configuration of the 0:10:10.980,0:10:12.959 pipeline and its results are stored in 0:10:12.959,0:10:17.339 your workspace as a pipeline job. 0:10:17.339,0:10:21.959 So the second main concept of Azure 0:10:21.959,0:10:25.080 Machine Learning is a component. So, going 0:10:25.080,0:10:28.440 hierarchically from the pipeline, we can 0:10:28.440,0:10:30.540 say that each building block of a 0:10:30.540,0:10:32.920 pipeline is called a component. 0:10:32.920,0:10:34.120 In other words, an Azure Machine 0:10:34.120,0:10:36.959 Learning component encapsulates one step 0:10:36.959,0:10:39.420 in a machine learning pipeline. So, it's a 0:10:39.420,0:10:41.640 reusable piece of code with inputs and 0:10:41.640,0:10:44.100 outputs, something very similar to a 0:10:44.100,0:10:46.500 function in any programming language. 0:10:46.500,0:10:48.899 And in a pipeline project, you can access 0:10:48.899,0:10:51.480 data assets and components from the left 0:10:51.480,0:10:52.700 panels 0:10:52.700,0:10:56.279 Asset Library tab, as you can see 0:10:56.279,0:11:00.200 here in the screenshot in the deck. 0:11:00.300,0:11:03.360 So you can create data assets on using 0:11:03.360,0:11:08.339 an ADOC page called Data Page. And a data 0:11:08.339,0:11:11.160 asset is a reference to a data source 0:11:11.160,0:11:12.480 location. 0:11:12.480,0:11:15.720 So this data source location could be a 0:11:15.720,0:11:18.779 local file, a data store, a web file or 0:11:18.779,0:11:21.660 even an Azure open asset. 0:11:21.660,0:11:23.880 And these data assets will appear along 0:11:23.880,0:11:26.459 with standard sample data set in the 0:11:26.459,0:11:30.019 designers Asset Library. 0:11:30.079,0:11:31.560 Um. 0:11:31.560,0:11:36.959 Another basic concept of Azure ML is 0:11:36.959,0:11:38.880 Azure Machine Learning jobs. 0:11:38.880,0:11:43.519 So, basically, when you submit a pipeline, 0:11:43.519,0:11:47.040 you create a job which will run all the 0:11:47.040,0:11:49.920 steps in your pipeline. So a job executes 0:11:49.920,0:11:52.800 a task against a specified compute 0:11:52.800,0:11:53.760 target. 0:11:53.760,0:11:56.640 Jobs enable systematic tracking for your 0:11:56.640,0:11:58.560 machine learning experimentation in 0:11:58.560,0:11:59.880 Azure ML. 0:11:59.880,0:12:02.399 And once a job is created, Azure ML 0:12:02.399,0:12:05.459 maintains a run record, uh, for the 0:12:05.459,0:12:07.640 job. 0:12:07.877,0:12:12.180 Um, but, let's move to the classification 0:12:12.180,0:12:14.040 steps. So, 0:12:14.040,0:12:17.160 um, let's introduce how to create a 0:12:17.160,0:12:21.360 classification model in Azure ML, but you 0:12:21.360,0:12:23.640 will see it in more details in a 0:12:23.640,0:12:26.339 handsome demo that John will guide 0:12:26.339,0:12:29.459 through in a few minutes. 0:12:29.459,0:12:32.220 So, you can think of the steps to train 0:12:32.220,0:12:33.720 and evaluate a classification machine 0:12:33.720,0:12:36.660 learning model as four main steps. So 0:12:36.660,0:12:38.459 first of all, you need to prepare your 0:12:38.459,0:12:41.100 data. So, you need to identify the 0:12:41.100,0:12:43.139 features and the label in your data set, 0:12:43.139,0:12:46.139 you need to pre-process, so you need to 0:12:46.139,0:12:48.839 clean and transform the data as needed. 0:12:48.839,0:12:51.120 Then, the second step, of course, is 0:12:51.120,0:12:52.740 training the model. 0:12:52.740,0:12:54.600 And for training the model, you need to 0:12:54.600,0:12:57.060 split the data into two groups: a 0:12:57.060,0:12:59.519 training and a validation set. 0:12:59.519,0:13:01.320 Then you train a machine learning model 0:13:01.320,0:13:03.540 using the training data set and you test 0:13:03.540,0:13:05.040 the machine learning model for 0:13:05.040,0:13:06.889 performance using the validation data 0:13:06.889,0:13:08.100 set. 0:13:08.100,0:13:12.180 The third step is performance evaluation, 0:13:12.180,0:13:14.519 which means comparing how close the 0:13:14.519,0:13:16.139 model's predictions are to the known 0:13:16.139,0:13:20.519 labels and these lead us to compute some 0:13:20.519,0:13:23.279 evaluation performance metrics. 0:13:23.279,0:13:25.740 And then finally... 0:13:25.740,0:13:29.051 So, these three steps are not, 0:13:29.051,0:13:32.770 um, not performed every time in a 0:13:32.770,0:13:35.459 linear manner. It's more an iterative 0:13:35.459,0:13:39.420 process. But once you obtain, you achieve 0:13:39.420,0:13:42.959 a performance with which you are 0:13:42.959,0:13:45.779 satisfied, so you are ready to, let's say 0:13:45.779,0:13:48.660 go into production, and you can deploy 0:13:48.660,0:13:51.920 your train model as a predictive service 0:13:51.920,0:13:55.980 into a real-time, uh, to a real-time 0:13:55.980,0:13:58.019 endpoint. And to do so, you need to 0:13:58.019,0:14:00.240 convert the training pipeline into a 0:14:00.240,0:14:02.820 real-time inference pipeline, and then 0:14:02.820,0:14:04.260 you can deploy the model as an 0:14:04.260,0:14:06.779 application on a server or device so 0:14:06.779,0:14:11.420 that others can consume this model. 0:14:11.459,0:14:14.279 So let's start with the first step, which 0:14:14.279,0:14:17.700 is prepare data. Real-world data can contain 0:14:17.700,0:14:19.920 many different issues that can affect 0:14:19.920,0:14:22.320 the utility of the data and our 0:14:22.320,0:14:24.959 interpretation of the results. So also 0:14:24.959,0:14:26.579 the machine learning model that you 0:14:26.579,0:14:29.279 train using this data. For example, real- 0:14:29.279,0:14:31.440 world data can be affected by a bad 0:14:31.440,0:14:34.079 recording or a bad measurement, and it 0:14:34.079,0:14:36.480 can also contain missing values for some 0:14:36.480,0:14:38.880 parameters. And Azure Machine Learning 0:14:38.880,0:14:40.860 designer has several pre-built 0:14:40.860,0:14:43.019 components that can be used to prepare 0:14:43.019,0:14:46.079 data for training. These components 0:14:46.079,0:14:48.300 enable you to clean data, normalize 0:14:48.300,0:14:52.940 features, join tables, and more. 0:14:53.000,0:14:57.120 Let's come to training. So, to train a 0:14:57.120,0:14:59.220 classification model you need a data set 0:14:59.220,0:15:02.160 that includes historical features, so the 0:15:02.160,0:15:03.899 characteristics of the entity for which 0:15:03.899,0:15:06.899 one to make a prediction, and known label 0:15:06.899,0:15:09.779 values. The label is the class indicator 0:15:09.779,0:15:11.820 we want to train a model to predict. 0:15:11.820,0:15:13.920 And it's common practice to train a 0:15:13.920,0:15:16.199 model using a subset of the data while 0:15:16.199,0:15:18.300 holding back some data with which to 0:15:18.300,0:15:20.760 test the train model. And this enables 0:15:20.760,0:15:22.440 you to compare the labels that the model 0:15:22.440,0:15:25.380 predicts with the actual known labels in 0:15:25.380,0:15:27.420 the original data set. 0:15:27.420,0:15:29.880 This operation can be performed in the 0:15:29.880,0:15:32.100 designer using the split data component 0:15:32.100,0:15:34.740 as shown by the screenshot here in the... 0:15:34.740,0:15:36.660 in the deck. 0:15:36.660,0:15:39.540 There's also another component that you 0:15:39.540,0:15:40.980 should use, which is the score model 0:15:40.980,0:15:43.139 component to generate the predicted 0:15:43.139,0:15:45.360 class label value using the validation 0:15:45.360,0:15:48.060 data as input. So once you connect all 0:15:48.060,0:15:49.800 these components, 0:15:49.800,0:15:52.440 the component specifying the 0:15:52.440,0:15:54.959 model we are going to use, the split data 0:15:54.959,0:15:57.060 component, the trained model component, 0:15:57.060,0:16:00.300 and the score model component, you want 0:16:00.300,0:16:02.639 to run a new experiment in 0:16:02.639,0:16:05.760 Azure ML, which will use the data set 0:16:05.760,0:16:09.600 on the canvas to train and score a model. 0:16:09.600,0:16:12.000 After training a model, it is important, 0:16:12.000,0:16:14.639 we say, to evaluate its performance, to 0:16:14.639,0:16:17.060 understand how bad-how good sorry 0:16:17.060,0:16:20.760 our model is performing. 0:16:20.760,0:16:22.680 And there are many performance metrics 0:16:22.680,0:16:24.600 and methodologies for evaluating how 0:16:24.600,0:16:27.000 well a model makes predictions. The 0:16:27.000,0:16:29.160 component to use to perform evaluation 0:16:29.160,0:16:32.220 in Azure ML designer is called, as 0:16:32.220,0:16:35.060 intuitive as it is, Evaluate Model. 0:16:35.060,0:16:38.339 Once the job of training and evaluation 0:16:38.339,0:16:40.740 of the model is completed, you can review 0:16:40.740,0:16:42.959 evaluation metrics on the completed job 0:16:42.959,0:16:45.860 page by right clicking on the component. 0:16:45.860,0:16:48.480 In the evaluation results, you can also 0:16:48.480,0:16:51.000 find the so-called confusion Matrix that 0:16:51.000,0:16:53.399 you can see here in the right side of 0:16:53.399,0:16:55.079 this deck 0:16:55.079,0:16:57.420 A confusion matrix shows cases where 0:16:57.420,0:16:59.220 both the predicted and actual values 0:16:59.220,0:17:01.980 were one, the so-called true positives 0:17:01.980,0:17:04.500 at the top left and also cases where 0:17:04.500,0:17:06.600 both the predicted and the actual values 0:17:06.600,0:17:08.459 were zero, the so-called true negatives 0:17:08.459,0:17:10.919 at the bottom right. While the other 0:17:10.919,0:17:13.679 cells show cases where the predicting 0:17:13.679,0:17:15.380 and actual values differ, 0:17:15.380,0:17:17.939 called false positive and false 0:17:17.939,0:17:19.919 negatives, and this is an example of a 0:17:19.919,0:17:23.579 confusion matrix for a binary classifier. 0:17:23.579,0:17:25.559 While for a multi-class classification 0:17:25.559,0:17:28.079 model the same approach is used to 0:17:28.079,0:17:30.120 tabulate each possible combination of 0:17:30.120,0:17:32.940 actual and predictive value counts. So 0:17:32.940,0:17:34.740 for example, a model with three possible 0:17:34.740,0:17:37.559 classes would result in three times 0:17:37.559,0:17:39.120 three matrix. 0:17:39.120,0:17:41.880 The confusion matrix is also useful for 0:17:41.880,0:17:43.860 the matrix that can be derived from it, 0:17:43.860,0:17:48.260 like accuracy, recall, or precision. 0:17:49.320,0:17:52.080 We say that the last step is 0:17:52.080,0:17:55.620 deploying the train model to a real-time 0:17:55.620,0:17:59.280 endpoint as a predictive service. And in 0:17:59.280,0:18:00.900 order to automate your model into a 0:18:00.900,0:18:02.760 service that makes continuous 0:18:02.760,0:18:04.980 predictions, you need, first of all, to 0:18:04.980,0:18:08.039 create and then deploy an 0:18:08.039,0:18:10.080 inference pipeline. The process of 0:18:10.080,0:18:11.940 converting the training pipeline into a 0:18:11.940,0:18:13.980 real-time inference pipeline removes 0:18:13.980,0:18:16.260 training components and adds web service 0:18:16.260,0:18:18.960 inputs and outputs to handle requests. 0:18:18.960,0:18:21.240 And the inference pipeline performs...they 0:18:21.240,0:18:22.679 seem that the transformation is the 0:18:22.679,0:18:26.160 first pipeline, but for new data. Then it 0:18:26.160,0:18:28.679 uses the train model to infer or predict 0:18:28.679,0:18:32.539 label values based on its feature. 0:18:32.820,0:18:36.120 So, I think I've talked a lot for now 0:18:36.120,0:18:40.380 I would like to let John show us 0:18:40.380,0:18:44.340 something in practice with 0:18:44.340,0:18:47.280 the hands-on demo, so please, John, go 0:18:47.280,0:18:49.860 ahead, share your screen and guide us 0:18:49.860,0:18:52.380 through this demo of creating a 0:18:52.380,0:18:53.425 classification with 0:18:53.425,0:18:55.860 the Azure Machine Learning designer. 0:18:55.860,0:18:58.509 [JOHN]: Thank you so much Carlotta for 0:18:58.509,0:19:00.690 this interesting explanation of the 0:19:00.690,0:19:03.810 Azure ML designer. And now, 0:19:03.810,0:19:07.500 um, I'm going to start with you in the 0:19:07.500,0:19:10.200 practical demo part, so if you want to 0:19:10.200,0:19:13.320 follow along, go to the link that Carlotta 0:19:13.320,0:19:18.380 sent in the chat so you can do 0:19:18.380,0:19:21.840 the demo or the practical part with me. 0:19:21.840,0:19:25.260 I'm just going to share my screen... 0:19:25.260,0:19:27.140 and... 0:19:27.140,0:19:31.559 ...go here. So, uh... 0:19:31.559,0:19:34.320 Where am I right now? I'm inside the 0:19:34.320,0:19:36.960 Microsoft Learn documentation. This is 0:19:36.960,0:19:40.260 the exercise part of this module, and we 0:19:40.260,0:19:43.080 will start by setting two things, which 0:19:43.080,0:19:45.299 are a prequisite for us to work inside 0:19:45.299,0:19:49.919 this module, which are the users group 0:19:49.919,0:19:52.400 and the Azure Machine Learning workspace, 0:19:52.400,0:19:55.620 and something extra which is the compute 0:19:55.620,0:19:59.760 cluster that Carlotta talked about. So I 0:19:59.760,0:20:02.100 just want to make sure that you all have 0:20:02.100,0:20:05.660 a resource group created inside your 0:20:05.660,0:20:08.039 portal inside your Microsoft Azure 0:20:08.039,0:20:11.100 platform. So this is my resource group. 0:20:11.100,0:20:14.640 Inside this is this Resource Group. I 0:20:14.640,0:20:17.299 have created an Azure Machine Learning 0:20:17.299,0:20:21.539 workspace. So I'm just going to access 0:20:21.539,0:20:24.000 the workspace that I have created 0:20:24.000,0:20:27.000 already from this link. I am going to 0:20:27.000,0:20:30.240 open it, which is the studio web URL, and 0:20:30.240,0:20:33.000 I will follow the steps. So what is this? 0:20:33.000,0:20:35.760 This is your machine learning workspace, 0:20:35.760,0:20:38.220 or machine learning studio. You can do a 0:20:38.220,0:20:40.080 lot of things here, but we are going to 0:20:40.080,0:20:42.419 focus mainly on the designer and the 0:20:42.419,0:20:46.080 data and the compute. So another 0:20:46.080,0:20:49.140 prerequisite here, as Carlotta told you, 0:20:49.140,0:20:51.480 we need some resources to power up the 0:20:51.480,0:20:54.299 classification, the processes that 0:20:54.299,0:20:55.140 will happen. 0:20:55.140,0:20:58.080 So, we have created this computing 0:20:58.080,0:20:59.100 cluster, 0:20:59.100,0:21:02.880 and we have set some presets for 0:21:02.880,0:21:04.140 it. So 0:21:04.140,0:21:07.080 where can you find this preset? You go 0:21:07.080,0:21:10.200 here. Under the create compute, you'll 0:21:10.200,0:21:13.220 find everything that you need to do. So 0:21:13.220,0:21:16.740 the size is the Standard DS11 Version 2, 0:21:16.740,0:21:19.799 and it's a CPU not GPU, because we don't 0:21:19.799,0:21:22.500 know the GPU, and we don't need a GPU. 0:21:22.500,0:21:25.799 Uh, it is ready for us to use. 0:21:25.799,0:21:30.900 The next thing which we will look into 0:21:30.900,0:21:33.600 is the designer. How can you access the 0:21:33.600,0:21:35.100 designer? 0:21:35.100,0:21:37.679 You can either click on this icon or 0:21:37.679,0:21:40.020 click on the navigation menu and click 0:21:40.020,0:21:42.299 on the designer for me. 0:21:42.900,0:21:45.780 Now I am inside my designer. 0:21:45.780,0:21:47.640 What we are going to do now is the 0:21:47.640,0:21:50.280 pipeline that Carlotta told you about. 0:21:50.280,0:21:54.360 And from where can I know these steps? If 0:21:54.360,0:21:57.120 you follow along in the learn module, you 0:21:57.120,0:21:58.740 will find everything that I'm doing 0:21:58.740,0:22:02.340 right now in detail, with screenshots 0:22:02.340,0:22:05.820 of course. So I'm going to create a new 0:22:05.820,0:22:09.120 pipeline, and I can do so by clicking on 0:22:09.120,0:22:10.980 this plus button. 0:22:10.980,0:22:13.740 It's going to redirect me to the 0:22:13.740,0:22:17.100 designer authoring the pipeline, uh, where 0:22:17.100,0:22:19.500 I can drag and drop data and components 0:22:19.500,0:22:21.780 that Carlotta told you the difference 0:22:21.780,0:22:22.980 between. 0:22:22.980,0:22:26.340 And here I am going to do some changes 0:22:26.340,0:22:29.100 to the settings. I am going to connect 0:22:29.100,0:22:31.860 this with my compute cluster that I 0:22:31.860,0:22:35.120 created previously so I can utilize it. 0:22:35.120,0:22:38.100 From here I'm going to choose this 0:22:38.100,0:22:40.380 compute cluster demo that I have showed 0:22:40.380,0:22:42.600 you before in the clusters here, 0:22:42.600,0:22:45.900 and I am going to change the name to 0:22:45.900,0:22:47.820 something more meaningful. Instead of 0:22:47.820,0:22:50.580 byline and the date of today I'm going 0:22:50.580,0:22:53.760 to name it Diabetes... 0:22:53.760,0:22:56.120 uh... 0:22:56.120,0:23:00.020 let's just check this training. 0:23:00.020,0:23:05.100 Let's say Training 0.1 or 01, okay? 0:23:05.100,0:23:09.360 And I am going to close this tab in 0:23:09.360,0:23:12.000 order to have a bigger place to work 0:23:12.000,0:23:14.700 inside because this is where we will 0:23:14.700,0:23:17.220 work, where everything will happen. So I 0:23:17.220,0:23:19.559 will click on close from here, 0:23:19.559,0:23:23.460 and I will go to the data and I will 0:23:23.460,0:23:25.620 create a new data set. 0:23:25.620,0:23:27.900 How can I create a new data set? There is 0:23:27.900,0:23:29.880 multiple options here you can find, from 0:23:29.880,0:23:31.799 local files, from data store, from web 0:23:31.799,0:23:34.020 files, from open data set, but I'm going 0:23:34.020,0:23:36.539 to choose from web files, as this is the 0:23:36.539,0:23:40.280 way we're going to create our data. 0:23:40.280,0:23:43.380 From here, the information of my data set 0:23:43.380,0:23:47.340 I'm going to get them from the Microsoft 0:23:47.340,0:23:50.820 Learn module. So if we go to the step 0:23:50.820,0:23:52.860 that says "Create a dataset", 0:23:52.860,0:23:55.020 under it, it illustrates that you can 0:23:55.020,0:23:57.720 access the data from inside the asset 0:23:57.720,0:23:59.760 library, and inside your asset library, 0:23:59.760,0:24:01.679 you'll find the data and find the 0:24:01.679,0:24:05.539 component. And I'm going to select 0:24:05.539,0:24:09.000 this link because this is where my data 0:24:09.000,0:24:12.000 is stored. If you open this link, you will 0:24:12.000,0:24:14.820 find this is a CSV file, I think. 0:24:14.820,0:24:17.400 Yeah. And you can...like, all the data are 0:24:17.400,0:24:18.360 here. 0:24:18.360,0:24:21.079 Now let's get back.. 0:24:21.079,0:24:22.149 Um... 0:24:26.880,0:24:28.200 And you are going to do something 0:24:28.200,0:24:29.880 meaningful, but because I have already 0:24:29.880,0:24:31.820 created it before twice, so I'm gonna 0:24:31.820,0:24:34.980 add a number to the name 0:24:34.980,0:24:37.559 The data set is tabular and there is 0:24:37.559,0:24:39.360 the file, but this is a table, so we're 0:24:39.360,0:24:40.760 going to choose the table. 0:24:40.760,0:24:42.240 Data type 0:24:42.240,0:24:43.740 for data set type. 0:24:43.740,0:24:46.260 Now we will click on "Next". That's gonna 0:24:46.260,0:24:51.179 review, or display for you the content 0:24:51.179,0:24:54.020 of this file that you have 0:24:54.020,0:24:57.419 imported to this workspace. 0:24:57.419,0:25:01.559 And for these settings, these are 0:25:01.559,0:25:03.720 related to our file format. 0:25:03.720,0:25:08.280 So this is a delimited file, and it's not 0:25:08.280,0:25:11.400 plain text, it's not a Jason. The delimiter 0:25:11.400,0:25:14.159 is common, as we have seen that they 0:25:14.159,0:25:26.700 [INDISTINGUISHABLE] 0:25:26.700,0:25:29.039 So I'm choosing common 0:25:29.039,0:25:32.900 errors because the only the first five... 0:25:32.900,0:25:34.880 [INDISTINGUISHABLE] 0:25:34.880,0:25:38.159 ...for example. Okay, uh, if you have any 0:25:38.159,0:25:39.960 doubts, if you have any problems, please 0:25:39.960,0:25:42.960 don't hesitate to write me 0:25:42.960,0:25:45.020 in the chat, 0:25:45.020,0:25:48.480 like, what is blocking you, and 0:25:48.480,0:25:50.940 me and Carlotta will try to help you, 0:25:50.940,0:25:53.220 like whenever possible. 0:25:53.220,0:25:55.659 And now this is the new preview for my 0:25:55.659,0:25:57.840 data set. I can see that I have an ID, I 0:25:57.840,0:25:59.700 have patient ID, I have pregnancies, I 0:25:59.700,0:26:02.220 have the age of the people, 0:26:02.220,0:26:05.720 I have the body mass, I think 0:26:05.720,0:26:08.460 whether they have diabetes or not, as a 0:26:08.460,0:26:10.679 zero and one. Zero indicates a negative, 0:26:10.679,0:26:14.159 the person doesn't have diabetes, and one 0:26:14.159,0:26:16.080 indicates a positive, that this person 0:26:16.080,0:26:18.299 has diabetes. Okay. 0:26:18.299,0:26:20.520 Now I'm going to click on "Next". Here I am 0:26:20.520,0:26:23.400 defining my schema. All the data types 0:26:23.400,0:26:25.380 inside my columns, the column names, which 0:26:25.380,0:26:28.760 columns to include, which to exclude. And 0:26:28.760,0:26:31.500 here we will include everything except 0:26:31.500,0:26:35.580 the path of the bath color. And we are 0:26:35.580,0:26:37.860 going to review the data types of each 0:26:37.860,0:26:40.440 column. So let's review this first one. 0:26:40.440,0:26:43.320 This is numbers, numbers, numbers, then it's the 0:26:43.320,0:26:45.779 integer. And this is, 0:26:45.779,0:26:48.679 um, like decimal.. 0:26:48.679,0:26:50.900 ...dotted... 0:26:50.900,0:26:53.580 decimal number. So we are going to choose 0:26:53.580,0:26:55.020 this data type. 0:26:55.020,0:26:57.200 And for this one 0:26:57.200,0:27:01.200 it says diabetic, and it's a zero under 0:27:01.200,0:27:02.460 one, and we are going to make it as 0:27:02.460,0:27:04.460 integers. 0:27:04.460,0:27:07.980 Now we are going to click on "Next" and 0:27:07.980,0:27:09.780 move to reviewing everything. This is 0:27:09.780,0:27:11.569 everything that we have defined together. 0:27:11.569,0:27:13.500 I will click on "Create". 0:27:13.500,0:27:15.179 And... 0:27:15.179,0:27:17.940 now the first step has ended. We have 0:27:17.940,0:27:19.919 gotten our data ready. 0:27:19.919,0:27:22.440 Now...what now? We're going to utilize the 0:27:22.440,0:27:23.468 designer... 0:27:23.468,0:27:26.820 um...power. We're going to drag and drop 0:27:26.820,0:27:29.820 our data set to create the pipeline. 0:27:29.820,0:27:33.179 So I have clicked on it and dragged it 0:27:33.179,0:27:35.640 to this space. It's gonna appear to you. 0:27:35.640,0:27:39.659 And we can inspect it by right clicking and 0:27:39.659,0:27:42.179 choose "Preview data" 0:27:42.179,0:27:46.200 to see what we have created together. 0:27:46.200,0:27:48.900 From here, you can see everything that we 0:27:48.900,0:27:50.700 have seen previously, but in more 0:27:50.700,0:27:53.100 details. And we are just going to close 0:27:53.100,0:27:56.580 this. Now what? Now we are gonna do the 0:27:56.580,0:28:00.799 processing that Carlota mentioned. 0:28:00.799,0:28:03.659 These are some instructions about the 0:28:03.659,0:28:05.460 data, about how you can look at them, how you 0:28:05.460,0:28:07.140 can open them but we are going to move 0:28:07.140,0:28:09.720 to the transformation or the processing. 0:28:09.720,0:28:13.500 So as Carlotta told you, like any data 0:28:13.500,0:28:15.480 for us to work on we have to do some 0:28:15.480,0:28:17.299 processing to it 0:28:17.299,0:28:20.159 to make it easy easier for the model to 0:28:20.159,0:28:23.279 be trained and easier to work with. So, uh, 0:28:23.279,0:28:25.860 we're gonna do the normalization. And 0:28:25.860,0:28:29.159 normalization meaning is, uh, 0:28:29.159,0:28:33.539 to scale our data, either down or up, but 0:28:33.539,0:28:35.400 we're going to scale them down, 0:28:35.400,0:28:38.820 and we are going to decrease, uh, 0:28:38.820,0:28:40.799 relatively decrease 0:28:40.799,0:28:44.640 the values, all the values, to work 0:28:44.640,0:28:48.120 with lower numbers. And if we are working 0:28:48.120,0:28:49.559 with larger numbers, it's going to take 0:28:49.559,0:28:52.500 more time. If we're working with smaller 0:28:52.500,0:28:54.779 numbers, it's going to take less time to 0:28:54.779,0:28:59.159 calculate them, and that's it. So 0:28:59.159,0:29:02.159 where can I find the normalized data? I 0:29:02.159,0:29:04.260 can find it inside my component. 0:29:04.260,0:29:06.720 So I will choose the component and 0:29:06.720,0:29:09.659 search for "Normalized data". 0:29:09.659,0:29:12.360 I will drag and drop it as usual and I 0:29:12.360,0:29:14.820 will connect between these two things 0:29:14.820,0:29:18.360 by clicking on this spot, this, uh, 0:29:18.360,0:29:20.159 circuit, and 0:29:20.159,0:29:23.159 drag and drop onto the next circuit. 0:29:23.159,0:29:24.899 Now we are going to define our 0:29:24.899,0:29:27.419 normalization method. 0:29:27.419,0:29:31.080 So I'm going to double click on the 0:29:31.080,0:29:32.640 normalized data. 0:29:32.640,0:29:34.860 It's going to open the settings for the 0:29:34.860,0:29:36.480 normalization 0:29:36.480,0:29:38.820 as a better transformation method, which is 0:29:38.820,0:29:40.500 a mathematical way 0:29:40.500,0:29:42.299 that is going to scale our data 0:29:42.299,0:29:44.520 according to. 0:29:44.520,0:29:47.760 We're going to choose min-max, and for 0:29:47.760,0:29:51.539 this one, we are going to choose "Use Zero", 0:29:51.539,0:29:53.100 for constant column we are going to 0:29:53.100,0:29:54.480 choose "True", 0:29:54.480,0:29:56.880 and we are going to define which columns 0:29:56.880,0:29:58.860 to normalize. So we are not going to 0:29:58.860,0:30:01.080 normalize the whole data set. We are 0:30:01.080,0:30:02.760 going to choose a subset from the data 0:30:02.760,0:30:04.559 set to normalize. So we're going to 0:30:04.559,0:30:07.020 choose everything except for the patient 0:30:07.020,0:30:09.000 ID and the diabetic, because the patient 0:30:09.000,0:30:10.919 ID is a number, but it's a categorical 0:30:10.919,0:30:13.740 data. It describes a patient, it's not a 0:30:13.740,0:30:17.460 number that I can sum. I can't say "patient 0:30:17.460,0:30:20.159 ID number one plus patient ID number two". 0:30:20.159,0:30:21.720 No, this is a patient and another 0:30:21.720,0:30:23.399 patient, it's not a number that I can do 0:30:23.399,0:30:25.740 mathematical operations on, so I'm not 0:30:25.740,0:30:28.200 going to choose it. So we will choose 0:30:28.200,0:30:30.539 everything as I said, except for the 0:30:30.539,0:30:33.480 diabetic and the patient ID. I will 0:30:33.480,0:30:34.860 click on "Save". 0:30:34.860,0:30:37.740 And it's not showing me a warning again, 0:30:37.740,0:30:39.480 everything is good. 0:30:39.480,0:30:41.880 Now I can click on "Submit" 0:30:41.880,0:30:46.799 and review my normalization output. 0:30:46.799,0:30:48.240 Um. 0:30:48.240,0:30:51.659 So, if you click on "Submit" here, 0:30:51.659,0:30:54.659 you will choose "Create new" and 0:30:54.659,0:30:56.460 set the name that is mentioned here 0:30:56.460,0:30:59.899 inside the notebook. So it tells you 0:30:59.899,0:31:03.419 to create a job and name it, name 0:31:03.419,0:31:05.460 the experiment "MS Learn Diabetes 0:31:05.460,0:31:06.720 Training", because you will continue 0:31:06.720,0:31:10.160 working on and building component later. 0:31:10.160,0:31:13.020 I have it already created, I am the, uh, 0:31:13.020,0:31:16.919 we can review it together. So let 0:31:16.919,0:31:19.860 me just open this in another tab. I think 0:31:19.860,0:31:21.000 I have it... 0:31:21.000,0:31:23.659 here. 0:31:25.679,0:31:28.220 Okay. 0:31:30.720,0:31:34.740 So, these are all the jobs that I have 0:31:34.740,0:31:37.340 created. 0:31:37.860,0:31:40.119 All the jobs there. Let's do this over. 0:31:40.119,0:31:42.059 These are all the jobs that I have 0:31:42.059,0:31:43.679 submitted previously. 0:31:43.679,0:31:45.840 And I think this one is the 0:31:45.840,0:31:48.360 normalization job, so let's see the 0:31:48.360,0:31:50.100 output of it. 0:31:50.100,0:31:54.120 As you can see, it says, uh, "Check mark", yes, 0:31:54.120,0:31:56.640 which means that it worked, and we can 0:31:56.640,0:31:59.399 preview it. How can I do that? Right click 0:31:59.399,0:32:02.539 on it, choose "Preview data", 0:32:02.539,0:32:06.659 and as you can see all the data are 0:32:06.659,0:32:08.399 scaled down 0:32:08.399,0:32:10.980 so everything is between zero 0:32:10.980,0:32:15.860 and, uh, one I think. 0:32:15.860,0:32:18.899 So everything is good for us. Now we 0:32:18.899,0:32:21.840 can move forward to the next step 0:32:21.840,0:32:26.939 which is to create the whole pipeline. 0:32:26.939,0:32:30.840 So, uh, Carlota told you that 0:32:30.840,0:32:33.179 we're going to use a classification 0:32:33.179,0:32:37.260 model to create this data set, so let 0:32:37.260,0:32:40.620 me just drag and drop everything 0:32:40.620,0:32:43.140 to get runtime and we're doing 0:32:43.140,0:32:46.489 [INDISTINGUISHABLE] 0:32:46.489,0:32:48.469 about everything by 0:32:48.469,0:32:51.419 [INDISTINGUISHABLE] 0:32:51.419,0:32:52.919 So, 0:32:52.919,0:32:55.593 as a result, we are going to explain 0:32:55.593,0:32:59.760 [INDISTINGUISHABLE] 0:32:59.760,0:33:03.600 Yeah. So, I'm going to give this split 0:33:03.600,0:33:06.070 data. I'm going to take the 0:33:06.070,0:33:08.880 transformation data to split data and 0:33:08.880,0:33:10.380 connect it like that. 0:33:10.380,0:33:12.299 I'm going to get three model 0:33:12.299,0:33:15.240 components because I want to train my 0:33:15.240,0:33:16.679 model, 0:33:16.679,0:33:19.740 and I'm going to put it right here. 0:33:19.740,0:33:21.740 Okay. 0:33:21.740,0:33:24.419 Let's just move it down there. Okay. 0:33:24.419,0:33:27.059 And we are going to use a classification 0:33:27.059,0:33:28.620 model, 0:33:28.620,0:33:31.880 a two class 0:33:32.240,0:33:35.399 logistic regression model. 0:33:35.399,0:33:38.159 So I'm going to give this algorithm to 0:33:38.159,0:33:41.480 enable my model to work 0:33:41.820,0:33:45.960 This is the untrained model, this is... 0:33:45.960,0:33:48.059 here. 0:33:48.059,0:33:51.120 The left... 0:33:51.120,0:33:52.860 the left, uh, circuit, I'm going to 0:33:52.860,0:33:54.819 connect it to the data set, and the right 0:33:54.819,0:33:56.940 one, we are going to connect it to 0:33:56.940,0:33:59.700 evaluate model. 0:33:59.700,0:34:02.640 Evaluate model...so let's search for 0:34:02.640,0:34:05.220 "Evaluate model" here. 0:34:05.220,0:34:07.440 So because we want to do what...we want to 0:34:07.440,0:34:10.800 evaluate our model and see how it it has 0:34:10.800,0:34:13.790 been doing. Is it good, is it bad? 0:34:13.790,0:34:18.200 Um, sorry... 0:34:19.980,0:34:22.820 This is... 0:34:23.460,0:34:25.560 this is down there 0:34:25.560,0:34:28.139 after the score model. 0:34:28.139,0:34:31.320 So we have to get the score model first, 0:34:31.320,0:34:33.960 so let's get it. 0:34:33.960,0:34:36.119 And this will take the trained model and 0:34:36.119,0:34:37.260 the data set 0:34:37.260,0:34:39.419 to score our model and see if it's 0:34:39.419,0:34:42.179 performing good or bad. 0:34:42.179,0:34:44.409 And... 0:34:44.409,0:34:47.159 um... 0:34:47.159,0:34:49.080 after that, we have finished 0:34:49.080,0:34:51.920 everything. Now, we are going to do the what? 0:34:52.139,0:34:54.359 The presets for everything. 0:34:54.359,0:34:56.820 As a starter, we will be splitting our 0:34:56.820,0:34:58.920 data. So 0:34:58.920,0:35:01.140 how are we going to do this, according to 0:35:01.140,0:35:03.780 what? To the split rules. So I'm going to 0:35:03.780,0:35:05.940 double-click on it and choose "Split rules". 0:35:05.940,0:35:09.420 And the percentage is 0:35:09.420,0:35:11.780 70 percent for the [INSISTINGUASHABLE] 0:35:11.780,0:35:12.780 and 30 percent of the 0:35:12.780,0:35:14.820 data for 0:35:14.820,0:35:18.420 the valuation or for the scoring, okay? 0:35:18.420,0:35:20.880 I'm going to make it a randomization, so 0:35:20.880,0:35:22.980 I'm going to split data randomly and the 0:35:22.980,0:35:26.060 seat is, uh, 0:35:26.060,0:35:29.339 132, uh 23 I think...yeah. 0:35:29.339,0:35:32.520 And I think that's it. 0:35:32.520,0:35:35.040 The split says why this holds, and that's 0:35:35.040,0:35:36.240 good. 0:35:36.240,0:35:39.540 Now for the next one, which is the train 0:35:39.540,0:35:42.000 model we are going to connect it as 0:35:42.000,0:35:43.500 mentioned here. 0:35:43.500,0:35:48.660 And we have done that and...then why 0:35:48.660,0:35:50.700 am I having here? Let's double click 0:35:50.700,0:35:54.660 on it...yeah. It has...it needs the 0:35:54.660,0:35:57.180 label column that I am trying to predict. 0:35:57.180,0:35:58.680 So from here, I'm going to choose 0:35:58.680,0:36:01.380 diabetic. I'm going to save. 0:36:01.380,0:36:05.180 I'm going to close this one. 0:36:05.520,0:36:07.380 So it says here, 0:36:07.380,0:36:10.619 the diabetic label, the model, it will 0:36:10.619,0:36:12.300 predict the zero and one, because this is 0:36:12.300,0:36:14.700 a binary classification algorithm, so 0:36:14.700,0:36:16.260 it's going to predict either this or 0:36:16.260,0:36:17.520 that. 0:36:17.520,0:36:18.460 And... 0:36:18.460,0:36:20.160 um... 0:36:20.160,0:36:23.880 I think that's everything to run the the 0:36:23.880,0:36:25.859 pipeline. 0:36:25.859,0:36:29.040 So everything is done, everything is good 0:36:29.040,0:36:31.200 for this one. We're just gonna leave it 0:36:31.200,0:36:34.140 for now, because this is the next 0:36:34.140,0:36:35.620 step. 0:36:35.620,0:36:39.839 Um, this will be put instead of the 0:36:39.839,0:36:43.520 score model, but let's... 0:36:44.099,0:36:46.920 let's delete it for now. 0:36:46.920,0:36:49.500 Okay. 0:36:49.500,0:36:52.920 Now we have to submit the job in order 0:36:52.920,0:36:55.680 to see the output of it. So I can click 0:36:55.680,0:36:59.280 on "Submit" and choose the previous job 0:36:59.280,0:37:01.200 which is the one that I have showed you 0:37:01.200,0:37:02.460 before. 0:37:02.460,0:37:05.460 And then let's review its output 0:37:05.460,0:37:06.960 together here. 0:37:06.960,0:37:09.960 So if I go to the jobs, 0:37:09.960,0:37:15.119 if I go to MS Learn, maybe it is training? 0:37:15.119,0:37:18.180 I think it's the one that lasted the 0:37:18.180,0:37:20.640 longest, this one here. 0:37:20.640,0:37:23.700 So here I can see 0:37:23.700,0:37:27.079 the job output, what happened inside 0:37:27.079,0:37:30.420 the model, as you can see. 0:37:30.420,0:37:33.839 So the normalization we have seen 0:37:33.839,0:37:36.540 before, the split data, I can preview it. 0:37:36.540,0:37:39.359 The result one or the result two as it 0:37:39.359,0:37:41.760 splits the data to 70 here and 0:37:41.760,0:37:43.639 thirty percent here. 0:37:43.639,0:37:46.859 Um, I can see the score model, which is 0:37:46.859,0:37:49.140 something that we need 0:37:49.140,0:37:51.530 to review. 0:37:51.530,0:37:56.820 Inside the scroll model, uh, from 0:37:56.820,0:37:57.960 here, 0:37:57.960,0:38:00.960 we can see that... 0:38:00.960,0:38:04.460 let's get back here. 0:38:05.940,0:38:08.220 This is the data that the model has 0:38:08.220,0:38:11.579 been scored and this is a scoring output. 0:38:11.579,0:38:15.300 So it says "code label true", and he is 0:38:15.300,0:38:17.370 not diabetic, so this is, 0:38:17.370,0:38:19.200 um, 0:38:19.200,0:38:21.839 a wrong prediction, let's say. 0:38:21.839,0:38:23.880 For this one it's true and true, and this 0:38:23.880,0:38:26.880 is a good, like, what do you say, 0:38:26.880,0:38:29.460 prediction, and the probabilities of this 0:38:29.460,0:38:30.420 score, 0:38:30.420,0:38:33.119 which means the certainty of our model 0:38:33.119,0:38:36.620 of that this is really true. It's 80 percent. 0:38:36.620,0:38:38.780 For this one it's 75 percent. 0:38:38.780,0:38:42.599 So these are some cool metrics that we 0:38:42.599,0:38:45.359 can review to understand how our model 0:38:45.359,0:38:47.580 is performing. It's performing good for 0:38:47.580,0:38:48.540 now. 0:38:48.540,0:38:53.180 Let's check our evaluation model. 0:38:53.180,0:38:56.700 So this is the extra one that I told you 0:38:56.700,0:38:59.579 about. Instead of the 0:38:59.579,0:39:01.800 score model only, we are going to add 0:39:01.800,0:39:04.260 what evaluate model 0:39:04.260,0:39:06.900 after it. So here 0:39:06.900,0:39:09.420 we're going to go to our Asset Library 0:39:09.420,0:39:12.180 and we are going to choose the evaluate 0:39:12.180,0:39:14.940 model, 0:39:14.940,0:39:17.760 and we are going to put it here, and we 0:39:17.760,0:39:20.220 are going to connect it, and we are going 0:39:20.220,0:39:23.099 to submit the job using the same name of 0:39:23.099,0:39:25.140 the job that we used previously. 0:39:25.140,0:39:29.520 Let's review it. Also, so, after it 0:39:29.520,0:39:33.300 finishes, you will find it here. So I have 0:39:33.300,0:39:35.280 already done it before, this is how I'm 0:39:35.280,0:39:37.380 able to see the output. 0:39:37.380,0:39:40.320 So let's see 0:39:40.320,0:39:43.280 what is the output of this 0:39:43.280,0:39:45.660 evaluation process. 0:39:45.660,0:39:49.800 Here it mentioned to you that there are 0:39:49.800,0:39:51.480 some matrix, 0:39:51.480,0:39:54.839 like the confusion matrix, which Carlotta 0:39:54.839,0:39:57.060 told you about, there is the accuracy, the 0:39:57.060,0:39:59.760 precision, the recall, and F1 Score. 0:39:59.760,0:40:02.339 Every matrix gives us some insight about 0:40:02.339,0:40:04.920 our model. It helps us to understand it 0:40:04.920,0:40:08.579 more, and, um, 0:40:08.579,0:40:10.560 understand if it's overfitting, if 0:40:10.560,0:40:12.240 it's good, if it's bad, and really really, 0:40:12.240,0:40:16.339 like, understand how it's working. 0:40:17.060,0:40:20.400 Now I'm just waiting for the job to load. 0:40:20.400,0:40:22.710 Until it loads, 0:40:22.710,0:40:23.640 um, 0:40:23.640,0:40:26.040 we can continue 0:40:26.040,0:40:28.740 to work on our 0:40:28.740,0:40:31.800 model. So I will go to my designer. I'm 0:40:31.800,0:40:34.740 just going to confirm this. 0:40:34.740,0:40:38.280 And I'm going to continue working on it 0:40:38.280,0:40:39.780 from 0:40:39.780,0:40:42.119 where we have stopped. Where have we 0:40:42.119,0:40:43.560 stopped? 0:40:43.560,0:40:46.440 we have stopped on the evaluate model. So 0:40:46.440,0:40:48.960 I'm going to choose this one. 0:40:48.960,0:40:53.420 And it says here 0:40:54.180,0:40:56.940 "select experiment", "create inference 0:40:56.940,0:40:58.200 pipeline", so 0:40:58.200,0:41:01.079 I am going to go to the jobs, 0:41:01.079,0:41:04.680 I'm going to select my experiment. 0:41:04.680,0:41:06.660 I hope this works. 0:41:06.660,0:41:09.720 Okay. Finally, now we have our 0:41:09.720,0:41:12.180 evaluate model output. 0:41:12.180,0:41:15.480 Let's preview evaluation results 0:41:15.480,0:41:18.660 and, uh... 0:41:18.660,0:41:22.220 come on. 0:41:25.500,0:41:28.020 Finally. Now we can create our inference 0:41:28.020,0:41:31.020 pipeline. So, 0:41:31.020,0:41:33.510 I think it says that... 0:41:33.510,0:41:35.280 um... 0:41:35.280,0:41:38.160 select the experiment, then select MS 0:41:38.160,0:41:39.359 Learn. So, 0:41:39.359,0:41:43.320 I am just going to select it, 0:41:43.320,0:41:48.300 and finally. Now we can, the ROC curve, we 0:41:48.300,0:41:51.000 can see it, that the true positive rate 0:41:51.000,0:41:53.760 and the force was integrate. The false 0:41:53.760,0:41:56.660 positive rate is increasing with time, 0:41:56.660,0:42:01.020 and also the true positive rate. True 0:42:01.020,0:42:03.540 positive is something that it predicted, 0:42:03.540,0:42:06.960 that it is, uh, positive it has diabetes, 0:42:06.960,0:42:09.480 and it's really...it's really true. 0:42:09.480,0:42:12.599 The person really has diabetes. Okay. And 0:42:12.599,0:42:14.760 for the false positive, it predicted that 0:42:14.760,0:42:17.579 someone has diabetes and someone doesn't 0:42:17.579,0:42:20.960 have it. This is what true position and 0:42:20.960,0:42:24.960 false positive means. This is the record 0:42:24.960,0:42:28.020 curve, so we can review the metrics 0:42:28.020,0:42:32.160 of our model. This is the lift curve. I 0:42:32.160,0:42:36.000 can change the threshold of my confusion 0:42:36.000,0:42:37.740 matrix here 0:42:37.740,0:42:39.119 and if Carlotta wants to add 0:42:39.119,0:42:43.920 anything about the...the graphs, 0:42:43.920,0:42:47.000 you can do so. 0:42:50.440,0:42:52.558 [CARLOTTA]: Um, yeah, so I just 0:42:52.558,0:42:54.540 wanted to...if you go...yeah. 0:42:54.540,0:42:57.119 I just wanted to comment for the 0:42:57.119,0:43:00.480 RSC curve, that actually from this 0:43:00.480,0:43:03.900 graph, the metric which usually we're 0:43:03.900,0:43:06.960 going to compute is the area under 0:43:06.960,0:43:09.900 under the curve. And this coefficient or 0:43:09.900,0:43:12.240 metric, 0:43:12.240,0:43:15.060 it's a coefficient— 0:43:15.060,0:43:18.420 it's a value that could span from 0:43:18.420,0:43:23.480 zero to one and the the highest is... 0:43:23.480,0:43:25.970 ...the highest is the the score. 0:43:25.970,0:43:29.220 So the closest one, 0:43:29.220,0:43:32.760 so the the highest is the amount of 0:43:32.760,0:43:35.280 area under this curve. 0:43:35.280,0:43:40.500 The highest performance 0:43:40.500,0:43:42.886 we've got from from our model. 0:43:42.886,0:43:46.440 And another thing is what John is 0:43:46.440,0:43:49.680 playing with. So this threshold for 0:43:49.680,0:43:51.380 the logistic 0:43:51.380,0:43:55.610 regression is the threshold used by the 0:43:55.610,0:43:59.520 model to, um, 0:43:59.520,0:44:02.880 to predict if the category is zero or 0:44:02.880,0:44:05.220 one. So if the probability—the 0:44:05.220,0:44:08.599 probability score is above the threshold, 0:44:08.599,0:44:11.579 then the category will be predicted as 0:44:11.579,0:44:15.359 one, while if the probability is 0:44:15.359,0:44:17.460 below the threshold, in this case, for 0:44:17.460,0:44:21.300 example, 0.5, the category is predicted 0:44:21.300,0:44:23.579 as zero. So that's why it's very 0:44:23.579,0:44:26.473 important to choose the threshold, 0:44:26.473,0:44:28.699 because the performance really can vary, 0:44:28.699,0:44:30.560 um, 0:44:30.560,0:44:34.380 with this threshold value. 0:44:34.380,0:44:41.099 [JOHN]: Thank you so much, Carlotta, and 0:44:41.400,0:44:44.400 as I mentioned now, we are going to 0:44:44.400,0:44:46.560 create our inference pipeline. So we are 0:44:46.560,0:44:48.540 going to select the latest one, which I 0:44:48.540,0:44:50.819 already have it opened here. This is the 0:44:50.819,0:44:52.859 one that we were reviewing together. This 0:44:52.859,0:44:55.500 is where we have stopped, and we're going 0:44:55.500,0:44:57.599 to create an inference pipeline. We are 0:44:57.599,0:44:59.520 going to choose a real-time inference 0:44:59.520,0:45:02.520 pipeline, okay? 0:45:02.520,0:45:05.080 From where I can find this? Here, as it 0:45:05.080,0:45:08.099 says, "Real-time inference pipeline". 0:45:08.099,0:45:10.680 So it's gonna add some things to my 0:45:10.680,0:45:12.240 workspace. It's going to add the 0:45:12.240,0:45:13.713 web service input, it's gonna 0:45:13.713,0:45:15.071 have the web service output, 0:45:15.071,0:45:16.490 because we will be creating 0:45:16.490,0:45:18.180 it as a web service to access 0:45:18.180,0:45:19.740 it from the internet. 0:45:19.740,0:45:21.770 What are we going to do? We're going 0:45:21.770,0:45:24.720 to remove this diabetes data, okay? 0:45:24.720,0:45:27.540 And we are going to get a component 0:45:27.540,0:45:29.359 called "Web 0:45:29.359,0:45:33.180 input" and...let me check 0:45:33.180,0:45:35.940 it's "enter data manually". 0:45:35.940,0:45:38.400 We have...we already have that with input 0:45:38.400,0:45:39.540 present. 0:45:39.540,0:45:42.119 So we are going to get the entire data 0:45:42.119,0:45:43.200 manually, 0:45:43.200,0:45:45.420 and we're going to collect it—to connect 0:45:45.420,0:45:49.560 it as it was connected before, like that. 0:45:49.560,0:45:53.040 And also, I am not going to directly take 0:45:53.040,0:45:55.260 the web service—sorry, escort model to 0:45:55.260,0:45:57.839 the web service output like that. 0:45:57.839,0:46:00.240 I'm going to delete this 0:46:00.240,0:46:03.960 and I'm going to execute a python script 0:46:03.960,0:46:05.880 before 0:46:05.880,0:46:09.500 I display my result. 0:46:10.680,0:46:12.060 So, 0:46:12.060,0:46:17.480 this will be connected like... 0:46:19.260,0:46:20.400 So... 0:46:20.400,0:46:23.599 the other way around. 0:46:23.599,0:46:27.660 And from here, I am going to connect this 0:46:27.660,0:46:30.960 with that and there is some data that 0:46:30.960,0:46:33.480 we will be getting from the node, or from 0:46:33.480,0:46:37.680 the explanation here, and this is the 0:46:37.680,0:46:40.740 data that will be entered to our 0:46:40.740,0:46:44.400 website manually. Okay? This is instead of 0:46:44.400,0:46:47.460 the data that we have been getting from 0:46:47.460,0:46:49.740 our data set that we created. So I'm just 0:46:49.740,0:46:51.960 going to double click on it and choose 0:46:51.960,0:46:55.579 CSV, and I will choose "it has headers", 0:46:55.579,0:47:00.839 and I will take or copy this content and 0:47:00.839,0:47:02.819 put it there, okay? 0:47:02.819,0:47:05.700 So let's do it. 0:47:05.700,0:47:07.920 I think I have to click on edit code, now 0:47:07.920,0:47:10.680 I can click on "Save", and I can close it. 0:47:10.680,0:47:13.079 Another thing which is the python script 0:47:13.079,0:47:16.700 that we will be executing. 0:47:17.099,0:47:17.900 Um, yeah. We 0:47:17.900,0:47:19.380 are going to remove this, also. 0:47:19.380,0:47:20.930 We don't need the evaluate model 0:47:20.930,0:47:24.319 anymore, so we are going to remove it. 0:47:24.319,0:47:25.582 The python script 0:47:25.582,0:47:28.579 that I will be executing, 0:47:28.579,0:47:32.599 I can find it here. 0:47:32.699,0:47:35.760 Um, yeah. 0:47:35.760,0:47:38.640 This is the python script that we will 0:47:38.640,0:47:41.520 execute. And it says to you that this 0:47:41.520,0:47:43.619 code selects only the patient's ID 0:47:43.619,0:47:45.000 the score label, the score 0:47:45.000,0:47:47.700 probability and return—returns them to 0:47:47.700,0:47:49.980 the web service output. So we don't want 0:47:49.980,0:47:51.960 to return all the columns, as we have 0:47:51.960,0:47:53.339 seen previously, 0:47:53.339,0:47:55.560 that determines everything, 0:47:55.560,0:47:56.940 so 0:47:56.940,0:47:59.040 we want to return certain stuff, the 0:47:59.040,0:48:02.940 stuff that we will use inside our 0:48:02.940,0:48:05.640 endpoint. So I'm just going to select 0:48:05.640,0:48:07.980 everything and delete it, and 0:48:07.980,0:48:11.060 paste the code that I have gotten from 0:48:11.060,0:48:14.280 the, uh, 0:48:14.280,0:48:16.500 the Microsoft Learn docs. 0:48:16.500,0:48:19.079 Now I can click on "Save", and I can close 0:48:19.079,0:48:20.280 this. 0:48:20.280,0:48:21.470 Let me check something, 0:48:21.470,0:48:22.950 I don't think it saved. 0:48:22.950,0:48:24.940 It's saved, but the display is 0:48:24.940,0:48:26.160 wrong, okay. 0:48:26.160,0:48:30.300 And now I think everything is good to go. 0:48:30.300,0:48:32.640 I'm just gonna double-check everything. 0:48:32.640,0:48:36.359 So, uh, yeah. We are gonna change the name 0:48:36.359,0:48:38.640 of this 0:48:38.640,0:48:40.800 pipeline, and we are gonna call it 0:48:40.800,0:48:42.780 "Predict 0:48:42.780,0:48:46.319 diabetes", okay? 0:48:46.319,0:48:50.339 Now let's close it, and 0:48:50.339,0:48:56.269 I think that we are good to go. So, 0:48:56.269,0:48:59.300 um, 0:48:59.720,0:49:04.460 Okay, I think everything is good for us. 0:49:06.210,0:49:08.108 I just want to make sure of something. 0:49:08.108,0:49:09.209 Is the data... 0:49:09.209,0:49:12.420 it's correct, the data is...yeah, 0:49:12.420,0:49:13.560 it's correct. 0:49:13.560,0:49:16.319 Okay, now I can run the pipeline. Let's 0:49:16.319,0:49:17.640 submit. 0:49:17.640,0:49:21.000 Select an "existing" pipeline, and we're 0:49:21.000,0:49:21.870 going to choose 0:49:21.870,0:49:23.529 the "ms-learn-diabetes-training", 0:49:23.529,0:49:24.599 which is the pipeline 0:49:24.599,0:49:27.060 that we have been working on 0:49:27.060,0:49:31.619 from the beginning of this module. 0:49:31.619,0:49:33.839 I don't think that this is going to take 0:49:33.839,0:49:36.060 much time. So we have submitted the job 0:49:36.060,0:49:37.319 and it's running. 0:49:37.319,0:49:40.140 Until the job ends, we are going to set 0:49:40.140,0:49:41.720 everything 0:49:41.720,0:49:45.599 for deploying a service. 0:49:45.599,0:49:49.070 In order to deploy a service, 0:49:49.070,0:49:50.520 um, 0:49:50.520,0:49:54.000 I have to have the job ready, so 0:49:54.000,0:49:55.980 until it's ready, you can't deploy it. So 0:49:55.980,0:49:58.319 let's go to the job—the job details from 0:49:58.319,0:50:01.319 here, okay? 0:50:01.319,0:50:05.119 And until it finishes, 0:50:05.119,0:50:07.260 Carlotta, do you think that we can have 0:50:07.260,0:50:09.240 the questions, and then we can get back 0:50:09.240,0:50:12.859 to the job I'm deploying it? 0:50:13.700,0:50:15.119 [CARLOTTA]: Yeah, yeah, yeah. 0:50:15.119,0:50:17.279 So yeah, guys, if you 0:50:17.279,0:50:18.980 have any questions 0:50:18.980,0:50:24.119 on what you just saw here 0:50:24.119,0:50:26.940 or into introductions, feel free. This is 0:50:26.940,0:50:30.300 a good moment, we can...we can discuss 0:50:30.300,0:50:33.900 now, while we wait for this job to 0:50:33.900,0:50:36.260 finish. 0:50:36.260,0:50:38.760 [JOHN]: Uh, and.... 0:50:38.760,0:50:40.220 can... 0:50:40.220,0:50:45.000 we have the knowledge check one? Or, like, 0:50:45.000,0:50:46.360 what do you think? 0:50:46.360,0:50:47.956 [CARLOTTA]: Yeah, we can also go 0:50:47.956,0:50:49.680 to the knowledge check. 0:50:49.680,0:50:50.940 Um... 0:50:50.940,0:50:56.339 Yeah, okay. So let me share my screen. 0:50:56.339,0:50:58.980 Yeah, so if you have not any questions 0:50:58.980,0:51:01.619 for us, we can maybe propose some 0:51:01.619,0:51:04.959 questions to you that you can, 0:51:04.959,0:51:06.240 um, 0:51:06.240,0:51:09.450 check our knowledge so far and you 0:51:09.450,0:51:12.900 can maybe answer to these questions 0:51:12.900,0:51:15.420 via chat. 0:51:15.420,0:51:18.300 So we have...do you see my screen, can 0:51:18.300,0:51:19.859 you see my screen? 0:51:19.859,0:51:21.650 [JOHN]: Yes. 0:51:21.650,0:51:24.440 [CARLOTTA]: So, John, I think I will 0:51:24.440,0:51:25.440 read this 0:51:25.440,0:51:29.040 question aloud and ask it to you, okay? So 0:51:29.040,0:51:32.040 are you ready to answer? 0:51:32.040,0:51:33.660 [JOHN:] Yes I am. 0:51:33.660,0:51:35.460 [CARLOTTA]: So... 0:51:35.460,0:51:37.260 you're using Azure Machine Learning 0:51:37.260,0:51:39.780 designer to create a training pipeline 0:51:39.780,0:51:42.540 for a binary classification model, so 0:51:42.540,0:51:45.300 what we were doing in our demo, 0:51:45.300,0:51:48.059 right? And you have added a data set 0:51:48.059,0:51:51.660 containing features and labels, a Two- 0:51:51.660,0:51:54.359 Class Decision Forest module. So we used 0:51:54.359,0:51:56.819 a logistic regression model our... 0:51:56.819,0:51:57.877 um, in our example. 0:51:57.877,0:51:59.019 Here, we're using a Two- 0:51:59.019,0:52:01.260 Class Decision Forest model. 0:52:01.260,0:52:04.500 And, of course, a Train Model module. You 0:52:04.500,0:52:07.200 plan now to use score model and evaluate 0:52:07.200,0:52:09.480 model modules to test the train model 0:52:09.480,0:52:11.640 with the subset of the data set that 0:52:11.640,0:52:13.500 wasn't used for training. 0:52:13.500,0:52:15.960 But what are we missing? So what's 0:52:15.960,0:52:18.780 another model you should add? We have 0:52:18.780,0:52:21.660 three options: we have Join Data, we have 0:52:21.660,0:52:25.200 Split Data, or we have Select Columns 0:52:25.200,0:52:26.819 in Dataset. 0:52:26.819,0:52:28.260 So 0:52:28.260,0:52:32.040 while John thinks about the answer, 0:52:32.040,0:52:33.599 go ahead and, 0:52:33.599,0:52:34.800 um, 0:52:34.800,0:52:37.800 answer yourself. So give us your 0:52:37.800,0:52:39.540 guess. 0:52:39.540,0:52:41.940 Put it in the chat, or just come off mute 0:52:41.940,0:52:44.900 and answer. 0:52:46.740,0:52:47.785 "A", "B". 0:52:47.785,0:52:49.769 [JOHN]: Yeah, what do you 0:52:49.769,0:52:50.509 is the correct 0:52:50.509,0:52:53.579 answer for this one? I need something to 0:52:53.579,0:52:56.579 uh...I have to score my model, and I 0:52:56.579,0:53:00.359 have to evaluate it, so I need 0:53:00.359,0:53:03.119 something to enable me to do these two 0:53:03.119,0:53:05.359 things. 0:53:06.579,0:53:08.233 [CARLOTTA]: I think it's something 0:53:08.233,0:53:10.640 you showed us in your pipeline, 0:53:10.640,0:53:13.260 right John? 0:53:13.260,0:53:16.819 [JOHN]: Of course I did. 0:53:23.460,0:53:25.122 [CARLOTTA]: Uh, we have no guesses 0:53:25.122,0:53:28.020 in the chat? 0:53:28.020,0:53:30.070 [JOHN]: Can someone... 0:53:30.070,0:53:32.280 Someone want to guess? 0:53:32.280,0:53:35.579 [CARLOTTA]: We have a "B". 0:53:35.579,0:53:38.760 [JOHN]: Uh, maybe. 0:53:38.760,0:53:43.260 So, in order to do this, 0:53:43.260,0:53:46.200 I mentioned the 0:53:46.200,0:53:49.380 the module that is going to help me 0:53:49.380,0:53:52.728 to divide my data into two things: 0:53:52.728,0:53:53.819 70 percent for the 0:53:53.819,0:53:56.220 the training and 30 percent for the 0:53:56.220,0:53:59.339 evaluation. So what did I use? I used 0:53:59.339,0:54:01.859 split data, because this is what is going 0:54:01.859,0:54:05.280 to split my data randomly into training 0:54:05.280,0:54:08.459 data and validation data. So the correct 0:54:08.459,0:54:12.240 answer is "B", and good job. Thank you 0:54:12.240,0:54:13.980 for participating. 0:54:13.980,0:54:17.400 Next question, please. 0:54:17.400,0:54:19.339 [CARLOTTA]: Yes, "B" is the correct 0:54:19.339,0:54:22.559 answer, so thanks, John, 0:54:22.559,0:54:26.040 for explaining to us the correct 0:54:26.040,0:54:26.940 one. 0:54:26.940,0:54:30.420 And we want to go with question two? 0:54:30.420,0:54:33.180 [JOHN]: Yeah, so, [br]I'm going to ask you now, 0:54:33.180,0:54:35.880 Carlotta. You use Azure Machine Learning 0:54:35.880,0:54:38.280 designer to create a training pipeline 0:54:38.280,0:54:40.500 for your classification model. 0:54:40.500,0:54:44.099 What must you do before you deploy this 0:54:44.099,0:54:45.870 model as a service?[br]You have to do 0:54:45.870,0:54:46.634 something before 0:54:46.634,0:54:47.439 you deploy it. 0:54:47.439,0:54:49.740 What do you think is the correct answer? 0:54:49.740,0:54:52.740 Is it "A", "B", or "C"? 0:54:52.740,0:54:55.020 Share your thoughts with— 0:54:55.020,0:54:56.690 with us in the chat and 0:54:56.690,0:55:00.180 and I'm also going to give you some 0:55:00.180,0:55:02.940 minutes to think of it before I 0:55:02.940,0:55:06.020 tell you about it. 0:55:06.020,0:55:07.765 [CARLOTTA]: Yeah so let me go 0:55:07.765,0:55:09.000 through the possible 0:55:09.000,0:55:12.359 answers, right? So we have A: "Create an 0:55:12.359,0:55:14.940 inference pipeline from the training 0:55:14.940,0:55:16.020 pipeline"; 0:55:16.020,0:55:19.260 B: we have "Add an Evaluate Model 0:55:19.260,0:55:22.380 module to the training pipeline; and then 0:55:22.380,0:55:25.079 three, we have "Clone the training 0:55:25.079,0:55:28.380 pipeline with a different name". 0:55:29.520,0:55:31.559 So what do you think is the correct 0:55:31.559,0:55:33.960 answer? "A", "B", or "C"? 0:55:33.960,0:55:36.660 Also this time, I think it's something 0:55:36.660,0:55:39.300 we mentioned both in the decks and in 0:55:39.300,0:55:41.960 the demo right? 0:55:42.599,0:55:44.819 [JOHN]: Yes it is, 0:55:44.819,0:55:46.793 it's something that I have done 0:55:46.793,0:55:50.410 like two, like five minutes ago. 0:55:51.800,0:55:57.200 It's real-time, real-time. 0:55:57.200,0:55:58.760 [CARLOTTA]: Um, 0:55:58.760,0:56:02.040 yeah, so, think about...you need to deploy 0:56:02.040,0:56:05.460 the model as a service. So if I'm 0:56:05.460,0:56:07.980 going to deploy model, 0:56:07.980,0:56:10.380 I cannot evaluate the model 0:56:10.380,0:56:12.839 after deploying it, right, because I 0:56:12.839,0:56:14.940 cannot go into production if I'm not 0:56:14.940,0:56:17.579 sure, I'm not satisfied with my model, and 0:56:17.579,0:56:19.500 I'm not sure that my model is performing 0:56:19.500,0:56:20.280 well. 0:56:20.280,0:56:22.900 So that's why I would go with, 0:56:22.900,0:56:24.319 um, 0:56:24.319,0:56:30.480 I would...exclude "B" from my 0:56:30.480,0:56:31.520 answer. 0:56:31.520,0:56:33.419 While 0:56:33.419,0:56:36.960 thinking about "C", uh, I don't see you—I 0:56:36.960,0:56:39.480 didn't see you, John, cloning the 0:56:39.480,0:56:41.420 training Pipeline with a different name, 0:56:41.420,0:56:44.640 so I don't think this is the 0:56:44.640,0:56:46.920 right answer. 0:56:46.920,0:56:49.619 While I've seen you creating an 0:56:49.619,0:56:52.729 inference pipeline from the 0:56:52.729,0:56:54.830 training pipeline, and you just converted 0:56:54.830,0:56:59.280 it using a one-click button, right? 0:56:59.280,0:57:01.400 [JOHN]: Yeah, that's correct. 0:57:01.400,0:57:04.280 So this is the right answer. 0:57:04.280,0:57:07.460 Good job. So I created an inference 0:57:07.460,0:57:11.280 real-time pipeline, and it has done. 0:57:11.280,0:57:13.440 It finished—it finished, the job is 0:57:13.440,0:57:18.000 finished. So we can now deploy. 0:57:18.000,0:57:19.400 And... 0:57:19.400,0:57:21.500 Yeah [LAUGHS]. 0:57:21.500,0:57:25.339 Exactly, like, on time. 0:57:25.339,0:57:27.839 Like, it finished two seconds... 0:57:27.839,0:57:30.859 three, four seconds ago [LAUGHS]. 0:57:30.859,0:57:33.119 So, uh, 0:57:33.119,0:57:36.480 until, um... 0:57:36.480,0:57:39.839 This is my job review, so 0:57:39.839,0:57:43.260 this is the job details that I 0:57:43.260,0:57:45.540 have already submitted, it's just opening, 0:57:45.540,0:57:47.459 and once it opens... 0:57:47.459,0:57:50.180 um... 0:57:50.400,0:57:52.740 I don't know why it's so heavy 0:57:52.740,0:57:56.780 today, it's not like that usually. 0:57:57.780,0:58:00.020 [CARLOTTA]: Yeah, it's probably because 0:58:00.020,0:58:01.020 you are also 0:58:01.020,0:58:06.000 showing your your screen on Teams, 0:58:06.000,0:58:08.160 so that's the bandwidth of your 0:58:08.160,0:58:08.944 connection. 0:58:08.944,0:58:10.740 [JOHN]: Let me do something here 0:58:10.740,0:58:13.740 because...yeah finally. 0:58:13.740,0:58:16.440 I can switch to my mobile internet if it 0:58:16.440,0:58:18.599 did it again. So I will click on "Deploy", 0:58:18.599,0:58:20.700 it's that simple. I'll just click on 0:58:20.700,0:58:23.040 "Deploy" and... 0:58:23.040,0:58:25.619 I am going to deploy a new real-time 0:58:25.619,0:58:27.960 endpoint. 0:58:27.960,0:58:30.300 So what I'm going to name it? 0:58:30.300,0:58:31.870 Description and the compute type. 0:58:31.870,0:58:33.372 Everything is already mentioned 0:58:33.372,0:58:34.140 for me here, 0:58:34.140,0:58:36.240 so I'm just gonna copy and paste it, 0:58:36.240,0:58:38.940 because we...we are running 0:58:38.940,0:58:41.280 out of time. 0:58:41.280,0:58:44.230 So it's all Azure Container Instance, 0:58:44.230,0:58:46.360 not Azure Kubernetes Service, 0:58:46.360,0:58:48.720 which is a containerization service also. 0:58:48.720,0:58:50.867 Both are for containerization, but this 0:58:50.867,0:58:53.613 gives you something, and this gives you[br]something else. 0:58:53.613,0:58:54.960 For the advanced options, 0:58:54.960,0:58:57.420 it doesn't say for us to do anything, so 0:58:57.420,0:59:00.420 we are just gonna click on "Deploy", 0:59:00.420,0:59:05.220 and now we can test our endpoint from 0:59:05.220,0:59:07.859 the endpoints that we can find here, so 0:59:07.859,0:59:11.460 it's in progress. If I go here 0:59:11.460,0:59:13.799 under the assets, I can find something 0:59:13.799,0:59:16.680 called "Endpoints", and I can find the 0:59:16.680,0:59:18.599 real-time ones and the batch endpoints. 0:59:18.599,0:59:22.020 And we have created a real-time endpoint, 0:59:22.020,0:59:25.260 so we are going to find it under this 0:59:25.260,0:59:29.760 title. So if I click on it, I should 0:59:29.760,0:59:32.640 be able to test it once it's ready. 0:59:32.640,0:59:37.200 It's still loading, but this is the 0:59:37.200,0:59:40.980 input, and this is the output that we 0:59:40.980,0:59:44.652 will get back, so if I click on "Test"... 0:59:44.652,0:59:46.886 and from here, 0:59:46.886,0:59:49.810 I will input some data to the 0:59:49.810,0:59:50.900 endpoint, 0:59:50.900,0:59:54.599 which are: the patient information; the 0:59:54.599,0:59:57.119 columns that we have already seen in our 0:59:57.119,1:00:00.380 data set; the patient ID; the pregnancies. 1:00:00.380,1:00:03.960 And of course, of course I'm not gonna 1:00:03.960,1:00:05.940 enter the label that I'm trying to 1:00:05.940,1:00:08.099 predict, so I'm not going to give him if 1:00:08.099,1:00:10.360 the patient is diabetic or not. This 1:00:10.360,1:00:12.665 endpoint is to tell me this. 1:00:12.665,1:00:14.599 The endpoint, or the URL, 1:00:14.599,1:00:15.529 is going to give me 1:00:15.529,1:00:17.640 back this information, whether someone 1:00:17.640,1:00:22.680 has diabetes, or he doesn't. So if I input 1:00:22.680,1:00:24.780 this data, I'm just going to copy it, 1:00:24.780,1:00:27.780 and go to my endpoint, and click on 1:00:27.780,1:00:30.180 "Test", I'm gonna give the result pack, 1:00:30.180,1:00:32.359 which are the three columns that we have 1:00:32.359,1:00:35.520 defined inside our python script: the 1:00:35.520,1:00:37.859 patient ID, the diabetic prediction, and 1:00:37.859,1:00:41.040 the probability—the certainty of whether 1:00:41.040,1:00:45.720 someone is diabetic or not based on the... 1:00:45.720,1:00:49.090 uh...based on the prediction. 1:00:49.090,1:00:50.660 So that's it. 1:00:50.660,1:00:54.359 And, uh, I think that this is a really 1:00:54.359,1:00:56.729 simple step to do, you can do it on your 1:00:56.729,1:00:58.380 own, you can test it. 1:00:58.380,1:01:01.140 And I think that I have finished, so 1:01:01.140,1:01:03.020 thank you. 1:01:03.020,1:01:04.206 [CARLOTTA]: Uh, yes, 1:01:04.206,1:01:06.069 we are running out of time 1:01:06.069,1:01:09.780 I just wanted to thank you, John, for 1:01:09.780,1:01:12.299 this demo, for going through all these 1:01:12.299,1:01:13.429 steps to 1:01:13.429,1:01:16.740 um, create, train a classification model, 1:01:16.740,1:01:19.680 and also deploy it as a predictive 1:01:19.680,1:01:22.880 service. And I encourage you all to go 1:01:22.880,1:01:25.079 back to the learn module 1:01:25.079,1:01:28.260 and, um, deepen all these topics 1:01:28.260,1:01:31.760 at your own pace, and also maybe 1:01:31.760,1:01:34.799 uh do this demo on your own, on your 1:01:34.799,1:01:37.140 subscription on your Azure for Student 1:01:37.140,1:01:39.359 subscription. Um... 1:01:39.359,1:01:43.200 And I would also like to recall that 1:01:43.200,1:01:46.140 this is part of a series of study 1:01:46.140,1:01:49.500 sessions of Cloud Skill Challenge study 1:01:49.500,1:01:51.059 sessions, 1:01:51.059,1:01:54.059 so you will have more in the... 1:01:54.059,1:01:57.540 in the following days, and this is for 1:01:57.540,1:02:00.480 you to prepare, let's say, to help you 1:02:00.480,1:02:04.880 in taking the Cloud Skills Challenge, 1:02:04.880,1:02:07.040 which collect 1:02:07.040,1:02:10.599 a very interesting learn module that you 1:02:10.599,1:02:14.540 can use to scale up on various topics, 1:02:14.540,1:02:18.359 and some of them are focused on AI and 1:02:18.359,1:02:20.819 ML. So if you are interested in these 1:02:20.819,1:02:23.099 topics, you can select these these learn 1:02:23.099,1:02:24.780 modules. 1:02:24.780,1:02:27.660 So let me also copy 1:02:27.660,1:02:29.669 the link, the short link to the 1:02:29.669,1:02:32.420 challenge in the chat. Remember that 1:02:32.420,1:02:34.980 you have time until the 13th of 1:02:34.980,1:02:37.980 September to take the challenge. And also 1:02:37.980,1:02:40.440 remember that in October, on the 7th of 1:02:40.440,1:02:43.020 October, you have the—you can join the 1:02:43.020,1:02:46.619 student—the Student Developer Summit, 1:02:46.619,1:02:50.480 which is, uh, which will be a virtual or 1:02:50.480,1:02:53.220 in...for some for some cases a hybrid 1:02:53.220,1:02:55.880 event, so stay tuned, because you will 1:02:55.880,1:02:58.559 have some surprises in the following 1:02:58.559,1:03:01.260 days. And if you want to learn more about 1:03:01.260,1:03:03.480 this event you can check the Microsoft 1:03:03.480,1:03:08.099 Imaging Cap Twitter page and stay tuned. 1:03:08.099,1:03:11.230 So thank you everyone for joining 1:03:11.230,1:03:12.989 this session today, and thank you very 1:03:12.989,1:03:16.500 much, John, for co-hosting with this 1:03:16.500,1:03:20.359 session with me. It was a pleasure. 1:03:21.227,1:03:22.838 [JOHN]: Thank you so much, 1:03:22.838,1:03:23.969 Carlotta, for having me 1:03:23.969,1:03:26.249 with you today, and thank you for 1:03:26.249,1:03:27.670 giving me this opportunity to 1:03:27.670,1:03:30.180 be with you here. 1:03:30.180,1:03:32.070 [CARLOTTA]: Great, thank you. 1:03:32.070,1:03:33.420 [JOHN]: Yeah, I hope that we 1:03:33.420,1:03:35.390 work again in the future. 1:03:35.390,1:03:37.880 [CARLOTTA]: Sure, I hope so as well. 1:03:37.880,1:03:40.700 Um, so, thank you everyone. 1:03:40.700,1:03:43.749 And have a nice rest of your day. 1:03:44.099,1:03:46.500 Bye-bye. Speak to you soon. 1:03:46.500,1:03:48.920 [JOHN]: Bye.