1 00:00:01,920 --> 00:00:04,072 [CARLOTTA]: Great, so I think we can start 2 00:00:04,072 --> 00:00:06,340 since the meeting is recorded, so if 3 00:00:06,340 --> 00:00:10,090 everyone, uh, jump-jumps in later, they 4 00:00:10,090 --> 00:00:12,420 can watch the recording. 5 00:00:12,420 --> 00:00:15,780 So, hi everyone and welcome to this 6 00:00:15,780 --> 00:00:18,000 um, Cloud Skill Challenge study session 7 00:00:18,000 --> 00:00:20,880 around a create classification models 8 00:00:20,880 --> 00:00:24,000 with Azure Machine learning designer. 9 00:00:24,000 --> 00:00:27,240 So today I'm thrilled to be here with 10 00:00:27,240 --> 00:00:29,149 John. Uh, John do you mind 11 00:00:29,149 --> 00:00:31,619 introduce briefly yourself? 12 00:00:31,619 --> 00:00:33,160 [JOHN]: Uh, thank you Carlotta. 13 00:00:33,160 --> 00:00:34,160 Hello everyone. 14 00:00:34,160 --> 00:00:38,080 Welcome to our workshop today. I hope 15 00:00:38,080 --> 00:00:40,559 that you are all excited for it. I am 16 00:00:40,559 --> 00:00:43,140 John Aziz, a gold Microsoft Learn student 17 00:00:43,140 --> 00:00:47,460 ambassador, and I will be here with, uh, 18 00:00:47,460 --> 00:00:50,760 Carlotta to do the practical part 19 00:00:50,760 --> 00:00:53,820 about this module of the Cloud Skills 20 00:00:53,820 --> 00:00:56,623 Challenge. Thank you for having me. 21 00:00:56,623 --> 00:00:58,219 [CARLOTTA]: Perfect, thanks John. 22 00:00:58,219 --> 00:00:59,623 So for those who 23 00:00:59,623 --> 00:01:03,440 don't know me, I'm Carlotta Castelluccio, 24 00:01:03,440 --> 00:01:06,479 based in Italy and focused on AI 25 00:01:06,479 --> 00:01:08,760 machine learning technologies and about 26 00:01:08,760 --> 00:01:11,200 the use in education. 27 00:01:11,200 --> 00:01:12,340 Um, so, 28 00:01:12,737 --> 00:01:14,537 um this Cloud Skill Challenge study 29 00:01:14,537 --> 00:01:17,117 session is based on a learn module, a 30 00:01:17,120 --> 00:01:21,080 dedicated learn module. I sent to you, uh 31 00:01:21,320 --> 00:01:23,939 the link to this module, uh, in the chat 32 00:01:23,939 --> 00:01:25,619 in a way that you can follow along the 33 00:01:25,619 --> 00:01:28,680 module if you want, or just have a look at 34 00:01:28,680 --> 00:01:32,470 the module later at your own pace. 35 00:01:32,470 --> 00:01:33,780 Um... 36 00:01:33,780 --> 00:01:37,020 So, before starting I would also like to 37 00:01:37,020 --> 00:01:40,619 remember to remember you, uh, the code of 38 00:01:40,619 --> 00:01:43,439 conduct and guidelines of our student 39 00:01:43,439 --> 00:01:47,510 ambassadors community. So please during this 40 00:01:47,510 --> 00:01:51,000 meeting be respectful and inclusive and 41 00:01:51,000 --> 00:01:53,579 be friendly, open, and welcoming and 42 00:01:53,579 --> 00:01:56,159 respectful of other-each other 43 00:01:56,159 --> 00:01:57,720 differences. 44 00:01:57,720 --> 00:02:01,200 If you want to learn more about the code 45 00:02:01,200 --> 00:02:03,390 of conduct, you can use this link in the 46 00:02:03,390 --> 00:02:08,880 deck: aka.ms/SACoC. 47 00:02:09,660 --> 00:02:11,730 And now we are, 48 00:02:11,730 --> 00:02:15,420 um, we are ready to to start our session. 49 00:02:15,420 --> 00:02:18,959 So as we mentioned it we are going to 50 00:02:18,959 --> 00:02:21,980 focus on classification models and Azure ML, 51 00:02:21,980 --> 00:02:24,900 uh, today. So, first of all, we are going 52 00:02:24,900 --> 00:02:28,430 to, um, identify, uh, the kind of 53 00:02:28,430 --> 00:02:31,080 um, of scenarios in which you should 54 00:02:31,080 --> 00:02:34,490 choose to use a classification model. 55 00:02:34,490 --> 00:02:36,660 We're going to introduce Azure Machine 56 00:02:36,660 --> 00:02:39,060 Learning and Azure Machine Designer. 57 00:02:39,060 --> 00:02:41,879 We're going to understand, uh, which are 58 00:02:41,879 --> 00:02:43,680 the steps to follow, to create a 59 00:02:43,680 --> 00:02:46,200 classification model in Azure Machine 60 00:02:46,200 --> 00:02:48,076 Learning, and then John will, 61 00:02:48,076 --> 00:02:49,500 um, 62 00:02:49,500 --> 00:02:52,219 lead an amazing demo about training and 63 00:02:52,219 --> 00:02:54,300 publishing a classification model in 64 00:02:54,300 --> 00:02:57,000 Azure ML Designer. 65 00:02:57,000 --> 00:02:59,819 So, let's start from the beginning. Let's 66 00:02:59,819 --> 00:03:02,640 start from identifying classification 67 00:03:02,640 --> 00:03:05,220 machine learning scenarios. 68 00:03:05,220 --> 00:03:07,640 So, first of all, what is classification? 69 00:03:07,640 --> 00:03:09,959 Classification is a form of machine 70 00:03:09,959 --> 00:03:12,120 learning that is used to predict which 71 00:03:12,120 --> 00:03:15,599 category or class an item belongs to. For 72 00:03:15,599 --> 00:03:17,340 example, we might want to develop a 73 00:03:17,340 --> 00:03:19,800 classifier able to identify if an 74 00:03:19,800 --> 00:03:22,200 incoming email should be filtered or not 75 00:03:22,200 --> 00:03:25,080 according to the style, the sender, the 76 00:03:25,080 --> 00:03:26,935 length of the email, etc. 77 00:03:26,935 --> 00:03:28,140 In this case, the 78 00:03:28,140 --> 00:03:30,060 characteristics of the email are the 79 00:03:30,060 --> 00:03:31,080 features. 80 00:03:31,080 --> 00:03:34,200 And the label is a classification of 81 00:03:34,200 --> 00:03:38,099 either a zero or one, representing a spam 82 00:03:38,099 --> 00:03:40,860 or non-spam for the incoming email. So 83 00:03:40,860 --> 00:03:42,360 this is an example of a binary 84 00:03:42,360 --> 00:03:44,400 classifier. If you want to assign 85 00:03:44,400 --> 00:03:46,260 multiple categories to the incoming 86 00:03:46,260 --> 00:03:48,959 email like work letters, love letters, 87 00:03:48,959 --> 00:03:52,080 complaints, or other categories, in this 88 00:03:52,080 --> 00:03:54,000 case a binary classifier is no longer 89 00:03:54,000 --> 00:03:55,739 enough, and we should develop a 90 00:03:55,739 --> 00:03:58,319 multi-class classifier. So classification 91 00:03:58,319 --> 00:04:00,599 is an example of what is called 92 00:04:00,599 --> 00:04:02,519 supervised machine learning 93 00:04:02,519 --> 00:04:05,280 in which you train a model using data 94 00:04:05,280 --> 00:04:07,080 that includes both the features and 95 00:04:07,080 --> 00:04:08,879 known values for label 96 00:04:08,879 --> 00:04:11,099 so that the model learns to fit the 97 00:04:11,099 --> 00:04:13,560 feature combinations to the label. Then, 98 00:04:13,560 --> 00:04:15,420 after training has been completed, you 99 00:04:15,420 --> 00:04:17,040 can use the train model to predict 100 00:04:17,040 --> 00:04:19,500 labels for new items for-for which the 101 00:04:19,500 --> 00:04:22,320 label is unknown. 102 00:04:22,320 --> 00:04:25,440 But let's see some examples of scenarios 103 00:04:25,440 --> 00:04:27,120 for classification machine learning 104 00:04:27,120 --> 00:04:29,160 models. So, we already mentioned an 105 00:04:29,160 --> 00:04:31,020 example of a solution in which we would 106 00:04:31,020 --> 00:04:33,660 need a classifier, but let's explore 107 00:04:33,660 --> 00:04:35,699 other scenarios for classification in 108 00:04:35,699 --> 00:04:37,979 other industries. For example, you can use 109 00:04:37,979 --> 00:04:40,380 a classification model for a health 110 00:04:40,380 --> 00:04:43,680 clinic scenario, and use clinical data to 111 00:04:43,680 --> 00:04:45,720 predict whether patient will become sick 112 00:04:45,720 --> 00:04:47,060 or not. 113 00:04:47,060 --> 00:04:49,553 You can use, um... 114 00:04:49,553 --> 00:04:59,250 [NO AUDIO] 115 00:04:59,250 --> 00:05:00,930 [JOHN]: Carlotta, you are muted. 116 00:05:03,780 --> 00:05:07,700 [CARLOTTA]: Oh, sorry. So, when I became muted, it's a 117 00:05:07,700 --> 00:05:08,807 long time, or? 118 00:05:08,807 --> 00:05:11,940 [JOHN]: You can use-you can use, uh 119 00:05:11,940 --> 00:05:13,430 some models for classification. 120 00:05:13,430 --> 00:05:14,729 For example, you can use... 121 00:05:14,729 --> 00:05:16,919 You were saying this. 122 00:05:16,919 --> 00:05:20,020 [CARLOTTA]: Uh, so I was in this deck, 123 00:05:20,020 --> 00:05:21,660 or the previous one? 124 00:05:21,660 --> 00:05:24,180 [JOHN]: This one, you have been muted 125 00:05:24,180 --> 00:05:25,901 for, uh, one second [LAUGHS]. 126 00:05:25,901 --> 00:05:28,018 [CARLOTTA]: Okay, okay perfect, perfect. 127 00:05:28,018 --> 00:05:30,419 Uh, yeah I was talking...sorry for 128 00:05:30,419 --> 00:05:33,278 that. So, I was talking about the possible 129 00:05:33,278 --> 00:05:34,560 scenarios in which you, 130 00:05:34,560 --> 00:05:37,320 you can use a classification model. Like 131 00:05:37,320 --> 00:05:39,660 have clinic scenario, financial scenario, 132 00:05:39,660 --> 00:05:41,699 or the third one is business type of 133 00:05:41,699 --> 00:05:44,100 scenario. You can use characteristics of 134 00:05:44,100 --> 00:05:45,900 small business to predict if a new 135 00:05:45,900 --> 00:05:47,880 venture will succeed or not, for 136 00:05:47,880 --> 00:05:49,560 example. And these are all types of 137 00:05:49,560 --> 00:05:52,160 binary classification. 138 00:05:52,160 --> 00:05:55,199 Uh, but today we are also going to talk 139 00:05:55,199 --> 00:05:57,240 about Azure Machine Learning. So let's 140 00:05:57,240 --> 00:05:58,139 see. 141 00:05:58,139 --> 00:06:00,660 What is Azure Machine Learning? So 142 00:06:00,660 --> 00:06:02,160 training and deploying an effective 143 00:06:02,160 --> 00:06:04,199 machine learning model involves a lot of 144 00:06:04,199 --> 00:06:06,539 work, much of it time-consuming and 145 00:06:06,539 --> 00:06:08,880 resource intensive. So, Azure Machine 146 00:06:08,880 --> 00:06:11,039 Learning is a cloud-based service that 147 00:06:11,039 --> 00:06:12,780 helps simplify some of the tasks it 148 00:06:12,780 --> 00:06:15,720 takes to prepare data, train a model, and 149 00:06:15,720 --> 00:06:18,060 also deploy it as a predictive service. 150 00:06:18,060 --> 00:06:20,220 So it helps that the scientists increase 151 00:06:20,220 --> 00:06:22,380 their efficiency by automating many of 152 00:06:22,380 --> 00:06:24,660 the time-consuming tasks associated to 153 00:06:24,660 --> 00:06:27,539 creating and training a model. 154 00:06:27,539 --> 00:06:29,520 And it enables them also to use 155 00:06:29,520 --> 00:06:31,740 cloud-based compute resources that scale 156 00:06:31,740 --> 00:06:33,720 effectively to handle large volumes of 157 00:06:33,720 --> 00:06:36,300 data while incurring costs only when 158 00:06:36,300 --> 00:06:38,699 actually used. 159 00:06:38,699 --> 00:06:41,220 To use Azure Machine Learning, you, 160 00:06:41,220 --> 00:06:43,199 first thing's first, you need to create a 161 00:06:43,199 --> 00:06:44,940 workspace resource in your Azure 162 00:06:44,940 --> 00:06:47,520 subscription, and you can then use these 163 00:06:47,520 --> 00:06:50,220 workspace to manage data, compute 164 00:06:50,220 --> 00:06:52,440 resources, code models and other 165 00:06:52,440 --> 00:06:54,959 artifacts after you have created an 166 00:06:54,959 --> 00:06:56,519 Azure Machine Learning workspace, 167 00:06:56,519 --> 00:06:57,808 you can develop solutions with the 168 00:06:57,808 --> 00:06:59,338 Azure Machine Learning service, 169 00:06:59,338 --> 00:07:00,840 either with developer 170 00:07:00,840 --> 00:07:02,580 tools or the Azure Machine Learning 171 00:07:02,580 --> 00:07:04,088 studio web portal. 172 00:07:04,088 --> 00:07:06,440 In particular, Azure Machine Learning studio 173 00:07:06,440 --> 00:07:07,800 is a web portal for machine 174 00:07:07,800 --> 00:07:09,720 learning solutions in Azure, and it 175 00:07:09,720 --> 00:07:11,639 includes a wide range of features and 176 00:07:11,639 --> 00:07:13,800 capabilities that help data scientists 177 00:07:13,800 --> 00:07:16,259 prepare data, train models, publish 178 00:07:16,259 --> 00:07:18,479 predictive services, and monitor also 179 00:07:18,479 --> 00:07:19,680 their usage. 180 00:07:19,680 --> 00:07:22,139 So to begin using the web portal, you need 181 00:07:22,139 --> 00:07:23,294 to assign the workspace 182 00:07:23,294 --> 00:07:24,781 you created in the Azure portal 183 00:07:24,781 --> 00:07:26,819 to the Azure Machine 184 00:07:26,819 --> 00:07:29,520 Learning studio. 185 00:07:29,520 --> 00:07:31,800 At its core, Azure Machine Learning is a 186 00:07:31,800 --> 00:07:33,720 service for training and managing 187 00:07:33,720 --> 00:07:36,000 machine learning models for which you 188 00:07:36,000 --> 00:07:38,220 need compute resources on which to run 189 00:07:38,220 --> 00:07:39,919 the training process. 190 00:07:39,919 --> 00:07:44,280 Compute targets are, um, one of the main 191 00:07:44,280 --> 00:07:46,740 basic concepts of Azure Machine Learning. 192 00:07:46,740 --> 00:07:48,780 They are cloud-based resources on which 193 00:07:48,780 --> 00:07:50,639 you can run model training and data 194 00:07:50,639 --> 00:07:53,220 exploration processes. 195 00:07:53,220 --> 00:07:54,780 So in Azure Machine Learning studio, you 196 00:07:54,780 --> 00:07:56,759 can manage the compute targets for your 197 00:07:56,759 --> 00:07:58,740 data science activities, and there are 198 00:07:58,740 --> 00:08:03,240 four kinds of of compute targets you can 199 00:08:03,240 --> 00:08:05,940 create. We have the compute instances, 200 00:08:05,940 --> 00:08:09,539 which are vital machine set up for 201 00:08:09,539 --> 00:08:10,979 running machine learning code during 202 00:08:10,979 --> 00:08:13,319 development, so they are not designed for 203 00:08:13,319 --> 00:08:14,460 production. 204 00:08:14,460 --> 00:08:17,099 Then we have compute clusters, which are 205 00:08:17,099 --> 00:08:19,800 a set of virtual machines that can scale 206 00:08:19,800 --> 00:08:22,199 up automatically based on traffic. 207 00:08:22,199 --> 00:08:24,599 We have inference clusters, which are 208 00:08:24,599 --> 00:08:26,699 similar to compute clusters, but they are 209 00:08:26,699 --> 00:08:29,340 designed for deployment, so they are 210 00:08:29,340 --> 00:08:31,979 deployment targets for predictive 211 00:08:31,979 --> 00:08:35,820 services that use trained models. 212 00:08:35,820 --> 00:08:38,339 And finally, we have attached compute, 213 00:08:38,339 --> 00:08:41,339 which are any compute target that you 214 00:08:41,339 --> 00:08:44,159 manage yourself outside of Azure ML, like, 215 00:08:44,159 --> 00:08:46,560 for example, virtual machines or Azure 216 00:08:46,560 --> 00:08:49,700 data bricks clusters. 217 00:08:49,980 --> 00:08:52,800 So we talked about Azure Machine 218 00:08:52,800 --> 00:08:54,300 Learning, but we also mentioned- 219 00:08:54,300 --> 00:08:55,500 mentioned Azure Machine Learning 220 00:08:55,500 --> 00:08:57,540 designer. What is Azure Machine Learning 221 00:08:57,540 --> 00:09:00,120 designer? So, in Azure Machine Learning 222 00:09:00,120 --> 00:09:02,880 Studio, there are several ways to author 223 00:09:02,880 --> 00:09:04,560 classification machine learning models. 224 00:09:04,560 --> 00:09:08,100 One way is to use a visual interface, and 225 00:09:08,100 --> 00:09:10,260 this visual interface is called designer, 226 00:09:10,260 --> 00:09:13,140 and you can use it to train, test, and 227 00:09:13,140 --> 00:09:15,540 also deploy machine learning models. And 228 00:09:15,540 --> 00:09:17,940 the drag-and-drop interface makes use of 229 00:09:17,940 --> 00:09:20,279 clearly defined inputs and outputs that 230 00:09:20,279 --> 00:09:22,680 can be shared, reused, and also version 231 00:09:22,680 --> 00:09:23,880 control. 232 00:09:23,880 --> 00:09:25,920 And using the designer, you can identify 233 00:09:25,920 --> 00:09:28,080 the building blocks or components needed 234 00:09:28,080 --> 00:09:30,839 for your model, place and connect them on 235 00:09:30,839 --> 00:09:33,120 your canvas, and run a machine learning 236 00:09:33,120 --> 00:09:35,300 job. 237 00:09:35,399 --> 00:09:36,779 So, 238 00:09:36,779 --> 00:09:39,120 each designer project, so each project 239 00:09:39,120 --> 00:09:42,360 in the designer is known as a pipeline. 240 00:09:42,360 --> 00:09:45,600 And in the design, we have a left panel 241 00:09:45,600 --> 00:09:48,360 for navigation and a canvas on your 242 00:09:48,360 --> 00:09:50,640 right hand side in which you build your 243 00:09:50,640 --> 00:09:53,940 pipeline visually. So pipelines let you 244 00:09:53,940 --> 00:09:56,100 organize, manage, and reuse complex 245 00:09:56,100 --> 00:09:58,260 machine learning workflows across 246 00:09:58,260 --> 00:10:00,480 projects and users. 247 00:10:00,480 --> 00:10:03,000 A pipeline starts with the data set from 248 00:10:03,000 --> 00:10:04,140 which you want to train the model 249 00:10:04,140 --> 00:10:05,880 because all begins with data when 250 00:10:05,880 --> 00:10:07,380 talking about data science and machine 251 00:10:07,380 --> 00:10:09,540 learning. And each time you run a 252 00:10:09,540 --> 00:10:10,980 pipeline, the configuration of the 253 00:10:10,980 --> 00:10:12,959 pipeline and its results are stored in 254 00:10:12,959 --> 00:10:17,339 your workspace as a pipeline job. 255 00:10:17,339 --> 00:10:21,959 So the second main concept of Azure 256 00:10:21,959 --> 00:10:25,080 Machine Learning is a component. So, going 257 00:10:25,080 --> 00:10:28,440 hierarchically from the pipeline, we can 258 00:10:28,440 --> 00:10:30,540 say that each building block of a 259 00:10:30,540 --> 00:10:32,920 pipeline is called a component. 260 00:10:32,920 --> 00:10:34,120 In other words, an Azure Machine 261 00:10:34,120 --> 00:10:36,959 Learning component encapsulates one step 262 00:10:36,959 --> 00:10:39,420 in a machine learning pipeline. So, it's a 263 00:10:39,420 --> 00:10:41,640 reusable piece of code with inputs and 264 00:10:41,640 --> 00:10:44,100 outputs, something very similar to a 265 00:10:44,100 --> 00:10:46,500 function in any programming language. 266 00:10:46,500 --> 00:10:48,899 And in a pipeline project, you can access 267 00:10:48,899 --> 00:10:51,480 data assets and components from the left 268 00:10:51,480 --> 00:10:52,700 panels 269 00:10:52,700 --> 00:10:56,279 Asset Library tab, as you can see 270 00:10:56,279 --> 00:11:00,200 here in the screenshot in the deck. 271 00:11:00,300 --> 00:11:03,360 So you can create data assets on using 272 00:11:03,360 --> 00:11:08,339 an ADOC page called Data Page. And a data 273 00:11:08,339 --> 00:11:11,160 asset is a reference to a data source 274 00:11:11,160 --> 00:11:12,480 location. 275 00:11:12,480 --> 00:11:15,720 So this data source location could be a 276 00:11:15,720 --> 00:11:18,779 local file, a data store, a web file or 277 00:11:18,779 --> 00:11:21,660 even an Azure open asset. 278 00:11:21,660 --> 00:11:23,880 And these data assets will appear along 279 00:11:23,880 --> 00:11:26,459 with standard sample data set in the 280 00:11:26,459 --> 00:11:30,019 designers Asset Library. 281 00:11:30,079 --> 00:11:31,560 Um. 282 00:11:31,560 --> 00:11:36,959 Another basic concept of Azure ML is 283 00:11:36,959 --> 00:11:38,880 Azure Machine Learning jobs. 284 00:11:38,880 --> 00:11:43,519 So, basically, when you submit a pipeline, 285 00:11:43,519 --> 00:11:47,040 you create a job which will run all the 286 00:11:47,040 --> 00:11:49,920 steps in your pipeline. So a job executes 287 00:11:49,920 --> 00:11:52,800 a task against a specified compute 288 00:11:52,800 --> 00:11:53,760 target. 289 00:11:53,760 --> 00:11:56,640 Jobs enable systematic tracking for your 290 00:11:56,640 --> 00:11:58,560 machine learning experimentation in 291 00:11:58,560 --> 00:11:59,880 Azure ML. 292 00:11:59,880 --> 00:12:02,399 And once a job is created, Azure ML 293 00:12:02,399 --> 00:12:05,459 maintains a run record, uh, for the 294 00:12:05,459 --> 00:12:07,640 job. 295 00:12:07,877 --> 00:12:12,180 Um, but, let's move to the classification 296 00:12:12,180 --> 00:12:14,040 steps. So, 297 00:12:14,040 --> 00:12:17,160 um, let's introduce how to create a 298 00:12:17,160 --> 00:12:21,360 classification model in Azure ML, but you 299 00:12:21,360 --> 00:12:23,640 will see it in more details in a 300 00:12:23,640 --> 00:12:26,339 handsome demo that John will guide 301 00:12:26,339 --> 00:12:29,459 through in a few minutes. 302 00:12:29,459 --> 00:12:32,220 So, you can think of the steps to train 303 00:12:32,220 --> 00:12:33,720 and evaluate a classification machine 304 00:12:33,720 --> 00:12:36,660 learning model as four main steps. So 305 00:12:36,660 --> 00:12:38,459 first of all, you need to prepare your 306 00:12:38,459 --> 00:12:41,100 data. So, you need to identify the 307 00:12:41,100 --> 00:12:43,139 features and the label in your data set, 308 00:12:43,139 --> 00:12:46,139 you need to pre-process, so you need to 309 00:12:46,139 --> 00:12:48,839 clean and transform the data as needed. 310 00:12:48,839 --> 00:12:51,120 Then, the second step, of course, is 311 00:12:51,120 --> 00:12:52,740 training the model. 312 00:12:52,740 --> 00:12:54,600 And for training the model, you need to 313 00:12:54,600 --> 00:12:57,060 split the data into two groups: a 314 00:12:57,060 --> 00:12:59,519 training and a validation set. 315 00:12:59,519 --> 00:13:01,320 Then you train a machine learning model 316 00:13:01,320 --> 00:13:03,540 using the training data set and you test 317 00:13:03,540 --> 00:13:05,040 the machine learning model for 318 00:13:05,040 --> 00:13:06,889 performance using the validation data 319 00:13:06,889 --> 00:13:08,100 set. 320 00:13:08,100 --> 00:13:12,180 The third step is performance evaluation, 321 00:13:12,180 --> 00:13:14,519 which means comparing how close the 322 00:13:14,519 --> 00:13:16,139 model's predictions are to the known 323 00:13:16,139 --> 00:13:20,519 labels and these lead us to compute some 324 00:13:20,519 --> 00:13:23,279 evaluation performance metrics. 325 00:13:23,279 --> 00:13:25,740 And then finally... 326 00:13:25,740 --> 00:13:29,051 So, these three steps are not, 327 00:13:29,051 --> 00:13:32,770 um, not performed every time in a 328 00:13:32,770 --> 00:13:35,459 linear manner. It's more an iterative 329 00:13:35,459 --> 00:13:39,420 process. But once you obtain, you achieve 330 00:13:39,420 --> 00:13:42,959 a performance with which you are 331 00:13:42,959 --> 00:13:45,779 satisfied, so you are ready to, let's say 332 00:13:45,779 --> 00:13:48,660 go into production, and you can deploy 333 00:13:48,660 --> 00:13:51,920 your train model as a predictive service 334 00:13:51,920 --> 00:13:55,980 into a real-time, uh, to a real-time 335 00:13:55,980 --> 00:13:58,019 endpoint. And to do so, you need to 336 00:13:58,019 --> 00:14:00,240 convert the training pipeline into a 337 00:14:00,240 --> 00:14:02,820 real-time inference pipeline, and then 338 00:14:02,820 --> 00:14:04,260 you can deploy the model as an 339 00:14:04,260 --> 00:14:06,779 application on a server or device so 340 00:14:06,779 --> 00:14:11,420 that others can consume this model. 341 00:14:11,459 --> 00:14:14,279 So let's start with the first step, which 342 00:14:14,279 --> 00:14:17,700 is prepare data. Real-world data can contain 343 00:14:17,700 --> 00:14:19,920 many different issues that can affect 344 00:14:19,920 --> 00:14:22,320 the utility of the data and our 345 00:14:22,320 --> 00:14:24,959 interpretation of the results. So also 346 00:14:24,959 --> 00:14:26,579 the machine learning model that you 347 00:14:26,579 --> 00:14:29,279 train using this data. For example, real- 348 00:14:29,279 --> 00:14:31,440 world data can be affected by a bad 349 00:14:31,440 --> 00:14:34,079 recording or a bad measurement, and it 350 00:14:34,079 --> 00:14:36,480 can also contain missing values for some 351 00:14:36,480 --> 00:14:38,880 parameters. And Azure Machine Learning 352 00:14:38,880 --> 00:14:40,860 designer has several pre-built 353 00:14:40,860 --> 00:14:43,019 components that can be used to prepare 354 00:14:43,019 --> 00:14:46,079 data for training. These components 355 00:14:46,079 --> 00:14:48,300 enable you to clean data, normalize 356 00:14:48,300 --> 00:14:52,940 features, join tables, and more. 357 00:14:53,000 --> 00:14:57,120 Let's come to training. So, to train a 358 00:14:57,120 --> 00:14:59,220 classification model you need a data set 359 00:14:59,220 --> 00:15:02,160 that includes historical features, so the 360 00:15:02,160 --> 00:15:03,899 characteristics of the entity for which 361 00:15:03,899 --> 00:15:06,899 one to make a prediction, and known label 362 00:15:06,899 --> 00:15:09,779 values. The label is the class indicator 363 00:15:09,779 --> 00:15:11,820 we want to train a model to predict. 364 00:15:11,820 --> 00:15:13,920 And it's common practice to train a 365 00:15:13,920 --> 00:15:16,199 model using a subset of the data while 366 00:15:16,199 --> 00:15:18,300 holding back some data with which to 367 00:15:18,300 --> 00:15:20,760 test the train model. And this enables 368 00:15:20,760 --> 00:15:22,440 you to compare the labels that the model 369 00:15:22,440 --> 00:15:25,380 predicts with the actual known labels in 370 00:15:25,380 --> 00:15:27,420 the original data set. 371 00:15:27,420 --> 00:15:29,880 This operation can be performed in the 372 00:15:29,880 --> 00:15:32,100 designer using the split data component 373 00:15:32,100 --> 00:15:34,740 as shown by the screenshot here in the... 374 00:15:34,740 --> 00:15:36,660 in the deck. 375 00:15:36,660 --> 00:15:39,540 There's also another component that you 376 00:15:39,540 --> 00:15:40,980 should use, which is the score model 377 00:15:40,980 --> 00:15:43,139 component to generate the predicted 378 00:15:43,139 --> 00:15:45,360 class label value using the validation 379 00:15:45,360 --> 00:15:48,060 data as input. So once you connect all 380 00:15:48,060 --> 00:15:49,800 these components, 381 00:15:49,800 --> 00:15:52,440 the component specifying the 382 00:15:52,440 --> 00:15:54,959 model we are going to use, the split data 383 00:15:54,959 --> 00:15:57,060 component, the trained model component, 384 00:15:57,060 --> 00:16:00,300 and the score model component, you want 385 00:16:00,300 --> 00:16:02,639 to run a new experiment in 386 00:16:02,639 --> 00:16:05,760 Azure ML, which will use the data set 387 00:16:05,760 --> 00:16:09,600 on the canvas to train and score a model. 388 00:16:09,600 --> 00:16:12,000 After training a model, it is important, 389 00:16:12,000 --> 00:16:14,639 we say, to evaluate its performance, to 390 00:16:14,639 --> 00:16:17,060 understand how bad-how good sorry 391 00:16:17,060 --> 00:16:20,760 our model is performing. 392 00:16:20,760 --> 00:16:22,680 And there are many performance metrics 393 00:16:22,680 --> 00:16:24,600 and methodologies for evaluating how 394 00:16:24,600 --> 00:16:27,000 well a model makes predictions. The 395 00:16:27,000 --> 00:16:29,160 component to use to perform evaluation 396 00:16:29,160 --> 00:16:32,220 in Azure ML designer is called, as 397 00:16:32,220 --> 00:16:35,060 intuitive as it is, Evaluate Model. 398 00:16:35,060 --> 00:16:38,339 Once the job of training and evaluation 399 00:16:38,339 --> 00:16:40,740 of the model is completed, you can review 400 00:16:40,740 --> 00:16:42,959 evaluation metrics on the completed job 401 00:16:42,959 --> 00:16:45,860 page by right clicking on the component. 402 00:16:45,860 --> 00:16:48,480 In the evaluation results, you can also 403 00:16:48,480 --> 00:16:51,000 find the so-called confusion Matrix that 404 00:16:51,000 --> 00:16:53,399 you can see here in the right side of 405 00:16:53,399 --> 00:16:55,079 this deck 406 00:16:55,079 --> 00:16:57,420 A confusion matrix shows cases where 407 00:16:57,420 --> 00:16:59,220 both the predicted and actual values 408 00:16:59,220 --> 00:17:01,980 were one, the so-called true positives 409 00:17:01,980 --> 00:17:04,500 at the top left and also cases where 410 00:17:04,500 --> 00:17:06,600 both the predicted and the actual values 411 00:17:06,600 --> 00:17:08,459 were zero, the so-called true negatives 412 00:17:08,459 --> 00:17:10,919 at the bottom right. While the other 413 00:17:10,919 --> 00:17:13,679 cells show cases where the predicting 414 00:17:13,679 --> 00:17:15,380 and actual values differ, 415 00:17:15,380 --> 00:17:17,939 called false positive and false 416 00:17:17,939 --> 00:17:19,919 negatives, and this is an example of a 417 00:17:19,919 --> 00:17:23,579 confusion matrix for a binary classifier. 418 00:17:23,579 --> 00:17:25,559 While for a multi-class classification 419 00:17:25,559 --> 00:17:28,079 model the same approach is used to 420 00:17:28,079 --> 00:17:30,120 tabulate each possible combination of 421 00:17:30,120 --> 00:17:32,940 actual and predictive value counts. So 422 00:17:32,940 --> 00:17:34,740 for example, a model with three possible 423 00:17:34,740 --> 00:17:37,559 classes would result in three times 424 00:17:37,559 --> 00:17:39,120 three matrix. 425 00:17:39,120 --> 00:17:41,880 The confusion matrix is also useful for 426 00:17:41,880 --> 00:17:43,860 the matrix that can be derived from it, 427 00:17:43,860 --> 00:17:48,260 like accuracy, recall, or precision. 428 00:17:49,320 --> 00:17:52,080 We say that the last step is 429 00:17:52,080 --> 00:17:55,620 deploying the train model to a real-time 430 00:17:55,620 --> 00:17:59,280 endpoint as a predictive service. And in 431 00:17:59,280 --> 00:18:00,900 order to automate your model into a 432 00:18:00,900 --> 00:18:02,760 service that makes continuous 433 00:18:02,760 --> 00:18:04,980 predictions, you need, first of all, to 434 00:18:04,980 --> 00:18:08,039 create and then deploy an 435 00:18:08,039 --> 00:18:10,080 inference pipeline. The process of 436 00:18:10,080 --> 00:18:11,940 converting the training pipeline into a 437 00:18:11,940 --> 00:18:13,980 real-time inference pipeline removes 438 00:18:13,980 --> 00:18:16,260 training components and adds web service 439 00:18:16,260 --> 00:18:18,960 inputs and outputs to handle requests. 440 00:18:18,960 --> 00:18:21,240 And the inference pipeline performs...they 441 00:18:21,240 --> 00:18:22,679 seem that the transformation is the 442 00:18:22,679 --> 00:18:26,160 first pipeline, but for new data. Then it 443 00:18:26,160 --> 00:18:28,679 uses the train model to infer or predict 444 00:18:28,679 --> 00:18:32,539 label values based on its feature. 445 00:18:32,820 --> 00:18:36,120 So, I think I've talked a lot for now 446 00:18:36,120 --> 00:18:40,380 I would like to let John show us 447 00:18:40,380 --> 00:18:44,340 something in practice with 448 00:18:44,340 --> 00:18:47,280 the hands-on demo, so please, John, go 449 00:18:47,280 --> 00:18:49,860 ahead, share your screen and guide us 450 00:18:49,860 --> 00:18:52,380 through this demo of creating a 451 00:18:52,380 --> 00:18:53,425 classification with 452 00:18:53,425 --> 00:18:55,860 the Azure Machine Learning designer. 453 00:18:55,860 --> 00:18:58,509 [JOHN]: Thank you so much Carlotta for 454 00:18:58,509 --> 00:19:00,690 this interesting explanation of the 455 00:19:00,690 --> 00:19:03,810 Azure ML designer. And now, 456 00:19:03,810 --> 00:19:07,500 um, I'm going to start with you in the 457 00:19:07,500 --> 00:19:10,200 practical demo part, so if you want to 458 00:19:10,200 --> 00:19:13,320 follow along, go to the link that Carlotta 459 00:19:13,320 --> 00:19:18,380 sent in the chat so you can do 460 00:19:18,380 --> 00:19:21,840 the demo or the practical part with me. 461 00:19:21,840 --> 00:19:25,260 I'm just going to share my screen... 462 00:19:25,260 --> 00:19:27,140 and... 463 00:19:27,140 --> 00:19:31,559 ...go here. So, uh... 464 00:19:31,559 --> 00:19:34,320 Where am I right now? I'm inside the 465 00:19:34,320 --> 00:19:36,960 Microsoft Learn documentation. This is 466 00:19:36,960 --> 00:19:40,260 the exercise part of this module, and we 467 00:19:40,260 --> 00:19:43,080 will start by setting two things, which 468 00:19:43,080 --> 00:19:45,299 are a prequisite for us to work inside 469 00:19:45,299 --> 00:19:49,919 this module, which are the users group 470 00:19:49,919 --> 00:19:52,400 and the Azure Machine Learning workspace, 471 00:19:52,400 --> 00:19:55,620 and something extra which is the compute 472 00:19:55,620 --> 00:19:59,760 cluster that Carlotta talked about. So I 473 00:19:59,760 --> 00:20:02,100 just want to make sure that you all have 474 00:20:02,100 --> 00:20:05,660 a resource group created inside your 475 00:20:05,660 --> 00:20:08,039 portal inside your Microsoft Azure 476 00:20:08,039 --> 00:20:11,100 platform. So this is my resource group. 477 00:20:11,100 --> 00:20:14,640 Inside this is this Resource Group. I 478 00:20:14,640 --> 00:20:17,299 have created an Azure Machine Learning 479 00:20:17,299 --> 00:20:21,539 workspace. So I'm just going to access 480 00:20:21,539 --> 00:20:24,000 the workspace that I have created 481 00:20:24,000 --> 00:20:27,000 already from this link. I am going to 482 00:20:27,000 --> 00:20:30,240 open it, which is the studio web URL, and 483 00:20:30,240 --> 00:20:33,000 I will follow the steps. So what is this? 484 00:20:33,000 --> 00:20:35,760 This is your machine learning workspace, 485 00:20:35,760 --> 00:20:38,220 or machine learning studio. You can do a 486 00:20:38,220 --> 00:20:40,080 lot of things here, but we are going to 487 00:20:40,080 --> 00:20:42,419 focus mainly on the designer and the 488 00:20:42,419 --> 00:20:46,080 data and the compute. So another 489 00:20:46,080 --> 00:20:49,140 prerequisite here, as Carlotta told you, 490 00:20:49,140 --> 00:20:51,480 we need some resources to power up the 491 00:20:51,480 --> 00:20:54,299 classification, the processes that 492 00:20:54,299 --> 00:20:55,140 will happen. 493 00:20:55,140 --> 00:20:58,080 So, we have created this computing 494 00:20:58,080 --> 00:20:59,100 cluster, 495 00:20:59,100 --> 00:21:02,880 and we have set some presets for 496 00:21:02,880 --> 00:21:04,140 it. So 497 00:21:04,140 --> 00:21:07,080 where can you find this preset? You go 498 00:21:07,080 --> 00:21:10,200 here. Under the create compute, you'll 499 00:21:10,200 --> 00:21:13,220 find everything that you need to do. So 500 00:21:13,220 --> 00:21:16,740 the size is the Standard DS11 Version 2, 501 00:21:16,740 --> 00:21:19,799 and it's a CPU not GPU, because we don't 502 00:21:19,799 --> 00:21:22,500 know the GPU, and we don't need a GPU. 503 00:21:22,500 --> 00:21:25,799 Uh, it is ready for us to use. 504 00:21:25,799 --> 00:21:30,900 The next thing which we will look into 505 00:21:30,900 --> 00:21:33,600 is the designer. How can you access the 506 00:21:33,600 --> 00:21:35,100 designer? 507 00:21:35,100 --> 00:21:37,679 You can either click on this icon or 508 00:21:37,679 --> 00:21:40,020 click on the navigation menu and click 509 00:21:40,020 --> 00:21:42,299 on the designer for me. 510 00:21:42,900 --> 00:21:45,780 Now I am inside my designer. 511 00:21:45,780 --> 00:21:47,640 What we are going to do now is the 512 00:21:47,640 --> 00:21:50,280 pipeline that Carlotta told you about. 513 00:21:50,280 --> 00:21:54,360 And from where can I know these steps? If 514 00:21:54,360 --> 00:21:57,120 you follow along in the learn module, you 515 00:21:57,120 --> 00:21:58,740 will find everything that I'm doing 516 00:21:58,740 --> 00:22:02,340 right now in detail, with screenshots 517 00:22:02,340 --> 00:22:05,820 of course. So I'm going to create a new 518 00:22:05,820 --> 00:22:09,120 pipeline, and I can do so by clicking on 519 00:22:09,120 --> 00:22:10,980 this plus button. 520 00:22:10,980 --> 00:22:13,740 It's going to redirect me to the 521 00:22:13,740 --> 00:22:17,100 designer authoring the pipeline, uh, where 522 00:22:17,100 --> 00:22:19,500 I can drag and drop data and components 523 00:22:19,500 --> 00:22:21,780 that Carlotta told you the difference 524 00:22:21,780 --> 00:22:22,980 between. 525 00:22:22,980 --> 00:22:26,340 And here I am going to do some changes 526 00:22:26,340 --> 00:22:29,100 to the settings. I am going to connect 527 00:22:29,100 --> 00:22:31,860 this with my compute cluster that I 528 00:22:31,860 --> 00:22:35,120 created previously so I can utilize it. 529 00:22:35,120 --> 00:22:38,100 From here I'm going to choose this 530 00:22:38,100 --> 00:22:40,380 compute cluster demo that I have showed 531 00:22:40,380 --> 00:22:42,600 you before in the clusters here, 532 00:22:42,600 --> 00:22:45,900 and I am going to change the name to 533 00:22:45,900 --> 00:22:47,820 something more meaningful. Instead of 534 00:22:47,820 --> 00:22:50,580 byline and the date of today I'm going 535 00:22:50,580 --> 00:22:53,760 to name it Diabetes... 536 00:22:53,760 --> 00:22:56,120 uh... 537 00:22:56,120 --> 00:23:00,020 let's just check this training. 538 00:23:00,020 --> 00:23:05,100 Let's say Training 0.1 or 01, okay? 539 00:23:05,100 --> 00:23:09,360 And I am going to close this tab in 540 00:23:09,360 --> 00:23:12,000 order to have a bigger place to work 541 00:23:12,000 --> 00:23:14,700 inside because this is where we will 542 00:23:14,700 --> 00:23:17,220 work, where everything will happen. So I 543 00:23:17,220 --> 00:23:19,559 will click on close from here, 544 00:23:19,559 --> 00:23:23,460 and I will go to the data and I will 545 00:23:23,460 --> 00:23:25,620 create a new data set. 546 00:23:25,620 --> 00:23:27,900 How can I create a new data set? There is 547 00:23:27,900 --> 00:23:29,880 multiple options here you can find, from 548 00:23:29,880 --> 00:23:31,799 local files, from data store, from web 549 00:23:31,799 --> 00:23:34,020 files, from open data set, but I'm going 550 00:23:34,020 --> 00:23:36,539 to choose from web files, as this is the 551 00:23:36,539 --> 00:23:40,280 way we're going to create our data. 552 00:23:40,280 --> 00:23:43,380 From here, the information of my data set 553 00:23:43,380 --> 00:23:47,340 I'm going to get them from the Microsoft 554 00:23:47,340 --> 00:23:50,820 Learn module. So if we go to the step 555 00:23:50,820 --> 00:23:52,860 that says "Create a dataset", 556 00:23:52,860 --> 00:23:55,020 under it, it illustrates that you can 557 00:23:55,020 --> 00:23:57,720 access the data from inside the asset 558 00:23:57,720 --> 00:23:59,760 library, and inside your asset library, 559 00:23:59,760 --> 00:24:01,679 you'll find the data and find the 560 00:24:01,679 --> 00:24:05,539 component. And I'm going to select 561 00:24:05,539 --> 00:24:09,000 this link because this is where my data 562 00:24:09,000 --> 00:24:12,000 is stored. If you open this link, you will 563 00:24:12,000 --> 00:24:14,820 find this is a CSV file, I think. 564 00:24:14,820 --> 00:24:17,400 Yeah. And you can...like, all the data are 565 00:24:17,400 --> 00:24:18,360 here. 566 00:24:18,360 --> 00:24:21,079 Now let's get back.. 567 00:24:21,079 --> 00:24:22,149 Um... 568 00:24:26,880 --> 00:24:28,200 And you are going to do something 569 00:24:28,200 --> 00:24:29,880 meaningful, but because I have already 570 00:24:29,880 --> 00:24:31,820 created it before twice, so I'm gonna 571 00:24:31,820 --> 00:24:34,980 add a number to the name 572 00:24:34,980 --> 00:24:37,559 The data set is tabular and there is 573 00:24:37,559 --> 00:24:39,360 the file, but this is a table, so we're 574 00:24:39,360 --> 00:24:40,760 going to choose the table. 575 00:24:40,760 --> 00:24:42,240 Data type 576 00:24:42,240 --> 00:24:43,740 for data set type. 577 00:24:43,740 --> 00:24:46,260 Now we will click on "Next". That's gonna 578 00:24:46,260 --> 00:24:51,179 review, or display for you the content 579 00:24:51,179 --> 00:24:54,020 of this file that you have 580 00:24:54,020 --> 00:24:57,419 imported to this workspace. 581 00:24:57,419 --> 00:25:01,559 And for these settings, these are 582 00:25:01,559 --> 00:25:03,720 related to our file format. 583 00:25:03,720 --> 00:25:08,280 So this is a delimited file, and it's not 584 00:25:08,280 --> 00:25:11,400 plain text, it's not a Jason. The delimiter 585 00:25:11,400 --> 00:25:14,159 is common, as we have seen that they 586 00:25:14,159 --> 00:25:26,700 [INDISTINGUISHABLE] 587 00:25:26,700 --> 00:25:29,039 So I'm choosing common 588 00:25:29,039 --> 00:25:32,900 errors because the only the first five... 589 00:25:32,900 --> 00:25:34,880 [INDISTINGUISHABLE] 590 00:25:34,880 --> 00:25:38,159 ...for example. Okay, uh, if you have any 591 00:25:38,159 --> 00:25:39,960 doubts, if you have any problems, please 592 00:25:39,960 --> 00:25:42,960 don't hesitate to write me 593 00:25:42,960 --> 00:25:45,020 in the chat, 594 00:25:45,020 --> 00:25:48,480 like, what is blocking you, and 595 00:25:48,480 --> 00:25:50,940 me and Carlotta will try to help you, 596 00:25:50,940 --> 00:25:53,220 like whenever possible. 597 00:25:53,220 --> 00:25:55,659 And now this is the new preview for my 598 00:25:55,659 --> 00:25:57,840 data set. I can see that I have an ID, I 599 00:25:57,840 --> 00:25:59,700 have patient ID, I have pregnancies, I 600 00:25:59,700 --> 00:26:02,220 have the age of the people, 601 00:26:02,220 --> 00:26:05,720 I have the body mass, I think 602 00:26:05,720 --> 00:26:08,460 whether they have diabetes or not, as a 603 00:26:08,460 --> 00:26:10,679 zero and one. Zero indicates a negative, 604 00:26:10,679 --> 00:26:14,159 the person doesn't have diabetes, and one 605 00:26:14,159 --> 00:26:16,080 indicates a positive, that this person 606 00:26:16,080 --> 00:26:18,299 has diabetes. Okay. 607 00:26:18,299 --> 00:26:20,520 Now I'm going to click on "Next". Here I am 608 00:26:20,520 --> 00:26:23,400 defining my schema. All the data types 609 00:26:23,400 --> 00:26:25,380 inside my columns, the column names, which 610 00:26:25,380 --> 00:26:28,760 columns to include, which to exclude. And 611 00:26:28,760 --> 00:26:31,500 here we will include everything except 612 00:26:31,500 --> 00:26:35,580 the path of the bath color. And we are 613 00:26:35,580 --> 00:26:37,860 going to review the data types of each 614 00:26:37,860 --> 00:26:40,440 column. So let's review this first one. 615 00:26:40,440 --> 00:26:43,320 This is numbers, numbers, numbers, then it's the 616 00:26:43,320 --> 00:26:45,779 integer. And this is, 617 00:26:45,779 --> 00:26:48,679 um, like decimal.. 618 00:26:48,679 --> 00:26:50,900 ...dotted... 619 00:26:50,900 --> 00:26:53,580 decimal number. So we are going to choose 620 00:26:53,580 --> 00:26:55,020 this data type. 621 00:26:55,020 --> 00:26:57,200 And for this one 622 00:26:57,200 --> 00:27:01,200 it says diabetic, and it's a zero under 623 00:27:01,200 --> 00:27:02,460 one, and we are going to make it as 624 00:27:02,460 --> 00:27:04,460 integers. 625 00:27:04,460 --> 00:27:07,980 Now we are going to click on "Next" and 626 00:27:07,980 --> 00:27:09,780 move to reviewing everything. This is 627 00:27:09,780 --> 00:27:11,569 everything that we have defined together. 628 00:27:11,569 --> 00:27:13,500 I will click on "Create". 629 00:27:13,500 --> 00:27:15,179 And... 630 00:27:15,179 --> 00:27:17,940 now the first step has ended. We have 631 00:27:17,940 --> 00:27:19,919 gotten our data ready. 632 00:27:19,919 --> 00:27:22,440 Now...what now? We're going to utilize the 633 00:27:22,440 --> 00:27:23,468 designer... 634 00:27:23,468 --> 00:27:26,820 um...power. We're going to drag and drop 635 00:27:26,820 --> 00:27:29,820 our data set to create the pipeline. 636 00:27:29,820 --> 00:27:33,179 So I have clicked on it and dragged it 637 00:27:33,179 --> 00:27:35,640 to this space. It's gonna appear to you. 638 00:27:35,640 --> 00:27:39,659 And we can inspect it by right clicking and 639 00:27:39,659 --> 00:27:42,179 choose "Preview data" 640 00:27:42,179 --> 00:27:46,200 to see what we have created together. 641 00:27:46,200 --> 00:27:48,900 From here, you can see everything that we 642 00:27:48,900 --> 00:27:50,700 have seen previously, but in more 643 00:27:50,700 --> 00:27:53,100 details. And we are just going to close 644 00:27:53,100 --> 00:27:56,580 this. Now what? Now we are gonna do the 645 00:27:56,580 --> 00:28:00,799 processing that Carlota mentioned. 646 00:28:00,799 --> 00:28:03,659 These are some instructions about the 647 00:28:03,659 --> 00:28:05,460 data, about how you can look at them, how you 648 00:28:05,460 --> 00:28:07,140 can open them but we are going to move 649 00:28:07,140 --> 00:28:09,720 to the transformation or the processing. 650 00:28:09,720 --> 00:28:13,500 So as Carlotta told you, like any data 651 00:28:13,500 --> 00:28:15,480 for us to work on we have to do some 652 00:28:15,480 --> 00:28:17,299 processing to it 653 00:28:17,299 --> 00:28:20,159 to make it easy easier for the model to 654 00:28:20,159 --> 00:28:23,279 be trained and easier to work with. So, uh, 655 00:28:23,279 --> 00:28:25,860 we're gonna do the normalization. And 656 00:28:25,860 --> 00:28:29,159 normalization meaning is, uh, 657 00:28:29,159 --> 00:28:33,539 to scale our data, either down or up, but 658 00:28:33,539 --> 00:28:35,400 we're going to scale them down, 659 00:28:35,400 --> 00:28:38,820 and we are going to decrease, uh, 660 00:28:38,820 --> 00:28:40,799 relatively decrease 661 00:28:40,799 --> 00:28:44,640 the values, all the values, to work 662 00:28:44,640 --> 00:28:48,120 with lower numbers. And if we are working 663 00:28:48,120 --> 00:28:49,559 with larger numbers, it's going to take 664 00:28:49,559 --> 00:28:52,500 more time. If we're working with smaller 665 00:28:52,500 --> 00:28:54,779 numbers, it's going to take less time to 666 00:28:54,779 --> 00:28:59,159 calculate them, and that's it. So 667 00:28:59,159 --> 00:29:02,159 where can I find the normalized data? I 668 00:29:02,159 --> 00:29:04,260 can find it inside my component. 669 00:29:04,260 --> 00:29:06,720 So I will choose the component and 670 00:29:06,720 --> 00:29:09,659 search for "Normalized data". 671 00:29:09,659 --> 00:29:12,360 I will drag and drop it as usual and I 672 00:29:12,360 --> 00:29:14,820 will connect between these two things 673 00:29:14,820 --> 00:29:18,360 by clicking on this spot, this, uh, 674 00:29:18,360 --> 00:29:20,159 circuit, and 675 00:29:20,159 --> 00:29:23,159 drag and drop onto the next circuit. 676 00:29:23,159 --> 00:29:24,899 Now we are going to define our 677 00:29:24,899 --> 00:29:27,419 normalization method. 678 00:29:27,419 --> 00:29:31,080 So I'm going to double click on the 679 00:29:31,080 --> 00:29:32,640 normalized data. 680 00:29:32,640 --> 00:29:34,860 It's going to open the settings for the 681 00:29:34,860 --> 00:29:36,480 normalization 682 00:29:36,480 --> 00:29:38,820 as a better transformation method, which is 683 00:29:38,820 --> 00:29:40,500 a mathematical way 684 00:29:40,500 --> 00:29:42,299 that is going to scale our data 685 00:29:42,299 --> 00:29:44,520 according to. 686 00:29:44,520 --> 00:29:47,760 We're going to choose min-max, and for 687 00:29:47,760 --> 00:29:51,539 this one, we are going to choose "Use Zero", 688 00:29:51,539 --> 00:29:53,100 for constant column we are going to 689 00:29:53,100 --> 00:29:54,480 choose "True", 690 00:29:54,480 --> 00:29:56,880 and we are going to define which columns 691 00:29:56,880 --> 00:29:58,860 to normalize. So we are not going to 692 00:29:58,860 --> 00:30:01,080 normalize the whole data set. We are 693 00:30:01,080 --> 00:30:02,760 going to choose a subset from the data 694 00:30:02,760 --> 00:30:04,559 set to normalize. So we're going to 695 00:30:04,559 --> 00:30:07,020 choose everything except for the patient 696 00:30:07,020 --> 00:30:09,000 ID and the diabetic, because the patient 697 00:30:09,000 --> 00:30:10,919 ID is a number, but it's a categorical 698 00:30:10,919 --> 00:30:13,740 data. It describes a patient, it's not a 699 00:30:13,740 --> 00:30:17,460 number that I can sum. I can't say "patient 700 00:30:17,460 --> 00:30:20,159 ID number one plus patient ID number two". 701 00:30:20,159 --> 00:30:21,720 No, this is a patient and another 702 00:30:21,720 --> 00:30:23,399 patient, it's not a number that I can do 703 00:30:23,399 --> 00:30:25,740 mathematical operations on, so I'm not 704 00:30:25,740 --> 00:30:28,200 going to choose it. So we will choose 705 00:30:28,200 --> 00:30:30,539 everything as I said, except for the 706 00:30:30,539 --> 00:30:33,480 diabetic and the patient ID. I will 707 00:30:33,480 --> 00:30:34,860 click on "Save". 708 00:30:34,860 --> 00:30:37,740 And it's not showing me a warning again, 709 00:30:37,740 --> 00:30:39,480 everything is good. 710 00:30:39,480 --> 00:30:41,880 Now I can click on "Submit" 711 00:30:41,880 --> 00:30:46,799 and review my normalization output. 712 00:30:46,799 --> 00:30:48,240 Um. 713 00:30:48,240 --> 00:30:51,659 So, if you click on "Submit" here, 714 00:30:51,659 --> 00:30:54,659 you will choose "Create new" and 715 00:30:54,659 --> 00:30:56,460 set the name that is mentioned here 716 00:30:56,460 --> 00:30:59,899 inside the notebook. So it tells you 717 00:30:59,899 --> 00:31:03,419 to create a job and name it, name 718 00:31:03,419 --> 00:31:05,460 the experiment "MS Learn Diabetes 719 00:31:05,460 --> 00:31:06,720 Training", because you will continue 720 00:31:06,720 --> 00:31:10,160 working on and building component later. 721 00:31:10,160 --> 00:31:13,020 I have it already created, I am the, uh, 722 00:31:13,020 --> 00:31:16,919 we can review it together. So let 723 00:31:16,919 --> 00:31:19,860 me just open this in another tab. I think 724 00:31:19,860 --> 00:31:21,000 I have it... 725 00:31:21,000 --> 00:31:23,659 here. 726 00:31:25,679 --> 00:31:28,220 Okay. 727 00:31:30,720 --> 00:31:34,740 So, these are all the jobs that I have 728 00:31:34,740 --> 00:31:37,340 created. 729 00:31:37,860 --> 00:31:40,119 All the jobs there. Let's do this over. 730 00:31:40,119 --> 00:31:42,059 These are all the jobs that I have 731 00:31:42,059 --> 00:31:43,679 submitted previously. 732 00:31:43,679 --> 00:31:45,840 And I think this one is the 733 00:31:45,840 --> 00:31:48,360 normalization job, so let's see the 734 00:31:48,360 --> 00:31:50,100 output of it. 735 00:31:50,100 --> 00:31:54,120 As you can see, it says, uh, "Check mark", yes, 736 00:31:54,120 --> 00:31:56,640 which means that it worked, and we can 737 00:31:56,640 --> 00:31:59,399 preview it. How can I do that? Right click 738 00:31:59,399 --> 00:32:02,539 on it, choose "Preview data", 739 00:32:02,539 --> 00:32:06,659 and as you can see all the data are 740 00:32:06,659 --> 00:32:08,399 scaled down 741 00:32:08,399 --> 00:32:10,980 so everything is between zero 742 00:32:10,980 --> 00:32:15,860 and, uh, one I think. 743 00:32:15,860 --> 00:32:18,899 So everything is good for us. Now we 744 00:32:18,899 --> 00:32:21,840 can move forward to the next step 745 00:32:21,840 --> 00:32:26,939 which is to create the whole pipeline. 746 00:32:26,939 --> 00:32:30,840 So, uh, Carlota told you that 747 00:32:30,840 --> 00:32:33,179 we're going to use a classification 748 00:32:33,179 --> 00:32:37,260 model to create this data set, so let 749 00:32:37,260 --> 00:32:40,620 me just drag and drop everything 750 00:32:40,620 --> 00:32:43,140 to get runtime and we're doing 751 00:32:43,140 --> 00:32:46,489 [INDISTINGUISHABLE] 752 00:32:46,489 --> 00:32:48,469 about everything by 753 00:32:48,469 --> 00:32:51,419 [INDISTINGUISHABLE] 754 00:32:51,419 --> 00:32:52,919 So, 755 00:32:52,919 --> 00:32:55,593 as a result, we are going to explain 756 00:32:55,593 --> 00:32:59,760 [INDISTINGUISHABLE] 757 00:32:59,760 --> 00:33:03,600 Yeah. So, I'm going to give this split 758 00:33:03,600 --> 00:33:06,070 data. I'm going to take the 759 00:33:06,070 --> 00:33:08,880 transformation data to split data and 760 00:33:08,880 --> 00:33:10,380 connect it like that. 761 00:33:10,380 --> 00:33:12,299 I'm going to get three model 762 00:33:12,299 --> 00:33:15,240 components because I want to train my 763 00:33:15,240 --> 00:33:16,679 model, 764 00:33:16,679 --> 00:33:19,740 and I'm going to put it right here. 765 00:33:19,740 --> 00:33:21,740 Okay. 766 00:33:21,740 --> 00:33:24,419 Let's just move it down there. Okay. 767 00:33:24,419 --> 00:33:27,059 And we are going to use a classification 768 00:33:27,059 --> 00:33:28,620 model, 769 00:33:28,620 --> 00:33:31,880 a two class 770 00:33:32,240 --> 00:33:35,399 logistic regression model. 771 00:33:35,399 --> 00:33:38,159 So I'm going to give this algorithm to 772 00:33:38,159 --> 00:33:41,480 enable my model to work 773 00:33:41,820 --> 00:33:45,960 This is the untrained model, this is... 774 00:33:45,960 --> 00:33:48,059 here. 775 00:33:48,059 --> 00:33:51,120 The left... 776 00:33:51,120 --> 00:33:52,860 the left, uh, circuit, I'm going to 777 00:33:52,860 --> 00:33:54,819 connect it to the data set, and the right 778 00:33:54,819 --> 00:33:56,940 one, we are going to connect it to 779 00:33:56,940 --> 00:33:59,700 evaluate model. 780 00:33:59,700 --> 00:34:02,640 Evaluate model...so let's search for 781 00:34:02,640 --> 00:34:05,220 "Evaluate model" here. 782 00:34:05,220 --> 00:34:07,440 So because we want to do what...we want to 783 00:34:07,440 --> 00:34:10,800 evaluate our model and see how it it has 784 00:34:10,800 --> 00:34:13,790 been doing. Is it good, is it bad? 785 00:34:13,790 --> 00:34:18,200 Um, sorry... 786 00:34:19,980 --> 00:34:22,820 This is... 787 00:34:23,460 --> 00:34:25,560 this is down there 788 00:34:25,560 --> 00:34:28,139 after the score model. 789 00:34:28,139 --> 00:34:31,320 So we have to get the score model first, 790 00:34:31,320 --> 00:34:33,960 so let's get it. 791 00:34:33,960 --> 00:34:36,119 And this will take the trained model and 792 00:34:36,119 --> 00:34:37,260 the data set 793 00:34:37,260 --> 00:34:39,419 to score our model and see if it's 794 00:34:39,419 --> 00:34:42,179 performing good or bad. 795 00:34:42,179 --> 00:34:44,409 And... 796 00:34:44,409 --> 00:34:47,159 um... 797 00:34:47,159 --> 00:34:49,080 after that, we have finished 798 00:34:49,080 --> 00:34:51,920 everything. Now, we are going to do the what? 799 00:34:52,139 --> 00:34:54,359 The presets for everything. 800 00:34:54,359 --> 00:34:56,820 As a starter, we will be splitting our 801 00:34:56,820 --> 00:34:58,920 data. So 802 00:34:58,920 --> 00:35:01,140 how are we going to do this, according to 803 00:35:01,140 --> 00:35:03,780 what? To the split rules. So I'm going to 804 00:35:03,780 --> 00:35:05,940 double-click on it and choose "Split rules". 805 00:35:05,940 --> 00:35:09,420 And the percentage is 806 00:35:09,420 --> 00:35:11,780 70 percent for the [INSISTINGUASHABLE] 807 00:35:11,780 --> 00:35:12,780 and 30 percent of the 808 00:35:12,780 --> 00:35:14,820 data for 809 00:35:14,820 --> 00:35:18,420 the valuation or for the scoring, okay? 810 00:35:18,420 --> 00:35:20,880 I'm going to make it a randomization, so 811 00:35:20,880 --> 00:35:22,980 I'm going to split data randomly and the 812 00:35:22,980 --> 00:35:26,060 seat is, uh, 813 00:35:26,060 --> 00:35:29,339 132, uh 23 I think...yeah. 814 00:35:29,339 --> 00:35:32,520 And I think that's it. 815 00:35:32,520 --> 00:35:35,040 The split says why this holds, and that's 816 00:35:35,040 --> 00:35:36,240 good. 817 00:35:36,240 --> 00:35:39,540 Now for the next one, which is the train 818 00:35:39,540 --> 00:35:42,000 model we are going to connect it as 819 00:35:42,000 --> 00:35:43,500 mentioned here. 820 00:35:43,500 --> 00:35:48,660 And we have done that and...then why 821 00:35:48,660 --> 00:35:50,700 am I having here? Let's double click 822 00:35:50,700 --> 00:35:54,660 on it...yeah. It has...it needs the 823 00:35:54,660 --> 00:35:57,180 label column that I am trying to predict. 824 00:35:57,180 --> 00:35:58,680 So from here, I'm going to choose 825 00:35:58,680 --> 00:36:01,380 diabetic. I'm going to save. 826 00:36:01,380 --> 00:36:05,180 I'm going to close this one. 827 00:36:05,520 --> 00:36:07,380 So it says here, 828 00:36:07,380 --> 00:36:10,619 the diabetic label, the model, it will 829 00:36:10,619 --> 00:36:12,300 predict the zero and one, because this is 830 00:36:12,300 --> 00:36:14,700 a binary classification algorithm, so 831 00:36:14,700 --> 00:36:16,260 it's going to predict either this or 832 00:36:16,260 --> 00:36:17,520 that. 833 00:36:17,520 --> 00:36:18,460 And... 834 00:36:18,460 --> 00:36:20,160 um... 835 00:36:20,160 --> 00:36:23,880 I think that's everything to run the the 836 00:36:23,880 --> 00:36:25,859 pipeline. 837 00:36:25,859 --> 00:36:29,040 So everything is done, everything is good 838 00:36:29,040 --> 00:36:31,200 for this one. We're just gonna leave it 839 00:36:31,200 --> 00:36:34,140 for now, because this is the next 840 00:36:34,140 --> 00:36:35,620 step. 841 00:36:35,620 --> 00:36:39,839 Um, this will be put instead of the 842 00:36:39,839 --> 00:36:43,520 score model, but let's... 843 00:36:44,099 --> 00:36:46,920 let's delete it for now. 844 00:36:46,920 --> 00:36:49,500 Okay. 845 00:36:49,500 --> 00:36:52,920 Now we have to submit the job in order 846 00:36:52,920 --> 00:36:55,680 to see the output of it. So I can click 847 00:36:55,680 --> 00:36:59,280 on "Submit" and choose the previous job 848 00:36:59,280 --> 00:37:01,200 which is the one that I have showed you 849 00:37:01,200 --> 00:37:02,460 before. 850 00:37:02,460 --> 00:37:05,460 And then let's review its output 851 00:37:05,460 --> 00:37:06,960 together here. 852 00:37:06,960 --> 00:37:09,960 So if I go to the jobs, 853 00:37:09,960 --> 00:37:15,119 if I go to MS Learn, maybe it is training? 854 00:37:15,119 --> 00:37:18,180 I think it's the one that lasted the 855 00:37:18,180 --> 00:37:20,640 longest, this one here. 856 00:37:20,640 --> 00:37:23,700 So here I can see 857 00:37:23,700 --> 00:37:27,079 the job output, what happened inside 858 00:37:27,079 --> 00:37:30,420 the model, as you can see. 859 00:37:30,420 --> 00:37:33,839 So the normalization we have seen 860 00:37:33,839 --> 00:37:36,540 before, the split data, I can preview it. 861 00:37:36,540 --> 00:37:39,359 The result one or the result two as it 862 00:37:39,359 --> 00:37:41,760 splits the data to 70 here and 863 00:37:41,760 --> 00:37:43,639 thirty percent here. 864 00:37:43,639 --> 00:37:46,859 Um, I can see the score model, which is 865 00:37:46,859 --> 00:37:49,140 something that we need 866 00:37:49,140 --> 00:37:51,530 to review. 867 00:37:51,530 --> 00:37:56,820 Inside the scroll model, uh, from 868 00:37:56,820 --> 00:37:57,960 here, 869 00:37:57,960 --> 00:38:00,960 we can see that... 870 00:38:00,960 --> 00:38:04,460 let's get back here. 871 00:38:05,940 --> 00:38:08,220 This is the data that the model has 872 00:38:08,220 --> 00:38:11,579 been scored and this is a scoring output. 873 00:38:11,579 --> 00:38:15,300 So it says "code label true", and he is 874 00:38:15,300 --> 00:38:17,370 not diabetic, so this is, 875 00:38:17,370 --> 00:38:19,200 um, 876 00:38:19,200 --> 00:38:21,839 a wrong prediction, let's say. 877 00:38:21,839 --> 00:38:23,880 For this one it's true and true, and this 878 00:38:23,880 --> 00:38:26,880 is a good, like, what do you say, 879 00:38:26,880 --> 00:38:29,460 prediction, and the probabilities of this 880 00:38:29,460 --> 00:38:30,420 score, 881 00:38:30,420 --> 00:38:33,119 which means the certainty of our model 882 00:38:33,119 --> 00:38:36,620 of that this is really true. It's 80 percent. 883 00:38:36,620 --> 00:38:38,780 For this one it's 75 percent. 884 00:38:38,780 --> 00:38:42,599 So these are some cool metrics that we 885 00:38:42,599 --> 00:38:45,359 can review to understand how our model 886 00:38:45,359 --> 00:38:47,580 is performing. It's performing good for 887 00:38:47,580 --> 00:38:48,540 now. 888 00:38:48,540 --> 00:38:53,180 Let's check our evaluation model. 889 00:38:53,180 --> 00:38:56,700 So this is the extra one that I told you 890 00:38:56,700 --> 00:38:59,579 about. Instead of the 891 00:38:59,579 --> 00:39:01,800 score model only, we are going to add 892 00:39:01,800 --> 00:39:04,260 what evaluate model 893 00:39:04,260 --> 00:39:06,900 after it. So here 894 00:39:06,900 --> 00:39:09,420 we're going to go to our Asset Library 895 00:39:09,420 --> 00:39:12,180 and we are going to choose the evaluate 896 00:39:12,180 --> 00:39:14,940 model, 897 00:39:14,940 --> 00:39:17,760 and we are going to put it here, and we 898 00:39:17,760 --> 00:39:20,220 are going to connect it, and we are going 899 00:39:20,220 --> 00:39:23,099 to submit the job using the same name of 900 00:39:23,099 --> 00:39:25,140 the job that we used previously. 901 00:39:25,140 --> 00:39:29,520 Let's review it. Also, so, after it 902 00:39:29,520 --> 00:39:33,300 finishes, you will find it here. So I have 903 00:39:33,300 --> 00:39:35,280 already done it before, this is how I'm 904 00:39:35,280 --> 00:39:37,380 able to see the output. 905 00:39:37,380 --> 00:39:40,320 So let's see 906 00:39:40,320 --> 00:39:43,280 what is the output of this 907 00:39:43,280 --> 00:39:45,660 evaluation process. 908 00:39:45,660 --> 00:39:49,800 Here it mentioned to you that there are 909 00:39:49,800 --> 00:39:51,480 some matrix, 910 00:39:51,480 --> 00:39:54,839 like the confusion matrix, which Carlotta 911 00:39:54,839 --> 00:39:57,060 told you about, there is the accuracy, the 912 00:39:57,060 --> 00:39:59,760 precision, the recall, and F1 Score. 913 00:39:59,760 --> 00:40:02,339 Every matrix gives us some insight about 914 00:40:02,339 --> 00:40:04,920 our model. It helps us to understand it 915 00:40:04,920 --> 00:40:08,579 more, and, um, 916 00:40:08,579 --> 00:40:10,560 understand if it's overfitting, if 917 00:40:10,560 --> 00:40:12,240 it's good, if it's bad, and really really, 918 00:40:12,240 --> 00:40:16,339 like, understand how it's working. 919 00:40:17,060 --> 00:40:20,400 Now I'm just waiting for the job to load. 920 00:40:20,400 --> 00:40:22,710 Until it loads, 921 00:40:22,710 --> 00:40:23,640 um, 922 00:40:23,640 --> 00:40:26,040 we can continue 923 00:40:26,040 --> 00:40:28,740 to work on our 924 00:40:28,740 --> 00:40:31,800 model. So I will go to my designer. I'm 925 00:40:31,800 --> 00:40:34,740 just going to confirm this. 926 00:40:34,740 --> 00:40:38,280 And I'm going to continue working on it 927 00:40:38,280 --> 00:40:39,780 from 928 00:40:39,780 --> 00:40:42,119 where we have stopped. Where have we 929 00:40:42,119 --> 00:40:43,560 stopped? 930 00:40:43,560 --> 00:40:46,440 we have stopped on the evaluate model. So 931 00:40:46,440 --> 00:40:48,960 I'm going to choose this one. 932 00:40:48,960 --> 00:40:53,420 And it says here 933 00:40:54,180 --> 00:40:56,940 "select experiment", "create inference 934 00:40:56,940 --> 00:40:58,200 pipeline", so 935 00:40:58,200 --> 00:41:01,079 I am going to go to the jobs, 936 00:41:01,079 --> 00:41:04,680 I'm going to select my experiment. 937 00:41:04,680 --> 00:41:06,660 I hope this works. 938 00:41:06,660 --> 00:41:09,720 Okay. Finally, now we have our 939 00:41:09,720 --> 00:41:12,180 evaluate model output. 940 00:41:12,180 --> 00:41:15,480 Let's preview evaluation results 941 00:41:15,480 --> 00:41:18,660 and, uh... 942 00:41:18,660 --> 00:41:22,220 come on. 943 00:41:25,500 --> 00:41:28,020 Finally. Now we can create our inference 944 00:41:28,020 --> 00:41:31,020 pipeline. So, 945 00:41:31,020 --> 00:41:33,510 I think it says that... 946 00:41:33,510 --> 00:41:35,280 um... 947 00:41:35,280 --> 00:41:38,160 select the experiment, then select MS 948 00:41:38,160 --> 00:41:39,359 Learn. So, 949 00:41:39,359 --> 00:41:43,320 I am just going to select it, 950 00:41:43,320 --> 00:41:48,300 and finally. Now we can, the ROC curve, we 951 00:41:48,300 --> 00:41:51,000 can see it, that the true positive rate 952 00:41:51,000 --> 00:41:53,760 and the force was integrate. The false 953 00:41:53,760 --> 00:41:56,660 positive rate is increasing with time, 954 00:41:56,660 --> 00:42:01,020 and also the true positive rate. True 955 00:42:01,020 --> 00:42:03,540 positive is something that it predicted, 956 00:42:03,540 --> 00:42:06,960 that it is, uh, positive it has diabetes, 957 00:42:06,960 --> 00:42:09,480 and it's really...it's really true. 958 00:42:09,480 --> 00:42:12,599 The person really has diabetes. Okay. And 959 00:42:12,599 --> 00:42:14,760 for the false positive, it predicted that 960 00:42:14,760 --> 00:42:17,579 someone has diabetes and someone doesn't 961 00:42:17,579 --> 00:42:20,960 have it. This is what true position and 962 00:42:20,960 --> 00:42:24,960 false positive means. This is the record 963 00:42:24,960 --> 00:42:28,020 curve, so we can review the metrics 964 00:42:28,020 --> 00:42:32,160 of our model. This is the lift curve. I 965 00:42:32,160 --> 00:42:36,000 can change the threshold of my confusion 966 00:42:36,000 --> 00:42:37,740 matrix here 967 00:42:37,740 --> 00:42:39,119 and if Carlotta wants to add 968 00:42:39,119 --> 00:42:43,920 anything about the...the graphs, 969 00:42:43,920 --> 00:42:47,000 you can do so. 970 00:42:50,440 --> 00:42:52,558 [CARLOTTA]: Um, yeah, so I just 971 00:42:52,558 --> 00:42:54,540 wanted to...if you go...yeah. 972 00:42:54,540 --> 00:42:57,119 I just wanted to comment for the 973 00:42:57,119 --> 00:43:00,480 RSC curve, that actually from this 974 00:43:00,480 --> 00:43:03,900 graph, the metric which usually we're 975 00:43:03,900 --> 00:43:06,960 going to compute is the area under 976 00:43:06,960 --> 00:43:09,900 under the curve. And this coefficient or 977 00:43:09,900 --> 00:43:12,240 metric, 978 00:43:12,240 --> 00:43:15,060 it's a coefficient— 979 00:43:15,060 --> 00:43:18,420 it's a value that could span from 980 00:43:18,420 --> 00:43:23,480 zero to one and the the highest is... 981 00:43:23,480 --> 00:43:25,970 ...the highest is the the score. 982 00:43:25,970 --> 00:43:29,220 So the closest one, 983 00:43:29,220 --> 00:43:32,760 so the the highest is the amount of 984 00:43:32,760 --> 00:43:35,280 area under this curve. 985 00:43:35,280 --> 00:43:40,500 The highest performance 986 00:43:40,500 --> 00:43:42,886 we've got from from our model. 987 00:43:42,886 --> 00:43:46,440 And another thing is what John is 988 00:43:46,440 --> 00:43:49,680 playing with. So this threshold for 989 00:43:49,680 --> 00:43:51,380 the logistic 990 00:43:51,380 --> 00:43:55,610 regression is the threshold used by the 991 00:43:55,610 --> 00:43:59,520 model to, um, 992 00:43:59,520 --> 00:44:02,880 to predict if the category is zero or 993 00:44:02,880 --> 00:44:05,220 one. So if the probability—the 994 00:44:05,220 --> 00:44:08,599 probability score is above the threshold, 995 00:44:08,599 --> 00:44:11,579 then the category will be predicted as 996 00:44:11,579 --> 00:44:15,359 one, while if the probability is 997 00:44:15,359 --> 00:44:17,460 below the threshold, in this case, for 998 00:44:17,460 --> 00:44:21,300 example, 0.5, the category is predicted 999 00:44:21,300 --> 00:44:23,579 as zero. So that's why it's very 1000 00:44:23,579 --> 00:44:26,473 important to choose the threshold, 1001 00:44:26,473 --> 00:44:28,699 because the performance really can vary, 1002 00:44:28,699 --> 00:44:30,560 um, 1003 00:44:30,560 --> 00:44:34,380 with this threshold value. 1004 00:44:34,380 --> 00:44:41,099 [JOHN]: Thank you so much, Carlotta, and 1005 00:44:41,400 --> 00:44:44,400 as I mentioned now, we are going to 1006 00:44:44,400 --> 00:44:46,560 create our inference pipeline. So we are 1007 00:44:46,560 --> 00:44:48,540 going to select the latest one, which I 1008 00:44:48,540 --> 00:44:50,819 already have it opened here. This is the 1009 00:44:50,819 --> 00:44:52,859 one that we were reviewing together. This 1010 00:44:52,859 --> 00:44:55,500 is where we have stopped, and we're going 1011 00:44:55,500 --> 00:44:57,599 to create an inference pipeline. We are 1012 00:44:57,599 --> 00:44:59,520 going to choose a real-time inference 1013 00:44:59,520 --> 00:45:02,520 pipeline, okay? 1014 00:45:02,520 --> 00:45:05,080 From where I can find this? Here, as it 1015 00:45:05,080 --> 00:45:08,099 says, "Real-time inference pipeline". 1016 00:45:08,099 --> 00:45:10,680 So it's gonna add some things to my 1017 00:45:10,680 --> 00:45:12,240 workspace. It's going to add the 1018 00:45:12,240 --> 00:45:13,713 web service input, it's gonna 1019 00:45:13,713 --> 00:45:15,071 have the web service output, 1020 00:45:15,071 --> 00:45:16,490 because we will be creating 1021 00:45:16,490 --> 00:45:18,180 it as a web service to access 1022 00:45:18,180 --> 00:45:19,740 it from the internet. 1023 00:45:19,740 --> 00:45:21,770 What are we going to do? We're going 1024 00:45:21,770 --> 00:45:24,720 to remove this diabetes data, okay? 1025 00:45:24,720 --> 00:45:27,540 And we are going to get a component 1026 00:45:27,540 --> 00:45:29,359 called "Web 1027 00:45:29,359 --> 00:45:33,180 input" and...let me check 1028 00:45:33,180 --> 00:45:35,940 it's "enter data manually". 1029 00:45:35,940 --> 00:45:38,400 We have...we already have that with input 1030 00:45:38,400 --> 00:45:39,540 present. 1031 00:45:39,540 --> 00:45:42,119 So we are going to get the entire data 1032 00:45:42,119 --> 00:45:43,200 manually, 1033 00:45:43,200 --> 00:45:45,420 and we're going to collect it—to connect 1034 00:45:45,420 --> 00:45:49,560 it as it was connected before, like that. 1035 00:45:49,560 --> 00:45:53,040 And also, I am not going to directly take 1036 00:45:53,040 --> 00:45:55,260 the web service—sorry, escort model to 1037 00:45:55,260 --> 00:45:57,839 the web service output like that. 1038 00:45:57,839 --> 00:46:00,240 I'm going to delete this 1039 00:46:00,240 --> 00:46:03,960 and I'm going to execute a python script 1040 00:46:03,960 --> 00:46:05,880 before 1041 00:46:05,880 --> 00:46:09,500 I display my result. 1042 00:46:10,680 --> 00:46:12,060 So, 1043 00:46:12,060 --> 00:46:17,480 this will be connected like... 1044 00:46:19,260 --> 00:46:20,400 So... 1045 00:46:20,400 --> 00:46:23,599 the other way around. 1046 00:46:23,599 --> 00:46:27,660 And from here, I am going to connect this 1047 00:46:27,660 --> 00:46:30,960 with that and there is some data that 1048 00:46:30,960 --> 00:46:33,480 we will be getting from the node, or from 1049 00:46:33,480 --> 00:46:37,680 the explanation here, and this is the 1050 00:46:37,680 --> 00:46:40,740 data that will be entered to our 1051 00:46:40,740 --> 00:46:44,400 website manually. Okay? This is instead of 1052 00:46:44,400 --> 00:46:47,460 the data that we have been getting from 1053 00:46:47,460 --> 00:46:49,740 our data set that we created. So I'm just 1054 00:46:49,740 --> 00:46:51,960 going to double click on it and choose 1055 00:46:51,960 --> 00:46:55,579 CSV, and I will choose "it has headers", 1056 00:46:55,579 --> 00:47:00,839 and I will take or copy this content and 1057 00:47:00,839 --> 00:47:02,819 put it there, okay? 1058 00:47:02,819 --> 00:47:05,700 So let's do it. 1059 00:47:05,700 --> 00:47:07,920 I think I have to click on edit code, now 1060 00:47:07,920 --> 00:47:10,680 I can click on "Save", and I can close it. 1061 00:47:10,680 --> 00:47:13,079 Another thing which is the python script 1062 00:47:13,079 --> 00:47:16,700 that we will be executing. 1063 00:47:17,099 --> 00:47:17,900 Um, yeah. We 1064 00:47:17,900 --> 00:47:19,380 are going to remove this, also. 1065 00:47:19,380 --> 00:47:20,930 We don't need the evaluate model 1066 00:47:20,930 --> 00:47:24,319 anymore, so we are going to remove it. 1067 00:47:24,319 --> 00:47:25,582 The python script 1068 00:47:25,582 --> 00:47:28,579 that I will be executing, 1069 00:47:28,579 --> 00:47:32,599 I can find it here. 1070 00:47:32,699 --> 00:47:35,760 Um, yeah. 1071 00:47:35,760 --> 00:47:38,640 This is the python script that we will 1072 00:47:38,640 --> 00:47:41,520 execute. And it says to you that this 1073 00:47:41,520 --> 00:47:43,619 code selects only the patient's ID 1074 00:47:43,619 --> 00:47:45,000 the score label, the score 1075 00:47:45,000 --> 00:47:47,700 probability and return—returns them to 1076 00:47:47,700 --> 00:47:49,980 the web service output. So we don't want 1077 00:47:49,980 --> 00:47:51,960 to return all the columns, as we have 1078 00:47:51,960 --> 00:47:53,339 seen previously, 1079 00:47:53,339 --> 00:47:55,560 that determines everything, 1080 00:47:55,560 --> 00:47:56,940 so 1081 00:47:56,940 --> 00:47:59,040 we want to return certain stuff, the 1082 00:47:59,040 --> 00:48:02,940 stuff that we will use inside our 1083 00:48:02,940 --> 00:48:05,640 endpoint. So I'm just going to select 1084 00:48:05,640 --> 00:48:07,980 everything and delete it, and 1085 00:48:07,980 --> 00:48:11,060 paste the code that I have gotten from 1086 00:48:11,060 --> 00:48:14,280 the, uh, 1087 00:48:14,280 --> 00:48:16,500 the Microsoft Learn docs. 1088 00:48:16,500 --> 00:48:19,079 Now I can click on "Save", and I can close 1089 00:48:19,079 --> 00:48:20,280 this. 1090 00:48:20,280 --> 00:48:21,470 Let me check something, 1091 00:48:21,470 --> 00:48:22,950 I don't think it saved. 1092 00:48:22,950 --> 00:48:24,940 It's saved, but the display is 1093 00:48:24,940 --> 00:48:26,160 wrong, okay. 1094 00:48:26,160 --> 00:48:30,300 And now I think everything is good to go. 1095 00:48:30,300 --> 00:48:32,640 I'm just gonna double-check everything. 1096 00:48:32,640 --> 00:48:36,359 So, uh, yeah. We are gonna change the name 1097 00:48:36,359 --> 00:48:38,640 of this 1098 00:48:38,640 --> 00:48:40,800 pipeline, and we are gonna call it 1099 00:48:40,800 --> 00:48:42,780 "Predict 1100 00:48:42,780 --> 00:48:46,319 diabetes", okay? 1101 00:48:46,319 --> 00:48:50,339 Now let's close it, and 1102 00:48:50,339 --> 00:48:56,269 I think that we are good to go. So, 1103 00:48:56,269 --> 00:48:59,300 um, 1104 00:48:59,720 --> 00:49:04,460 Okay, I think everything is good for us. 1105 00:49:06,210 --> 00:49:08,108 I just want to make sure of something. 1106 00:49:08,108 --> 00:49:09,209 Is the data... 1107 00:49:09,209 --> 00:49:12,420 it's correct, the data is...yeah, 1108 00:49:12,420 --> 00:49:13,560 it's correct. 1109 00:49:13,560 --> 00:49:16,319 Okay, now I can run the pipeline. Let's 1110 00:49:16,319 --> 00:49:17,640 submit. 1111 00:49:17,640 --> 00:49:21,000 Select an "existing" pipeline, and we're 1112 00:49:21,000 --> 00:49:21,870 going to choose 1113 00:49:21,870 --> 00:49:23,529 the "ms-learn-diabetes-training", 1114 00:49:23,529 --> 00:49:24,599 which is the pipeline 1115 00:49:24,599 --> 00:49:27,060 that we have been working on 1116 00:49:27,060 --> 00:49:31,619 from the beginning of this module. 1117 00:49:31,619 --> 00:49:33,839 I don't think that this is going to take 1118 00:49:33,839 --> 00:49:36,060 much time. So we have submitted the job 1119 00:49:36,060 --> 00:49:37,319 and it's running. 1120 00:49:37,319 --> 00:49:40,140 Until the job ends, we are going to set 1121 00:49:40,140 --> 00:49:41,720 everything 1122 00:49:41,720 --> 00:49:45,599 for deploying a service. 1123 00:49:45,599 --> 00:49:49,070 In order to deploy a service, 1124 00:49:49,070 --> 00:49:50,520 um, 1125 00:49:50,520 --> 00:49:54,000 I have to have the job ready, so 1126 00:49:54,000 --> 00:49:55,980 until it's ready, you can't deploy it. So 1127 00:49:55,980 --> 00:49:58,319 let's go to the job—the job details from 1128 00:49:58,319 --> 00:50:01,319 here, okay? 1129 00:50:01,319 --> 00:50:05,119 And until it finishes, 1130 00:50:05,119 --> 00:50:07,260 Carlotta, do you think that we can have 1131 00:50:07,260 --> 00:50:09,240 the questions, and then we can get back 1132 00:50:09,240 --> 00:50:12,859 to the job I'm deploying it? 1133 00:50:13,700 --> 00:50:15,119 [CARLOTTA]: Yeah, yeah, yeah. 1134 00:50:15,119 --> 00:50:17,279 So yeah, guys, if you 1135 00:50:17,279 --> 00:50:18,980 have any questions 1136 00:50:18,980 --> 00:50:24,119 on what you just saw here 1137 00:50:24,119 --> 00:50:26,940 or into introductions, feel free. This is 1138 00:50:26,940 --> 00:50:30,300 a good moment, we can...we can discuss 1139 00:50:30,300 --> 00:50:33,900 now, while we wait for this job to 1140 00:50:33,900 --> 00:50:36,260 finish. 1141 00:50:36,260 --> 00:50:38,760 [JOHN]: Uh, and.... 1142 00:50:38,760 --> 00:50:40,220 can... 1143 00:50:40,220 --> 00:50:45,000 we have the knowledge check one? Or, like, 1144 00:50:45,000 --> 00:50:46,360 what do you think? 1145 00:50:46,360 --> 00:50:47,956 [CARLOTTA]: Yeah, we can also go 1146 00:50:47,956 --> 00:50:49,680 to the knowledge check. 1147 00:50:49,680 --> 00:50:50,940 Um... 1148 00:50:50,940 --> 00:50:56,339 Yeah, okay. So let me share my screen. 1149 00:50:56,339 --> 00:50:58,980 Yeah, so if you have not any questions 1150 00:50:58,980 --> 00:51:01,619 for us, we can maybe propose some 1151 00:51:01,619 --> 00:51:04,959 questions to you that you can, 1152 00:51:04,959 --> 00:51:06,240 um, 1153 00:51:06,240 --> 00:51:09,450 check our knowledge so far and you 1154 00:51:09,450 --> 00:51:12,900 can maybe answer to these questions 1155 00:51:12,900 --> 00:51:15,420 via chat. 1156 00:51:15,420 --> 00:51:18,300 So we have...do you see my screen, can 1157 00:51:18,300 --> 00:51:19,859 you see my screen? 1158 00:51:19,859 --> 00:51:21,650 [JOHN]: Yes. 1159 00:51:21,650 --> 00:51:24,440 [CARLOTTA]: So, John, I think I will 1160 00:51:24,440 --> 00:51:25,440 read this 1161 00:51:25,440 --> 00:51:29,040 question aloud and ask it to you, okay? So 1162 00:51:29,040 --> 00:51:32,040 are you ready to answer? 1163 00:51:32,040 --> 00:51:33,660 [JOHN:] Yes I am. 1164 00:51:33,660 --> 00:51:35,460 [CARLOTTA]: So... 1165 00:51:35,460 --> 00:51:37,260 you're using Azure Machine Learning 1166 00:51:37,260 --> 00:51:39,780 designer to create a training pipeline 1167 00:51:39,780 --> 00:51:42,540 for a binary classification model, so 1168 00:51:42,540 --> 00:51:45,300 what we were doing in our demo, 1169 00:51:45,300 --> 00:51:48,059 right? And you have added a data set 1170 00:51:48,059 --> 00:51:51,660 containing features and labels, a Two- 1171 00:51:51,660 --> 00:51:54,359 Class Decision Forest module. So we used 1172 00:51:54,359 --> 00:51:56,819 a logistic regression model our... 1173 00:51:56,819 --> 00:51:57,877 um, in our example. 1174 00:51:57,877 --> 00:51:59,019 Here, we're using a Two- 1175 00:51:59,019 --> 00:52:01,260 Class Decision Forest model. 1176 00:52:01,260 --> 00:52:04,500 And, of course, a Train Model module. You 1177 00:52:04,500 --> 00:52:07,200 plan now to use score model and evaluate 1178 00:52:07,200 --> 00:52:09,480 model modules to test the train model 1179 00:52:09,480 --> 00:52:11,640 with the subset of the data set that 1180 00:52:11,640 --> 00:52:13,500 wasn't used for training. 1181 00:52:13,500 --> 00:52:15,960 But what are we missing? So what's 1182 00:52:15,960 --> 00:52:18,780 another model you should add? We have 1183 00:52:18,780 --> 00:52:21,660 three options: we have Join Data, we have 1184 00:52:21,660 --> 00:52:25,200 Split Data, or we have Select Columns 1185 00:52:25,200 --> 00:52:26,819 in Dataset. 1186 00:52:26,819 --> 00:52:28,260 So 1187 00:52:28,260 --> 00:52:32,040 while John thinks about the answer, 1188 00:52:32,040 --> 00:52:33,599 go ahead and, 1189 00:52:33,599 --> 00:52:34,800 um, 1190 00:52:34,800 --> 00:52:37,800 answer yourself. So give us your 1191 00:52:37,800 --> 00:52:39,540 guess. 1192 00:52:39,540 --> 00:52:41,940 Put it in the chat, or just come off mute 1193 00:52:41,940 --> 00:52:44,900 and answer. 1194 00:52:46,740 --> 00:52:47,785 "A", "B". 1195 00:52:47,785 --> 00:52:49,769 [JOHN]: Yeah, what do you 1196 00:52:49,769 --> 00:52:50,509 is the correct 1197 00:52:50,509 --> 00:52:53,579 answer for this one? I need something to 1198 00:52:53,579 --> 00:52:56,579 uh...I have to score my model, and I 1199 00:52:56,579 --> 00:53:00,359 have to evaluate it, so I need 1200 00:53:00,359 --> 00:53:03,119 something to enable me to do these two 1201 00:53:03,119 --> 00:53:05,359 things. 1202 00:53:06,579 --> 00:53:08,233 [CARLOTTA]: I think it's something 1203 00:53:08,233 --> 00:53:10,640 you showed us in your pipeline, 1204 00:53:10,640 --> 00:53:13,260 right John? 1205 00:53:13,260 --> 00:53:16,819 [JOHN]: Of course I did. 1206 00:53:23,460 --> 00:53:25,122 [CARLOTTA]: Uh, we have no guesses 1207 00:53:25,122 --> 00:53:28,020 in the chat? 1208 00:53:28,020 --> 00:53:30,070 [JOHN]: Can someone... 1209 00:53:30,070 --> 00:53:32,280 Someone want to guess? 1210 00:53:32,280 --> 00:53:35,579 [CARLOTTA]: We have a "B". 1211 00:53:35,579 --> 00:53:38,760 [JOHN]: Uh, maybe. 1212 00:53:38,760 --> 00:53:43,260 So, in order to do this, 1213 00:53:43,260 --> 00:53:46,200 I mentioned the 1214 00:53:46,200 --> 00:53:49,380 the module that is going to help me 1215 00:53:49,380 --> 00:53:52,728 to divide my data into two things: 1216 00:53:52,728 --> 00:53:53,819 70 percent for the 1217 00:53:53,819 --> 00:53:56,220 the training and 30 percent for the 1218 00:53:56,220 --> 00:53:59,339 evaluation. So what did I use? I used 1219 00:53:59,339 --> 00:54:01,859 split data, because this is what is going 1220 00:54:01,859 --> 00:54:05,280 to split my data randomly into training 1221 00:54:05,280 --> 00:54:08,459 data and validation data. So the correct 1222 00:54:08,459 --> 00:54:12,240 answer is "B", and good job. Thank you 1223 00:54:12,240 --> 00:54:13,980 for participating. 1224 00:54:13,980 --> 00:54:17,400 Next question, please. 1225 00:54:17,400 --> 00:54:19,339 [CARLOTTA]: Yes, "B" is the correct 1226 00:54:19,339 --> 00:54:22,559 answer, so thanks, John, 1227 00:54:22,559 --> 00:54:26,040 for explaining to us the correct 1228 00:54:26,040 --> 00:54:26,940 one. 1229 00:54:26,940 --> 00:54:30,420 And we want to go with question two? 1230 00:54:30,420 --> 00:54:33,180 [JOHN]: Yeah, so, I'm going to ask you now, 1231 00:54:33,180 --> 00:54:35,880 Carlotta. You use Azure Machine Learning 1232 00:54:35,880 --> 00:54:38,280 designer to create a training pipeline 1233 00:54:38,280 --> 00:54:40,500 for your classification model. 1234 00:54:40,500 --> 00:54:44,099 What must you do before you deploy this 1235 00:54:44,099 --> 00:54:45,870 model as a service? You have to do 1236 00:54:45,870 --> 00:54:46,634 something before 1237 00:54:46,634 --> 00:54:47,439 you deploy it. 1238 00:54:47,439 --> 00:54:49,740 What do you think is the correct answer? 1239 00:54:49,740 --> 00:54:52,740 Is it "A", "B", or "C"? 1240 00:54:52,740 --> 00:54:55,020 Share your thoughts with— 1241 00:54:55,020 --> 00:54:56,690 with us in the chat and 1242 00:54:56,690 --> 00:55:00,180 and I'm also going to give you some 1243 00:55:00,180 --> 00:55:02,940 minutes to think of it before I 1244 00:55:02,940 --> 00:55:06,020 tell you about it. 1245 00:55:06,020 --> 00:55:07,765 [CARLOTTA]: Yeah so let me go 1246 00:55:07,765 --> 00:55:09,000 through the possible 1247 00:55:09,000 --> 00:55:12,359 answers, right? So we have A: "Create an 1248 00:55:12,359 --> 00:55:14,940 inference pipeline from the training 1249 00:55:14,940 --> 00:55:16,020 pipeline"; 1250 00:55:16,020 --> 00:55:19,260 B: we have "Add an Evaluate Model 1251 00:55:19,260 --> 00:55:22,380 module to the training pipeline; and then 1252 00:55:22,380 --> 00:55:25,079 three, we have "Clone the training 1253 00:55:25,079 --> 00:55:28,380 pipeline with a different name". 1254 00:55:29,520 --> 00:55:31,559 So what do you think is the correct 1255 00:55:31,559 --> 00:55:33,960 answer? "A", "B", or "C"? 1256 00:55:33,960 --> 00:55:36,660 Also this time, I think it's something 1257 00:55:36,660 --> 00:55:39,300 we mentioned both in the decks and in 1258 00:55:39,300 --> 00:55:41,960 the demo right? 1259 00:55:42,599 --> 00:55:44,819 [JOHN]: Yes it is, 1260 00:55:44,819 --> 00:55:46,793 it's something that I have done 1261 00:55:46,793 --> 00:55:50,410 like two, like five minutes ago. 1262 00:55:51,800 --> 00:55:57,200 It's real-time, real-time. 1263 00:55:57,200 --> 00:55:58,760 [CARLOTTA]: Um, 1264 00:55:58,760 --> 00:56:02,040 yeah, so, think about...you need to deploy 1265 00:56:02,040 --> 00:56:05,460 the model as a service. So if I'm 1266 00:56:05,460 --> 00:56:07,980 going to deploy model, 1267 00:56:07,980 --> 00:56:10,380 I cannot evaluate the model 1268 00:56:10,380 --> 00:56:12,839 after deploying it, right, because I 1269 00:56:12,839 --> 00:56:14,940 cannot go into production if I'm not 1270 00:56:14,940 --> 00:56:17,579 sure, I'm not satisfied with my model, and 1271 00:56:17,579 --> 00:56:19,500 I'm not sure that my model is performing 1272 00:56:19,500 --> 00:56:20,280 well. 1273 00:56:20,280 --> 00:56:22,900 So that's why I would go with, 1274 00:56:22,900 --> 00:56:24,319 um, 1275 00:56:24,319 --> 00:56:30,480 I would...exclude "B" from my 1276 00:56:30,480 --> 00:56:31,520 answer. 1277 00:56:31,520 --> 00:56:33,419 While 1278 00:56:33,419 --> 00:56:36,960 thinking about "C", uh, I don't see you—I 1279 00:56:36,960 --> 00:56:39,480 didn't see you, John, cloning the 1280 00:56:39,480 --> 00:56:41,420 training Pipeline with a different name, 1281 00:56:41,420 --> 00:56:44,640 so I don't think this is the 1282 00:56:44,640 --> 00:56:46,920 right answer. 1283 00:56:46,920 --> 00:56:49,619 While I've seen you creating an 1284 00:56:49,619 --> 00:56:52,729 inference pipeline from the 1285 00:56:52,729 --> 00:56:54,830 training pipeline, and you just converted 1286 00:56:54,830 --> 00:56:59,280 it using a one-click button, right? 1287 00:56:59,280 --> 00:57:01,400 [JOHN]: Yeah, that's correct. 1288 00:57:01,400 --> 00:57:04,280 So this is the right answer. 1289 00:57:04,280 --> 00:57:07,460 Good job. So I created an inference 1290 00:57:07,460 --> 00:57:11,280 real-time pipeline, and it has done. 1291 00:57:11,280 --> 00:57:13,440 It finished—it finished, the job is 1292 00:57:13,440 --> 00:57:18,000 finished. So we can now deploy. 1293 00:57:18,000 --> 00:57:19,400 And... 1294 00:57:19,400 --> 00:57:21,500 Yeah [LAUGHS]. 1295 00:57:21,500 --> 00:57:25,339 Exactly, like, on time. 1296 00:57:25,339 --> 00:57:27,839 Like, it finished two seconds... 1297 00:57:27,839 --> 00:57:30,859 three, four seconds ago [LAUGHS]. 1298 00:57:30,859 --> 00:57:33,119 So, uh, 1299 00:57:33,119 --> 00:57:36,480 until, um... 1300 00:57:36,480 --> 00:57:39,839 This is my job review, so 1301 00:57:39,839 --> 00:57:43,260 this is the job details that I 1302 00:57:43,260 --> 00:57:45,540 have already submitted, it's just opening, 1303 00:57:45,540 --> 00:57:47,459 and once it opens... 1304 00:57:47,459 --> 00:57:50,180 um... 1305 00:57:50,400 --> 00:57:52,740 I don't know why it's so heavy 1306 00:57:52,740 --> 00:57:56,780 today, it's not like that usually. 1307 00:57:57,780 --> 00:58:00,020 [CARLOTTA]: Yeah, it's probably because 1308 00:58:00,020 --> 00:58:01,020 you are also 1309 00:58:01,020 --> 00:58:06,000 showing your your screen on Teams, 1310 00:58:06,000 --> 00:58:08,160 so that's the bandwidth of your 1311 00:58:08,160 --> 00:58:08,944 connection. 1312 00:58:08,944 --> 00:58:10,740 [JOHN]: Let me do something here 1313 00:58:10,740 --> 00:58:13,740 because...yeah finally. 1314 00:58:13,740 --> 00:58:16,440 I can switch to my mobile internet if it 1315 00:58:16,440 --> 00:58:18,599 did it again. So I will click on "Deploy", 1316 00:58:18,599 --> 00:58:20,700 it's that simple. I'll just click on 1317 00:58:20,700 --> 00:58:23,040 "Deploy" and... 1318 00:58:23,040 --> 00:58:25,619 I am going to deploy a new real-time 1319 00:58:25,619 --> 00:58:27,960 endpoint. 1320 00:58:27,960 --> 00:58:30,300 So what I'm going to name it? 1321 00:58:30,300 --> 00:58:31,870 Description and the compute type. 1322 00:58:31,870 --> 00:58:33,372 Everything is already mentioned 1323 00:58:33,372 --> 00:58:34,140 for me here, 1324 00:58:34,140 --> 00:58:36,240 so I'm just gonna copy and paste it, 1325 00:58:36,240 --> 00:58:38,940 because we...we are running 1326 00:58:38,940 --> 00:58:41,280 out of time. 1327 00:58:41,280 --> 00:58:44,230 So it's all Azure Container Instance, 1328 00:58:44,230 --> 00:58:46,360 not Azure Kubernetes Service, 1329 00:58:46,360 --> 00:58:48,720 which is a containerization service also. 1330 00:58:48,720 --> 00:58:50,867 Both are for containerization, but this 1331 00:58:50,867 --> 00:58:53,613 gives you something, and this gives you something else. 1332 00:58:53,613 --> 00:58:54,960 For the advanced options, 1333 00:58:54,960 --> 00:58:57,420 it doesn't say for us to do anything, so 1334 00:58:57,420 --> 00:59:00,420 we are just gonna click on "Deploy", 1335 00:59:00,420 --> 00:59:05,220 and now we can test our endpoint from 1336 00:59:05,220 --> 00:59:07,859 the endpoints that we can find here, so 1337 00:59:07,859 --> 00:59:11,460 it's in progress. If I go here 1338 00:59:11,460 --> 00:59:13,799 under the assets, I can find something 1339 00:59:13,799 --> 00:59:16,680 called "Endpoints", and I can find the 1340 00:59:16,680 --> 00:59:18,599 real-time ones and the batch endpoints. 1341 00:59:18,599 --> 00:59:22,020 And we have created a real-time endpoint, 1342 00:59:22,020 --> 00:59:25,260 so we are going to find it under this 1343 00:59:25,260 --> 00:59:29,760 title. So if I click on it, I should 1344 00:59:29,760 --> 00:59:32,640 be able to test it once it's ready. 1345 00:59:32,640 --> 00:59:37,200 It's still loading, but this is the 1346 00:59:37,200 --> 00:59:40,980 input, and this is the output that we 1347 00:59:40,980 --> 00:59:44,652 will get back, so if I click on "Test"... 1348 00:59:44,652 --> 00:59:46,886 and from here, 1349 00:59:46,886 --> 00:59:49,810 I will input some data to the 1350 00:59:49,810 --> 00:59:50,900 endpoint, 1351 00:59:50,900 --> 00:59:54,599 which are: the patient information; the 1352 00:59:54,599 --> 00:59:57,119 columns that we have already seen in our 1353 00:59:57,119 --> 01:00:00,380 data set; the patient ID; the pregnancies. 1354 01:00:00,380 --> 01:00:03,960 And of course, of course I'm not gonna 1355 01:00:03,960 --> 01:00:05,940 enter the label that I'm trying to 1356 01:00:05,940 --> 01:00:08,099 predict, so I'm not going to give him if 1357 01:00:08,099 --> 01:00:10,360 the patient is diabetic or not. This 1358 01:00:10,360 --> 01:00:12,665 endpoint is to tell me this. 1359 01:00:12,665 --> 01:00:14,599 The endpoint, or the URL, 1360 01:00:14,599 --> 01:00:15,529 is going to give me 1361 01:00:15,529 --> 01:00:17,640 back this information, whether someone 1362 01:00:17,640 --> 01:00:22,680 has diabetes, or he doesn't. So if I input 1363 01:00:22,680 --> 01:00:24,780 this data, I'm just going to copy it, 1364 01:00:24,780 --> 01:00:27,780 and go to my endpoint, and click on 1365 01:00:27,780 --> 01:00:30,180 "Test", I'm gonna give the result pack, 1366 01:00:30,180 --> 01:00:32,359 which are the three columns that we have 1367 01:00:32,359 --> 01:00:35,520 defined inside our python script: the 1368 01:00:35,520 --> 01:00:37,859 patient ID, the diabetic prediction, and 1369 01:00:37,859 --> 01:00:41,040 the probability—the certainty of whether 1370 01:00:41,040 --> 01:00:45,720 someone is diabetic or not based on the... 1371 01:00:45,720 --> 01:00:49,090 uh...based on the prediction. 1372 01:00:49,090 --> 01:00:50,660 So that's it. 1373 01:00:50,660 --> 01:00:54,359 And, uh, I think that this is a really 1374 01:00:54,359 --> 01:00:56,729 simple step to do, you can do it on your 1375 01:00:56,729 --> 01:00:58,380 own, you can test it. 1376 01:00:58,380 --> 01:01:01,140 And I think that I have finished, so 1377 01:01:01,140 --> 01:01:03,020 thank you. 1378 01:01:03,020 --> 01:01:04,206 [CARLOTTA]: Uh, yes, 1379 01:01:04,206 --> 01:01:06,069 we are running out of time 1380 01:01:06,069 --> 01:01:09,780 I just wanted to thank you, John, for 1381 01:01:09,780 --> 01:01:12,299 this demo, for going through all these 1382 01:01:12,299 --> 01:01:13,429 steps to 1383 01:01:13,429 --> 01:01:16,740 um, create, train a classification model, 1384 01:01:16,740 --> 01:01:19,680 and also deploy it as a predictive 1385 01:01:19,680 --> 01:01:22,880 service. And I encourage you all to go 1386 01:01:22,880 --> 01:01:25,079 back to the learn module 1387 01:01:25,079 --> 01:01:28,260 and, um, deepen all these topics 1388 01:01:28,260 --> 01:01:31,760 at your own pace, and also maybe 1389 01:01:31,760 --> 01:01:34,799 uh do this demo on your own, on your 1390 01:01:34,799 --> 01:01:37,140 subscription on your Azure for Student 1391 01:01:37,140 --> 01:01:39,359 subscription. Um... 1392 01:01:39,359 --> 01:01:43,200 And I would also like to recall that 1393 01:01:43,200 --> 01:01:46,140 this is part of a series of study 1394 01:01:46,140 --> 01:01:49,500 sessions of Cloud Skill Challenge study 1395 01:01:49,500 --> 01:01:51,059 sessions, 1396 01:01:51,059 --> 01:01:54,059 so you will have more in the... 1397 01:01:54,059 --> 01:01:57,540 in the following days, and this is for 1398 01:01:57,540 --> 01:02:00,480 you to prepare, let's say, to help you 1399 01:02:00,480 --> 01:02:04,880 in taking the Cloud Skills Challenge, 1400 01:02:04,880 --> 01:02:07,040 which collect 1401 01:02:07,040 --> 01:02:10,599 a very interesting learn module that you 1402 01:02:10,599 --> 01:02:14,540 can use to scale up on various topics, 1403 01:02:14,540 --> 01:02:18,359 and some of them are focused on AI and 1404 01:02:18,359 --> 01:02:20,819 ML. So if you are interested in these 1405 01:02:20,819 --> 01:02:23,099 topics, you can select these these learn 1406 01:02:23,099 --> 01:02:24,780 modules. 1407 01:02:24,780 --> 01:02:27,660 So let me also copy 1408 01:02:27,660 --> 01:02:29,669 the link, the short link to the 1409 01:02:29,669 --> 01:02:32,420 challenge in the chat. Remember that 1410 01:02:32,420 --> 01:02:34,980 you have time until the 13th of 1411 01:02:34,980 --> 01:02:37,980 September to take the challenge. And also 1412 01:02:37,980 --> 01:02:40,440 remember that in October, on the 7th of 1413 01:02:40,440 --> 01:02:43,020 October, you have the—you can join the 1414 01:02:43,020 --> 01:02:46,619 student—the Student Developer Summit, 1415 01:02:46,619 --> 01:02:50,480 which is, uh, which will be a virtual or 1416 01:02:50,480 --> 01:02:53,220 in...for some for some cases a hybrid 1417 01:02:53,220 --> 01:02:55,880 event, so stay tuned, because you will 1418 01:02:55,880 --> 01:02:58,559 have some surprises in the following 1419 01:02:58,559 --> 01:03:01,260 days. And if you want to learn more about 1420 01:03:01,260 --> 01:03:03,480 this event you can check the Microsoft 1421 01:03:03,480 --> 01:03:08,099 Imaging Cap Twitter page and stay tuned. 1422 01:03:08,099 --> 01:03:11,230 So thank you everyone for joining 1423 01:03:11,230 --> 01:03:12,989 this session today, and thank you very 1424 01:03:12,989 --> 01:03:16,500 much, John, for co-hosting with this 1425 01:03:16,500 --> 01:03:20,359 session with me. It was a pleasure. 1426 01:03:21,227 --> 01:03:22,838 [JOHN]: Thank you so much, 1427 01:03:22,838 --> 01:03:23,969 Carlotta, for having me 1428 01:03:23,969 --> 01:03:26,249 with you today, and thank you for 1429 01:03:26,249 --> 01:03:27,670 giving me this opportunity to 1430 01:03:27,670 --> 01:03:30,180 be with you here. 1431 01:03:30,180 --> 01:03:32,070 [CARLOTTA]: Great, thank you. 1432 01:03:32,070 --> 01:03:33,420 [JOHN]: Yeah, I hope that we 1433 01:03:33,420 --> 01:03:35,390 work again in the future. 1434 01:03:35,390 --> 01:03:37,880 [CARLOTTA]: Sure, I hope so as well. 1435 01:03:37,880 --> 01:03:40,700 Um, so, thank you everyone. 1436 01:03:40,700 --> 01:03:43,749 And have a nice rest of your day. 1437 01:03:44,099 --> 01:03:46,500 Bye-bye. Speak to you soon. 1438 01:03:46,500 --> 01:03:48,920 [JOHN]: Bye.