1
00:00:01,920 --> 00:00:04,072
[CARLOTTA]: Great, so I think we can start

2
00:00:04,072 --> 00:00:06,340
since the meeting is recorded, so if

3
00:00:06,340 --> 00:00:10,090
everyone, uh, jump-jumps in later, they

4
00:00:10,090 --> 00:00:12,420
can watch the recording.

5
00:00:12,420 --> 00:00:15,780
So, hi everyone and welcome to this

6
00:00:15,780 --> 00:00:18,000
um, Cloud Skill Challenge study session

7
00:00:18,000 --> 00:00:20,880
around a create classification models

8
00:00:20,880 --> 00:00:24,000
with Azure Machine learning designer.

9
00:00:24,000 --> 00:00:27,240
So today I'm thrilled to be here with

10
00:00:27,240 --> 00:00:29,149
John. Uh, John do you mind

11
00:00:29,149 --> 00:00:31,619
introduce briefly yourself?

12
00:00:31,619 --> 00:00:33,160
[JOHN]: Uh, thank you Carlotta.

13
00:00:33,160 --> 00:00:34,160
Hello everyone.

14
00:00:34,160 --> 00:00:38,080
Welcome to our workshop today. I hope

15
00:00:38,080 --> 00:00:40,559
that you are all excited for it. I am

16
00:00:40,559 --> 00:00:43,140
John Aziz, a gold Microsoft Learn student

17
00:00:43,140 --> 00:00:47,460
ambassador, and I will be here with, uh,

18
00:00:47,460 --> 00:00:50,760
Carlotta to do the practical part

19
00:00:50,760 --> 00:00:53,820
about this module of the Cloud Skills

20
00:00:53,820 --> 00:00:56,623
Challenge. Thank you for having me.

21
00:00:56,623 --> 00:00:58,219
[CARLOTTA]: Perfect, thanks John.

22
00:00:58,219 --> 00:00:59,623
So for those who

23
00:00:59,623 --> 00:01:03,440
don't know me, I'm Carlotta Castelluccio,

24
00:01:03,440 --> 00:01:06,479
based in Italy and focused on AI

25
00:01:06,479 --> 00:01:08,760
machine learning technologies and about

26
00:01:08,760 --> 00:01:11,200
the use in education.

27
00:01:11,200 --> 00:01:12,340
Um, so,

28
00:01:12,737 --> 00:01:14,537
um this Cloud Skill Challenge study

29
00:01:14,537 --> 00:01:17,117
session is based on a learn module, a

30
00:01:17,120 --> 00:01:21,080
dedicated learn module. I sent to you, uh

31
00:01:21,320 --> 00:01:23,939
the link to this module, uh, in the chat

32
00:01:23,939 --> 00:01:25,619
in a way that you can follow along the

33
00:01:25,619 --> 00:01:28,680
module if you want, or just have a look at

34
00:01:28,680 --> 00:01:32,470
the module later at your own pace.

35
00:01:32,470 --> 00:01:33,780
Um...

36
00:01:33,780 --> 00:01:37,020
So, before starting I would also like to

37
00:01:37,020 --> 00:01:40,619
remember to remember you, uh, the code of

38
00:01:40,619 --> 00:01:43,439
conduct and guidelines of our student

39
00:01:43,439 --> 00:01:47,510
ambassadors community. So please during this

40
00:01:47,510 --> 00:01:51,000
meeting be respectful and inclusive and

41
00:01:51,000 --> 00:01:53,579
be friendly, open, and welcoming and

42
00:01:53,579 --> 00:01:56,159
respectful of other-each other

43
00:01:56,159 --> 00:01:57,720
differences.

44
00:01:57,720 --> 00:02:01,200
If you want to learn more about the code

45
00:02:01,200 --> 00:02:03,390
of conduct, you can use this link in the

46
00:02:03,390 --> 00:02:08,880
deck: aka.ms/SACoC.

47
00:02:09,660 --> 00:02:11,730
And now we are,

48
00:02:11,730 --> 00:02:15,420
um, we are ready to to start our session.

49
00:02:15,420 --> 00:02:18,959
So as we mentioned it we are going to

50
00:02:18,959 --> 00:02:21,980
focus on classification models and Azure ML,

51
00:02:21,980 --> 00:02:24,900
uh, today. So, first of all, we are going

52
00:02:24,900 --> 00:02:28,430
to, um, identify, uh, the kind of

53
00:02:28,430 --> 00:02:31,080
um, of scenarios in which you should

54
00:02:31,080 --> 00:02:34,490
choose to use a classification model.

55
00:02:34,490 --> 00:02:36,660
We're going to introduce Azure Machine

56
00:02:36,660 --> 00:02:39,060
Learning and Azure Machine Designer.

57
00:02:39,060 --> 00:02:41,879
We're going to understand, uh, which are

58
00:02:41,879 --> 00:02:43,680
the steps to follow, to create a

59
00:02:43,680 --> 00:02:46,200
classification model in Azure Machine

60
00:02:46,200 --> 00:02:48,076
Learning, and then John will,

61
00:02:48,076 --> 00:02:49,500
um,

62
00:02:49,500 --> 00:02:52,219
lead an amazing demo about training and

63
00:02:52,219 --> 00:02:54,300
publishing a classification model in

64
00:02:54,300 --> 00:02:57,000
Azure ML Designer.

65
00:02:57,000 --> 00:02:59,819
So, let's start from the beginning. Let's

66
00:02:59,819 --> 00:03:02,640
start from identifying classification

67
00:03:02,640 --> 00:03:05,220
machine learning scenarios.

68
00:03:05,220 --> 00:03:07,640
So, first of all, what is classification?

69
00:03:07,640 --> 00:03:09,959
Classification is a form of machine

70
00:03:09,959 --> 00:03:12,120
learning that is used to predict which

71
00:03:12,120 --> 00:03:15,599
category or class an item belongs to. For

72
00:03:15,599 --> 00:03:17,340
example, we might want to develop a

73
00:03:17,340 --> 00:03:19,800
classifier able to identify if an

74
00:03:19,800 --> 00:03:22,200
incoming email should be filtered or not

75
00:03:22,200 --> 00:03:25,080
according to the style, the sender, the

76
00:03:25,080 --> 00:03:26,935
length of the email, etc.

77
00:03:26,935 --> 00:03:28,140
In this case, the

78
00:03:28,140 --> 00:03:30,060
characteristics of the email are the

79
00:03:30,060 --> 00:03:31,080
features.

80
00:03:31,080 --> 00:03:34,200
And the label is a classification of

81
00:03:34,200 --> 00:03:38,099
either a zero or one, representing a spam

82
00:03:38,099 --> 00:03:40,860
or non-spam for the incoming email. So

83
00:03:40,860 --> 00:03:42,360
this is an example of a binary

84
00:03:42,360 --> 00:03:44,400
classifier. If you want to assign

85
00:03:44,400 --> 00:03:46,260
multiple categories to the incoming

86
00:03:46,260 --> 00:03:48,959
email like work letters, love letters,

87
00:03:48,959 --> 00:03:52,080
complaints, or other categories, in this

88
00:03:52,080 --> 00:03:54,000
case a binary classifier is no longer

89
00:03:54,000 --> 00:03:55,739
enough, and we should develop a

90
00:03:55,739 --> 00:03:58,319
multi-class classifier. So classification

91
00:03:58,319 --> 00:04:00,599
is an example of what is called

92
00:04:00,599 --> 00:04:02,519
supervised machine learning

93
00:04:02,519 --> 00:04:05,280
in which you train a model using data

94
00:04:05,280 --> 00:04:07,080
that includes both the features and

95
00:04:07,080 --> 00:04:08,879
known values for label

96
00:04:08,879 --> 00:04:11,099
so that the model learns to fit the

97
00:04:11,099 --> 00:04:13,560
feature combinations to the label. Then,

98
00:04:13,560 --> 00:04:15,420
after training has been completed, you

99
00:04:15,420 --> 00:04:17,040
can use the train model to predict

100
00:04:17,040 --> 00:04:19,500
labels for new items for-for which the

101
00:04:19,500 --> 00:04:22,320
label is unknown.

102
00:04:22,320 --> 00:04:25,440
But let's see some examples of scenarios

103
00:04:25,440 --> 00:04:27,120
for classification machine learning

104
00:04:27,120 --> 00:04:29,160
models. So, we already mentioned an

105
00:04:29,160 --> 00:04:31,020
example of a solution in which we would

106
00:04:31,020 --> 00:04:33,660
need a classifier, but let's explore

107
00:04:33,660 --> 00:04:35,699
other scenarios for classification in

108
00:04:35,699 --> 00:04:37,979
other industries. For example, you can use

109
00:04:37,979 --> 00:04:40,380
a classification model for a health

110
00:04:40,380 --> 00:04:43,680
clinic scenario, and use clinical data to

111
00:04:43,680 --> 00:04:45,720
predict whether patient will become sick

112
00:04:45,720 --> 00:04:47,060
or not.

113
00:04:47,060 --> 00:04:49,553
You can use, um...

114
00:04:49,553 --> 00:04:59,250
[NO AUDIO]

115
00:04:59,250 --> 00:05:00,930
[JOHN]: Carlotta, you are muted.

116
00:05:03,780 --> 00:05:07,700
[CARLOTTA]: Oh, sorry. 
So, when I became muted, it's a

117
00:05:07,700 --> 00:05:08,807
long time, or?

118
00:05:08,807 --> 00:05:11,940
[JOHN]: You can use-you can use, uh

119
00:05:11,940 --> 00:05:13,430
some models for classification.

120
00:05:13,430 --> 00:05:14,729
For example, you can use...

121
00:05:14,729 --> 00:05:16,919
You were saying this.

122
00:05:16,919 --> 00:05:20,020
[CARLOTTA]: Uh, so I was in this deck,

123
00:05:20,020 --> 00:05:21,660
or the previous one?

124
00:05:21,660 --> 00:05:24,180
[JOHN]: This one, you have been muted

125
00:05:24,180 --> 00:05:25,901
for, uh, one second [LAUGHS].

126
00:05:25,901 --> 00:05:28,018
[CARLOTTA]: Okay, okay perfect, perfect.

127
00:05:28,018 --> 00:05:30,419
Uh, yeah I was talking...sorry for

128
00:05:30,419 --> 00:05:33,278
that. So, I was talking about the possible

129
00:05:33,278 --> 00:05:34,560
scenarios in which you,

130
00:05:34,560 --> 00:05:37,320
you can use a classification model. Like

131
00:05:37,320 --> 00:05:39,660
have clinic scenario, financial scenario,

132
00:05:39,660 --> 00:05:41,699
or the third one is business type of

133
00:05:41,699 --> 00:05:44,100
scenario. You can use characteristics of

134
00:05:44,100 --> 00:05:45,900
small business to predict if a new

135
00:05:45,900 --> 00:05:47,880
venture will succeed or not, for

136
00:05:47,880 --> 00:05:49,560
example. And these are all types of

137
00:05:49,560 --> 00:05:52,160
binary classification.

138
00:05:52,160 --> 00:05:55,199
Uh, but today we are also going to talk

139
00:05:55,199 --> 00:05:57,240
about Azure Machine Learning. So let's

140
00:05:57,240 --> 00:05:58,139
see.

141
00:05:58,139 --> 00:06:00,660
What is Azure Machine Learning? So

142
00:06:00,660 --> 00:06:02,160
training and deploying an effective

143
00:06:02,160 --> 00:06:04,199
machine learning model involves a lot of

144
00:06:04,199 --> 00:06:06,539
work, much of it time-consuming and

145
00:06:06,539 --> 00:06:08,880
resource intensive. So, Azure Machine

146
00:06:08,880 --> 00:06:11,039
Learning is a cloud-based service that

147
00:06:11,039 --> 00:06:12,780
helps simplify some of the tasks it

148
00:06:12,780 --> 00:06:15,720
takes to prepare data, train a model, and

149
00:06:15,720 --> 00:06:18,060
also deploy it as a predictive service.

150
00:06:18,060 --> 00:06:20,220
So it helps that the scientists increase

151
00:06:20,220 --> 00:06:22,380
their efficiency by automating many of

152
00:06:22,380 --> 00:06:24,660
the time-consuming tasks associated to

153
00:06:24,660 --> 00:06:27,539
creating and training a model.

154
00:06:27,539 --> 00:06:29,520
And it enables them also to use

155
00:06:29,520 --> 00:06:31,740
cloud-based compute resources that scale

156
00:06:31,740 --> 00:06:33,720
effectively to handle large volumes of

157
00:06:33,720 --> 00:06:36,300
data while incurring costs only when

158
00:06:36,300 --> 00:06:38,699
actually used.

159
00:06:38,699 --> 00:06:41,220
To use Azure Machine Learning, you,

160
00:06:41,220 --> 00:06:43,199
first thing's first, you need to create a

161
00:06:43,199 --> 00:06:44,940
workspace resource in your Azure

162
00:06:44,940 --> 00:06:47,520
subscription, and you can then use these

163
00:06:47,520 --> 00:06:50,220
workspace to manage data, compute

164
00:06:50,220 --> 00:06:52,440
resources, code models and other

165
00:06:52,440 --> 00:06:54,959
artifacts after you have created an

166
00:06:54,959 --> 00:06:56,519
Azure Machine Learning workspace,

167
00:06:56,519 --> 00:06:57,808
you can develop solutions with the

168
00:06:57,808 --> 00:06:59,338
Azure Machine Learning service,

169
00:06:59,338 --> 00:07:00,840
either with developer

170
00:07:00,840 --> 00:07:02,580
tools or the Azure Machine Learning

171
00:07:02,580 --> 00:07:04,088
studio web portal.

172
00:07:04,088 --> 00:07:06,440
In particular, 
Azure Machine Learning studio

173
00:07:06,440 --> 00:07:07,800
is a web portal for machine

174
00:07:07,800 --> 00:07:09,720
learning solutions in Azure, and it

175
00:07:09,720 --> 00:07:11,639
includes a wide range of features and

176
00:07:11,639 --> 00:07:13,800
capabilities that help data scientists

177
00:07:13,800 --> 00:07:16,259
prepare data, train models, publish

178
00:07:16,259 --> 00:07:18,479
predictive services, and monitor also

179
00:07:18,479 --> 00:07:19,680
their usage.

180
00:07:19,680 --> 00:07:22,139
So to begin using the web portal, you need

181
00:07:22,139 --> 00:07:23,294
to assign the workspace

182
00:07:23,294 --> 00:07:24,781
you created in the Azure portal

183
00:07:24,781 --> 00:07:26,819
to the Azure Machine

184
00:07:26,819 --> 00:07:29,520
Learning studio.

185
00:07:29,520 --> 00:07:31,800
At its core, Azure Machine Learning is a

186
00:07:31,800 --> 00:07:33,720
service for training and managing

187
00:07:33,720 --> 00:07:36,000
machine learning models for which you

188
00:07:36,000 --> 00:07:38,220
need compute resources on which to run

189
00:07:38,220 --> 00:07:39,919
the training process.

190
00:07:39,919 --> 00:07:44,280
Compute targets are, um, one of the main

191
00:07:44,280 --> 00:07:46,740
basic concepts of Azure Machine Learning.

192
00:07:46,740 --> 00:07:48,780
They are cloud-based resources on which

193
00:07:48,780 --> 00:07:50,639
you can run model training and data

194
00:07:50,639 --> 00:07:53,220
exploration processes.

195
00:07:53,220 --> 00:07:54,780
So in Azure Machine Learning studio, you

196
00:07:54,780 --> 00:07:56,759
can manage the compute targets for your

197
00:07:56,759 --> 00:07:58,740
data science activities, and there are

198
00:07:58,740 --> 00:08:03,240
four kinds of of compute targets you can

199
00:08:03,240 --> 00:08:05,940
create. We have the compute instances,

200
00:08:05,940 --> 00:08:09,539
which are vital machine set up for

201
00:08:09,539 --> 00:08:10,979
running machine learning code during

202
00:08:10,979 --> 00:08:13,319
development, so they are not designed for

203
00:08:13,319 --> 00:08:14,460
production.

204
00:08:14,460 --> 00:08:17,099
Then we have compute clusters, which are

205
00:08:17,099 --> 00:08:19,800
a set of virtual machines that can scale

206
00:08:19,800 --> 00:08:22,199
up automatically based on traffic.

207
00:08:22,199 --> 00:08:24,599
We have inference clusters, which are

208
00:08:24,599 --> 00:08:26,699
similar to compute clusters, but they are

209
00:08:26,699 --> 00:08:29,340
designed for deployment, so they are

210
00:08:29,340 --> 00:08:31,979
deployment targets for predictive

211
00:08:31,979 --> 00:08:35,820
services that use trained models.

212
00:08:35,820 --> 00:08:38,339
And finally, we have attached compute,

213
00:08:38,339 --> 00:08:41,339
which are any compute target that you

214
00:08:41,339 --> 00:08:44,159
manage yourself outside of Azure ML, like,

215
00:08:44,159 --> 00:08:46,560
for example, virtual machines or Azure

216
00:08:46,560 --> 00:08:49,700
data bricks clusters.

217
00:08:49,980 --> 00:08:52,800
So we talked about Azure Machine

218
00:08:52,800 --> 00:08:54,300
Learning, but we also mentioned-

219
00:08:54,300 --> 00:08:55,500
mentioned Azure Machine Learning

220
00:08:55,500 --> 00:08:57,540
designer. What is Azure Machine Learning

221
00:08:57,540 --> 00:09:00,120
designer? So, in Azure Machine Learning

222
00:09:00,120 --> 00:09:02,880
Studio, there are several ways to author

223
00:09:02,880 --> 00:09:04,560
classification machine learning models.

224
00:09:04,560 --> 00:09:08,100
One way is to use a visual interface, and

225
00:09:08,100 --> 00:09:10,260
this visual interface is called designer,

226
00:09:10,260 --> 00:09:13,140
and you can use it to train, test, and

227
00:09:13,140 --> 00:09:15,540
also deploy machine learning models. And

228
00:09:15,540 --> 00:09:17,940
the drag-and-drop interface makes use of

229
00:09:17,940 --> 00:09:20,279
clearly defined inputs and outputs that

230
00:09:20,279 --> 00:09:22,680
can be shared, reused, and also version

231
00:09:22,680 --> 00:09:23,880
control.

232
00:09:23,880 --> 00:09:25,920
And using the designer, you can identify

233
00:09:25,920 --> 00:09:28,080
the building blocks or components needed

234
00:09:28,080 --> 00:09:30,839
for your model, place and connect them on

235
00:09:30,839 --> 00:09:33,120
your canvas, and run a machine learning

236
00:09:33,120 --> 00:09:35,300
job.

237
00:09:35,399 --> 00:09:36,779
So,

238
00:09:36,779 --> 00:09:39,120
each designer project, so each project

239
00:09:39,120 --> 00:09:42,360
in the designer is known as a pipeline.

240
00:09:42,360 --> 00:09:45,600
And in the design, we have a left panel

241
00:09:45,600 --> 00:09:48,360
for navigation and a canvas on your

242
00:09:48,360 --> 00:09:50,640
right hand side in which you build your

243
00:09:50,640 --> 00:09:53,940
pipeline visually. So pipelines let you

244
00:09:53,940 --> 00:09:56,100
organize, manage, and reuse complex

245
00:09:56,100 --> 00:09:58,260
machine learning workflows across

246
00:09:58,260 --> 00:10:00,480
projects and users.

247
00:10:00,480 --> 00:10:03,000
A pipeline starts with the data set from

248
00:10:03,000 --> 00:10:04,140
which you want to train the model

249
00:10:04,140 --> 00:10:05,880
because all begins with data when

250
00:10:05,880 --> 00:10:07,380
talking about data science and machine

251
00:10:07,380 --> 00:10:09,540
learning. And each time you run a

252
00:10:09,540 --> 00:10:10,980
pipeline, the configuration of the

253
00:10:10,980 --> 00:10:12,959
pipeline and its results are stored in

254
00:10:12,959 --> 00:10:17,339
your workspace as a pipeline job.

255
00:10:17,339 --> 00:10:21,959
So the second main concept of Azure

256
00:10:21,959 --> 00:10:25,080
Machine Learning is a component. So, going

257
00:10:25,080 --> 00:10:28,440
hierarchically from the pipeline, we can

258
00:10:28,440 --> 00:10:30,540
say that each building block of a

259
00:10:30,540 --> 00:10:32,920
pipeline is called a component.

260
00:10:32,920 --> 00:10:34,120
In other words, an Azure Machine

261
00:10:34,120 --> 00:10:36,959
Learning component encapsulates one step

262
00:10:36,959 --> 00:10:39,420
in a machine learning pipeline. So, it's a

263
00:10:39,420 --> 00:10:41,640
reusable piece of code with inputs and

264
00:10:41,640 --> 00:10:44,100
outputs, something very similar to a

265
00:10:44,100 --> 00:10:46,500
function in any programming language.

266
00:10:46,500 --> 00:10:48,899
And in a pipeline project, you can access

267
00:10:48,899 --> 00:10:51,480
data assets and components from the left

268
00:10:51,480 --> 00:10:52,700
panels

269
00:10:52,700 --> 00:10:56,279
Asset Library tab, as you can see

270
00:10:56,279 --> 00:11:00,200
here in the screenshot in the deck.

271
00:11:00,300 --> 00:11:03,360
So you can create data assets on using

272
00:11:03,360 --> 00:11:08,339
an ADOC page called Data Page. And a data

273
00:11:08,339 --> 00:11:11,160
asset is a reference to a data source

274
00:11:11,160 --> 00:11:12,480
location.

275
00:11:12,480 --> 00:11:15,720
So this data source location could be a

276
00:11:15,720 --> 00:11:18,779
local file, a data store, a web file or

277
00:11:18,779 --> 00:11:21,660
even an Azure open asset.

278
00:11:21,660 --> 00:11:23,880
And these data assets will appear along

279
00:11:23,880 --> 00:11:26,459
with standard sample data set in the

280
00:11:26,459 --> 00:11:30,019
designers Asset Library.

281
00:11:30,079 --> 00:11:31,560
Um.

282
00:11:31,560 --> 00:11:36,959
Another basic concept of Azure ML is

283
00:11:36,959 --> 00:11:38,880
Azure Machine Learning jobs.

284
00:11:38,880 --> 00:11:43,519
So, basically, when you submit a pipeline,

285
00:11:43,519 --> 00:11:47,040
you create a job which will run all the

286
00:11:47,040 --> 00:11:49,920
steps in your pipeline. So a job executes

287
00:11:49,920 --> 00:11:52,800
a task against a specified compute

288
00:11:52,800 --> 00:11:53,760
target.

289
00:11:53,760 --> 00:11:56,640
Jobs enable systematic tracking for your

290
00:11:56,640 --> 00:11:58,560
machine learning experimentation in

291
00:11:58,560 --> 00:11:59,880
Azure ML.

292
00:11:59,880 --> 00:12:02,399
And once a job is created, Azure ML

293
00:12:02,399 --> 00:12:05,459
maintains a run record, uh, for the

294
00:12:05,459 --> 00:12:07,640
job.

295
00:12:07,877 --> 00:12:12,180
Um, but, let's move to the classification

296
00:12:12,180 --> 00:12:14,040
steps. So,

297
00:12:14,040 --> 00:12:17,160
um, let's introduce how to create a

298
00:12:17,160 --> 00:12:21,360
classification model in Azure ML, but you

299
00:12:21,360 --> 00:12:23,640
will see it in more details in a

300
00:12:23,640 --> 00:12:26,339
handsome demo that John will guide

301
00:12:26,339 --> 00:12:29,459
through in a few minutes.

302
00:12:29,459 --> 00:12:32,220
So, you can think of the steps to train

303
00:12:32,220 --> 00:12:33,720
and evaluate a classification machine

304
00:12:33,720 --> 00:12:36,660
learning model as four main steps. So

305
00:12:36,660 --> 00:12:38,459
first of all, you need to prepare your

306
00:12:38,459 --> 00:12:41,100
data. So, you need to identify the

307
00:12:41,100 --> 00:12:43,139
features and the label in your data set,

308
00:12:43,139 --> 00:12:46,139
you need to pre-process, so you need to

309
00:12:46,139 --> 00:12:48,839
clean and transform the data as needed.

310
00:12:48,839 --> 00:12:51,120
Then, the second step, of course, is

311
00:12:51,120 --> 00:12:52,740
training the model.

312
00:12:52,740 --> 00:12:54,600
And for training the model, you need to

313
00:12:54,600 --> 00:12:57,060
split the data into two groups: a

314
00:12:57,060 --> 00:12:59,519
training and a validation set.

315
00:12:59,519 --> 00:13:01,320
Then you train a machine learning model

316
00:13:01,320 --> 00:13:03,540
using the training data set and you test

317
00:13:03,540 --> 00:13:05,040
the machine learning model for

318
00:13:05,040 --> 00:13:06,889
performance using the validation data

319
00:13:06,889 --> 00:13:08,100
set.

320
00:13:08,100 --> 00:13:12,180
The third step is performance evaluation,

321
00:13:12,180 --> 00:13:14,519
which means comparing how close the

322
00:13:14,519 --> 00:13:16,139
model's predictions are to the known

323
00:13:16,139 --> 00:13:20,519
labels and these lead us to compute some

324
00:13:20,519 --> 00:13:23,279
evaluation performance metrics.

325
00:13:23,279 --> 00:13:25,740
And then finally...

326
00:13:25,740 --> 00:13:29,051
So, these three steps are not,

327
00:13:29,051 --> 00:13:32,770
um, not performed every time in a

328
00:13:32,770 --> 00:13:35,459
linear manner. It's more an iterative

329
00:13:35,459 --> 00:13:39,420
process. But once you obtain, you achieve

330
00:13:39,420 --> 00:13:42,959
a performance with which you are

331
00:13:42,959 --> 00:13:45,779
satisfied, so you are ready to, let's say

332
00:13:45,779 --> 00:13:48,660
go into production, and you can deploy

333
00:13:48,660 --> 00:13:51,920
your train model as a predictive service

334
00:13:51,920 --> 00:13:55,980
into a real-time, uh, to a real-time

335
00:13:55,980 --> 00:13:58,019
endpoint. And to do so, you need to

336
00:13:58,019 --> 00:14:00,240
convert the training pipeline into a

337
00:14:00,240 --> 00:14:02,820
real-time inference pipeline, and then

338
00:14:02,820 --> 00:14:04,260
you can deploy the model as an

339
00:14:04,260 --> 00:14:06,779
application on a server or device so

340
00:14:06,779 --> 00:14:11,420
that others can consume this model.

341
00:14:11,459 --> 00:14:14,279
So let's start with the first step, which

342
00:14:14,279 --> 00:14:17,700
is prepare data. Real-world data can contain

343
00:14:17,700 --> 00:14:19,920
many different issues that can affect

344
00:14:19,920 --> 00:14:22,320
the utility of the data and our

345
00:14:22,320 --> 00:14:24,959
interpretation of the results. So also

346
00:14:24,959 --> 00:14:26,579
the machine learning model that you

347
00:14:26,579 --> 00:14:29,279
train using this data. For example, real-

348
00:14:29,279 --> 00:14:31,440
world data can be affected by a bad

349
00:14:31,440 --> 00:14:34,079
recording or a bad measurement, and it

350
00:14:34,079 --> 00:14:36,480
can also contain missing values for some

351
00:14:36,480 --> 00:14:38,880
parameters. And Azure Machine Learning

352
00:14:38,880 --> 00:14:40,860
designer has several pre-built

353
00:14:40,860 --> 00:14:43,019
components that can be used to prepare

354
00:14:43,019 --> 00:14:46,079
data for training. These components

355
00:14:46,079 --> 00:14:48,300
enable you to clean data, normalize

356
00:14:48,300 --> 00:14:52,940
features, join tables, and more.

357
00:14:53,000 --> 00:14:57,120
Let's come to training. So, to train a

358
00:14:57,120 --> 00:14:59,220
classification model you need a data set

359
00:14:59,220 --> 00:15:02,160
that includes historical features, so the

360
00:15:02,160 --> 00:15:03,899
characteristics of the entity for which

361
00:15:03,899 --> 00:15:06,899
one to make a prediction, and known label

362
00:15:06,899 --> 00:15:09,779
values. The label is the class indicator

363
00:15:09,779 --> 00:15:11,820
we want to train a model to predict.

364
00:15:11,820 --> 00:15:13,920
And it's common practice to train a

365
00:15:13,920 --> 00:15:16,199
model using a subset of the data while

366
00:15:16,199 --> 00:15:18,300
holding back some data with which to

367
00:15:18,300 --> 00:15:20,760
test the train model. And this enables

368
00:15:20,760 --> 00:15:22,440
you to compare the labels that the model

369
00:15:22,440 --> 00:15:25,380
predicts with the actual known labels in

370
00:15:25,380 --> 00:15:27,420
the original data set.

371
00:15:27,420 --> 00:15:29,880
This operation can be performed in the

372
00:15:29,880 --> 00:15:32,100
designer using the split data component

373
00:15:32,100 --> 00:15:34,740
as shown by the screenshot here in the...

374
00:15:34,740 --> 00:15:36,660
in the deck.

375
00:15:36,660 --> 00:15:39,540
There's also another component that you

376
00:15:39,540 --> 00:15:40,980
should use, which is the score model

377
00:15:40,980 --> 00:15:43,139
component to generate the predicted

378
00:15:43,139 --> 00:15:45,360
class label value using the validation

379
00:15:45,360 --> 00:15:48,060
data as input. So once you connect all

380
00:15:48,060 --> 00:15:49,800
these components,

381
00:15:49,800 --> 00:15:52,440
the component specifying the

382
00:15:52,440 --> 00:15:54,959
model we are going to use, the split data

383
00:15:54,959 --> 00:15:57,060
component, the trained model component,

384
00:15:57,060 --> 00:16:00,300
and the score model component, you want

385
00:16:00,300 --> 00:16:02,639
to run a new experiment in

386
00:16:02,639 --> 00:16:05,760
Azure ML, which will use the data set

387
00:16:05,760 --> 00:16:09,600
on the canvas to train and score a model.

388
00:16:09,600 --> 00:16:12,000
After training a model, it is important,

389
00:16:12,000 --> 00:16:14,639
we say, to evaluate its performance, to

390
00:16:14,639 --> 00:16:17,060
understand how bad-how good sorry

391
00:16:17,060 --> 00:16:20,760
our model is performing.

392
00:16:20,760 --> 00:16:22,680
And there are many performance metrics

393
00:16:22,680 --> 00:16:24,600
and methodologies for evaluating how

394
00:16:24,600 --> 00:16:27,000
well a model makes predictions. The

395
00:16:27,000 --> 00:16:29,160
component to use to perform evaluation

396
00:16:29,160 --> 00:16:32,220
in Azure ML designer is called, as

397
00:16:32,220 --> 00:16:35,060
intuitive as it is, Evaluate Model.

398
00:16:35,060 --> 00:16:38,339
Once the job of training and evaluation

399
00:16:38,339 --> 00:16:40,740
of the model is completed, you can review

400
00:16:40,740 --> 00:16:42,959
evaluation metrics on the completed job

401
00:16:42,959 --> 00:16:45,860
page by right clicking on the component.

402
00:16:45,860 --> 00:16:48,480
In the evaluation results, you can also

403
00:16:48,480 --> 00:16:51,000
find the so-called confusion Matrix that

404
00:16:51,000 --> 00:16:53,399
you can see here in the right side of

405
00:16:53,399 --> 00:16:55,079
this deck

406
00:16:55,079 --> 00:16:57,420
A confusion matrix shows cases where

407
00:16:57,420 --> 00:16:59,220
both the predicted and actual values

408
00:16:59,220 --> 00:17:01,980
were one, the so-called true positives

409
00:17:01,980 --> 00:17:04,500
at the top left and also cases where

410
00:17:04,500 --> 00:17:06,600
both the predicted and the actual values

411
00:17:06,600 --> 00:17:08,459
were zero, the so-called true negatives

412
00:17:08,459 --> 00:17:10,919
at the bottom right. While the other

413
00:17:10,919 --> 00:17:13,679
cells show cases where the predicting

414
00:17:13,679 --> 00:17:15,380
and actual values differ,

415
00:17:15,380 --> 00:17:17,939
called false positive and false

416
00:17:17,939 --> 00:17:19,919
negatives, and this is an example of a

417
00:17:19,919 --> 00:17:23,579
confusion matrix for a binary classifier.

418
00:17:23,579 --> 00:17:25,559
While for a multi-class classification

419
00:17:25,559 --> 00:17:28,079
model the same approach is used to

420
00:17:28,079 --> 00:17:30,120
tabulate each possible combination of

421
00:17:30,120 --> 00:17:32,940
actual and predictive value counts. So

422
00:17:32,940 --> 00:17:34,740
for example, a model with three possible

423
00:17:34,740 --> 00:17:37,559
classes would result in three times

424
00:17:37,559 --> 00:17:39,120
three matrix.

425
00:17:39,120 --> 00:17:41,880
The confusion matrix is also useful for

426
00:17:41,880 --> 00:17:43,860
the matrix that can be derived from it,

427
00:17:43,860 --> 00:17:48,260
like accuracy, recall, or precision.

428
00:17:49,320 --> 00:17:52,080
We say that the last step is

429
00:17:52,080 --> 00:17:55,620
deploying the train model to a real-time

430
00:17:55,620 --> 00:17:59,280
endpoint as a predictive service. And in

431
00:17:59,280 --> 00:18:00,900
order to automate your model into a

432
00:18:00,900 --> 00:18:02,760
service that makes continuous

433
00:18:02,760 --> 00:18:04,980
predictions, you need, first of all, to

434
00:18:04,980 --> 00:18:08,039
create and then deploy an

435
00:18:08,039 --> 00:18:10,080
inference pipeline. The process of

436
00:18:10,080 --> 00:18:11,940
converting the training pipeline into a

437
00:18:11,940 --> 00:18:13,980
real-time inference pipeline removes

438
00:18:13,980 --> 00:18:16,260
training components and adds web service

439
00:18:16,260 --> 00:18:18,960
inputs and outputs to handle requests.

440
00:18:18,960 --> 00:18:21,240
And the inference pipeline performs...they

441
00:18:21,240 --> 00:18:22,679
seem that the transformation is the

442
00:18:22,679 --> 00:18:26,160
first pipeline, but for new data. Then it

443
00:18:26,160 --> 00:18:28,679
uses the train model to infer or predict

444
00:18:28,679 --> 00:18:32,539
label values based on its feature.

445
00:18:32,820 --> 00:18:36,120
So, I think I've talked a lot for now

446
00:18:36,120 --> 00:18:40,380
I would like to let John show us

447
00:18:40,380 --> 00:18:44,340
something in practice with

448
00:18:44,340 --> 00:18:47,280
the hands-on demo, so please, John, go

449
00:18:47,280 --> 00:18:49,860
ahead, share your screen and guide us

450
00:18:49,860 --> 00:18:52,380
through this demo of creating a

451
00:18:52,380 --> 00:18:53,425
classification with

452
00:18:53,425 --> 00:18:55,860
the Azure Machine Learning designer.

453
00:18:55,860 --> 00:18:58,509
[JOHN]: Thank you so much Carlotta for

454
00:18:58,509 --> 00:19:00,690
this interesting explanation of the

455
00:19:00,690 --> 00:19:03,810
Azure ML designer. And now,

456
00:19:03,810 --> 00:19:07,500
um, I'm going to start with you in the

457
00:19:07,500 --> 00:19:10,200
practical demo part, so if you want to

458
00:19:10,200 --> 00:19:13,320
follow along, go to the link that Carlotta

459
00:19:13,320 --> 00:19:18,380
sent in the chat so you can do

460
00:19:18,380 --> 00:19:21,840
the demo or the practical part with me.

461
00:19:21,840 --> 00:19:25,260
I'm just going to share my screen...

462
00:19:25,260 --> 00:19:27,140
and...

463
00:19:27,140 --> 00:19:31,559
...go here. So, uh...

464
00:19:31,559 --> 00:19:34,320
Where am I right now? I'm inside the

465
00:19:34,320 --> 00:19:36,960
Microsoft Learn documentation. This is

466
00:19:36,960 --> 00:19:40,260
the exercise part of this module, and we

467
00:19:40,260 --> 00:19:43,080
will start by setting two things, which

468
00:19:43,080 --> 00:19:45,299
are a prequisite for us to work inside

469
00:19:45,299 --> 00:19:49,919
this module, which are the users group

470
00:19:49,919 --> 00:19:52,400
and the Azure Machine Learning workspace,

471
00:19:52,400 --> 00:19:55,620
and something extra which is the compute

472
00:19:55,620 --> 00:19:59,760
cluster that Carlotta talked about. So I

473
00:19:59,760 --> 00:20:02,100
just want to make sure that you all have

474
00:20:02,100 --> 00:20:05,660
a resource group created inside your

475
00:20:05,660 --> 00:20:08,039
portal inside your Microsoft Azure

476
00:20:08,039 --> 00:20:11,100
platform. So this is my resource group.

477
00:20:11,100 --> 00:20:14,640
Inside this is this Resource Group. I

478
00:20:14,640 --> 00:20:17,299
have created an Azure Machine Learning

479
00:20:17,299 --> 00:20:21,539
workspace. So I'm just going to access

480
00:20:21,539 --> 00:20:24,000
the workspace that I have created

481
00:20:24,000 --> 00:20:27,000
already from this link. I am going to

482
00:20:27,000 --> 00:20:30,240
open it, which is the studio web URL, and

483
00:20:30,240 --> 00:20:33,000
I will follow the steps. So what is this?

484
00:20:33,000 --> 00:20:35,760
This is your machine learning workspace,

485
00:20:35,760 --> 00:20:38,220
or machine learning studio. You can do a

486
00:20:38,220 --> 00:20:40,080
lot of things here, but we are going to

487
00:20:40,080 --> 00:20:42,419
focus mainly on the designer and the

488
00:20:42,419 --> 00:20:46,080
data and the compute. So another

489
00:20:46,080 --> 00:20:49,140
prerequisite here, as Carlotta told you,

490
00:20:49,140 --> 00:20:51,480
we need some resources to power up the

491
00:20:51,480 --> 00:20:54,299
classification, the processes that

492
00:20:54,299 --> 00:20:55,140
will happen.

493
00:20:55,140 --> 00:20:58,080
So, we have created this computing

494
00:20:58,080 --> 00:20:59,100
cluster,

495
00:20:59,100 --> 00:21:02,880
and we have set some presets for

496
00:21:02,880 --> 00:21:04,140
it. So

497
00:21:04,140 --> 00:21:07,080
where can you find this preset? You go

498
00:21:07,080 --> 00:21:10,200
here. Under the create compute, you'll

499
00:21:10,200 --> 00:21:13,220
find everything that you need to do. So

500
00:21:13,220 --> 00:21:16,740
the size is the Standard DS11 Version 2,

501
00:21:16,740 --> 00:21:19,799
and it's a CPU not GPU, because we don't

502
00:21:19,799 --> 00:21:22,500
know the GPU, and we don't need a GPU.

503
00:21:22,500 --> 00:21:25,799
Uh, it is ready for us to use.

504
00:21:25,799 --> 00:21:30,900
The next thing which we will look into

505
00:21:30,900 --> 00:21:33,600
is the designer. How can you access the

506
00:21:33,600 --> 00:21:35,100
designer?

507
00:21:35,100 --> 00:21:37,679
You can either click on this icon or

508
00:21:37,679 --> 00:21:40,020
click on the navigation menu and click

509
00:21:40,020 --> 00:21:42,299
on the designer for me.

510
00:21:42,900 --> 00:21:45,780
Now I am inside my designer.

511
00:21:45,780 --> 00:21:47,640
What we are going to do now is the

512
00:21:47,640 --> 00:21:50,280
pipeline that Carlotta told you about.

513
00:21:50,280 --> 00:21:54,360
And from where can I know these steps? If

514
00:21:54,360 --> 00:21:57,120
you follow along in the learn module, you

515
00:21:57,120 --> 00:21:58,740
will find everything that I'm doing

516
00:21:58,740 --> 00:22:02,340
right now in detail, with screenshots

517
00:22:02,340 --> 00:22:05,820
of course. So I'm going to create a new

518
00:22:05,820 --> 00:22:09,120
pipeline, and I can do so by clicking on

519
00:22:09,120 --> 00:22:10,980
this plus button.

520
00:22:10,980 --> 00:22:13,740
It's going to redirect me to the

521
00:22:13,740 --> 00:22:17,100
designer authoring the pipeline, uh, where

522
00:22:17,100 --> 00:22:19,500
I can drag and drop data and components

523
00:22:19,500 --> 00:22:21,780
that Carlotta told you the difference

524
00:22:21,780 --> 00:22:22,980
between.

525
00:22:22,980 --> 00:22:26,340
And here I am going to do some changes

526
00:22:26,340 --> 00:22:29,100
to the settings. I am going to connect

527
00:22:29,100 --> 00:22:31,860
this with my compute cluster that I

528
00:22:31,860 --> 00:22:35,120
created previously so I can utilize it.

529
00:22:35,120 --> 00:22:38,100
From here I'm going to choose this

530
00:22:38,100 --> 00:22:40,380
compute cluster demo that I have showed

531
00:22:40,380 --> 00:22:42,600
you before in the clusters here,

532
00:22:42,600 --> 00:22:45,900
and I am going to change the name to

533
00:22:45,900 --> 00:22:47,820
something more meaningful. Instead of

534
00:22:47,820 --> 00:22:50,580
byline and the date of today I'm going

535
00:22:50,580 --> 00:22:53,760
to name it Diabetes...

536
00:22:53,760 --> 00:22:56,120
uh...

537
00:22:56,120 --> 00:23:00,020
let's just check this training.

538
00:23:00,020 --> 00:23:05,100
Let's say Training 0.1 or 01, okay?

539
00:23:05,100 --> 00:23:09,360
And I am going to close this tab in

540
00:23:09,360 --> 00:23:12,000
order to have a bigger place to work

541
00:23:12,000 --> 00:23:14,700
inside because this is where we will

542
00:23:14,700 --> 00:23:17,220
work, where everything will happen. So I

543
00:23:17,220 --> 00:23:19,559
will click on close from here,

544
00:23:19,559 --> 00:23:23,460
and I will go to the data and I will

545
00:23:23,460 --> 00:23:25,620
create a new data set.

546
00:23:25,620 --> 00:23:27,900
How can I create a new data set? There is

547
00:23:27,900 --> 00:23:29,880
multiple options here you can find, from

548
00:23:29,880 --> 00:23:31,799
local files, from data store, from web

549
00:23:31,799 --> 00:23:34,020
files, from open data set, but I'm going

550
00:23:34,020 --> 00:23:36,539
to choose from web files, as this is the

551
00:23:36,539 --> 00:23:40,280
way we're going to create our data.

552
00:23:40,280 --> 00:23:43,380
From here, the information of my data set

553
00:23:43,380 --> 00:23:47,340
I'm going to get them from the Microsoft

554
00:23:47,340 --> 00:23:50,820
Learn module. So if we go to the step

555
00:23:50,820 --> 00:23:52,860
that says "Create a dataset",

556
00:23:52,860 --> 00:23:55,020
under it, it illustrates that you can

557
00:23:55,020 --> 00:23:57,720
access the data from inside the asset

558
00:23:57,720 --> 00:23:59,760
library, and inside your asset library,

559
00:23:59,760 --> 00:24:01,679
you'll find the data and find the

560
00:24:01,679 --> 00:24:05,539
component. And I'm going to select

561
00:24:05,539 --> 00:24:09,000
this link because this is where my data

562
00:24:09,000 --> 00:24:12,000
is stored. If you open this link, you will

563
00:24:12,000 --> 00:24:14,820
find this is a CSV file, I think.

564
00:24:14,820 --> 00:24:17,400
Yeah. And you can...like, all the data are

565
00:24:17,400 --> 00:24:18,360
here.

566
00:24:18,360 --> 00:24:21,079
Now let's get back..

567
00:24:21,079 --> 00:24:22,149
Um...

568
00:24:26,880 --> 00:24:28,200
And you are going to do something

569
00:24:28,200 --> 00:24:29,880
meaningful, but because I have already

570
00:24:29,880 --> 00:24:31,820
created it before twice, so I'm gonna

571
00:24:31,820 --> 00:24:34,980
add a number to the name

572
00:24:34,980 --> 00:24:37,559
The data set is tabular and there is

573
00:24:37,559 --> 00:24:39,360
the file, but this is a table, so we're

574
00:24:39,360 --> 00:24:40,760
going to choose the table.

575
00:24:40,760 --> 00:24:42,240
Data type

576
00:24:42,240 --> 00:24:43,740
for data set type.

577
00:24:43,740 --> 00:24:46,260
Now we will click on "Next". That's gonna

578
00:24:46,260 --> 00:24:51,179
review, or display for you the content

579
00:24:51,179 --> 00:24:54,020
of this file that you have

580
00:24:54,020 --> 00:24:57,419
imported to this workspace.

581
00:24:57,419 --> 00:25:01,559
And for these settings, these are

582
00:25:01,559 --> 00:25:03,720
related to our file format.

583
00:25:03,720 --> 00:25:08,280
So this is a delimited file, and it's not

584
00:25:08,280 --> 00:25:11,400
plain text, it's not a Jason. The delimiter

585
00:25:11,400 --> 00:25:14,159
is common, as we have seen that they

586
00:25:14,159 --> 00:25:26,700
[INDISTINGUISHABLE]

587
00:25:26,700 --> 00:25:29,039
So I'm choosing common

588
00:25:29,039 --> 00:25:32,900
errors because the only the first five...

589
00:25:32,900 --> 00:25:34,880
[INDISTINGUISHABLE]

590
00:25:34,880 --> 00:25:38,159
...for example. Okay, uh, if you have any

591
00:25:38,159 --> 00:25:39,960
doubts, if you have any problems, please

592
00:25:39,960 --> 00:25:42,960
don't hesitate to write me

593
00:25:42,960 --> 00:25:45,020
in the chat,

594
00:25:45,020 --> 00:25:48,480
like, what is blocking you, and

595
00:25:48,480 --> 00:25:50,940
me and Carlotta will try to help you,

596
00:25:50,940 --> 00:25:53,220
like whenever possible.

597
00:25:53,220 --> 00:25:55,659
And now this is the new preview for my

598
00:25:55,659 --> 00:25:57,840
data set. I can see that I have an ID, I

599
00:25:57,840 --> 00:25:59,700
have patient ID, I have pregnancies, I

600
00:25:59,700 --> 00:26:02,220
have the age of the people,

601
00:26:02,220 --> 00:26:05,720
I have the body mass, I think

602
00:26:05,720 --> 00:26:08,460
whether they have diabetes or not, as a

603
00:26:08,460 --> 00:26:10,679
zero and one. Zero indicates a negative,

604
00:26:10,679 --> 00:26:14,159
the person doesn't have diabetes, and one

605
00:26:14,159 --> 00:26:16,080
indicates a positive, that this person

606
00:26:16,080 --> 00:26:18,299
has diabetes. Okay.

607
00:26:18,299 --> 00:26:20,520
Now I'm going to click on "Next". Here I am

608
00:26:20,520 --> 00:26:23,400
defining my schema. All the data types

609
00:26:23,400 --> 00:26:25,380
inside my columns, the column names, which

610
00:26:25,380 --> 00:26:28,760
columns to include, which to exclude. And

611
00:26:28,760 --> 00:26:31,500
here we will include everything except

612
00:26:31,500 --> 00:26:35,580
the path of the bath color. And we are

613
00:26:35,580 --> 00:26:37,860
going to review the data types of each

614
00:26:37,860 --> 00:26:40,440
column. So let's review this first one.

615
00:26:40,440 --> 00:26:43,320
This is numbers, numbers, numbers, then it's the

616
00:26:43,320 --> 00:26:45,779
integer. And this is,

617
00:26:45,779 --> 00:26:48,679
um, like decimal..

618
00:26:48,679 --> 00:26:50,900
...dotted...

619
00:26:50,900 --> 00:26:53,580
decimal number. So we are going to choose

620
00:26:53,580 --> 00:26:55,020
this data type.

621
00:26:55,020 --> 00:26:57,200
And for this one

622
00:26:57,200 --> 00:27:01,200
it says diabetic, and it's a zero under

623
00:27:01,200 --> 00:27:02,460
one, and we are going to make it as

624
00:27:02,460 --> 00:27:04,460
integers.

625
00:27:04,460 --> 00:27:07,980
Now we are going to click on "Next" and

626
00:27:07,980 --> 00:27:09,780
move to reviewing everything. This is

627
00:27:09,780 --> 00:27:11,569
everything that we have defined together.

628
00:27:11,569 --> 00:27:13,500
I will click on "Create".

629
00:27:13,500 --> 00:27:15,179
And...

630
00:27:15,179 --> 00:27:17,940
now the first step has ended. We have

631
00:27:17,940 --> 00:27:19,919
gotten our data ready.

632
00:27:19,919 --> 00:27:22,440
Now...what now? We're going to utilize the

633
00:27:22,440 --> 00:27:23,468
designer...

634
00:27:23,468 --> 00:27:26,820
um...power. We're going to drag and drop

635
00:27:26,820 --> 00:27:29,820
our data set to create the pipeline.

636
00:27:29,820 --> 00:27:33,179
So I have clicked on it and dragged it

637
00:27:33,179 --> 00:27:35,640
to this space. It's gonna appear to you.

638
00:27:35,640 --> 00:27:39,659
And we can inspect it by right clicking and

639
00:27:39,659 --> 00:27:42,179
choose "Preview data"

640
00:27:42,179 --> 00:27:46,200
to see what we have created together.

641
00:27:46,200 --> 00:27:48,900
From here, you can see everything that we

642
00:27:48,900 --> 00:27:50,700
have seen previously, but in more

643
00:27:50,700 --> 00:27:53,100
details. And we are just going to close

644
00:27:53,100 --> 00:27:56,580
this. Now what? Now we are gonna do the

645
00:27:56,580 --> 00:28:00,799
processing that Carlota mentioned.

646
00:28:00,799 --> 00:28:03,659
These are some instructions about the

647
00:28:03,659 --> 00:28:05,460
data, about how you can look at them, how you

648
00:28:05,460 --> 00:28:07,140
can open them but we are going to move

649
00:28:07,140 --> 00:28:09,720
to the transformation or the processing.

650
00:28:09,720 --> 00:28:13,500
So as Carlotta told you, like any data

651
00:28:13,500 --> 00:28:15,480
for us to work on we have to do some

652
00:28:15,480 --> 00:28:17,299
processing to it

653
00:28:17,299 --> 00:28:20,159
to make it easy easier for the model to

654
00:28:20,159 --> 00:28:23,279
be trained and easier to work with. So, uh,

655
00:28:23,279 --> 00:28:25,860
we're gonna do the normalization. And

656
00:28:25,860 --> 00:28:29,159
normalization meaning is, uh,

657
00:28:29,159 --> 00:28:33,539
to scale our data, either down or up, but

658
00:28:33,539 --> 00:28:35,400
we're going to scale them down,

659
00:28:35,400 --> 00:28:38,820
and we are going to decrease, uh,

660
00:28:38,820 --> 00:28:40,799
relatively decrease

661
00:28:40,799 --> 00:28:44,640
the values, all the values, to work

662
00:28:44,640 --> 00:28:48,120
with lower numbers. And if we are working

663
00:28:48,120 --> 00:28:49,559
with larger numbers, it's going to take

664
00:28:49,559 --> 00:28:52,500
more time. If we're working with smaller

665
00:28:52,500 --> 00:28:54,779
numbers, it's going to take less time to

666
00:28:54,779 --> 00:28:59,159
calculate them, and that's it. So

667
00:28:59,159 --> 00:29:02,159
where can I find the normalized data? I

668
00:29:02,159 --> 00:29:04,260
can find it inside my component.

669
00:29:04,260 --> 00:29:06,720
So I will choose the component and

670
00:29:06,720 --> 00:29:09,659
search for "Normalized data".

671
00:29:09,659 --> 00:29:12,360
I will drag and drop it as usual and I

672
00:29:12,360 --> 00:29:14,820
will connect between these two things

673
00:29:14,820 --> 00:29:18,360
by clicking on this spot, this, uh,

674
00:29:18,360 --> 00:29:20,159
circuit, and

675
00:29:20,159 --> 00:29:23,159
drag and drop onto the next circuit.

676
00:29:23,159 --> 00:29:24,899
Now we are going to define our

677
00:29:24,899 --> 00:29:27,419
normalization method.

678
00:29:27,419 --> 00:29:31,080
So I'm going to double click on the

679
00:29:31,080 --> 00:29:32,640
normalized data.

680
00:29:32,640 --> 00:29:34,860
It's going to open the settings for the

681
00:29:34,860 --> 00:29:36,480
normalization

682
00:29:36,480 --> 00:29:38,820
as a better transformation method, which is

683
00:29:38,820 --> 00:29:40,500
a mathematical way

684
00:29:40,500 --> 00:29:42,299
that is going to scale our data

685
00:29:42,299 --> 00:29:44,520
according to.

686
00:29:44,520 --> 00:29:47,760
We're going to choose min-max, and for

687
00:29:47,760 --> 00:29:51,539
this one, we are going to choose "Use Zero",

688
00:29:51,539 --> 00:29:53,100
for constant column we are going to

689
00:29:53,100 --> 00:29:54,480
choose "True",

690
00:29:54,480 --> 00:29:56,880
and we are going to define which columns

691
00:29:56,880 --> 00:29:58,860
to normalize. So we are not going to

692
00:29:58,860 --> 00:30:01,080
normalize the whole data set. We are

693
00:30:01,080 --> 00:30:02,760
going to choose a subset from the data

694
00:30:02,760 --> 00:30:04,559
set to normalize. So we're going to

695
00:30:04,559 --> 00:30:07,020
choose everything except for the patient

696
00:30:07,020 --> 00:30:09,000
ID and the diabetic, because the patient

697
00:30:09,000 --> 00:30:10,919
ID is a number, but it's a categorical

698
00:30:10,919 --> 00:30:13,740
data. It describes a patient, it's not a

699
00:30:13,740 --> 00:30:17,460
number that I can sum. I can't say "patient

700
00:30:17,460 --> 00:30:20,159
ID number one plus patient ID number two".

701
00:30:20,159 --> 00:30:21,720
No, this is a patient and another

702
00:30:21,720 --> 00:30:23,399
patient, it's not a number that I can do

703
00:30:23,399 --> 00:30:25,740
mathematical operations on, so I'm not

704
00:30:25,740 --> 00:30:28,200
going to choose it. So we will choose

705
00:30:28,200 --> 00:30:30,539
everything as I said, except for the

706
00:30:30,539 --> 00:30:33,480
diabetic and the patient ID. I will

707
00:30:33,480 --> 00:30:34,860
click on "Save".

708
00:30:34,860 --> 00:30:37,740
And it's not showing me a warning again,

709
00:30:37,740 --> 00:30:39,480
everything is good.

710
00:30:39,480 --> 00:30:41,880
Now I can click on "Submit"

711
00:30:41,880 --> 00:30:46,799
and review my normalization output.

712
00:30:46,799 --> 00:30:48,240
Um.

713
00:30:48,240 --> 00:30:51,659
So, if you click on "Submit" here,

714
00:30:51,659 --> 00:30:54,659
you will choose "Create new" and

715
00:30:54,659 --> 00:30:56,460
set the name that is mentioned here

716
00:30:56,460 --> 00:30:59,899
inside the notebook. So it tells you

717
00:30:59,899 --> 00:31:03,419
to create a job and name it, name

718
00:31:03,419 --> 00:31:05,460
the experiment "MS Learn Diabetes

719
00:31:05,460 --> 00:31:06,720
Training", because you will continue

720
00:31:06,720 --> 00:31:10,160
working on and building component later.

721
00:31:10,160 --> 00:31:13,020
I have it already created, I am the, uh,

722
00:31:13,020 --> 00:31:16,919
we can review it together. So let

723
00:31:16,919 --> 00:31:19,860
me just open this in another tab. I think

724
00:31:19,860 --> 00:31:21,000
I have it...

725
00:31:21,000 --> 00:31:23,659
here.

726
00:31:25,679 --> 00:31:28,220
Okay.

727
00:31:30,720 --> 00:31:34,740
So, these are all the jobs that I have

728
00:31:34,740 --> 00:31:37,340
created.

729
00:31:37,860 --> 00:31:40,119
All the jobs there. Let's do this over.

730
00:31:40,119 --> 00:31:42,059
These are all the jobs that I have

731
00:31:42,059 --> 00:31:43,679
submitted previously.

732
00:31:43,679 --> 00:31:45,840
And I think this one is the

733
00:31:45,840 --> 00:31:48,360
normalization job, so let's see the

734
00:31:48,360 --> 00:31:50,100
output of it.

735
00:31:50,100 --> 00:31:54,120
As you can see, it says, uh, "Check mark", yes,

736
00:31:54,120 --> 00:31:56,640
which means that it worked, and we can

737
00:31:56,640 --> 00:31:59,399
preview it. How can I do that? Right click

738
00:31:59,399 --> 00:32:02,539
on it, choose "Preview data",

739
00:32:02,539 --> 00:32:06,659
and as you can see all the data are

740
00:32:06,659 --> 00:32:08,399
scaled down

741
00:32:08,399 --> 00:32:10,980
so everything is between zero

742
00:32:10,980 --> 00:32:15,860
and, uh, one I think.

743
00:32:15,860 --> 00:32:18,899
So everything is good for us. Now we

744
00:32:18,899 --> 00:32:21,840
can move forward to the next step

745
00:32:21,840 --> 00:32:26,939
which is to create the whole pipeline.

746
00:32:26,939 --> 00:32:30,840
So, uh, Carlota told you that

747
00:32:30,840 --> 00:32:33,179
we're going to use a classification

748
00:32:33,179 --> 00:32:37,260
model to create this data set, so let

749
00:32:37,260 --> 00:32:40,620
me just drag and drop everything

750
00:32:40,620 --> 00:32:43,140
to get runtime and we're doing

751
00:32:43,140 --> 00:32:46,489
[INDISTINGUISHABLE]

752
00:32:46,489 --> 00:32:48,469
about everything by

753
00:32:48,469 --> 00:32:51,419
[INDISTINGUISHABLE]

754
00:32:51,419 --> 00:32:52,919
So,

755
00:32:52,919 --> 00:32:55,593
as a result, we are going to explain

756
00:32:55,593 --> 00:32:59,760
[INDISTINGUISHABLE]

757
00:32:59,760 --> 00:33:03,600
Yeah. So, I'm going to give this split

758
00:33:03,600 --> 00:33:06,070
data. I'm going to take the

759
00:33:06,070 --> 00:33:08,880
transformation data to split data and

760
00:33:08,880 --> 00:33:10,380
connect it like that.

761
00:33:10,380 --> 00:33:12,299
I'm going to get three model

762
00:33:12,299 --> 00:33:15,240
components because I want to train my

763
00:33:15,240 --> 00:33:16,679
model,

764
00:33:16,679 --> 00:33:19,740
and I'm going to put it right here.

765
00:33:19,740 --> 00:33:21,740
Okay.

766
00:33:21,740 --> 00:33:24,419
Let's just move it down there. Okay.

767
00:33:24,419 --> 00:33:27,059
And we are going to use a classification

768
00:33:27,059 --> 00:33:28,620
model,

769
00:33:28,620 --> 00:33:31,880
a two class

770
00:33:32,240 --> 00:33:35,399
logistic regression model.

771
00:33:35,399 --> 00:33:38,159
So I'm going to give this algorithm to

772
00:33:38,159 --> 00:33:41,480
enable my model to work

773
00:33:41,820 --> 00:33:45,960
This is the untrained model, this is...

774
00:33:45,960 --> 00:33:48,059
here.

775
00:33:48,059 --> 00:33:51,120
The left...

776
00:33:51,120 --> 00:33:52,860
the left, uh, circuit, I'm going to

777
00:33:52,860 --> 00:33:54,819
connect it to the data set, and the right

778
00:33:54,819 --> 00:33:56,940
one, we are going to connect it to

779
00:33:56,940 --> 00:33:59,700
evaluate model.

780
00:33:59,700 --> 00:34:02,640
Evaluate model...so let's search for

781
00:34:02,640 --> 00:34:05,220
"Evaluate model" here.

782
00:34:05,220 --> 00:34:07,440
So because we want to do what...we want to

783
00:34:07,440 --> 00:34:10,800
evaluate our model and see how it it has

784
00:34:10,800 --> 00:34:13,790
been doing. Is it good, is it bad?

785
00:34:13,790 --> 00:34:18,200
Um, sorry...

786
00:34:19,980 --> 00:34:22,820
This is...

787
00:34:23,460 --> 00:34:25,560
this is down there

788
00:34:25,560 --> 00:34:28,139
after the score model.

789
00:34:28,139 --> 00:34:31,320
So we have to get the score model first,

790
00:34:31,320 --> 00:34:33,960
so let's get it.

791
00:34:33,960 --> 00:34:36,119
And this will take the trained model and

792
00:34:36,119 --> 00:34:37,260
the data set

793
00:34:37,260 --> 00:34:39,419
to score our model and see if it's

794
00:34:39,419 --> 00:34:42,179
performing good or bad.

795
00:34:42,179 --> 00:34:44,409
And...

796
00:34:44,409 --> 00:34:47,159
um...

797
00:34:47,159 --> 00:34:49,080
after that, we have finished

798
00:34:49,080 --> 00:34:51,920
everything. Now, we are going to do the what?

799
00:34:52,139 --> 00:34:54,359
The presets for everything.

800
00:34:54,359 --> 00:34:56,820
As a starter, we will be splitting our

801
00:34:56,820 --> 00:34:58,920
data. So

802
00:34:58,920 --> 00:35:01,140
how are we going to do this, according to

803
00:35:01,140 --> 00:35:03,780
what? To the split rules. So I'm going to

804
00:35:03,780 --> 00:35:05,940
double-click on it and choose "Split rules".

805
00:35:05,940 --> 00:35:09,420
And the percentage is

806
00:35:09,420 --> 00:35:11,780
70 percent for the [INSISTINGUASHABLE]

807
00:35:11,780 --> 00:35:12,780
and 30 percent of the

808
00:35:12,780 --> 00:35:14,820
data for

809
00:35:14,820 --> 00:35:18,420
the valuation or for the scoring, okay?

810
00:35:18,420 --> 00:35:20,880
I'm going to make it a randomization, so

811
00:35:20,880 --> 00:35:22,980
I'm going to split data randomly and the

812
00:35:22,980 --> 00:35:26,060
seat is, uh,

813
00:35:26,060 --> 00:35:29,339
132, uh 23 I think...yeah.

814
00:35:29,339 --> 00:35:32,520
And I think that's it.

815
00:35:32,520 --> 00:35:35,040
The split says why this holds, and that's

816
00:35:35,040 --> 00:35:36,240
good.

817
00:35:36,240 --> 00:35:39,540
Now for the next one, which is the train

818
00:35:39,540 --> 00:35:42,000
model we are going to connect it as

819
00:35:42,000 --> 00:35:43,500
mentioned here.

820
00:35:43,500 --> 00:35:48,660
And we have done that and...then why

821
00:35:48,660 --> 00:35:50,700
am I having here? Let's double click

822
00:35:50,700 --> 00:35:54,660
on it...yeah. It has...it needs the

823
00:35:54,660 --> 00:35:57,180
label column that I am trying to predict.

824
00:35:57,180 --> 00:35:58,680
So from here, I'm going to choose

825
00:35:58,680 --> 00:36:01,380
diabetic. I'm going to save.

826
00:36:01,380 --> 00:36:05,180
I'm going to close this one.

827
00:36:05,520 --> 00:36:07,380
So it says here,

828
00:36:07,380 --> 00:36:10,619
the diabetic label, the model, it will

829
00:36:10,619 --> 00:36:12,300
predict the zero and one, because this is

830
00:36:12,300 --> 00:36:14,700
a binary classification algorithm, so

831
00:36:14,700 --> 00:36:16,260
it's going to predict either this or

832
00:36:16,260 --> 00:36:17,520
that.

833
00:36:17,520 --> 00:36:18,460
And...

834
00:36:18,460 --> 00:36:20,160
um...

835
00:36:20,160 --> 00:36:23,880
I think that's everything to run the the

836
00:36:23,880 --> 00:36:25,859
pipeline.

837
00:36:25,859 --> 00:36:29,040
So everything is done, everything is good

838
00:36:29,040 --> 00:36:31,200
for this one. We're just gonna leave it

839
00:36:31,200 --> 00:36:34,140
for now, because this is the next

840
00:36:34,140 --> 00:36:35,620
step.

841
00:36:35,620 --> 00:36:39,839
Um, this will be put instead of the

842
00:36:39,839 --> 00:36:43,520
score model, but let's...

843
00:36:44,099 --> 00:36:46,920
let's delete it for now.

844
00:36:46,920 --> 00:36:49,500
Okay.

845
00:36:49,500 --> 00:36:52,920
Now we have to submit the job in order

846
00:36:52,920 --> 00:36:55,680
to see the output of it. So I can click

847
00:36:55,680 --> 00:36:59,280
on "Submit" and choose the previous job

848
00:36:59,280 --> 00:37:01,200
which is the one that I have showed you

849
00:37:01,200 --> 00:37:02,460
before.

850
00:37:02,460 --> 00:37:05,460
And then let's review its output

851
00:37:05,460 --> 00:37:06,960
together here.

852
00:37:06,960 --> 00:37:09,960
So if I go to the jobs,

853
00:37:09,960 --> 00:37:15,119
if I go to MS Learn, maybe it is training?

854
00:37:15,119 --> 00:37:18,180
I think it's the one that lasted the

855
00:37:18,180 --> 00:37:20,640
longest, this one here.

856
00:37:20,640 --> 00:37:23,700
So here I can see

857
00:37:23,700 --> 00:37:27,079
the job output, what happened inside

858
00:37:27,079 --> 00:37:30,420
the model, as you can see.

859
00:37:30,420 --> 00:37:33,839
So the normalization we have seen

860
00:37:33,839 --> 00:37:36,540
before, the split data, I can preview it.

861
00:37:36,540 --> 00:37:39,359
The result one or the result two as it

862
00:37:39,359 --> 00:37:41,760
splits the data to 70 here and

863
00:37:41,760 --> 00:37:43,639
thirty percent here.

864
00:37:43,639 --> 00:37:46,859
Um, I can see the score model, which is

865
00:37:46,859 --> 00:37:49,140
something that we need

866
00:37:49,140 --> 00:37:51,530
to review.

867
00:37:51,530 --> 00:37:56,820
Inside the scroll model, uh, from

868
00:37:56,820 --> 00:37:57,960
here,

869
00:37:57,960 --> 00:38:00,960
we can see that...

870
00:38:00,960 --> 00:38:04,460
let's get back here.

871
00:38:05,940 --> 00:38:08,220
This is the data that the model has

872
00:38:08,220 --> 00:38:11,579
been scored and this is a scoring output.

873
00:38:11,579 --> 00:38:15,300
So it says "code label true", and he is

874
00:38:15,300 --> 00:38:17,370
not diabetic, so this is,

875
00:38:17,370 --> 00:38:19,200
um,

876
00:38:19,200 --> 00:38:21,839
a wrong prediction, let's say.

877
00:38:21,839 --> 00:38:23,880
For this one it's true and true, and this

878
00:38:23,880 --> 00:38:26,880
is a good, like, what do you say,

879
00:38:26,880 --> 00:38:29,460
prediction, and the probabilities of this

880
00:38:29,460 --> 00:38:30,420
score,

881
00:38:30,420 --> 00:38:33,119
which means the certainty of our model

882
00:38:33,119 --> 00:38:36,620
of that this is really true. It's 80 percent.

883
00:38:36,620 --> 00:38:38,780
For this one it's 75 percent.

884
00:38:38,780 --> 00:38:42,599
So these are some cool metrics that we

885
00:38:42,599 --> 00:38:45,359
can review to understand how our model

886
00:38:45,359 --> 00:38:47,580
is performing. It's performing good for

887
00:38:47,580 --> 00:38:48,540
now.

888
00:38:48,540 --> 00:38:53,180
Let's check our evaluation model.

889
00:38:53,180 --> 00:38:56,700
So this is the extra one that I told you

890
00:38:56,700 --> 00:38:59,579
about. Instead of the

891
00:38:59,579 --> 00:39:01,800
score model only, we are going to add

892
00:39:01,800 --> 00:39:04,260
what evaluate model

893
00:39:04,260 --> 00:39:06,900
after it. So here

894
00:39:06,900 --> 00:39:09,420
we're going to go to our Asset Library

895
00:39:09,420 --> 00:39:12,180
and we are going to choose the evaluate

896
00:39:12,180 --> 00:39:14,940
model,

897
00:39:14,940 --> 00:39:17,760
and we are going to put it here, and we

898
00:39:17,760 --> 00:39:20,220
are going to connect it, and we are going

899
00:39:20,220 --> 00:39:23,099
to submit the job using the same name of

900
00:39:23,099 --> 00:39:25,140
the job that we used previously.

901
00:39:25,140 --> 00:39:29,520
Let's review it. Also, so, after it

902
00:39:29,520 --> 00:39:33,300
finishes, you will find it here. So I have

903
00:39:33,300 --> 00:39:35,280
already done it before, this is how I'm

904
00:39:35,280 --> 00:39:37,380
able to see the output.

905
00:39:37,380 --> 00:39:40,320
So let's see

906
00:39:40,320 --> 00:39:43,280
what is the output of this

907
00:39:43,280 --> 00:39:45,660
evaluation process.

908
00:39:45,660 --> 00:39:49,800
Here it mentioned to you that there are

909
00:39:49,800 --> 00:39:51,480
some matrix,

910
00:39:51,480 --> 00:39:54,839
like the confusion matrix, which Carlotta

911
00:39:54,839 --> 00:39:57,060
told you about, there is the accuracy, the

912
00:39:57,060 --> 00:39:59,760
precision, the recall, and F1 Score.

913
00:39:59,760 --> 00:40:02,339
Every matrix gives us some insight about

914
00:40:02,339 --> 00:40:04,920
our model. It helps us to understand it

915
00:40:04,920 --> 00:40:08,579
more, and, um,

916
00:40:08,579 --> 00:40:10,560
understand if it's overfitting, if

917
00:40:10,560 --> 00:40:12,240
it's good, if it's bad, and really really,

918
00:40:12,240 --> 00:40:16,339
like, understand how it's working.

919
00:40:17,060 --> 00:40:20,400
Now I'm just waiting for the job to load.

920
00:40:20,400 --> 00:40:22,710
Until it loads,

921
00:40:22,710 --> 00:40:23,640
um,

922
00:40:23,640 --> 00:40:26,040
we can continue

923
00:40:26,040 --> 00:40:28,740
to work on our

924
00:40:28,740 --> 00:40:31,800
model. So I will go to my designer. I'm

925
00:40:31,800 --> 00:40:34,740
just going to confirm this.

926
00:40:34,740 --> 00:40:38,280
And I'm going to continue working on it

927
00:40:38,280 --> 00:40:39,780
from

928
00:40:39,780 --> 00:40:42,119
where we have stopped. Where have we

929
00:40:42,119 --> 00:40:43,560
stopped?

930
00:40:43,560 --> 00:40:46,440
we have stopped on the evaluate model. So

931
00:40:46,440 --> 00:40:48,960
I'm going to choose this one.

932
00:40:48,960 --> 00:40:53,420
And it says here

933
00:40:54,180 --> 00:40:56,940
"select experiment", "create inference

934
00:40:56,940 --> 00:40:58,200
pipeline", so

935
00:40:58,200 --> 00:41:01,079
I am going to go to the jobs,

936
00:41:01,079 --> 00:41:04,680
I'm going to select my experiment.

937
00:41:04,680 --> 00:41:06,660
I hope this works.

938
00:41:06,660 --> 00:41:09,720
Okay. Finally, now we have our

939
00:41:09,720 --> 00:41:12,180
evaluate model output.

940
00:41:12,180 --> 00:41:15,480
Let's preview evaluation results

941
00:41:15,480 --> 00:41:18,660
and, uh...

942
00:41:18,660 --> 00:41:22,220
come on.

943
00:41:25,500 --> 00:41:28,020
Finally. Now we can create our inference

944
00:41:28,020 --> 00:41:31,020
pipeline. So,

945
00:41:31,020 --> 00:41:33,510
I think it says that...

946
00:41:33,510 --> 00:41:35,280
um...

947
00:41:35,280 --> 00:41:38,160
select the experiment, then select MS

948
00:41:38,160 --> 00:41:39,359
Learn. So,

949
00:41:39,359 --> 00:41:43,320
I am just going to select it,

950
00:41:43,320 --> 00:41:48,300
and finally. Now we can, the ROC curve, we

951
00:41:48,300 --> 00:41:51,000
can see it, that the true positive rate

952
00:41:51,000 --> 00:41:53,760
and the force was integrate. The false

953
00:41:53,760 --> 00:41:56,660
positive rate is increasing with time,

954
00:41:56,660 --> 00:42:01,020
and also the true positive rate. True

955
00:42:01,020 --> 00:42:03,540
positive is something that it predicted,

956
00:42:03,540 --> 00:42:06,960
that it is, uh, positive it has diabetes,

957
00:42:06,960 --> 00:42:09,480
and it's really...it's really true.

958
00:42:09,480 --> 00:42:12,599
The person really has diabetes. Okay. And

959
00:42:12,599 --> 00:42:14,760
for the false positive, it predicted that

960
00:42:14,760 --> 00:42:17,579
someone has diabetes and someone doesn't

961
00:42:17,579 --> 00:42:20,960
have it. This is what true position and

962
00:42:20,960 --> 00:42:24,960
false positive means. This is the record

963
00:42:24,960 --> 00:42:28,020
curve, so we can review the metrics

964
00:42:28,020 --> 00:42:32,160
of our model. This is the lift curve. I

965
00:42:32,160 --> 00:42:36,000
can change the threshold of my confusion

966
00:42:36,000 --> 00:42:37,740
matrix here

967
00:42:37,740 --> 00:42:39,119
and if Carlotta wants to add

968
00:42:39,119 --> 00:42:43,920
anything about the...the graphs,

969
00:42:43,920 --> 00:42:47,000
you can do so.

970
00:42:50,440 --> 00:42:52,558
[CARLOTTA]: Um, yeah, so I just

971
00:42:52,558 --> 00:42:54,540
wanted to...if you go...yeah.

972
00:42:54,540 --> 00:42:57,119
I just wanted to comment for the

973
00:42:57,119 --> 00:43:00,480
RSC curve, that actually from this

974
00:43:00,480 --> 00:43:03,900
graph, the metric which usually we're

975
00:43:03,900 --> 00:43:06,960
going to compute is the area under

976
00:43:06,960 --> 00:43:09,900
under the curve. And this coefficient or

977
00:43:09,900 --> 00:43:12,240
metric,

978
00:43:12,240 --> 00:43:15,060
it's a coefficient—

979
00:43:15,060 --> 00:43:18,420
it's a value that could span from

980
00:43:18,420 --> 00:43:23,480
zero to one and the the highest is...

981
00:43:23,480 --> 00:43:25,970
...the highest is the the score.

982
00:43:25,970 --> 00:43:29,220
So the closest one,

983
00:43:29,220 --> 00:43:32,760
so the the highest is the amount of

984
00:43:32,760 --> 00:43:35,280
area under this curve.

985
00:43:35,280 --> 00:43:40,500
The highest performance

986
00:43:40,500 --> 00:43:42,886
we've got from from our model.

987
00:43:42,886 --> 00:43:46,440
And another thing is what John is

988
00:43:46,440 --> 00:43:49,680
playing with. So this threshold for

989
00:43:49,680 --> 00:43:51,380
the logistic

990
00:43:51,380 --> 00:43:55,610
regression is the threshold used by the

991
00:43:55,610 --> 00:43:59,520
model to, um,

992
00:43:59,520 --> 00:44:02,880
to predict if the category is zero or

993
00:44:02,880 --> 00:44:05,220
one. So if the probability—the

994
00:44:05,220 --> 00:44:08,599
probability score is above the threshold,

995
00:44:08,599 --> 00:44:11,579
then the category will be predicted as

996
00:44:11,579 --> 00:44:15,359
one, while if the probability is

997
00:44:15,359 --> 00:44:17,460
below the threshold, in this case, for

998
00:44:17,460 --> 00:44:21,300
example, 0.5, the category is predicted

999
00:44:21,300 --> 00:44:23,579
as zero. So that's why it's very

1000
00:44:23,579 --> 00:44:26,473
important to choose the threshold,

1001
00:44:26,473 --> 00:44:28,699
because the performance really can vary,

1002
00:44:28,699 --> 00:44:30,560
um,

1003
00:44:30,560 --> 00:44:34,380
with this threshold value.

1004
00:44:34,380 --> 00:44:41,099
[JOHN]: Thank you so much, Carlotta, and

1005
00:44:41,400 --> 00:44:44,400
as I mentioned now, we are going to

1006
00:44:44,400 --> 00:44:46,560
create our inference pipeline. So we are

1007
00:44:46,560 --> 00:44:48,540
going to select the latest one, which I

1008
00:44:48,540 --> 00:44:50,819
already have it opened here. This is the

1009
00:44:50,819 --> 00:44:52,859
one that we were reviewing together. This

1010
00:44:52,859 --> 00:44:55,500
is where we have stopped, and we're going

1011
00:44:55,500 --> 00:44:57,599
to create an inference pipeline. We are

1012
00:44:57,599 --> 00:44:59,520
going to choose a real-time inference

1013
00:44:59,520 --> 00:45:02,520
pipeline, okay?

1014
00:45:02,520 --> 00:45:05,080
From where I can find this? Here, as it

1015
00:45:05,080 --> 00:45:08,099
says, "Real-time inference pipeline".

1016
00:45:08,099 --> 00:45:10,680
So it's gonna add some things to my

1017
00:45:10,680 --> 00:45:12,240
workspace. It's going to add the

1018
00:45:12,240 --> 00:45:13,713
web service input, it's gonna

1019
00:45:13,713 --> 00:45:15,071
have the web service output,

1020
00:45:15,071 --> 00:45:16,490
because we will be creating

1021
00:45:16,490 --> 00:45:18,180
it as a web service to access

1022
00:45:18,180 --> 00:45:19,740
it from the internet.

1023
00:45:19,740 --> 00:45:21,770
What are we going to do? We're going

1024
00:45:21,770 --> 00:45:24,720
to remove this diabetes data, okay?

1025
00:45:24,720 --> 00:45:27,540
And we are going to get a component

1026
00:45:27,540 --> 00:45:29,359
called "Web

1027
00:45:29,359 --> 00:45:33,180
input" and...let me check

1028
00:45:33,180 --> 00:45:35,940
it's "enter data manually".

1029
00:45:35,940 --> 00:45:38,400
We have...we already have that with input

1030
00:45:38,400 --> 00:45:39,540
present.

1031
00:45:39,540 --> 00:45:42,119
So we are going to get the entire data

1032
00:45:42,119 --> 00:45:43,200
manually,

1033
00:45:43,200 --> 00:45:45,420
and we're going to collect it—to connect

1034
00:45:45,420 --> 00:45:49,560
it as it was connected before, like that.

1035
00:45:49,560 --> 00:45:53,040
And also, I am not going to directly take

1036
00:45:53,040 --> 00:45:55,260
the web service—sorry, escort model to

1037
00:45:55,260 --> 00:45:57,839
the web service output like that.

1038
00:45:57,839 --> 00:46:00,240
I'm going to delete this

1039
00:46:00,240 --> 00:46:03,960
and I'm going to execute a python script

1040
00:46:03,960 --> 00:46:05,880
before

1041
00:46:05,880 --> 00:46:09,500
I display my result.

1042
00:46:10,680 --> 00:46:12,060
So,

1043
00:46:12,060 --> 00:46:17,480
this will be connected like...

1044
00:46:19,260 --> 00:46:20,400
So...

1045
00:46:20,400 --> 00:46:23,599
the other way around.

1046
00:46:23,599 --> 00:46:27,660
And from here, I am going to connect this

1047
00:46:27,660 --> 00:46:30,960
with that and there is some data that

1048
00:46:30,960 --> 00:46:33,480
we will be getting from the node, or from

1049
00:46:33,480 --> 00:46:37,680
the explanation here, and this is the

1050
00:46:37,680 --> 00:46:40,740
data that will be entered to our

1051
00:46:40,740 --> 00:46:44,400
website manually. Okay? This is instead of

1052
00:46:44,400 --> 00:46:47,460
the data that we have been getting from

1053
00:46:47,460 --> 00:46:49,740
our data set that we created. So I'm just

1054
00:46:49,740 --> 00:46:51,960
going to double click on it and choose

1055
00:46:51,960 --> 00:46:55,579
CSV, and I will choose "it has headers",

1056
00:46:55,579 --> 00:47:00,839
and I will take or copy this content and

1057
00:47:00,839 --> 00:47:02,819
put it there, okay?

1058
00:47:02,819 --> 00:47:05,700
So let's do it.

1059
00:47:05,700 --> 00:47:07,920
I think I have to click on edit code, now

1060
00:47:07,920 --> 00:47:10,680
I can click on "Save", and I can close it.

1061
00:47:10,680 --> 00:47:13,079
Another thing which is the python script

1062
00:47:13,079 --> 00:47:16,700
that we will be executing.

1063
00:47:17,099 --> 00:47:17,900
Um, yeah. We

1064
00:47:17,900 --> 00:47:19,380
are going to remove this, also.

1065
00:47:19,380 --> 00:47:20,930
We don't need the evaluate model

1066
00:47:20,930 --> 00:47:24,319
anymore, so we are going to remove it.

1067
00:47:24,319 --> 00:47:25,582
The python script

1068
00:47:25,582 --> 00:47:28,579
that I will be executing,

1069
00:47:28,579 --> 00:47:32,599
I can find it here.

1070
00:47:32,699 --> 00:47:35,760
Um, yeah.

1071
00:47:35,760 --> 00:47:38,640
This is the python script that we will

1072
00:47:38,640 --> 00:47:41,520
execute. And it says to you that this

1073
00:47:41,520 --> 00:47:43,619
code selects only the patient's ID

1074
00:47:43,619 --> 00:47:45,000
the score label, the score

1075
00:47:45,000 --> 00:47:47,700
probability and return—returns them to

1076
00:47:47,700 --> 00:47:49,980
the web service output. So we don't want

1077
00:47:49,980 --> 00:47:51,960
to return all the columns, as we have

1078
00:47:51,960 --> 00:47:53,339
seen previously,

1079
00:47:53,339 --> 00:47:55,560
that determines everything,

1080
00:47:55,560 --> 00:47:56,940
so

1081
00:47:56,940 --> 00:47:59,040
we want to return certain stuff, the

1082
00:47:59,040 --> 00:48:02,940
stuff that we will use inside our

1083
00:48:02,940 --> 00:48:05,640
endpoint. So I'm just going to select

1084
00:48:05,640 --> 00:48:07,980
everything and delete it, and

1085
00:48:07,980 --> 00:48:11,060
paste the code that I have gotten from

1086
00:48:11,060 --> 00:48:14,280
the, uh,

1087
00:48:14,280 --> 00:48:16,500
the Microsoft Learn docs.

1088
00:48:16,500 --> 00:48:19,079
Now I can click on "Save", and I can close

1089
00:48:19,079 --> 00:48:20,280
this.

1090
00:48:20,280 --> 00:48:21,470
Let me check something,

1091
00:48:21,470 --> 00:48:22,950
I don't think it saved.

1092
00:48:22,950 --> 00:48:24,940
It's saved, but the display is

1093
00:48:24,940 --> 00:48:26,160
wrong, okay.

1094
00:48:26,160 --> 00:48:30,300
And now I think everything is good to go.

1095
00:48:30,300 --> 00:48:32,640
I'm just gonna double-check everything.

1096
00:48:32,640 --> 00:48:36,359
So, uh, yeah. We are gonna change the name

1097
00:48:36,359 --> 00:48:38,640
of this

1098
00:48:38,640 --> 00:48:40,800
pipeline, and we are gonna call it

1099
00:48:40,800 --> 00:48:42,780
"Predict

1100
00:48:42,780 --> 00:48:46,319
diabetes", okay?

1101
00:48:46,319 --> 00:48:50,339
Now let's close it, and

1102
00:48:50,339 --> 00:48:56,269
I think that we are good to go. So,

1103
00:48:56,269 --> 00:48:59,300
um,

1104
00:48:59,720 --> 00:49:04,460
Okay, I think everything is good for us.

1105
00:49:06,210 --> 00:49:08,108
I just want to make sure of something.

1106
00:49:08,108 --> 00:49:09,209
Is the data...

1107
00:49:09,209 --> 00:49:12,420
it's correct, the data is...yeah,

1108
00:49:12,420 --> 00:49:13,560
it's correct.

1109
00:49:13,560 --> 00:49:16,319
Okay, now I can run the pipeline. Let's

1110
00:49:16,319 --> 00:49:17,640
submit.

1111
00:49:17,640 --> 00:49:21,000
Select an "existing" pipeline, and we're

1112
00:49:21,000 --> 00:49:21,870
going to choose

1113
00:49:21,870 --> 00:49:23,529
the "ms-learn-diabetes-training",

1114
00:49:23,529 --> 00:49:24,599
which is the pipeline

1115
00:49:24,599 --> 00:49:27,060
that we have been working on

1116
00:49:27,060 --> 00:49:31,619
from the beginning of this module.

1117
00:49:31,619 --> 00:49:33,839
I don't think that this is going to take

1118
00:49:33,839 --> 00:49:36,060
much time. So we have submitted the job

1119
00:49:36,060 --> 00:49:37,319
and it's running.

1120
00:49:37,319 --> 00:49:40,140
Until the job ends, we are going to set

1121
00:49:40,140 --> 00:49:41,720
everything

1122
00:49:41,720 --> 00:49:45,599
for deploying a service.

1123
00:49:45,599 --> 00:49:49,070
In order to deploy a service,

1124
00:49:49,070 --> 00:49:50,520
um,

1125
00:49:50,520 --> 00:49:54,000
I have to have the job ready, so

1126
00:49:54,000 --> 00:49:55,980
until it's ready, you can't deploy it. So

1127
00:49:55,980 --> 00:49:58,319
let's go to the job—the job details from

1128
00:49:58,319 --> 00:50:01,319
here, okay?

1129
00:50:01,319 --> 00:50:05,119
And until it finishes,

1130
00:50:05,119 --> 00:50:07,260
Carlotta, do you think that we can have

1131
00:50:07,260 --> 00:50:09,240
the questions, and then we can get back

1132
00:50:09,240 --> 00:50:12,859
to the job I'm deploying it?

1133
00:50:13,700 --> 00:50:15,119
[CARLOTTA]: Yeah, yeah, yeah.

1134
00:50:15,119 --> 00:50:17,279
So yeah, guys, if you

1135
00:50:17,279 --> 00:50:18,980
have any questions

1136
00:50:18,980 --> 00:50:24,119
on what you just saw here

1137
00:50:24,119 --> 00:50:26,940
or into introductions, feel free. This is

1138
00:50:26,940 --> 00:50:30,300
a good moment, we can...we can discuss

1139
00:50:30,300 --> 00:50:33,900
now, while we wait for this job to

1140
00:50:33,900 --> 00:50:36,260
finish.

1141
00:50:36,260 --> 00:50:38,760
[JOHN]: Uh, and....

1142
00:50:38,760 --> 00:50:40,220
can...

1143
00:50:40,220 --> 00:50:45,000
we have the knowledge check one? Or, like,

1144
00:50:45,000 --> 00:50:46,360
what do you think?

1145
00:50:46,360 --> 00:50:47,956
[CARLOTTA]: Yeah, we can also go

1146
00:50:47,956 --> 00:50:49,680
to the knowledge check.

1147
00:50:49,680 --> 00:50:50,940
Um...

1148
00:50:50,940 --> 00:50:56,339
Yeah, okay. So let me share my screen.

1149
00:50:56,339 --> 00:50:58,980
Yeah, so if you have not any questions

1150
00:50:58,980 --> 00:51:01,619
for us, we can maybe propose some

1151
00:51:01,619 --> 00:51:04,959
questions to you that you can,

1152
00:51:04,959 --> 00:51:06,240
um,

1153
00:51:06,240 --> 00:51:09,450
check our knowledge so far and you

1154
00:51:09,450 --> 00:51:12,900
can maybe answer to these questions

1155
00:51:12,900 --> 00:51:15,420
via chat.

1156
00:51:15,420 --> 00:51:18,300
So we have...do you see my screen, can

1157
00:51:18,300 --> 00:51:19,859
you see my screen?

1158
00:51:19,859 --> 00:51:21,650
[JOHN]: Yes.

1159
00:51:21,650 --> 00:51:24,440
[CARLOTTA]: So, John, I think I will

1160
00:51:24,440 --> 00:51:25,440
read this

1161
00:51:25,440 --> 00:51:29,040
question aloud and ask it to you, okay? So

1162
00:51:29,040 --> 00:51:32,040
are you ready to answer?

1163
00:51:32,040 --> 00:51:33,660
[JOHN:] Yes I am.

1164
00:51:33,660 --> 00:51:35,460
[CARLOTTA]: So...

1165
00:51:35,460 --> 00:51:37,260
you're using Azure Machine Learning

1166
00:51:37,260 --> 00:51:39,780
designer to create a training pipeline

1167
00:51:39,780 --> 00:51:42,540
for a binary classification model, so

1168
00:51:42,540 --> 00:51:45,300
what we were doing in our demo,

1169
00:51:45,300 --> 00:51:48,059
right? And you have added a data set

1170
00:51:48,059 --> 00:51:51,660
containing features and labels, a Two-

1171
00:51:51,660 --> 00:51:54,359
Class Decision Forest module. So we used

1172
00:51:54,359 --> 00:51:56,819
a logistic regression model our...

1173
00:51:56,819 --> 00:51:57,877
um, in our example.

1174
00:51:57,877 --> 00:51:59,019
Here, we're using a Two-

1175
00:51:59,019 --> 00:52:01,260
Class Decision Forest model.

1176
00:52:01,260 --> 00:52:04,500
And, of course, a Train Model module. You

1177
00:52:04,500 --> 00:52:07,200
plan now to use score model and evaluate

1178
00:52:07,200 --> 00:52:09,480
model modules to test the train model

1179
00:52:09,480 --> 00:52:11,640
with the subset of the data set that

1180
00:52:11,640 --> 00:52:13,500
wasn't used for training.

1181
00:52:13,500 --> 00:52:15,960
But what are we missing? So what's

1182
00:52:15,960 --> 00:52:18,780
another model you should add? We have

1183
00:52:18,780 --> 00:52:21,660
three options: we have Join Data, we have

1184
00:52:21,660 --> 00:52:25,200
Split Data, or we have Select Columns

1185
00:52:25,200 --> 00:52:26,819
in Dataset.

1186
00:52:26,819 --> 00:52:28,260
So

1187
00:52:28,260 --> 00:52:32,040
while John thinks about the answer,

1188
00:52:32,040 --> 00:52:33,599
go ahead and,

1189
00:52:33,599 --> 00:52:34,800
um,

1190
00:52:34,800 --> 00:52:37,800
answer yourself. So give us your

1191
00:52:37,800 --> 00:52:39,540
guess.

1192
00:52:39,540 --> 00:52:41,940
Put it in the chat, or just come off mute

1193
00:52:41,940 --> 00:52:44,900
and answer.

1194
00:52:46,740 --> 00:52:47,785
"A", "B".

1195
00:52:47,785 --> 00:52:49,769
[JOHN]: Yeah, what do you

1196
00:52:49,769 --> 00:52:50,509
is the correct

1197
00:52:50,509 --> 00:52:53,579
answer for this one? I need something to

1198
00:52:53,579 --> 00:52:56,579
uh...I have to score my model, and I

1199
00:52:56,579 --> 00:53:00,359
have to evaluate it, so I need

1200
00:53:00,359 --> 00:53:03,119
something to enable me to do these two

1201
00:53:03,119 --> 00:53:05,359
things.

1202
00:53:06,579 --> 00:53:08,233
[CARLOTTA]: I think it's something

1203
00:53:08,233 --> 00:53:10,640
you showed us in your pipeline,

1204
00:53:10,640 --> 00:53:13,260
right John?

1205
00:53:13,260 --> 00:53:16,819
[JOHN]: Of course I did.

1206
00:53:23,460 --> 00:53:25,122
[CARLOTTA]: Uh, we have no guesses

1207
00:53:25,122 --> 00:53:28,020
in the chat?

1208
00:53:28,020 --> 00:53:30,070
[JOHN]: Can someone...

1209
00:53:30,070 --> 00:53:32,280
Someone want to guess?

1210
00:53:32,280 --> 00:53:35,579
[CARLOTTA]: We have a "B".

1211
00:53:35,579 --> 00:53:38,760
[JOHN]: Uh, maybe.

1212
00:53:38,760 --> 00:53:43,260
So, in order to do this,

1213
00:53:43,260 --> 00:53:46,200
I mentioned the

1214
00:53:46,200 --> 00:53:49,380
the module that is going to help me

1215
00:53:49,380 --> 00:53:52,728
to divide my data into two things:

1216
00:53:52,728 --> 00:53:53,819
70 percent for the

1217
00:53:53,819 --> 00:53:56,220
the training and 30 percent for the

1218
00:53:56,220 --> 00:53:59,339
evaluation. So what did I use? I used

1219
00:53:59,339 --> 00:54:01,859
split data, because this is what is going

1220
00:54:01,859 --> 00:54:05,280
to split my data randomly into training

1221
00:54:05,280 --> 00:54:08,459
data and validation data. So the correct

1222
00:54:08,459 --> 00:54:12,240
answer is "B", and good job. Thank you

1223
00:54:12,240 --> 00:54:13,980
for participating.

1224
00:54:13,980 --> 00:54:17,400
Next question, please.

1225
00:54:17,400 --> 00:54:19,339
[CARLOTTA]: Yes, "B" is the correct

1226
00:54:19,339 --> 00:54:22,559
answer, so thanks, John,

1227
00:54:22,559 --> 00:54:26,040
for explaining to us the correct

1228
00:54:26,040 --> 00:54:26,940
one.

1229
00:54:26,940 --> 00:54:30,420
And we want to go with question two?

1230
00:54:30,420 --> 00:54:33,180
[JOHN]: Yeah, so, 
I'm going to ask you now,

1231
00:54:33,180 --> 00:54:35,880
Carlotta. You use Azure Machine Learning

1232
00:54:35,880 --> 00:54:38,280
designer to create a training pipeline

1233
00:54:38,280 --> 00:54:40,500
for your classification model.

1234
00:54:40,500 --> 00:54:44,099
What must you do before you deploy this

1235
00:54:44,099 --> 00:54:45,870
model as a service?
You have to do

1236
00:54:45,870 --> 00:54:46,634
something before

1237
00:54:46,634 --> 00:54:47,439
you deploy it.

1238
00:54:47,439 --> 00:54:49,740
What do you think is the correct answer?

1239
00:54:49,740 --> 00:54:52,740
Is it "A", "B", or "C"?

1240
00:54:52,740 --> 00:54:55,020
Share your thoughts with—

1241
00:54:55,020 --> 00:54:56,690
with us in the chat and

1242
00:54:56,690 --> 00:55:00,180
and I'm also going to give you some

1243
00:55:00,180 --> 00:55:02,940
minutes to think of it before I

1244
00:55:02,940 --> 00:55:06,020
tell you about it.

1245
00:55:06,020 --> 00:55:07,765
[CARLOTTA]: Yeah so let me go

1246
00:55:07,765 --> 00:55:09,000
through the possible

1247
00:55:09,000 --> 00:55:12,359
answers, right? So we have A: "Create an

1248
00:55:12,359 --> 00:55:14,940
inference pipeline from the training

1249
00:55:14,940 --> 00:55:16,020
pipeline";

1250
00:55:16,020 --> 00:55:19,260
B: we have "Add an Evaluate Model

1251
00:55:19,260 --> 00:55:22,380
module to the training pipeline; and then

1252
00:55:22,380 --> 00:55:25,079
three, we have "Clone the training

1253
00:55:25,079 --> 00:55:28,380
pipeline with a different name".

1254
00:55:29,520 --> 00:55:31,559
So what do you think is the correct

1255
00:55:31,559 --> 00:55:33,960
answer? "A", "B", or "C"?

1256
00:55:33,960 --> 00:55:36,660
Also this time, I think it's something

1257
00:55:36,660 --> 00:55:39,300
we mentioned both in the decks and in

1258
00:55:39,300 --> 00:55:41,960
the demo right?

1259
00:55:42,599 --> 00:55:44,819
[JOHN]: Yes it is,

1260
00:55:44,819 --> 00:55:46,793
it's something that I have done

1261
00:55:46,793 --> 00:55:50,410
like two, like five minutes ago.

1262
00:55:51,800 --> 00:55:57,200
It's real-time, real-time.

1263
00:55:57,200 --> 00:55:58,760
[CARLOTTA]: Um,

1264
00:55:58,760 --> 00:56:02,040
yeah, so, think about...you need to deploy

1265
00:56:02,040 --> 00:56:05,460
the model as a service. So if I'm

1266
00:56:05,460 --> 00:56:07,980
going to deploy model,

1267
00:56:07,980 --> 00:56:10,380
I cannot evaluate the model

1268
00:56:10,380 --> 00:56:12,839
after deploying it, right, because I

1269
00:56:12,839 --> 00:56:14,940
cannot go into production if I'm not

1270
00:56:14,940 --> 00:56:17,579
sure, I'm not satisfied with my model, and

1271
00:56:17,579 --> 00:56:19,500
I'm not sure that my model is performing

1272
00:56:19,500 --> 00:56:20,280
well.

1273
00:56:20,280 --> 00:56:22,900
So that's why I would go with,

1274
00:56:22,900 --> 00:56:24,319
um,

1275
00:56:24,319 --> 00:56:30,480
I would...exclude "B" from my

1276
00:56:30,480 --> 00:56:31,520
answer.

1277
00:56:31,520 --> 00:56:33,419
While

1278
00:56:33,419 --> 00:56:36,960
thinking about "C", uh, I don't see you—I

1279
00:56:36,960 --> 00:56:39,480
didn't see you, John, cloning the

1280
00:56:39,480 --> 00:56:41,420
training Pipeline with a different name,

1281
00:56:41,420 --> 00:56:44,640
so I don't think this is the

1282
00:56:44,640 --> 00:56:46,920
right answer.

1283
00:56:46,920 --> 00:56:49,619
While I've seen you creating an

1284
00:56:49,619 --> 00:56:52,729
inference pipeline from the

1285
00:56:52,729 --> 00:56:54,830
training pipeline, and you just converted

1286
00:56:54,830 --> 00:56:59,280
it using a one-click button, right?

1287
00:56:59,280 --> 00:57:01,400
[JOHN]: Yeah, that's correct.

1288
00:57:01,400 --> 00:57:04,280
So this is the right answer.

1289
00:57:04,280 --> 00:57:07,460
Good job. So I created an inference

1290
00:57:07,460 --> 00:57:11,280
real-time pipeline, and it has done.

1291
00:57:11,280 --> 00:57:13,440
It finished—it finished, the job is

1292
00:57:13,440 --> 00:57:18,000
finished. So we can now deploy.

1293
00:57:18,000 --> 00:57:19,400
And...

1294
00:57:19,400 --> 00:57:21,500
Yeah [LAUGHS].

1295
00:57:21,500 --> 00:57:25,339
Exactly, like, on time.

1296
00:57:25,339 --> 00:57:27,839
Like, it finished two seconds...

1297
00:57:27,839 --> 00:57:30,859
three, four seconds ago [LAUGHS].

1298
00:57:30,859 --> 00:57:33,119
So, uh,

1299
00:57:33,119 --> 00:57:36,480
until, um...

1300
00:57:36,480 --> 00:57:39,839
This is my job review, so

1301
00:57:39,839 --> 00:57:43,260
this is the job details that I

1302
00:57:43,260 --> 00:57:45,540
have already submitted, it's just opening,

1303
00:57:45,540 --> 00:57:47,459
and once it opens...

1304
00:57:47,459 --> 00:57:50,180
um...

1305
00:57:50,400 --> 00:57:52,740
I don't know why it's so heavy

1306
00:57:52,740 --> 00:57:56,780
today, it's not like that usually.

1307
00:57:57,780 --> 00:58:00,020
[CARLOTTA]: Yeah, it's probably because

1308
00:58:00,020 --> 00:58:01,020
you are also

1309
00:58:01,020 --> 00:58:06,000
showing your your screen on Teams,

1310
00:58:06,000 --> 00:58:08,160
so that's the bandwidth of your

1311
00:58:08,160 --> 00:58:08,944
connection.

1312
00:58:08,944 --> 00:58:10,740
[JOHN]: Let me do something here

1313
00:58:10,740 --> 00:58:13,740
because...yeah finally.

1314
00:58:13,740 --> 00:58:16,440
I can switch to my mobile internet if it

1315
00:58:16,440 --> 00:58:18,599
did it again. So I will click on "Deploy",

1316
00:58:18,599 --> 00:58:20,700
it's that simple. I'll just click on

1317
00:58:20,700 --> 00:58:23,040
"Deploy" and...

1318
00:58:23,040 --> 00:58:25,619
I am going to deploy a new real-time

1319
00:58:25,619 --> 00:58:27,960
endpoint.

1320
00:58:27,960 --> 00:58:30,300
So what I'm going to name it?

1321
00:58:30,300 --> 00:58:31,870
Description and the compute type.

1322
00:58:31,870 --> 00:58:33,372
Everything is already mentioned

1323
00:58:33,372 --> 00:58:34,140
for me here,

1324
00:58:34,140 --> 00:58:36,240
so I'm just gonna copy and paste it,

1325
00:58:36,240 --> 00:58:38,940
because we...we are running

1326
00:58:38,940 --> 00:58:41,280
out of time.

1327
00:58:41,280 --> 00:58:44,230
So it's all Azure Container Instance,

1328
00:58:44,230 --> 00:58:46,360
not Azure Kubernetes Service,

1329
00:58:46,360 --> 00:58:48,720
which is a containerization service also.

1330
00:58:48,720 --> 00:58:50,867
Both are for containerization, but this

1331
00:58:50,867 --> 00:58:53,613
gives you something, and this gives you
something else.

1332
00:58:53,613 --> 00:58:54,960
For the advanced options,

1333
00:58:54,960 --> 00:58:57,420
it doesn't say for us to do anything, so

1334
00:58:57,420 --> 00:59:00,420
we are just gonna click on "Deploy",

1335
00:59:00,420 --> 00:59:05,220
and now we can test our endpoint from

1336
00:59:05,220 --> 00:59:07,859
the endpoints that we can find here, so

1337
00:59:07,859 --> 00:59:11,460
it's in progress. If I go here

1338
00:59:11,460 --> 00:59:13,799
under the assets, I can find something

1339
00:59:13,799 --> 00:59:16,680
called "Endpoints", and I can find the

1340
00:59:16,680 --> 00:59:18,599
real-time ones and the batch endpoints.

1341
00:59:18,599 --> 00:59:22,020
And we have created a real-time endpoint,

1342
00:59:22,020 --> 00:59:25,260
so we are going to find it under this

1343
00:59:25,260 --> 00:59:29,760
title. So if I click on it, I should

1344
00:59:29,760 --> 00:59:32,640
be able to test it once it's ready.

1345
00:59:32,640 --> 00:59:37,200
It's still loading, but this is the

1346
00:59:37,200 --> 00:59:40,980
input, and this is the output that we

1347
00:59:40,980 --> 00:59:44,652
will get back, so if I click on "Test"...

1348
00:59:44,652 --> 00:59:46,886
and from here,

1349
00:59:46,886 --> 00:59:49,810
I will input some data to the

1350
00:59:49,810 --> 00:59:50,900
endpoint,

1351
00:59:50,900 --> 00:59:54,599
which are: the patient information; the

1352
00:59:54,599 --> 00:59:57,119
columns that we have already seen in our

1353
00:59:57,119 --> 01:00:00,380
data set; the patient ID; the pregnancies.

1354
01:00:00,380 --> 01:00:03,960
And of course, of course I'm not gonna

1355
01:00:03,960 --> 01:00:05,940
enter the label that I'm trying to

1356
01:00:05,940 --> 01:00:08,099
predict, so I'm not going to give him if

1357
01:00:08,099 --> 01:00:10,360
the patient is diabetic or not. This

1358
01:00:10,360 --> 01:00:12,665
endpoint is to tell me this.

1359
01:00:12,665 --> 01:00:14,599
The endpoint, or the URL,

1360
01:00:14,599 --> 01:00:15,529
is going to give me

1361
01:00:15,529 --> 01:00:17,640
back this information, whether someone

1362
01:00:17,640 --> 01:00:22,680
has diabetes, or he doesn't. So if I input

1363
01:00:22,680 --> 01:00:24,780
this data, I'm just going to copy it,

1364
01:00:24,780 --> 01:00:27,780
and go to my endpoint, and click on

1365
01:00:27,780 --> 01:00:30,180
"Test", I'm gonna give the result pack,

1366
01:00:30,180 --> 01:00:32,359
which are the three columns that we have

1367
01:00:32,359 --> 01:00:35,520
defined inside our python script: the

1368
01:00:35,520 --> 01:00:37,859
patient ID, the diabetic prediction, and

1369
01:00:37,859 --> 01:00:41,040
the probability—the certainty of whether

1370
01:00:41,040 --> 01:00:45,720
someone is diabetic or not based on the...

1371
01:00:45,720 --> 01:00:49,090
uh...based on the prediction.

1372
01:00:49,090 --> 01:00:50,660
So that's it.

1373
01:00:50,660 --> 01:00:54,359
And, uh, I think that this is a really

1374
01:00:54,359 --> 01:00:56,729
simple step to do, you can do it on your

1375
01:00:56,729 --> 01:00:58,380
own, you can test it.

1376
01:00:58,380 --> 01:01:01,140
And I think that I have finished, so

1377
01:01:01,140 --> 01:01:03,020
thank you.

1378
01:01:03,020 --> 01:01:04,206
[CARLOTTA]: Uh, yes,

1379
01:01:04,206 --> 01:01:06,069
we are running out of time

1380
01:01:06,069 --> 01:01:09,780
I just wanted to thank you, John, for

1381
01:01:09,780 --> 01:01:12,299
this demo, for going through all these

1382
01:01:12,299 --> 01:01:13,429
steps to

1383
01:01:13,429 --> 01:01:16,740
um, create, train a classification model,

1384
01:01:16,740 --> 01:01:19,680
and also deploy it as a predictive

1385
01:01:19,680 --> 01:01:22,880
service. And I encourage you all to go

1386
01:01:22,880 --> 01:01:25,079
back to the learn module

1387
01:01:25,079 --> 01:01:28,260
and, um, deepen all these topics

1388
01:01:28,260 --> 01:01:31,760
at your own pace, and also maybe

1389
01:01:31,760 --> 01:01:34,799
uh do this demo on your own, on your

1390
01:01:34,799 --> 01:01:37,140
subscription on your Azure for Student

1391
01:01:37,140 --> 01:01:39,359
subscription. Um...

1392
01:01:39,359 --> 01:01:43,200
And I would also like to recall that

1393
01:01:43,200 --> 01:01:46,140
this is part of a series of study

1394
01:01:46,140 --> 01:01:49,500
sessions of Cloud Skill Challenge study

1395
01:01:49,500 --> 01:01:51,059
sessions,

1396
01:01:51,059 --> 01:01:54,059
so you will have more in the...

1397
01:01:54,059 --> 01:01:57,540
in the following days, and this is for

1398
01:01:57,540 --> 01:02:00,480
you to prepare, let's say, to help you

1399
01:02:00,480 --> 01:02:04,880
in taking the Cloud Skills Challenge,

1400
01:02:04,880 --> 01:02:07,040
which collect

1401
01:02:07,040 --> 01:02:10,599
a very interesting learn module that you

1402
01:02:10,599 --> 01:02:14,540
can use to scale up on various topics,

1403
01:02:14,540 --> 01:02:18,359
and some of them are focused on AI and

1404
01:02:18,359 --> 01:02:20,819
ML. So if you are interested in these

1405
01:02:20,819 --> 01:02:23,099
topics, you can select these these learn

1406
01:02:23,099 --> 01:02:24,780
modules.

1407
01:02:24,780 --> 01:02:27,660
So let me also copy

1408
01:02:27,660 --> 01:02:29,669
the link, the short link to the

1409
01:02:29,669 --> 01:02:32,420
challenge in the chat. Remember that

1410
01:02:32,420 --> 01:02:34,980
you have time until the 13th of

1411
01:02:34,980 --> 01:02:37,980
September to take the challenge. And also

1412
01:02:37,980 --> 01:02:40,440
remember that in October, on the 7th of

1413
01:02:40,440 --> 01:02:43,020
October, you have the—you can join the

1414
01:02:43,020 --> 01:02:46,619
student—the Student Developer Summit,

1415
01:02:46,619 --> 01:02:50,480
which is, uh, which will be a virtual or

1416
01:02:50,480 --> 01:02:53,220
in...for some for some cases a hybrid

1417
01:02:53,220 --> 01:02:55,880
event, so stay tuned, because you will

1418
01:02:55,880 --> 01:02:58,559
have some surprises in the following

1419
01:02:58,559 --> 01:03:01,260
days. And if you want to learn more about

1420
01:03:01,260 --> 01:03:03,480
this event you can check the Microsoft

1421
01:03:03,480 --> 01:03:08,099
Imaging Cap Twitter page and stay tuned.

1422
01:03:08,099 --> 01:03:11,230
So thank you everyone for joining

1423
01:03:11,230 --> 01:03:12,989
this session today, and thank you very

1424
01:03:12,989 --> 01:03:16,500
much, John, for co-hosting with this

1425
01:03:16,500 --> 01:03:20,359
session with me. It was a pleasure.

1426
01:03:21,227 --> 01:03:22,838
[JOHN]: Thank you so much,

1427
01:03:22,838 --> 01:03:23,969
Carlotta, for having me

1428
01:03:23,969 --> 01:03:26,249
with you today, and thank you for

1429
01:03:26,249 --> 01:03:27,670
giving me this opportunity to

1430
01:03:27,670 --> 01:03:30,180
be with you here.

1431
01:03:30,180 --> 01:03:32,070
[CARLOTTA]: Great, thank you.

1432
01:03:32,070 --> 01:03:33,420
[JOHN]: Yeah, I hope that we

1433
01:03:33,420 --> 01:03:35,390
work again in the future.

1434
01:03:35,390 --> 01:03:37,880
[CARLOTTA]: Sure, I hope so as well.

1435
01:03:37,880 --> 01:03:40,700
Um, so, thank you everyone.

1436
01:03:40,700 --> 01:03:43,749
And have a nice rest of your day.

1437
01:03:44,099 --> 01:03:46,500
Bye-bye. Speak to you soon.

1438
01:03:46,500 --> 01:03:48,920
[JOHN]: Bye.