0:00:01.920,0:00:04.072
[CARLOTTA]: Great, so I think we can start

0:00:04.072,0:00:06.340
since the meeting is recorded, so if

0:00:06.340,0:00:10.090
everyone, uh, jump-jumps in later, they

0:00:10.090,0:00:12.420
can watch the recording.

0:00:12.420,0:00:15.780
So, hi everyone and welcome to this

0:00:15.780,0:00:18.000
um, Cloud Skill Challenge study session

0:00:18.000,0:00:20.880
around a create classification models

0:00:20.880,0:00:24.000
with Azure Machine learning designer.

0:00:24.000,0:00:27.240
So today I'm thrilled to be here with

0:00:27.240,0:00:29.149
John. Uh, John do you mind

0:00:29.149,0:00:31.619
introduce briefly yourself?

0:00:31.619,0:00:33.160
[JOHN]: Uh, thank you Carlotta.

0:00:33.160,0:00:34.160
Hello everyone.

0:00:34.160,0:00:38.080
Welcome to our workshop today. I hope

0:00:38.080,0:00:40.559
that you are all excited for it. I am

0:00:40.559,0:00:43.140
John Aziz, a gold Microsoft Learn student

0:00:43.140,0:00:47.460
ambassador, and I will be here with, uh,

0:00:47.460,0:00:50.760
Carlotta to do the practical part

0:00:50.760,0:00:53.820
about this module of the Cloud Skills

0:00:53.820,0:00:56.623
Challenge. Thank you for having me.

0:00:56.623,0:00:58.219
[CARLOTTA]: Perfect, thanks John.

0:00:58.219,0:00:59.623
So for those who

0:00:59.623,0:01:03.440
don't know me, I'm Carlotta Castelluccio,

0:01:03.440,0:01:06.479
based in Italy and focused on AI

0:01:06.479,0:01:08.760
machine learning technologies and about

0:01:08.760,0:01:11.200
the use in education.

0:01:11.200,0:01:12.340
Um, so,

0:01:12.737,0:01:14.537
um this Cloud Skill Challenge study

0:01:14.537,0:01:17.117
session is based on a learn module, a

0:01:17.120,0:01:21.080
dedicated learn module. I sent to you, uh

0:01:21.320,0:01:23.939
the link to this module, uh, in the chat

0:01:23.939,0:01:25.619
in a way that you can follow along the

0:01:25.619,0:01:28.680
module if you want, or just have a look at

0:01:28.680,0:01:32.470
the module later at your own pace.

0:01:32.470,0:01:33.780
Um...

0:01:33.780,0:01:37.020
So, before starting I would also like to

0:01:37.020,0:01:40.619
remember to remember you, uh, the code of

0:01:40.619,0:01:43.439
conduct and guidelines of our student

0:01:43.439,0:01:47.510
ambassadors community. So please during this

0:01:47.510,0:01:51.000
meeting be respectful and inclusive and

0:01:51.000,0:01:53.579
be friendly, open, and welcoming and

0:01:53.579,0:01:56.159
respectful of other-each other

0:01:56.159,0:01:57.720
differences.

0:01:57.720,0:02:01.200
If you want to learn more about the code

0:02:01.200,0:02:03.390
of conduct, you can use this link in the

0:02:03.390,0:02:08.880
deck: aka.ms/SACoC.

0:02:09.660,0:02:11.730
And now we are,

0:02:11.730,0:02:15.420
um, we are ready to to start our session.

0:02:15.420,0:02:18.959
So as we mentioned it we are going to

0:02:18.959,0:02:21.980
focus on classification models and Azure ML,

0:02:21.980,0:02:24.900
uh, today. So, first of all, we are going

0:02:24.900,0:02:28.430
to, um, identify, uh, the kind of

0:02:28.430,0:02:31.080
um, of scenarios in which you should

0:02:31.080,0:02:34.490
choose to use a classification model.

0:02:34.490,0:02:36.660
We're going to introduce Azure Machine

0:02:36.660,0:02:39.060
Learning and Azure Machine Designer.

0:02:39.060,0:02:41.879
We're going to understand, uh, which are

0:02:41.879,0:02:43.680
the steps to follow, to create a

0:02:43.680,0:02:46.200
classification model in Azure Machine

0:02:46.200,0:02:48.076
Learning, and then John will,

0:02:48.076,0:02:49.500
um,

0:02:49.500,0:02:52.219
lead an amazing demo about training and

0:02:52.219,0:02:54.300
publishing a classification model in

0:02:54.300,0:02:57.000
Azure ML Designer.

0:02:57.000,0:02:59.819
So, let's start from the beginning. Let's

0:02:59.819,0:03:02.640
start from identifying classification

0:03:02.640,0:03:05.220
machine learning scenarios.

0:03:05.220,0:03:07.640
So, first of all, what is classification?

0:03:07.640,0:03:09.959
Classification is a form of machine

0:03:09.959,0:03:12.120
learning that is used to predict which

0:03:12.120,0:03:15.599
category or class an item belongs to. For

0:03:15.599,0:03:17.340
example, we might want to develop a

0:03:17.340,0:03:19.800
classifier able to identify if an

0:03:19.800,0:03:22.200
incoming email should be filtered or not

0:03:22.200,0:03:25.080
according to the style, the sender, the

0:03:25.080,0:03:26.935
length of the email, etc.

0:03:26.935,0:03:28.140
In this case, the

0:03:28.140,0:03:30.060
characteristics of the email are the

0:03:30.060,0:03:31.080
features.

0:03:31.080,0:03:34.200
And the label is a classification of

0:03:34.200,0:03:38.099
either a zero or one, representing a spam

0:03:38.099,0:03:40.860
or non-spam for the incoming email. So

0:03:40.860,0:03:42.360
this is an example of a binary

0:03:42.360,0:03:44.400
classifier. If you want to assign

0:03:44.400,0:03:46.260
multiple categories to the incoming

0:03:46.260,0:03:48.959
email like work letters, love letters,

0:03:48.959,0:03:52.080
complaints, or other categories, in this

0:03:52.080,0:03:54.000
case a binary classifier is no longer

0:03:54.000,0:03:55.739
enough, and we should develop a

0:03:55.739,0:03:58.319
multi-class classifier. So classification

0:03:58.319,0:04:00.599
is an example of what is called

0:04:00.599,0:04:02.519
supervised machine learning

0:04:02.519,0:04:05.280
in which you train a model using data

0:04:05.280,0:04:07.080
that includes both the features and

0:04:07.080,0:04:08.879
known values for label

0:04:08.879,0:04:11.099
so that the model learns to fit the

0:04:11.099,0:04:13.560
feature combinations to the label. Then,

0:04:13.560,0:04:15.420
after training has been completed, you

0:04:15.420,0:04:17.040
can use the train model to predict

0:04:17.040,0:04:19.500
labels for new items for-for which the

0:04:19.500,0:04:22.320
label is unknown.

0:04:22.320,0:04:25.440
But let's see some examples of scenarios

0:04:25.440,0:04:27.120
for classification machine learning

0:04:27.120,0:04:29.160
models. So, we already mentioned an

0:04:29.160,0:04:31.020
example of a solution in which we would

0:04:31.020,0:04:33.660
need a classifier, but let's explore

0:04:33.660,0:04:35.699
other scenarios for classification in

0:04:35.699,0:04:37.979
other industries. For example, you can use

0:04:37.979,0:04:40.380
a classification model for a health

0:04:40.380,0:04:43.680
clinic scenario, and use clinical data to

0:04:43.680,0:04:45.720
predict whether patient will become sick

0:04:45.720,0:04:47.060
or not.

0:04:47.060,0:04:49.553
You can use, um...

0:04:49.553,0:04:59.250
[NO AUDIO]

0:04:59.250,0:05:00.930
[JOHN]: Carlotta, you are muted.

0:05:03.780,0:05:07.700
[CARLOTTA]: Oh, sorry. [br]So, when I became muted, it's a

0:05:07.700,0:05:08.807
long time, or?

0:05:08.807,0:05:11.940
[JOHN]: You can use-you can use, uh

0:05:11.940,0:05:13.430
some models for classification.

0:05:13.430,0:05:14.729
For example, you can use...

0:05:14.729,0:05:16.919
You were saying this.

0:05:16.919,0:05:20.020
[CARLOTTA]: Uh, so I was in this deck,

0:05:20.020,0:05:21.660
or the previous one?

0:05:21.660,0:05:24.180
[JOHN]: This one, you have been muted

0:05:24.180,0:05:25.901
for, uh, one second [LAUGHS].

0:05:25.901,0:05:28.018
[CARLOTTA]: Okay, okay perfect, perfect.

0:05:28.018,0:05:30.419
Uh, yeah I was talking...sorry for

0:05:30.419,0:05:33.278
that. So, I was talking about the possible

0:05:33.278,0:05:34.560
scenarios in which you,

0:05:34.560,0:05:37.320
you can use a classification model. Like

0:05:37.320,0:05:39.660
have clinic scenario, financial scenario,

0:05:39.660,0:05:41.699
or the third one is business type of

0:05:41.699,0:05:44.100
scenario. You can use characteristics of

0:05:44.100,0:05:45.900
small business to predict if a new

0:05:45.900,0:05:47.880
venture will succeed or not, for

0:05:47.880,0:05:49.560
example. And these are all types of

0:05:49.560,0:05:52.160
binary classification.

0:05:52.160,0:05:55.199
Uh, but today we are also going to talk

0:05:55.199,0:05:57.240
about Azure Machine Learning. So let's

0:05:57.240,0:05:58.139
see.

0:05:58.139,0:06:00.660
What is Azure Machine Learning? So

0:06:00.660,0:06:02.160
training and deploying an effective

0:06:02.160,0:06:04.199
machine learning model involves a lot of

0:06:04.199,0:06:06.539
work, much of it time-consuming and

0:06:06.539,0:06:08.880
resource intensive. So, Azure Machine

0:06:08.880,0:06:11.039
Learning is a cloud-based service that

0:06:11.039,0:06:12.780
helps simplify some of the tasks it

0:06:12.780,0:06:15.720
takes to prepare data, train a model, and

0:06:15.720,0:06:18.060
also deploy it as a predictive service.

0:06:18.060,0:06:20.220
So it helps that the scientists increase

0:06:20.220,0:06:22.380
their efficiency by automating many of

0:06:22.380,0:06:24.660
the time-consuming tasks associated to

0:06:24.660,0:06:27.539
creating and training a model.

0:06:27.539,0:06:29.520
And it enables them also to use

0:06:29.520,0:06:31.740
cloud-based compute resources that scale

0:06:31.740,0:06:33.720
effectively to handle large volumes of

0:06:33.720,0:06:36.300
data while incurring costs only when

0:06:36.300,0:06:38.699
actually used.

0:06:38.699,0:06:41.220
To use Azure Machine Learning, you,

0:06:41.220,0:06:43.199
first thing's first, you need to create a

0:06:43.199,0:06:44.940
workspace resource in your Azure

0:06:44.940,0:06:47.520
subscription, and you can then use these

0:06:47.520,0:06:50.220
workspace to manage data, compute

0:06:50.220,0:06:52.440
resources, code models and other

0:06:52.440,0:06:54.959
artifacts after you have created an

0:06:54.959,0:06:56.519
Azure Machine Learning workspace,

0:06:56.519,0:06:57.808
you can develop solutions with the

0:06:57.808,0:06:59.338
Azure Machine Learning service,

0:06:59.338,0:07:00.840
either with developer

0:07:00.840,0:07:02.580
tools or the Azure Machine Learning

0:07:02.580,0:07:04.088
studio web portal.

0:07:04.088,0:07:06.440
In particular, [br]Azure Machine Learning studio

0:07:06.440,0:07:07.800
is a web portal for machine

0:07:07.800,0:07:09.720
learning solutions in Azure, and it

0:07:09.720,0:07:11.639
includes a wide range of features and

0:07:11.639,0:07:13.800
capabilities that help data scientists

0:07:13.800,0:07:16.259
prepare data, train models, publish

0:07:16.259,0:07:18.479
predictive services, and monitor also

0:07:18.479,0:07:19.680
their usage.

0:07:19.680,0:07:22.139
So to begin using the web portal, you need

0:07:22.139,0:07:23.294
to assign the workspace

0:07:23.294,0:07:24.781
you created in the Azure portal

0:07:24.781,0:07:26.819
to the Azure Machine

0:07:26.819,0:07:29.520
Learning studio.

0:07:29.520,0:07:31.800
At its core, Azure Machine Learning is a

0:07:31.800,0:07:33.720
service for training and managing

0:07:33.720,0:07:36.000
machine learning models for which you

0:07:36.000,0:07:38.220
need compute resources on which to run

0:07:38.220,0:07:39.919
the training process.

0:07:39.919,0:07:44.280
Compute targets are, um, one of the main

0:07:44.280,0:07:46.740
basic concepts of Azure Machine Learning.

0:07:46.740,0:07:48.780
They are cloud-based resources on which

0:07:48.780,0:07:50.639
you can run model training and data

0:07:50.639,0:07:53.220
exploration processes.

0:07:53.220,0:07:54.780
So in Azure Machine Learning studio, you

0:07:54.780,0:07:56.759
can manage the compute targets for your

0:07:56.759,0:07:58.740
data science activities, and there are

0:07:58.740,0:08:03.240
four kinds of of compute targets you can

0:08:03.240,0:08:05.940
create. We have the compute instances,

0:08:05.940,0:08:09.539
which are vital machine set up for

0:08:09.539,0:08:10.979
running machine learning code during

0:08:10.979,0:08:13.319
development, so they are not designed for

0:08:13.319,0:08:14.460
production.

0:08:14.460,0:08:17.099
Then we have compute clusters, which are

0:08:17.099,0:08:19.800
a set of virtual machines that can scale

0:08:19.800,0:08:22.199
up automatically based on traffic.

0:08:22.199,0:08:24.599
We have inference clusters, which are

0:08:24.599,0:08:26.699
similar to compute clusters, but they are

0:08:26.699,0:08:29.340
designed for deployment, so they are

0:08:29.340,0:08:31.979
deployment targets for predictive

0:08:31.979,0:08:35.820
services that use trained models.

0:08:35.820,0:08:38.339
And finally, we have attached compute,

0:08:38.339,0:08:41.339
which are any compute target that you

0:08:41.339,0:08:44.159
manage yourself outside of Azure ML, like,

0:08:44.159,0:08:46.560
for example, virtual machines or Azure

0:08:46.560,0:08:49.700
data bricks clusters.

0:08:49.980,0:08:52.800
So we talked about Azure Machine

0:08:52.800,0:08:54.300
Learning, but we also mentioned-

0:08:54.300,0:08:55.500
mentioned Azure Machine Learning

0:08:55.500,0:08:57.540
designer. What is Azure Machine Learning

0:08:57.540,0:09:00.120
designer? So, in Azure Machine Learning

0:09:00.120,0:09:02.880
Studio, there are several ways to author

0:09:02.880,0:09:04.560
classification machine learning models.

0:09:04.560,0:09:08.100
One way is to use a visual interface, and

0:09:08.100,0:09:10.260
this visual interface is called designer,

0:09:10.260,0:09:13.140
and you can use it to train, test, and

0:09:13.140,0:09:15.540
also deploy machine learning models. And

0:09:15.540,0:09:17.940
the drag-and-drop interface makes use of

0:09:17.940,0:09:20.279
clearly defined inputs and outputs that

0:09:20.279,0:09:22.680
can be shared, reused, and also version

0:09:22.680,0:09:23.880
control.

0:09:23.880,0:09:25.920
And using the designer, you can identify

0:09:25.920,0:09:28.080
the building blocks or components needed

0:09:28.080,0:09:30.839
for your model, place and connect them on

0:09:30.839,0:09:33.120
your canvas, and run a machine learning

0:09:33.120,0:09:35.300
job.

0:09:35.399,0:09:36.779
So,

0:09:36.779,0:09:39.120
each designer project, so each project

0:09:39.120,0:09:42.360
in the designer is known as a pipeline.

0:09:42.360,0:09:45.600
And in the design, we have a left panel

0:09:45.600,0:09:48.360
for navigation and a canvas on your

0:09:48.360,0:09:50.640
right hand side in which you build your

0:09:50.640,0:09:53.940
pipeline visually. So pipelines let you

0:09:53.940,0:09:56.100
organize, manage, and reuse complex

0:09:56.100,0:09:58.260
machine learning workflows across

0:09:58.260,0:10:00.480
projects and users.

0:10:00.480,0:10:03.000
A pipeline starts with the data set from

0:10:03.000,0:10:04.140
which you want to train the model

0:10:04.140,0:10:05.880
because all begins with data when

0:10:05.880,0:10:07.380
talking about data science and machine

0:10:07.380,0:10:09.540
learning. And each time you run a

0:10:09.540,0:10:10.980
pipeline, the configuration of the

0:10:10.980,0:10:12.959
pipeline and its results are stored in

0:10:12.959,0:10:17.339
your workspace as a pipeline job.

0:10:17.339,0:10:21.959
So the second main concept of Azure

0:10:21.959,0:10:25.080
Machine Learning is a component. So, going

0:10:25.080,0:10:28.440
hierarchically from the pipeline, we can

0:10:28.440,0:10:30.540
say that each building block of a

0:10:30.540,0:10:32.920
pipeline is called a component.

0:10:32.920,0:10:34.120
In other words, an Azure Machine

0:10:34.120,0:10:36.959
Learning component encapsulates one step

0:10:36.959,0:10:39.420
in a machine learning pipeline. So, it's a

0:10:39.420,0:10:41.640
reusable piece of code with inputs and

0:10:41.640,0:10:44.100
outputs, something very similar to a

0:10:44.100,0:10:46.500
function in any programming language.

0:10:46.500,0:10:48.899
And in a pipeline project, you can access

0:10:48.899,0:10:51.480
data assets and components from the left

0:10:51.480,0:10:52.700
panels

0:10:52.700,0:10:56.279
Asset Library tab, as you can see

0:10:56.279,0:11:00.200
here in the screenshot in the deck.

0:11:00.300,0:11:03.360
So you can create data assets on using

0:11:03.360,0:11:08.339
an ADOC page called Data Page. And a data

0:11:08.339,0:11:11.160
asset is a reference to a data source

0:11:11.160,0:11:12.480
location.

0:11:12.480,0:11:15.720
So this data source location could be a

0:11:15.720,0:11:18.779
local file, a data store, a web file or

0:11:18.779,0:11:21.660
even an Azure open asset.

0:11:21.660,0:11:23.880
And these data assets will appear along

0:11:23.880,0:11:26.459
with standard sample data set in the

0:11:26.459,0:11:30.019
designers Asset Library.

0:11:30.079,0:11:31.560
Um.

0:11:31.560,0:11:36.959
Another basic concept of Azure ML is

0:11:36.959,0:11:38.880
Azure Machine Learning jobs.

0:11:38.880,0:11:43.519
So, basically, when you submit a pipeline,

0:11:43.519,0:11:47.040
you create a job which will run all the

0:11:47.040,0:11:49.920
steps in your pipeline. So a job executes

0:11:49.920,0:11:52.800
a task against a specified compute

0:11:52.800,0:11:53.760
target.

0:11:53.760,0:11:56.640
Jobs enable systematic tracking for your

0:11:56.640,0:11:58.560
machine learning experimentation in

0:11:58.560,0:11:59.880
Azure ML.

0:11:59.880,0:12:02.399
And once a job is created, Azure ML

0:12:02.399,0:12:05.459
maintains a run record, uh, for the

0:12:05.459,0:12:07.640
job.

0:12:07.877,0:12:12.180
Um, but, let's move to the classification

0:12:12.180,0:12:14.040
steps. So,

0:12:14.040,0:12:17.160
um, let's introduce how to create a

0:12:17.160,0:12:21.360
classification model in Azure ML, but you

0:12:21.360,0:12:23.640
will see it in more details in a

0:12:23.640,0:12:26.339
handsome demo that John will guide

0:12:26.339,0:12:29.459
through in a few minutes.

0:12:29.459,0:12:32.220
So, you can think of the steps to train

0:12:32.220,0:12:33.720
and evaluate a classification machine

0:12:33.720,0:12:36.660
learning model as four main steps. So

0:12:36.660,0:12:38.459
first of all, you need to prepare your

0:12:38.459,0:12:41.100
data. So, you need to identify the

0:12:41.100,0:12:43.139
features and the label in your data set,

0:12:43.139,0:12:46.139
you need to pre-process, so you need to

0:12:46.139,0:12:48.839
clean and transform the data as needed.

0:12:48.839,0:12:51.120
Then, the second step, of course, is

0:12:51.120,0:12:52.740
training the model.

0:12:52.740,0:12:54.600
And for training the model, you need to

0:12:54.600,0:12:57.060
split the data into two groups: a

0:12:57.060,0:12:59.519
training and a validation set.

0:12:59.519,0:13:01.320
Then you train a machine learning model

0:13:01.320,0:13:03.540
using the training data set and you test

0:13:03.540,0:13:05.040
the machine learning model for

0:13:05.040,0:13:06.889
performance using the validation data

0:13:06.889,0:13:08.100
set.

0:13:08.100,0:13:12.180
The third step is performance evaluation,

0:13:12.180,0:13:14.519
which means comparing how close the

0:13:14.519,0:13:16.139
model's predictions are to the known

0:13:16.139,0:13:20.519
labels and these lead us to compute some

0:13:20.519,0:13:23.279
evaluation performance metrics.

0:13:23.279,0:13:25.740
And then finally...

0:13:25.740,0:13:29.051
So, these three steps are not,

0:13:29.051,0:13:32.770
um, not performed every time in a

0:13:32.770,0:13:35.459
linear manner. It's more an iterative

0:13:35.459,0:13:39.420
process. But once you obtain, you achieve

0:13:39.420,0:13:42.959
a performance with which you are

0:13:42.959,0:13:45.779
satisfied, so you are ready to, let's say

0:13:45.779,0:13:48.660
go into production, and you can deploy

0:13:48.660,0:13:51.920
your train model as a predictive service

0:13:51.920,0:13:55.980
into a real-time, uh, to a real-time

0:13:55.980,0:13:58.019
endpoint. And to do so, you need to

0:13:58.019,0:14:00.240
convert the training pipeline into a

0:14:00.240,0:14:02.820
real-time inference pipeline, and then

0:14:02.820,0:14:04.260
you can deploy the model as an

0:14:04.260,0:14:06.779
application on a server or device so

0:14:06.779,0:14:11.420
that others can consume this model.

0:14:11.459,0:14:14.279
So let's start with the first step, which

0:14:14.279,0:14:17.700
is prepare data. Real-world data can contain

0:14:17.700,0:14:19.920
many different issues that can affect

0:14:19.920,0:14:22.320
the utility of the data and our

0:14:22.320,0:14:24.959
interpretation of the results. So also

0:14:24.959,0:14:26.579
the machine learning model that you

0:14:26.579,0:14:29.279
train using this data. For example, real-

0:14:29.279,0:14:31.440
world data can be affected by a bad

0:14:31.440,0:14:34.079
recording or a bad measurement, and it

0:14:34.079,0:14:36.480
can also contain missing values for some

0:14:36.480,0:14:38.880
parameters. And Azure Machine Learning

0:14:38.880,0:14:40.860
designer has several pre-built

0:14:40.860,0:14:43.019
components that can be used to prepare

0:14:43.019,0:14:46.079
data for training. These components

0:14:46.079,0:14:48.300
enable you to clean data, normalize

0:14:48.300,0:14:52.940
features, join tables, and more.

0:14:53.000,0:14:57.120
Let's come to training. So, to train a

0:14:57.120,0:14:59.220
classification model you need a data set

0:14:59.220,0:15:02.160
that includes historical features, so the

0:15:02.160,0:15:03.899
characteristics of the entity for which

0:15:03.899,0:15:06.899
one to make a prediction, and known label

0:15:06.899,0:15:09.779
values. The label is the class indicator

0:15:09.779,0:15:11.820
we want to train a model to predict.

0:15:11.820,0:15:13.920
And it's common practice to train a

0:15:13.920,0:15:16.199
model using a subset of the data while

0:15:16.199,0:15:18.300
holding back some data with which to

0:15:18.300,0:15:20.760
test the train model. And this enables

0:15:20.760,0:15:22.440
you to compare the labels that the model

0:15:22.440,0:15:25.380
predicts with the actual known labels in

0:15:25.380,0:15:27.420
the original data set.

0:15:27.420,0:15:29.880
This operation can be performed in the

0:15:29.880,0:15:32.100
designer using the split data component

0:15:32.100,0:15:34.740
as shown by the screenshot here in the...

0:15:34.740,0:15:36.660
in the deck.

0:15:36.660,0:15:39.540
There's also another component that you

0:15:39.540,0:15:40.980
should use, which is the score model

0:15:40.980,0:15:43.139
component to generate the predicted

0:15:43.139,0:15:45.360
class label value using the validation

0:15:45.360,0:15:48.060
data as input. So once you connect all

0:15:48.060,0:15:49.800
these components,

0:15:49.800,0:15:52.440
the component specifying the

0:15:52.440,0:15:54.959
model we are going to use, the split data

0:15:54.959,0:15:57.060
component, the trained model component,

0:15:57.060,0:16:00.300
and the score model component, you want

0:16:00.300,0:16:02.639
to run a new experiment in

0:16:02.639,0:16:05.760
Azure ML, which will use the data set

0:16:05.760,0:16:09.600
on the canvas to train and score a model.

0:16:09.600,0:16:12.000
After training a model, it is important,

0:16:12.000,0:16:14.639
we say, to evaluate its performance, to

0:16:14.639,0:16:17.060
understand how bad-how good sorry

0:16:17.060,0:16:20.760
our model is performing.

0:16:20.760,0:16:22.680
And there are many performance metrics

0:16:22.680,0:16:24.600
and methodologies for evaluating how

0:16:24.600,0:16:27.000
well a model makes predictions. The

0:16:27.000,0:16:29.160
component to use to perform evaluation

0:16:29.160,0:16:32.220
in Azure ML designer is called, as

0:16:32.220,0:16:35.060
intuitive as it is, Evaluate Model.

0:16:35.060,0:16:38.339
Once the job of training and evaluation

0:16:38.339,0:16:40.740
of the model is completed, you can review

0:16:40.740,0:16:42.959
evaluation metrics on the completed job

0:16:42.959,0:16:45.860
page by right clicking on the component.

0:16:45.860,0:16:48.480
In the evaluation results, you can also

0:16:48.480,0:16:51.000
find the so-called confusion Matrix that

0:16:51.000,0:16:53.399
you can see here in the right side of

0:16:53.399,0:16:55.079
this deck

0:16:55.079,0:16:57.420
A confusion matrix shows cases where

0:16:57.420,0:16:59.220
both the predicted and actual values

0:16:59.220,0:17:01.980
were one, the so-called true positives

0:17:01.980,0:17:04.500
at the top left and also cases where

0:17:04.500,0:17:06.600
both the predicted and the actual values

0:17:06.600,0:17:08.459
were zero, the so-called true negatives

0:17:08.459,0:17:10.919
at the bottom right. While the other

0:17:10.919,0:17:13.679
cells show cases where the predicting

0:17:13.679,0:17:15.380
and actual values differ,

0:17:15.380,0:17:17.939
called false positive and false

0:17:17.939,0:17:19.919
negatives, and this is an example of a

0:17:19.919,0:17:23.579
confusion matrix for a binary classifier.

0:17:23.579,0:17:25.559
While for a multi-class classification

0:17:25.559,0:17:28.079
model the same approach is used to

0:17:28.079,0:17:30.120
tabulate each possible combination of

0:17:30.120,0:17:32.940
actual and predictive value counts. So

0:17:32.940,0:17:34.740
for example, a model with three possible

0:17:34.740,0:17:37.559
classes would result in three times

0:17:37.559,0:17:39.120
three matrix.

0:17:39.120,0:17:41.880
The confusion matrix is also useful for

0:17:41.880,0:17:43.860
the matrix that can be derived from it,

0:17:43.860,0:17:48.260
like accuracy, recall, or precision.

0:17:49.320,0:17:52.080
We say that the last step is

0:17:52.080,0:17:55.620
deploying the train model to a real-time

0:17:55.620,0:17:59.280
endpoint as a predictive service. And in

0:17:59.280,0:18:00.900
order to automate your model into a

0:18:00.900,0:18:02.760
service that makes continuous

0:18:02.760,0:18:04.980
predictions, you need, first of all, to

0:18:04.980,0:18:08.039
create and then deploy an

0:18:08.039,0:18:10.080
inference pipeline. The process of

0:18:10.080,0:18:11.940
converting the training pipeline into a

0:18:11.940,0:18:13.980
real-time inference pipeline removes

0:18:13.980,0:18:16.260
training components and adds web service

0:18:16.260,0:18:18.960
inputs and outputs to handle requests.

0:18:18.960,0:18:21.240
And the inference pipeline performs...they

0:18:21.240,0:18:22.679
seem that the transformation is the

0:18:22.679,0:18:26.160
first pipeline, but for new data. Then it

0:18:26.160,0:18:28.679
uses the train model to infer or predict

0:18:28.679,0:18:32.539
label values based on its feature.

0:18:32.820,0:18:36.120
So, I think I've talked a lot for now

0:18:36.120,0:18:40.380
I would like to let John show us

0:18:40.380,0:18:44.340
something in practice with

0:18:44.340,0:18:47.280
the hands-on demo, so please, John, go

0:18:47.280,0:18:49.860
ahead, share your screen and guide us

0:18:49.860,0:18:52.380
through this demo of creating a

0:18:52.380,0:18:53.425
classification with

0:18:53.425,0:18:55.860
the Azure Machine Learning designer.

0:18:55.860,0:18:58.509
[JOHN]: Thank you so much Carlotta for

0:18:58.509,0:19:00.690
this interesting explanation of the

0:19:00.690,0:19:03.810
Azure ML designer. And now,

0:19:03.810,0:19:07.500
um, I'm going to start with you in the

0:19:07.500,0:19:10.200
practical demo part, so if you want to

0:19:10.200,0:19:13.320
follow along, go to the link that Carlotta

0:19:13.320,0:19:18.380
sent in the chat so you can do

0:19:18.380,0:19:21.840
the demo or the practical part with me.

0:19:21.840,0:19:25.260
I'm just going to share my screen...

0:19:25.260,0:19:27.140
and...

0:19:27.140,0:19:31.559
...go here. So, uh...

0:19:31.559,0:19:34.320
Where am I right now? I'm inside the

0:19:34.320,0:19:36.960
Microsoft Learn documentation. This is

0:19:36.960,0:19:40.260
the exercise part of this module, and we

0:19:40.260,0:19:43.080
will start by setting two things, which

0:19:43.080,0:19:45.299
are a prequisite for us to work inside

0:19:45.299,0:19:49.919
this module, which are the users group

0:19:49.919,0:19:52.400
and the Azure Machine Learning workspace,

0:19:52.400,0:19:55.620
and something extra which is the compute

0:19:55.620,0:19:59.760
cluster that Carlotta talked about. So I

0:19:59.760,0:20:02.100
just want to make sure that you all have

0:20:02.100,0:20:05.660
a resource group created inside your

0:20:05.660,0:20:08.039
portal inside your Microsoft Azure

0:20:08.039,0:20:11.100
platform. So this is my resource group.

0:20:11.100,0:20:14.640
Inside this is this Resource Group. I

0:20:14.640,0:20:17.299
have created an Azure Machine Learning

0:20:17.299,0:20:21.539
workspace. So I'm just going to access

0:20:21.539,0:20:24.000
the workspace that I have created

0:20:24.000,0:20:27.000
already from this link. I am going to

0:20:27.000,0:20:30.240
open it, which is the studio web URL, and

0:20:30.240,0:20:33.000
I will follow the steps. So what is this?

0:20:33.000,0:20:35.760
This is your machine learning workspace,

0:20:35.760,0:20:38.220
or machine learning studio. You can do a

0:20:38.220,0:20:40.080
lot of things here, but we are going to

0:20:40.080,0:20:42.419
focus mainly on the designer and the

0:20:42.419,0:20:46.080
data and the compute. So another

0:20:46.080,0:20:49.140
prerequisite here, as Carlotta told you,

0:20:49.140,0:20:51.480
we need some resources to power up the

0:20:51.480,0:20:54.299
classification, the processes that

0:20:54.299,0:20:55.140
will happen.

0:20:55.140,0:20:58.080
So, we have created this computing

0:20:58.080,0:20:59.100
cluster,

0:20:59.100,0:21:02.880
and we have set some presets for

0:21:02.880,0:21:04.140
it. So

0:21:04.140,0:21:07.080
where can you find this preset? You go

0:21:07.080,0:21:10.200
here. Under the create compute, you'll

0:21:10.200,0:21:13.220
find everything that you need to do. So

0:21:13.220,0:21:16.740
the size is the Standard DS11 Version 2,

0:21:16.740,0:21:19.799
and it's a CPU not GPU, because we don't

0:21:19.799,0:21:22.500
know the GPU, and we don't need a GPU.

0:21:22.500,0:21:25.799
Uh, it is ready for us to use.

0:21:25.799,0:21:30.900
The next thing which we will look into

0:21:30.900,0:21:33.600
is the designer. How can you access the

0:21:33.600,0:21:35.100
designer?

0:21:35.100,0:21:37.679
You can either click on this icon or

0:21:37.679,0:21:40.020
click on the navigation menu and click

0:21:40.020,0:21:42.299
on the designer for me.

0:21:42.900,0:21:45.780
Now I am inside my designer.

0:21:45.780,0:21:47.640
What we are going to do now is the

0:21:47.640,0:21:50.280
pipeline that Carlotta told you about.

0:21:50.280,0:21:54.360
And from where can I know these steps? If

0:21:54.360,0:21:57.120
you follow along in the learn module, you

0:21:57.120,0:21:58.740
will find everything that I'm doing

0:21:58.740,0:22:02.340
right now in detail, with screenshots

0:22:02.340,0:22:05.820
of course. So I'm going to create a new

0:22:05.820,0:22:09.120
pipeline, and I can do so by clicking on

0:22:09.120,0:22:10.980
this plus button.

0:22:10.980,0:22:13.740
It's going to redirect me to the

0:22:13.740,0:22:17.100
designer authoring the pipeline, uh, where

0:22:17.100,0:22:19.500
I can drag and drop data and components

0:22:19.500,0:22:21.780
that Carlotta told you the difference

0:22:21.780,0:22:22.980
between.

0:22:22.980,0:22:26.340
And here I am going to do some changes

0:22:26.340,0:22:29.100
to the settings. I am going to connect

0:22:29.100,0:22:31.860
this with my compute cluster that I

0:22:31.860,0:22:35.120
created previously so I can utilize it.

0:22:35.120,0:22:38.100
From here I'm going to choose this

0:22:38.100,0:22:40.380
compute cluster demo that I have showed

0:22:40.380,0:22:42.600
you before in the clusters here,

0:22:42.600,0:22:45.900
and I am going to change the name to

0:22:45.900,0:22:47.820
something more meaningful. Instead of

0:22:47.820,0:22:50.580
byline and the date of today I'm going

0:22:50.580,0:22:53.760
to name it Diabetes...

0:22:53.760,0:22:56.120
uh...

0:22:56.120,0:23:00.020
let's just check this training.

0:23:00.020,0:23:05.100
Let's say Training 0.1 or 01, okay?

0:23:05.100,0:23:09.360
And I am going to close this tab in

0:23:09.360,0:23:12.000
order to have a bigger place to work

0:23:12.000,0:23:14.700
inside because this is where we will

0:23:14.700,0:23:17.220
work, where everything will happen. So I

0:23:17.220,0:23:19.559
will click on close from here,

0:23:19.559,0:23:23.460
and I will go to the data and I will

0:23:23.460,0:23:25.620
create a new data set.

0:23:25.620,0:23:27.900
How can I create a new data set? There is

0:23:27.900,0:23:29.880
multiple options here you can find, from

0:23:29.880,0:23:31.799
local files, from data store, from web

0:23:31.799,0:23:34.020
files, from open data set, but I'm going

0:23:34.020,0:23:36.539
to choose from web files, as this is the

0:23:36.539,0:23:40.280
way we're going to create our data.

0:23:40.280,0:23:43.380
From here, the information of my data set

0:23:43.380,0:23:47.340
I'm going to get them from the Microsoft

0:23:47.340,0:23:50.820
Learn module. So if we go to the step

0:23:50.820,0:23:52.860
that says "Create a dataset",

0:23:52.860,0:23:55.020
under it, it illustrates that you can

0:23:55.020,0:23:57.720
access the data from inside the asset

0:23:57.720,0:23:59.760
library, and inside your asset library,

0:23:59.760,0:24:01.679
you'll find the data and find the

0:24:01.679,0:24:05.539
component. And I'm going to select

0:24:05.539,0:24:09.000
this link because this is where my data

0:24:09.000,0:24:12.000
is stored. If you open this link, you will

0:24:12.000,0:24:14.820
find this is a CSV file, I think.

0:24:14.820,0:24:17.400
Yeah. And you can...like, all the data are

0:24:17.400,0:24:18.360
here.

0:24:18.360,0:24:21.079
Now let's get back..

0:24:21.079,0:24:22.149
Um...

0:24:26.880,0:24:28.200
And you are going to do something

0:24:28.200,0:24:29.880
meaningful, but because I have already

0:24:29.880,0:24:31.820
created it before twice, so I'm gonna

0:24:31.820,0:24:34.980
add a number to the name

0:24:34.980,0:24:37.559
The data set is tabular and there is

0:24:37.559,0:24:39.360
the file, but this is a table, so we're

0:24:39.360,0:24:40.760
going to choose the table.

0:24:40.760,0:24:42.240
Data type

0:24:42.240,0:24:43.740
for data set type.

0:24:43.740,0:24:46.260
Now we will click on "Next". That's gonna

0:24:46.260,0:24:51.179
review, or display for you the content

0:24:51.179,0:24:54.020
of this file that you have

0:24:54.020,0:24:57.419
imported to this workspace.

0:24:57.419,0:25:01.559
And for these settings, these are

0:25:01.559,0:25:03.720
related to our file format.

0:25:03.720,0:25:08.280
So this is a delimited file, and it's not

0:25:08.280,0:25:11.400
plain text, it's not a Jason. The delimiter

0:25:11.400,0:25:14.159
is common, as we have seen that they

0:25:14.159,0:25:26.700
[INDISTINGUISHABLE]

0:25:26.700,0:25:29.039
So I'm choosing common

0:25:29.039,0:25:32.900
errors because the only the first five...

0:25:32.900,0:25:34.880
[INDISTINGUISHABLE]

0:25:34.880,0:25:38.159
...for example. Okay, uh, if you have any

0:25:38.159,0:25:39.960
doubts, if you have any problems, please

0:25:39.960,0:25:42.960
don't hesitate to write me

0:25:42.960,0:25:45.020
in the chat,

0:25:45.020,0:25:48.480
like, what is blocking you, and

0:25:48.480,0:25:50.940
me and Carlotta will try to help you,

0:25:50.940,0:25:53.220
like whenever possible.

0:25:53.220,0:25:55.659
And now this is the new preview for my

0:25:55.659,0:25:57.840
data set. I can see that I have an ID, I

0:25:57.840,0:25:59.700
have patient ID, I have pregnancies, I

0:25:59.700,0:26:02.220
have the age of the people,

0:26:02.220,0:26:05.720
I have the body mass, I think

0:26:05.720,0:26:08.460
whether they have diabetes or not, as a

0:26:08.460,0:26:10.679
zero and one. Zero indicates a negative,

0:26:10.679,0:26:14.159
the person doesn't have diabetes, and one

0:26:14.159,0:26:16.080
indicates a positive, that this person

0:26:16.080,0:26:18.299
has diabetes. Okay.

0:26:18.299,0:26:20.520
Now I'm going to click on "Next". Here I am

0:26:20.520,0:26:23.400
defining my schema. All the data types

0:26:23.400,0:26:25.380
inside my columns, the column names, which

0:26:25.380,0:26:28.760
columns to include, which to exclude. And

0:26:28.760,0:26:31.500
here we will include everything except

0:26:31.500,0:26:35.580
the path of the bath color. And we are

0:26:35.580,0:26:37.860
going to review the data types of each

0:26:37.860,0:26:40.440
column. So let's review this first one.

0:26:40.440,0:26:43.320
This is numbers, numbers, numbers, then it's the

0:26:43.320,0:26:45.779
integer. And this is,

0:26:45.779,0:26:48.679
um, like decimal..

0:26:48.679,0:26:50.900
...dotted...

0:26:50.900,0:26:53.580
decimal number. So we are going to choose

0:26:53.580,0:26:55.020
this data type.

0:26:55.020,0:26:57.200
And for this one

0:26:57.200,0:27:01.200
it says diabetic, and it's a zero under

0:27:01.200,0:27:02.460
one, and we are going to make it as

0:27:02.460,0:27:04.460
integers.

0:27:04.460,0:27:07.980
Now we are going to click on "Next" and

0:27:07.980,0:27:09.780
move to reviewing everything. This is

0:27:09.780,0:27:11.569
everything that we have defined together.

0:27:11.569,0:27:13.500
I will click on "Create".

0:27:13.500,0:27:15.179
And...

0:27:15.179,0:27:17.940
now the first step has ended. We have

0:27:17.940,0:27:19.919
gotten our data ready.

0:27:19.919,0:27:22.440
Now...what now? We're going to utilize the

0:27:22.440,0:27:23.468
designer...

0:27:23.468,0:27:26.820
um...power. We're going to drag and drop

0:27:26.820,0:27:29.820
our data set to create the pipeline.

0:27:29.820,0:27:33.179
So I have clicked on it and dragged it

0:27:33.179,0:27:35.640
to this space. It's gonna appear to you.

0:27:35.640,0:27:39.659
And we can inspect it by right clicking and

0:27:39.659,0:27:42.179
choose "Preview data"

0:27:42.179,0:27:46.200
to see what we have created together.

0:27:46.200,0:27:48.900
From here, you can see everything that we

0:27:48.900,0:27:50.700
have seen previously, but in more

0:27:50.700,0:27:53.100
details. And we are just going to close

0:27:53.100,0:27:56.580
this. Now what? Now we are gonna do the

0:27:56.580,0:28:00.799
processing that Carlota mentioned.

0:28:00.799,0:28:03.659
These are some instructions about the

0:28:03.659,0:28:05.460
data, about how you can look at them, how you

0:28:05.460,0:28:07.140
can open them but we are going to move

0:28:07.140,0:28:09.720
to the transformation or the processing.

0:28:09.720,0:28:13.500
So as Carlotta told you, like any data

0:28:13.500,0:28:15.480
for us to work on we have to do some

0:28:15.480,0:28:17.299
processing to it

0:28:17.299,0:28:20.159
to make it easy easier for the model to

0:28:20.159,0:28:23.279
be trained and easier to work with. So, uh,

0:28:23.279,0:28:25.860
we're gonna do the normalization. And

0:28:25.860,0:28:29.159
normalization meaning is, uh,

0:28:29.159,0:28:33.539
to scale our data, either down or up, but

0:28:33.539,0:28:35.400
we're going to scale them down,

0:28:35.400,0:28:38.820
and we are going to decrease, uh,

0:28:38.820,0:28:40.799
relatively decrease

0:28:40.799,0:28:44.640
the values, all the values, to work

0:28:44.640,0:28:48.120
with lower numbers. And if we are working

0:28:48.120,0:28:49.559
with larger numbers, it's going to take

0:28:49.559,0:28:52.500
more time. If we're working with smaller

0:28:52.500,0:28:54.779
numbers, it's going to take less time to

0:28:54.779,0:28:59.159
calculate them, and that's it. So

0:28:59.159,0:29:02.159
where can I find the normalized data? I

0:29:02.159,0:29:04.260
can find it inside my component.

0:29:04.260,0:29:06.720
So I will choose the component and

0:29:06.720,0:29:09.659
search for "Normalized data".

0:29:09.659,0:29:12.360
I will drag and drop it as usual and I

0:29:12.360,0:29:14.820
will connect between these two things

0:29:14.820,0:29:18.360
by clicking on this spot, this, uh,

0:29:18.360,0:29:20.159
circuit, and

0:29:20.159,0:29:23.159
drag and drop onto the next circuit.

0:29:23.159,0:29:24.899
Now we are going to define our

0:29:24.899,0:29:27.419
normalization method.

0:29:27.419,0:29:31.080
So I'm going to double click on the

0:29:31.080,0:29:32.640
normalized data.

0:29:32.640,0:29:34.860
It's going to open the settings for the

0:29:34.860,0:29:36.480
normalization

0:29:36.480,0:29:38.820
as a better transformation method, which is

0:29:38.820,0:29:40.500
a mathematical way

0:29:40.500,0:29:42.299
that is going to scale our data

0:29:42.299,0:29:44.520
according to.

0:29:44.520,0:29:47.760
We're going to choose min-max, and for

0:29:47.760,0:29:51.539
this one, we are going to choose "Use Zero",

0:29:51.539,0:29:53.100
for constant column we are going to

0:29:53.100,0:29:54.480
choose "True",

0:29:54.480,0:29:56.880
and we are going to define which columns

0:29:56.880,0:29:58.860
to normalize. So we are not going to

0:29:58.860,0:30:01.080
normalize the whole data set. We are

0:30:01.080,0:30:02.760
going to choose a subset from the data

0:30:02.760,0:30:04.559
set to normalize. So we're going to

0:30:04.559,0:30:07.020
choose everything except for the patient

0:30:07.020,0:30:09.000
ID and the diabetic, because the patient

0:30:09.000,0:30:10.919
ID is a number, but it's a categorical

0:30:10.919,0:30:13.740
data. It describes a patient, it's not a

0:30:13.740,0:30:17.460
number that I can sum. I can't say "patient

0:30:17.460,0:30:20.159
ID number one plus patient ID number two".

0:30:20.159,0:30:21.720
No, this is a patient and another

0:30:21.720,0:30:23.399
patient, it's not a number that I can do

0:30:23.399,0:30:25.740
mathematical operations on, so I'm not

0:30:25.740,0:30:28.200
going to choose it. So we will choose

0:30:28.200,0:30:30.539
everything as I said, except for the

0:30:30.539,0:30:33.480
diabetic and the patient ID. I will

0:30:33.480,0:30:34.860
click on "Save".

0:30:34.860,0:30:37.740
And it's not showing me a warning again,

0:30:37.740,0:30:39.480
everything is good.

0:30:39.480,0:30:41.880
Now I can click on "Submit"

0:30:41.880,0:30:46.799
and review my normalization output.

0:30:46.799,0:30:48.240
Um.

0:30:48.240,0:30:51.659
So, if you click on "Submit" here,

0:30:51.659,0:30:54.659
you will choose "Create new" and

0:30:54.659,0:30:56.460
set the name that is mentioned here

0:30:56.460,0:30:59.899
inside the notebook. So it tells you

0:30:59.899,0:31:03.419
to create a job and name it, name

0:31:03.419,0:31:05.460
the experiment "MS Learn Diabetes

0:31:05.460,0:31:06.720
Training", because you will continue

0:31:06.720,0:31:10.160
working on and building component later.

0:31:10.160,0:31:13.020
I have it already created, I am the, uh,

0:31:13.020,0:31:16.919
we can review it together. So let

0:31:16.919,0:31:19.860
me just open this in another tab. I think

0:31:19.860,0:31:21.000
I have it...

0:31:21.000,0:31:23.659
here.

0:31:25.679,0:31:28.220
Okay.

0:31:30.720,0:31:34.740
So, these are all the jobs that I have

0:31:34.740,0:31:37.340
created.

0:31:37.860,0:31:40.119
All the jobs there. Let's do this over.

0:31:40.119,0:31:42.059
These are all the jobs that I have

0:31:42.059,0:31:43.679
submitted previously.

0:31:43.679,0:31:45.840
And I think this one is the

0:31:45.840,0:31:48.360
normalization job, so let's see the

0:31:48.360,0:31:50.100
output of it.

0:31:50.100,0:31:54.120
As you can see, it says, uh, "Check mark", yes,

0:31:54.120,0:31:56.640
which means that it worked, and we can

0:31:56.640,0:31:59.399
preview it. How can I do that? Right click

0:31:59.399,0:32:02.539
on it, choose "Preview data",

0:32:02.539,0:32:06.659
and as you can see all the data are

0:32:06.659,0:32:08.399
scaled down

0:32:08.399,0:32:10.980
so everything is between zero

0:32:10.980,0:32:15.860
and, uh, one I think.

0:32:15.860,0:32:18.899
So everything is good for us. Now we

0:32:18.899,0:32:21.840
can move forward to the next step

0:32:21.840,0:32:26.939
which is to create the whole pipeline.

0:32:26.939,0:32:30.840
So, uh, Carlota told you that

0:32:30.840,0:32:33.179
we're going to use a classification

0:32:33.179,0:32:37.260
model to create this data set, so let

0:32:37.260,0:32:40.620
me just drag and drop everything

0:32:40.620,0:32:43.140
to get runtime and we're doing

0:32:43.140,0:32:46.489
[INDISTINGUISHABLE]

0:32:46.489,0:32:48.469
about everything by

0:32:48.469,0:32:51.419
[INDISTINGUISHABLE]

0:32:51.419,0:32:52.919
So,

0:32:52.919,0:32:55.593
as a result, we are going to explain

0:32:55.593,0:32:59.760
[INDISTINGUISHABLE]

0:32:59.760,0:33:03.600
Yeah. So, I'm going to give this split

0:33:03.600,0:33:06.070
data. I'm going to take the

0:33:06.070,0:33:08.880
transformation data to split data and

0:33:08.880,0:33:10.380
connect it like that.

0:33:10.380,0:33:12.299
I'm going to get three model

0:33:12.299,0:33:15.240
components because I want to train my

0:33:15.240,0:33:16.679
model,

0:33:16.679,0:33:19.740
and I'm going to put it right here.

0:33:19.740,0:33:21.740
Okay.

0:33:21.740,0:33:24.419
Let's just move it down there. Okay.

0:33:24.419,0:33:27.059
And we are going to use a classification

0:33:27.059,0:33:28.620
model,

0:33:28.620,0:33:31.880
a two class

0:33:32.240,0:33:35.399
logistic regression model.

0:33:35.399,0:33:38.159
So I'm going to give this algorithm to

0:33:38.159,0:33:41.480
enable my model to work

0:33:41.820,0:33:45.960
This is the untrained model, this is...

0:33:45.960,0:33:48.059
here.

0:33:48.059,0:33:51.120
The left...

0:33:51.120,0:33:52.860
the left, uh, circuit, I'm going to

0:33:52.860,0:33:54.819
connect it to the data set, and the right

0:33:54.819,0:33:56.940
one, we are going to connect it to

0:33:56.940,0:33:59.700
evaluate model.

0:33:59.700,0:34:02.640
Evaluate model...so let's search for

0:34:02.640,0:34:05.220
"Evaluate model" here.

0:34:05.220,0:34:07.440
So because we want to do what...we want to

0:34:07.440,0:34:10.800
evaluate our model and see how it it has

0:34:10.800,0:34:13.790
been doing. Is it good, is it bad?

0:34:13.790,0:34:18.200
Um, sorry...

0:34:19.980,0:34:22.820
This is...

0:34:23.460,0:34:25.560
this is down there

0:34:25.560,0:34:28.139
after the score model.

0:34:28.139,0:34:31.320
So we have to get the score model first,

0:34:31.320,0:34:33.960
so let's get it.

0:34:33.960,0:34:36.119
And this will take the trained model and

0:34:36.119,0:34:37.260
the data set

0:34:37.260,0:34:39.419
to score our model and see if it's

0:34:39.419,0:34:42.179
performing good or bad.

0:34:42.179,0:34:44.409
And...

0:34:44.409,0:34:47.159
um...

0:34:47.159,0:34:49.080
after that, we have finished

0:34:49.080,0:34:51.920
everything. Now, we are going to do the what?

0:34:52.139,0:34:54.359
The presets for everything.

0:34:54.359,0:34:56.820
As a starter, we will be splitting our

0:34:56.820,0:34:58.920
data. So

0:34:58.920,0:35:01.140
how are we going to do this, according to

0:35:01.140,0:35:03.780
what? To the split rules. So I'm going to

0:35:03.780,0:35:05.940
double-click on it and choose "Split rules".

0:35:05.940,0:35:09.420
And the percentage is

0:35:09.420,0:35:11.780
70 percent for the [INSISTINGUASHABLE]

0:35:11.780,0:35:12.780
and 30 percent of the

0:35:12.780,0:35:14.820
data for

0:35:14.820,0:35:18.420
the valuation or for the scoring, okay?

0:35:18.420,0:35:20.880
I'm going to make it a randomization, so

0:35:20.880,0:35:22.980
I'm going to split data randomly and the

0:35:22.980,0:35:26.060
seat is, uh,

0:35:26.060,0:35:29.339
132, uh 23 I think...yeah.

0:35:29.339,0:35:32.520
And I think that's it.

0:35:32.520,0:35:35.040
The split says why this holds, and that's

0:35:35.040,0:35:36.240
good.

0:35:36.240,0:35:39.540
Now for the next one, which is the train

0:35:39.540,0:35:42.000
model we are going to connect it as

0:35:42.000,0:35:43.500
mentioned here.

0:35:43.500,0:35:48.660
And we have done that and...then why

0:35:48.660,0:35:50.700
am I having here? Let's double click

0:35:50.700,0:35:54.660
on it...yeah. It has...it needs the

0:35:54.660,0:35:57.180
label column that I am trying to predict.

0:35:57.180,0:35:58.680
So from here, I'm going to choose

0:35:58.680,0:36:01.380
diabetic. I'm going to save.

0:36:01.380,0:36:05.180
I'm going to close this one.

0:36:05.520,0:36:07.380
So it says here,

0:36:07.380,0:36:10.619
the diabetic label, the model, it will

0:36:10.619,0:36:12.300
predict the zero and one, because this is

0:36:12.300,0:36:14.700
a binary classification algorithm, so

0:36:14.700,0:36:16.260
it's going to predict either this or

0:36:16.260,0:36:17.520
that.

0:36:17.520,0:36:18.460
And...

0:36:18.460,0:36:20.160
um...

0:36:20.160,0:36:23.880
I think that's everything to run the the

0:36:23.880,0:36:25.859
pipeline.

0:36:25.859,0:36:29.040
So everything is done, everything is good

0:36:29.040,0:36:31.200
for this one. We're just gonna leave it

0:36:31.200,0:36:34.140
for now, because this is the next

0:36:34.140,0:36:35.620
step.

0:36:35.620,0:36:39.839
Um, this will be put instead of the

0:36:39.839,0:36:43.520
score model, but let's...

0:36:44.099,0:36:46.920
let's delete it for now.

0:36:46.920,0:36:49.500
Okay.

0:36:49.500,0:36:52.920
Now we have to submit the job in order

0:36:52.920,0:36:55.680
to see the output of it. So I can click

0:36:55.680,0:36:59.280
on "Submit" and choose the previous job

0:36:59.280,0:37:01.200
which is the one that I have showed you

0:37:01.200,0:37:02.460
before.

0:37:02.460,0:37:05.460
And then let's review its output

0:37:05.460,0:37:06.960
together here.

0:37:06.960,0:37:09.960
So if I go to the jobs,

0:37:09.960,0:37:15.119
if I go to MS Learn, maybe it is training?

0:37:15.119,0:37:18.180
I think it's the one that lasted the

0:37:18.180,0:37:20.640
longest, this one here.

0:37:20.640,0:37:23.700
So here I can see

0:37:23.700,0:37:27.079
the job output, what happened inside

0:37:27.079,0:37:30.420
the model, as you can see.

0:37:30.420,0:37:33.839
So the normalization we have seen

0:37:33.839,0:37:36.540
before, the split data, I can preview it.

0:37:36.540,0:37:39.359
The result one or the result two as it

0:37:39.359,0:37:41.760
splits the data to 70 here and

0:37:41.760,0:37:43.639
thirty percent here.

0:37:43.639,0:37:46.859
Um, I can see the score model, which is

0:37:46.859,0:37:49.140
something that we need

0:37:49.140,0:37:51.530
to review.

0:37:51.530,0:37:56.820
Inside the scroll model, uh, from

0:37:56.820,0:37:57.960
here,

0:37:57.960,0:38:00.960
we can see that...

0:38:00.960,0:38:04.460
let's get back here.

0:38:05.940,0:38:08.220
This is the data that the model has

0:38:08.220,0:38:11.579
been scored and this is a scoring output.

0:38:11.579,0:38:15.300
So it says "code label true", and he is

0:38:15.300,0:38:17.370
not diabetic, so this is,

0:38:17.370,0:38:19.200
um,

0:38:19.200,0:38:21.839
a wrong prediction, let's say.

0:38:21.839,0:38:23.880
For this one it's true and true, and this

0:38:23.880,0:38:26.880
is a good, like, what do you say,

0:38:26.880,0:38:29.460
prediction, and the probabilities of this

0:38:29.460,0:38:30.420
score,

0:38:30.420,0:38:33.119
which means the certainty of our model

0:38:33.119,0:38:36.620
of that this is really true. It's 80 percent.

0:38:36.620,0:38:38.780
For this one it's 75 percent.

0:38:38.780,0:38:42.599
So these are some cool metrics that we

0:38:42.599,0:38:45.359
can review to understand how our model

0:38:45.359,0:38:47.580
is performing. It's performing good for

0:38:47.580,0:38:48.540
now.

0:38:48.540,0:38:53.180
Let's check our evaluation model.

0:38:53.180,0:38:56.700
So this is the extra one that I told you

0:38:56.700,0:38:59.579
about. Instead of the

0:38:59.579,0:39:01.800
score model only, we are going to add

0:39:01.800,0:39:04.260
what evaluate model

0:39:04.260,0:39:06.900
after it. So here

0:39:06.900,0:39:09.420
we're going to go to our Asset Library

0:39:09.420,0:39:12.180
and we are going to choose the evaluate

0:39:12.180,0:39:14.940
model,

0:39:14.940,0:39:17.760
and we are going to put it here, and we

0:39:17.760,0:39:20.220
are going to connect it, and we are going

0:39:20.220,0:39:23.099
to submit the job using the same name of

0:39:23.099,0:39:25.140
the job that we used previously.

0:39:25.140,0:39:29.520
Let's review it. Also, so, after it

0:39:29.520,0:39:33.300
finishes, you will find it here. So I have

0:39:33.300,0:39:35.280
already done it before, this is how I'm

0:39:35.280,0:39:37.380
able to see the output.

0:39:37.380,0:39:40.320
So let's see

0:39:40.320,0:39:43.280
what is the output of this

0:39:43.280,0:39:45.660
evaluation process.

0:39:45.660,0:39:49.800
Here it mentioned to you that there are

0:39:49.800,0:39:51.480
some matrix,

0:39:51.480,0:39:54.839
like the confusion matrix, which Carlotta

0:39:54.839,0:39:57.060
told you about, there is the accuracy, the

0:39:57.060,0:39:59.760
precision, the recall, and F1 Score.

0:39:59.760,0:40:02.339
Every matrix gives us some insight about

0:40:02.339,0:40:04.920
our model. It helps us to understand it

0:40:04.920,0:40:08.579
more, and, um,

0:40:08.579,0:40:10.560
understand if it's overfitting, if

0:40:10.560,0:40:12.240
it's good, if it's bad, and really really,

0:40:12.240,0:40:16.339
like, understand how it's working.

0:40:17.060,0:40:20.400
Now I'm just waiting for the job to load.

0:40:20.400,0:40:22.710
Until it loads,

0:40:22.710,0:40:23.640
um,

0:40:23.640,0:40:26.040
we can continue

0:40:26.040,0:40:28.740
to work on our

0:40:28.740,0:40:31.800
model. So I will go to my designer. I'm

0:40:31.800,0:40:34.740
just going to confirm this.

0:40:34.740,0:40:38.280
And I'm going to continue working on it

0:40:38.280,0:40:39.780
from

0:40:39.780,0:40:42.119
where we have stopped. Where have we

0:40:42.119,0:40:43.560
stopped?

0:40:43.560,0:40:46.440
we have stopped on the evaluate model. So

0:40:46.440,0:40:48.960
I'm going to choose this one.

0:40:48.960,0:40:53.420
And it says here

0:40:54.180,0:40:56.940
"select experiment", "create inference

0:40:56.940,0:40:58.200
pipeline", so

0:40:58.200,0:41:01.079
I am going to go to the jobs,

0:41:01.079,0:41:04.680
I'm going to select my experiment.

0:41:04.680,0:41:06.660
I hope this works.

0:41:06.660,0:41:09.720
Okay. Finally, now we have our

0:41:09.720,0:41:12.180
evaluate model output.

0:41:12.180,0:41:15.480
Let's preview evaluation results

0:41:15.480,0:41:18.660
and, uh...

0:41:18.660,0:41:22.220
come on.

0:41:25.500,0:41:28.020
Finally. Now we can create our inference

0:41:28.020,0:41:31.020
pipeline. So,

0:41:31.020,0:41:33.510
I think it says that...

0:41:33.510,0:41:35.280
um...

0:41:35.280,0:41:38.160
select the experiment, then select MS

0:41:38.160,0:41:39.359
Learn. So,

0:41:39.359,0:41:43.320
I am just going to select it,

0:41:43.320,0:41:48.300
and finally. Now we can, the ROC curve, we

0:41:48.300,0:41:51.000
can see it, that the true positive rate

0:41:51.000,0:41:53.760
and the force was integrate. The false

0:41:53.760,0:41:56.660
positive rate is increasing with time,

0:41:56.660,0:42:01.020
and also the true positive rate. True

0:42:01.020,0:42:03.540
positive is something that it predicted,

0:42:03.540,0:42:06.960
that it is, uh, positive it has diabetes,

0:42:06.960,0:42:09.480
and it's really...it's really true.

0:42:09.480,0:42:12.599
The person really has diabetes. Okay. And

0:42:12.599,0:42:14.760
for the false positive, it predicted that

0:42:14.760,0:42:17.579
someone has diabetes and someone doesn't

0:42:17.579,0:42:20.960
have it. This is what true position and

0:42:20.960,0:42:24.960
false positive means. This is the record

0:42:24.960,0:42:28.020
curve, so we can review the metrics

0:42:28.020,0:42:32.160
of our model. This is the lift curve. I

0:42:32.160,0:42:36.000
can change the threshold of my confusion

0:42:36.000,0:42:37.740
matrix here

0:42:37.740,0:42:39.119
and if Carlotta wants to add

0:42:39.119,0:42:43.920
anything about the...the graphs,

0:42:43.920,0:42:47.000
you can do so.

0:42:50.440,0:42:52.558
[CARLOTTA]: Um, yeah, so I just

0:42:52.558,0:42:54.540
wanted to...if you go...yeah.

0:42:54.540,0:42:57.119
I just wanted to comment for the

0:42:57.119,0:43:00.480
RSC curve, that actually from this

0:43:00.480,0:43:03.900
graph, the metric which usually we're

0:43:03.900,0:43:06.960
going to compute is the area under

0:43:06.960,0:43:09.900
under the curve. And this coefficient or

0:43:09.900,0:43:12.240
metric,

0:43:12.240,0:43:15.060
it's a coefficient—

0:43:15.060,0:43:18.420
it's a value that could span from

0:43:18.420,0:43:23.480
zero to one and the the highest is...

0:43:23.480,0:43:25.970
...the highest is the the score.

0:43:25.970,0:43:29.220
So the closest one,

0:43:29.220,0:43:32.760
so the the highest is the amount of

0:43:32.760,0:43:35.280
area under this curve.

0:43:35.280,0:43:40.500
The highest performance

0:43:40.500,0:43:42.886
we've got from from our model.

0:43:42.886,0:43:46.440
And another thing is what John is

0:43:46.440,0:43:49.680
playing with. So this threshold for

0:43:49.680,0:43:51.380
the logistic

0:43:51.380,0:43:55.610
regression is the threshold used by the

0:43:55.610,0:43:59.520
model to, um,

0:43:59.520,0:44:02.880
to predict if the category is zero or

0:44:02.880,0:44:05.220
one. So if the probability—the

0:44:05.220,0:44:08.599
probability score is above the threshold,

0:44:08.599,0:44:11.579
then the category will be predicted as

0:44:11.579,0:44:15.359
one, while if the probability is

0:44:15.359,0:44:17.460
below the threshold, in this case, for

0:44:17.460,0:44:21.300
example, 0.5, the category is predicted

0:44:21.300,0:44:23.579
as zero. So that's why it's very

0:44:23.579,0:44:26.473
important to choose the threshold,

0:44:26.473,0:44:28.699
because the performance really can vary,

0:44:28.699,0:44:30.560
um,

0:44:30.560,0:44:34.380
with this threshold value.

0:44:34.380,0:44:41.099
[JOHN]: Thank you so much, Carlotta, and

0:44:41.400,0:44:44.400
as I mentioned now, we are going to

0:44:44.400,0:44:46.560
create our inference pipeline. So we are

0:44:46.560,0:44:48.540
going to select the latest one, which I

0:44:48.540,0:44:50.819
already have it opened here. This is the

0:44:50.819,0:44:52.859
one that we were reviewing together. This

0:44:52.859,0:44:55.500
is where we have stopped, and we're going

0:44:55.500,0:44:57.599
to create an inference pipeline. We are

0:44:57.599,0:44:59.520
going to choose a real-time inference

0:44:59.520,0:45:02.520
pipeline, okay?

0:45:02.520,0:45:05.080
From where I can find this? Here, as it

0:45:05.080,0:45:08.099
says, "Real-time inference pipeline".

0:45:08.099,0:45:10.680
So it's gonna add some things to my

0:45:10.680,0:45:12.240
workspace. It's going to add the

0:45:12.240,0:45:13.713
web service input, it's gonna

0:45:13.713,0:45:15.071
have the web service output,

0:45:15.071,0:45:16.490
because we will be creating

0:45:16.490,0:45:18.180
it as a web service to access

0:45:18.180,0:45:19.740
it from the internet.

0:45:19.740,0:45:21.770
What are we going to do? We're going

0:45:21.770,0:45:24.720
to remove this diabetes data, okay?

0:45:24.720,0:45:27.540
And we are going to get a component

0:45:27.540,0:45:29.359
called "Web

0:45:29.359,0:45:33.180
input" and...let me check

0:45:33.180,0:45:35.940
it's "enter data manually".

0:45:35.940,0:45:38.400
We have...we already have that with input

0:45:38.400,0:45:39.540
present.

0:45:39.540,0:45:42.119
So we are going to get the entire data

0:45:42.119,0:45:43.200
manually,

0:45:43.200,0:45:45.420
and we're going to collect it—to connect

0:45:45.420,0:45:49.560
it as it was connected before, like that.

0:45:49.560,0:45:53.040
And also, I am not going to directly take

0:45:53.040,0:45:55.260
the web service—sorry, escort model to

0:45:55.260,0:45:57.839
the web service output like that.

0:45:57.839,0:46:00.240
I'm going to delete this

0:46:00.240,0:46:03.960
and I'm going to execute a python script

0:46:03.960,0:46:05.880
before

0:46:05.880,0:46:09.500
I display my result.

0:46:10.680,0:46:12.060
So,

0:46:12.060,0:46:17.480
this will be connected like...

0:46:19.260,0:46:20.400
So...

0:46:20.400,0:46:23.599
the other way around.

0:46:23.599,0:46:27.660
And from here, I am going to connect this

0:46:27.660,0:46:30.960
with that and there is some data that

0:46:30.960,0:46:33.480
we will be getting from the node, or from

0:46:33.480,0:46:37.680
the explanation here, and this is the

0:46:37.680,0:46:40.740
data that will be entered to our

0:46:40.740,0:46:44.400
website manually. Okay? This is instead of

0:46:44.400,0:46:47.460
the data that we have been getting from

0:46:47.460,0:46:49.740
our data set that we created. So I'm just

0:46:49.740,0:46:51.960
going to double click on it and choose

0:46:51.960,0:46:55.579
CSV, and I will choose "it has headers",

0:46:55.579,0:47:00.839
and I will take or copy this content and

0:47:00.839,0:47:02.819
put it there, okay?

0:47:02.819,0:47:05.700
So let's do it.

0:47:05.700,0:47:07.920
I think I have to click on edit code, now

0:47:07.920,0:47:10.680
I can click on "Save", and I can close it.

0:47:10.680,0:47:13.079
Another thing which is the python script

0:47:13.079,0:47:16.700
that we will be executing.

0:47:17.099,0:47:17.900
Um, yeah. We

0:47:17.900,0:47:19.380
are going to remove this, also.

0:47:19.380,0:47:20.930
We don't need the evaluate model

0:47:20.930,0:47:24.319
anymore, so we are going to remove it.

0:47:24.319,0:47:25.582
The python script

0:47:25.582,0:47:28.579
that I will be executing,

0:47:28.579,0:47:32.599
I can find it here.

0:47:32.699,0:47:35.760
Um, yeah.

0:47:35.760,0:47:38.640
This is the python script that we will

0:47:38.640,0:47:41.520
execute. And it says to you that this

0:47:41.520,0:47:43.619
code selects only the patient's ID

0:47:43.619,0:47:45.000
the score label, the score

0:47:45.000,0:47:47.700
probability and return—returns them to

0:47:47.700,0:47:49.980
the web service output. So we don't want

0:47:49.980,0:47:51.960
to return all the columns, as we have

0:47:51.960,0:47:53.339
seen previously,

0:47:53.339,0:47:55.560
that determines everything,

0:47:55.560,0:47:56.940
so

0:47:56.940,0:47:59.040
we want to return certain stuff, the

0:47:59.040,0:48:02.940
stuff that we will use inside our

0:48:02.940,0:48:05.640
endpoint. So I'm just going to select

0:48:05.640,0:48:07.980
everything and delete it, and

0:48:07.980,0:48:11.060
paste the code that I have gotten from

0:48:11.060,0:48:14.280
the, uh,

0:48:14.280,0:48:16.500
the Microsoft Learn docs.

0:48:16.500,0:48:19.079
Now I can click on "Save", and I can close

0:48:19.079,0:48:20.280
this.

0:48:20.280,0:48:21.470
Let me check something,

0:48:21.470,0:48:22.950
I don't think it saved.

0:48:22.950,0:48:24.940
It's saved, but the display is

0:48:24.940,0:48:26.160
wrong, okay.

0:48:26.160,0:48:30.300
And now I think everything is good to go.

0:48:30.300,0:48:32.640
I'm just gonna double-check everything.

0:48:32.640,0:48:36.359
So, uh, yeah. We are gonna change the name

0:48:36.359,0:48:38.640
of this

0:48:38.640,0:48:40.800
pipeline, and we are gonna call it

0:48:40.800,0:48:42.780
"Predict

0:48:42.780,0:48:46.319
diabetes", okay?

0:48:46.319,0:48:50.339
Now let's close it, and

0:48:50.339,0:48:56.269
I think that we are good to go. So,

0:48:56.269,0:48:59.300
um,

0:48:59.720,0:49:04.460
Okay, I think everything is good for us.

0:49:06.210,0:49:08.108
I just want to make sure of something.

0:49:08.108,0:49:09.209
Is the data...

0:49:09.209,0:49:12.420
it's correct, the data is...yeah,

0:49:12.420,0:49:13.560
it's correct.

0:49:13.560,0:49:16.319
Okay, now I can run the pipeline. Let's

0:49:16.319,0:49:17.640
submit.

0:49:17.640,0:49:21.000
Select an "existing" pipeline, and we're

0:49:21.000,0:49:21.870
going to choose

0:49:21.870,0:49:23.529
the "ms-learn-diabetes-training",

0:49:23.529,0:49:24.599
which is the pipeline

0:49:24.599,0:49:27.060
that we have been working on

0:49:27.060,0:49:31.619
from the beginning of this module.

0:49:31.619,0:49:33.839
I don't think that this is going to take

0:49:33.839,0:49:36.060
much time. So we have submitted the job

0:49:36.060,0:49:37.319
and it's running.

0:49:37.319,0:49:40.140
Until the job ends, we are going to set

0:49:40.140,0:49:41.720
everything

0:49:41.720,0:49:45.599
for deploying a service.

0:49:45.599,0:49:49.070
In order to deploy a service,

0:49:49.070,0:49:50.520
um,

0:49:50.520,0:49:54.000
I have to have the job ready, so

0:49:54.000,0:49:55.980
until it's ready, you can't deploy it. So

0:49:55.980,0:49:58.319
let's go to the job—the job details from

0:49:58.319,0:50:01.319
here, okay?

0:50:01.319,0:50:05.119
And until it finishes,

0:50:05.119,0:50:07.260
Carlotta, do you think that we can have

0:50:07.260,0:50:09.240
the questions, and then we can get back

0:50:09.240,0:50:12.859
to the job I'm deploying it?

0:50:13.700,0:50:15.119
[CARLOTTA]: Yeah, yeah, yeah.

0:50:15.119,0:50:17.279
So yeah, guys, if you

0:50:17.279,0:50:18.980
have any questions

0:50:18.980,0:50:24.119
on what you just saw here

0:50:24.119,0:50:26.940
or into introductions, feel free. This is

0:50:26.940,0:50:30.300
a good moment, we can...we can discuss

0:50:30.300,0:50:33.900
now, while we wait for this job to

0:50:33.900,0:50:36.260
finish.

0:50:36.260,0:50:38.760
[JOHN]: Uh, and....

0:50:38.760,0:50:40.220
can...

0:50:40.220,0:50:45.000
we have the knowledge check one? Or, like,

0:50:45.000,0:50:46.360
what do you think?

0:50:46.360,0:50:47.956
[CARLOTTA]: Yeah, we can also go

0:50:47.956,0:50:49.680
to the knowledge check.

0:50:49.680,0:50:50.940
Um...

0:50:50.940,0:50:56.339
Yeah, okay. So let me share my screen.

0:50:56.339,0:50:58.980
Yeah, so if you have not any questions

0:50:58.980,0:51:01.619
for us, we can maybe propose some

0:51:01.619,0:51:04.959
questions to you that you can,

0:51:04.959,0:51:06.240
um,

0:51:06.240,0:51:09.450
check our knowledge so far and you

0:51:09.450,0:51:12.900
can maybe answer to these questions

0:51:12.900,0:51:15.420
via chat.

0:51:15.420,0:51:18.300
So we have...do you see my screen, can

0:51:18.300,0:51:19.859
you see my screen?

0:51:19.859,0:51:21.650
[JOHN]: Yes.

0:51:21.650,0:51:24.440
[CARLOTTA]: So, John, I think I will

0:51:24.440,0:51:25.440
read this

0:51:25.440,0:51:29.040
question aloud and ask it to you, okay? So

0:51:29.040,0:51:32.040
are you ready to answer?

0:51:32.040,0:51:33.660
[JOHN:] Yes I am.

0:51:33.660,0:51:35.460
[CARLOTTA]: So...

0:51:35.460,0:51:37.260
you're using Azure Machine Learning

0:51:37.260,0:51:39.780
designer to create a training pipeline

0:51:39.780,0:51:42.540
for a binary classification model, so

0:51:42.540,0:51:45.300
what we were doing in our demo,

0:51:45.300,0:51:48.059
right? And you have added a data set

0:51:48.059,0:51:51.660
containing features and labels, a Two-

0:51:51.660,0:51:54.359
Class Decision Forest module. So we used

0:51:54.359,0:51:56.819
a logistic regression model our...

0:51:56.819,0:51:57.877
um, in our example.

0:51:57.877,0:51:59.019
Here, we're using a Two-

0:51:59.019,0:52:01.260
Class Decision Forest model.

0:52:01.260,0:52:04.500
And, of course, a Train Model module. You

0:52:04.500,0:52:07.200
plan now to use score model and evaluate

0:52:07.200,0:52:09.480
model modules to test the train model

0:52:09.480,0:52:11.640
with the subset of the data set that

0:52:11.640,0:52:13.500
wasn't used for training.

0:52:13.500,0:52:15.960
But what are we missing? So what's

0:52:15.960,0:52:18.780
another model you should add? We have

0:52:18.780,0:52:21.660
three options: we have Join Data, we have

0:52:21.660,0:52:25.200
Split Data, or we have Select Columns

0:52:25.200,0:52:26.819
in Dataset.

0:52:26.819,0:52:28.260
So

0:52:28.260,0:52:32.040
while John thinks about the answer,

0:52:32.040,0:52:33.599
go ahead and,

0:52:33.599,0:52:34.800
um,

0:52:34.800,0:52:37.800
answer yourself. So give us your

0:52:37.800,0:52:39.540
guess.

0:52:39.540,0:52:41.940
Put it in the chat, or just come off mute

0:52:41.940,0:52:44.900
and answer.

0:52:46.740,0:52:47.785
"A", "B".

0:52:47.785,0:52:49.769
[JOHN]: Yeah, what do you

0:52:49.769,0:52:50.509
is the correct

0:52:50.509,0:52:53.579
answer for this one? I need something to

0:52:53.579,0:52:56.579
uh...I have to score my model, and I

0:52:56.579,0:53:00.359
have to evaluate it, so I need

0:53:00.359,0:53:03.119
something to enable me to do these two

0:53:03.119,0:53:05.359
things.

0:53:06.579,0:53:08.233
[CARLOTTA]: I think it's something

0:53:08.233,0:53:10.640
you showed us in your pipeline,

0:53:10.640,0:53:13.260
right John?

0:53:13.260,0:53:16.819
[JOHN]: Of course I did.

0:53:23.460,0:53:25.122
[CARLOTTA]: Uh, we have no guesses

0:53:25.122,0:53:28.020
in the chat?

0:53:28.020,0:53:30.070
[JOHN]: Can someone...

0:53:30.070,0:53:32.280
Someone want to guess?

0:53:32.280,0:53:35.579
[CARLOTTA]: We have a "B".

0:53:35.579,0:53:38.760
[JOHN]: Uh, maybe.

0:53:38.760,0:53:43.260
So, in order to do this,

0:53:43.260,0:53:46.200
I mentioned the

0:53:46.200,0:53:49.380
the module that is going to help me

0:53:49.380,0:53:52.728
to divide my data into two things:

0:53:52.728,0:53:53.819
70 percent for the

0:53:53.819,0:53:56.220
the training and 30 percent for the

0:53:56.220,0:53:59.339
evaluation. So what did I use? I used

0:53:59.339,0:54:01.859
split data, because this is what is going

0:54:01.859,0:54:05.280
to split my data randomly into training

0:54:05.280,0:54:08.459
data and validation data. So the correct

0:54:08.459,0:54:12.240
answer is "B", and good job. Thank you

0:54:12.240,0:54:13.980
for participating.

0:54:13.980,0:54:17.400
Next question, please.

0:54:17.400,0:54:19.339
[CARLOTTA]: Yes, "B" is the correct

0:54:19.339,0:54:22.559
answer, so thanks, John,

0:54:22.559,0:54:26.040
for explaining to us the correct

0:54:26.040,0:54:26.940
one.

0:54:26.940,0:54:30.420
And we want to go with question two?

0:54:30.420,0:54:33.180
[JOHN]: Yeah, so, [br]I'm going to ask you now,

0:54:33.180,0:54:35.880
Carlotta. You use Azure Machine Learning

0:54:35.880,0:54:38.280
designer to create a training pipeline

0:54:38.280,0:54:40.500
for your classification model.

0:54:40.500,0:54:44.099
What must you do before you deploy this

0:54:44.099,0:54:45.870
model as a service?[br]You have to do

0:54:45.870,0:54:46.634
something before

0:54:46.634,0:54:47.439
you deploy it.

0:54:47.439,0:54:49.740
What do you think is the correct answer?

0:54:49.740,0:54:52.740
Is it "A", "B", or "C"?

0:54:52.740,0:54:55.020
Share your thoughts with—

0:54:55.020,0:54:56.690
with us in the chat and

0:54:56.690,0:55:00.180
and I'm also going to give you some

0:55:00.180,0:55:02.940
minutes to think of it before I

0:55:02.940,0:55:06.020
tell you about it.

0:55:06.020,0:55:07.765
[CARLOTTA]: Yeah so let me go

0:55:07.765,0:55:09.000
through the possible

0:55:09.000,0:55:12.359
answers, right? So we have A: "Create an

0:55:12.359,0:55:14.940
inference pipeline from the training

0:55:14.940,0:55:16.020
pipeline";

0:55:16.020,0:55:19.260
B: we have "Add an Evaluate Model

0:55:19.260,0:55:22.380
module to the training pipeline; and then

0:55:22.380,0:55:25.079
three, we have "Clone the training

0:55:25.079,0:55:28.380
pipeline with a different name".

0:55:29.520,0:55:31.559
So what do you think is the correct

0:55:31.559,0:55:33.960
answer? "A", "B", or "C"?

0:55:33.960,0:55:36.660
Also this time, I think it's something

0:55:36.660,0:55:39.300
we mentioned both in the decks and in

0:55:39.300,0:55:41.960
the demo right?

0:55:42.599,0:55:44.819
[JOHN]: Yes it is,

0:55:44.819,0:55:46.793
it's something that I have done

0:55:46.793,0:55:50.410
like two, like five minutes ago.

0:55:51.800,0:55:57.200
It's real-time, real-time.

0:55:57.200,0:55:58.760
[CARLOTTA]: Um,

0:55:58.760,0:56:02.040
yeah, so, think about...you need to deploy

0:56:02.040,0:56:05.460
the model as a service. So if I'm

0:56:05.460,0:56:07.980
going to deploy model,

0:56:07.980,0:56:10.380
I cannot evaluate the model

0:56:10.380,0:56:12.839
after deploying it, right, because I

0:56:12.839,0:56:14.940
cannot go into production if I'm not

0:56:14.940,0:56:17.579
sure, I'm not satisfied with my model, and

0:56:17.579,0:56:19.500
I'm not sure that my model is performing

0:56:19.500,0:56:20.280
well.

0:56:20.280,0:56:22.900
So that's why I would go with,

0:56:22.900,0:56:24.319
um,

0:56:24.319,0:56:30.480
I would...exclude "B" from my

0:56:30.480,0:56:31.520
answer.

0:56:31.520,0:56:33.419
While

0:56:33.419,0:56:36.960
thinking about "C", uh, I don't see you—I

0:56:36.960,0:56:39.480
didn't see you, John, cloning the

0:56:39.480,0:56:41.420
training Pipeline with a different name,

0:56:41.420,0:56:44.640
so I don't think this is the

0:56:44.640,0:56:46.920
right answer.

0:56:46.920,0:56:49.619
While I've seen you creating an

0:56:49.619,0:56:52.729
inference pipeline from the

0:56:52.729,0:56:54.830
training pipeline, and you just converted

0:56:54.830,0:56:59.280
it using a one-click button, right?

0:56:59.280,0:57:01.400
[JOHN]: Yeah, that's correct.

0:57:01.400,0:57:04.280
So this is the right answer.

0:57:04.280,0:57:07.460
Good job. So I created an inference

0:57:07.460,0:57:11.280
real-time pipeline, and it has done.

0:57:11.280,0:57:13.440
It finished—it finished, the job is

0:57:13.440,0:57:18.000
finished. So we can now deploy.

0:57:18.000,0:57:19.400
And...

0:57:19.400,0:57:21.500
Yeah [LAUGHS].

0:57:21.500,0:57:25.339
Exactly, like, on time.

0:57:25.339,0:57:27.839
Like, it finished two seconds...

0:57:27.839,0:57:30.859
three, four seconds ago [LAUGHS].

0:57:30.859,0:57:33.119
So, uh,

0:57:33.119,0:57:36.480
until, um...

0:57:36.480,0:57:39.839
This is my job review, so

0:57:39.839,0:57:43.260
this is the job details that I

0:57:43.260,0:57:45.540
have already submitted, it's just opening,

0:57:45.540,0:57:47.459
and once it opens...

0:57:47.459,0:57:50.180
um...

0:57:50.400,0:57:52.740
I don't know why it's so heavy

0:57:52.740,0:57:56.780
today, it's not like that usually.

0:57:57.780,0:58:00.020
[CARLOTTA]: Yeah, it's probably because

0:58:00.020,0:58:01.020
you are also

0:58:01.020,0:58:06.000
showing your your screen on Teams,

0:58:06.000,0:58:08.160
so that's the bandwidth of your

0:58:08.160,0:58:08.944
connection.

0:58:08.944,0:58:10.740
[JOHN]: Let me do something here

0:58:10.740,0:58:13.740
because...yeah finally.

0:58:13.740,0:58:16.440
I can switch to my mobile internet if it

0:58:16.440,0:58:18.599
did it again. So I will click on "Deploy",

0:58:18.599,0:58:20.700
it's that simple. I'll just click on

0:58:20.700,0:58:23.040
"Deploy" and...

0:58:23.040,0:58:25.619
I am going to deploy a new real-time

0:58:25.619,0:58:27.960
endpoint.

0:58:27.960,0:58:30.300
So what I'm going to name it?

0:58:30.300,0:58:31.870
Description and the compute type.

0:58:31.870,0:58:33.372
Everything is already mentioned

0:58:33.372,0:58:34.140
for me here,

0:58:34.140,0:58:36.240
so I'm just gonna copy and paste it,

0:58:36.240,0:58:38.940
because we...we are running

0:58:38.940,0:58:41.280
out of time.

0:58:41.280,0:58:44.230
So it's all Azure Container Instance,

0:58:44.230,0:58:46.360
not Azure Kubernetes Service,

0:58:46.360,0:58:48.720
which is a containerization service also.

0:58:48.720,0:58:50.867
Both are for containerization, but this

0:58:50.867,0:58:53.613
gives you something, and this gives you[br]something else.

0:58:53.613,0:58:54.960
For the advanced options,

0:58:54.960,0:58:57.420
it doesn't say for us to do anything, so

0:58:57.420,0:59:00.420
we are just gonna click on "Deploy",

0:59:00.420,0:59:05.220
and now we can test our endpoint from

0:59:05.220,0:59:07.859
the endpoints that we can find here, so

0:59:07.859,0:59:11.460
it's in progress. If I go here

0:59:11.460,0:59:13.799
under the assets, I can find something

0:59:13.799,0:59:16.680
called "Endpoints", and I can find the

0:59:16.680,0:59:18.599
real-time ones and the batch endpoints.

0:59:18.599,0:59:22.020
And we have created a real-time endpoint,

0:59:22.020,0:59:25.260
so we are going to find it under this

0:59:25.260,0:59:29.760
title. So if I click on it, I should

0:59:29.760,0:59:32.640
be able to test it once it's ready.

0:59:32.640,0:59:37.200
It's still loading, but this is the

0:59:37.200,0:59:40.980
input, and this is the output that we

0:59:40.980,0:59:44.652
will get back, so if I click on "Test"...

0:59:44.652,0:59:46.886
and from here,

0:59:46.886,0:59:49.810
I will input some data to the

0:59:49.810,0:59:50.900
endpoint,

0:59:50.900,0:59:54.599
which are: the patient information; the

0:59:54.599,0:59:57.119
columns that we have already seen in our

0:59:57.119,1:00:00.380
data set; the patient ID; the pregnancies.

1:00:00.380,1:00:03.960
And of course, of course I'm not gonna

1:00:03.960,1:00:05.940
enter the label that I'm trying to

1:00:05.940,1:00:08.099
predict, so I'm not going to give him if

1:00:08.099,1:00:10.360
the patient is diabetic or not. This

1:00:10.360,1:00:12.665
endpoint is to tell me this.

1:00:12.665,1:00:14.599
The endpoint, or the URL,

1:00:14.599,1:00:15.529
is going to give me

1:00:15.529,1:00:17.640
back this information, whether someone

1:00:17.640,1:00:22.680
has diabetes, or he doesn't. So if I input

1:00:22.680,1:00:24.780
this data, I'm just going to copy it,

1:00:24.780,1:00:27.780
and go to my endpoint, and click on

1:00:27.780,1:00:30.180
"Test", I'm gonna give the result pack,

1:00:30.180,1:00:32.359
which are the three columns that we have

1:00:32.359,1:00:35.520
defined inside our python script: the

1:00:35.520,1:00:37.859
patient ID, the diabetic prediction, and

1:00:37.859,1:00:41.040
the probability—the certainty of whether

1:00:41.040,1:00:45.720
someone is diabetic or not based on the...

1:00:45.720,1:00:49.090
uh...based on the prediction.

1:00:49.090,1:00:50.660
So that's it.

1:00:50.660,1:00:54.359
And, uh, I think that this is a really

1:00:54.359,1:00:56.729
simple step to do, you can do it on your

1:00:56.729,1:00:58.380
own, you can test it.

1:00:58.380,1:01:01.140
And I think that I have finished, so

1:01:01.140,1:01:03.020
thank you.

1:01:03.020,1:01:04.206
[CARLOTTA]: Uh, yes,

1:01:04.206,1:01:06.069
we are running out of time

1:01:06.069,1:01:09.780
I just wanted to thank you, John, for

1:01:09.780,1:01:12.299
this demo, for going through all these

1:01:12.299,1:01:13.429
steps to

1:01:13.429,1:01:16.740
um, create, train a classification model,

1:01:16.740,1:01:19.680
and also deploy it as a predictive

1:01:19.680,1:01:22.880
service. And I encourage you all to go

1:01:22.880,1:01:25.079
back to the learn module

1:01:25.079,1:01:28.260
and, um, deepen all these topics

1:01:28.260,1:01:31.760
at your own pace, and also maybe

1:01:31.760,1:01:34.799
uh do this demo on your own, on your

1:01:34.799,1:01:37.140
subscription on your Azure for Student

1:01:37.140,1:01:39.359
subscription. Um...

1:01:39.359,1:01:43.200
And I would also like to recall that

1:01:43.200,1:01:46.140
this is part of a series of study

1:01:46.140,1:01:49.500
sessions of Cloud Skill Challenge study

1:01:49.500,1:01:51.059
sessions,

1:01:51.059,1:01:54.059
so you will have more in the...

1:01:54.059,1:01:57.540
in the following days, and this is for

1:01:57.540,1:02:00.480
you to prepare, let's say, to help you

1:02:00.480,1:02:04.880
in taking the Cloud Skills Challenge,

1:02:04.880,1:02:07.040
which collect

1:02:07.040,1:02:10.599
a very interesting learn module that you

1:02:10.599,1:02:14.540
can use to scale up on various topics,

1:02:14.540,1:02:18.359
and some of them are focused on AI and

1:02:18.359,1:02:20.819
ML. So if you are interested in these

1:02:20.819,1:02:23.099
topics, you can select these these learn

1:02:23.099,1:02:24.780
modules.

1:02:24.780,1:02:27.660
So let me also copy

1:02:27.660,1:02:29.669
the link, the short link to the

1:02:29.669,1:02:32.420
challenge in the chat. Remember that

1:02:32.420,1:02:34.980
you have time until the 13th of

1:02:34.980,1:02:37.980
September to take the challenge. And also

1:02:37.980,1:02:40.440
remember that in October, on the 7th of

1:02:40.440,1:02:43.020
October, you have the—you can join the

1:02:43.020,1:02:46.619
student—the Student Developer Summit,

1:02:46.619,1:02:50.480
which is, uh, which will be a virtual or

1:02:50.480,1:02:53.220
in...for some for some cases a hybrid

1:02:53.220,1:02:55.880
event, so stay tuned, because you will

1:02:55.880,1:02:58.559
have some surprises in the following

1:02:58.559,1:03:01.260
days. And if you want to learn more about

1:03:01.260,1:03:03.480
this event you can check the Microsoft

1:03:03.480,1:03:08.099
Imaging Cap Twitter page and stay tuned.

1:03:08.099,1:03:11.230
So thank you everyone for joining

1:03:11.230,1:03:12.989
this session today, and thank you very

1:03:12.989,1:03:16.500
much, John, for co-hosting with this

1:03:16.500,1:03:20.359
session with me. It was a pleasure.

1:03:21.227,1:03:22.838
[JOHN]: Thank you so much,

1:03:22.838,1:03:23.969
Carlotta, for having me

1:03:23.969,1:03:26.249
with you today, and thank you for

1:03:26.249,1:03:27.670
giving me this opportunity to

1:03:27.670,1:03:30.180
be with you here.

1:03:30.180,1:03:32.070
[CARLOTTA]: Great, thank you.

1:03:32.070,1:03:33.420
[JOHN]: Yeah, I hope that we

1:03:33.420,1:03:35.390
work again in the future.

1:03:35.390,1:03:37.880
[CARLOTTA]: Sure, I hope so as well.

1:03:37.880,1:03:40.700
Um, so, thank you everyone.

1:03:40.700,1:03:43.749
And have a nice rest of your day.

1:03:44.099,1:03:46.500
Bye-bye. Speak to you soon.

1:03:46.500,1:03:48.920
[JOHN]: Bye.