Create a Classification Model with Azure Machine Learning Designer [English]

Edit subtitles

0:02 - 0:04

[CARLOTTA]: Great, so I think we can start
0:04 - 0:06

since the meeting is recorded, so if
0:06 - 0:10

everyone, uh, jump-jumps in later, they
0:10 - 0:12

can watch the recording.
0:12 - 0:16

So, hi everyone and welcome to this
0:16 - 0:18

um, Cloud Skill Challenge study session
0:18 - 0:21

around a create classification models
0:21 - 0:24

with Azure Machine learning designer.
0:24 - 0:27

So today I'm thrilled to be here with
0:27 - 0:29

John. Uh, John do you mind
0:29 - 0:32

introduce briefly yourself?
0:32 - 0:33

[JOHN]: Uh, thank you Carlotta.
0:33 - 0:34

Hello everyone.
0:34 - 0:38

Welcome to our workshop today. I hope
0:38 - 0:41

that you are all excited for it. I am
0:41 - 0:43

John Aziz, a gold Microsoft Learn student
0:43 - 0:47

ambassador, and I will be here with, uh,
0:47 - 0:51

Carlotta to do the practical part
0:51 - 0:54

about this module of the Cloud Skills
0:54 - 0:57

Challenge. Thank you for having me.
0:57 - 0:58

[CARLOTTA]: Perfect, thanks John.
0:58 - 1:00

So for those who
1:00 - 1:03

don't know me, I'm Carlotta Castelluccio,
1:03 - 1:06

based in Italy and focused on AI
1:06 - 1:09

machine learning technologies and about
1:09 - 1:11

the use in education.
1:11 - 1:12

Um, so,
1:13 - 1:15

um this Cloud Skill Challenge study
1:15 - 1:17

session is based on a learn module, a
1:17 - 1:21

dedicated learn module. I sent to you, uh
1:21 - 1:24

the link to this module, uh, in the chat
1:24 - 1:26

in a way that you can follow along the
1:26 - 1:29

module if you want, or just have a look at
1:29 - 1:32

the module later at your own pace.
1:32 - 1:34

Um...
1:34 - 1:37

So, before starting I would also like to
1:37 - 1:41

remember to remember you, uh, the code of
1:41 - 1:43

conduct and guidelines of our student
1:43 - 1:48

ambassadors community. So please during this
1:48 - 1:51

meeting be respectful and inclusive and
1:51 - 1:54

be friendly, open, and welcoming and
1:54 - 1:56

respectful of other-each other
1:56 - 1:58

differences.
1:58 - 2:01

If you want to learn more about the code
2:01 - 2:03

of conduct, you can use this link in the
2:03 - 2:09

deck: aka.ms/SACoC.
2:10 - 2:12

And now we are,
2:12 - 2:15

um, we are ready to to start our session.
2:15 - 2:19

So as we mentioned it we are going to
2:19 - 2:22

focus on classification models and Azure ML,
2:22 - 2:25

uh, today. So, first of all, we are going
2:25 - 2:28

to, um, identify, uh, the kind of
2:28 - 2:31

um, of scenarios in which you should
2:31 - 2:34

choose to use a classification model.
2:34 - 2:37

We're going to introduce Azure Machine
2:37 - 2:39

Learning and Azure Machine Designer.
2:39 - 2:42

We're going to understand, uh, which are
2:42 - 2:44

the steps to follow, to create a
2:44 - 2:46

classification model in Azure Machine
2:46 - 2:48

Learning, and then John will,
2:48 - 2:50

um,
2:50 - 2:52

lead an amazing demo about training and
2:52 - 2:54

publishing a classification model in
2:54 - 2:57

Azure ML Designer.
2:57 - 3:00

So, let's start from the beginning. Let's
3:00 - 3:03

start from identifying classification
3:03 - 3:05

machine learning scenarios.
3:05 - 3:08

So, first of all, what is classification?
3:08 - 3:10

Classification is a form of machine
3:10 - 3:12

learning that is used to predict which
3:12 - 3:16

category or class an item belongs to. For
3:16 - 3:17

example, we might want to develop a
3:17 - 3:20

classifier able to identify if an
3:20 - 3:22

incoming email should be filtered or not
3:22 - 3:25

according to the style, the sender, the
3:25 - 3:27

length of the email, etc.
3:27 - 3:28

In this case, the
3:28 - 3:30

characteristics of the email are the
3:30 - 3:31

features.
3:31 - 3:34

And the label is a classification of
3:34 - 3:38

either a zero or one, representing a spam
3:38 - 3:41

or non-spam for the incoming email. So
3:41 - 3:42

this is an example of a binary
3:42 - 3:44

classifier. If you want to assign
3:44 - 3:46

multiple categories to the incoming
3:46 - 3:49

email like work letters, love letters,
3:49 - 3:52

complaints, or other categories, in this
3:52 - 3:54

case a binary classifier is no longer
3:54 - 3:56

enough, and we should develop a
3:56 - 3:58

multi-class classifier. So classification
3:58 - 4:01

is an example of what is called
4:01 - 4:03

supervised machine learning
4:03 - 4:05

in which you train a model using data
4:05 - 4:07

that includes both the features and
4:07 - 4:09

known values for label
4:09 - 4:11

so that the model learns to fit the
4:11 - 4:14

feature combinations to the label. Then,
4:14 - 4:15

after training has been completed, you
4:15 - 4:17

can use the train model to predict
4:17 - 4:20

labels for new items for-for which the
4:20 - 4:22

label is unknown.
4:22 - 4:25

But let's see some examples of scenarios
4:25 - 4:27

for classification machine learning
4:27 - 4:29

models. So, we already mentioned an
4:29 - 4:31

example of a solution in which we would
4:31 - 4:34

need a classifier, but let's explore
4:34 - 4:36

other scenarios for classification in
4:36 - 4:38

other industries. For example, you can use
4:38 - 4:40

a classification model for a health
4:40 - 4:44

clinic scenario, and use clinical data to
4:44 - 4:46

predict whether patient will become sick
4:46 - 4:47

or not.
4:47 - 4:50

You can use, um...
4:50 - 4:59

[NO AUDIO]
4:59 - 5:01

[JOHN]: Carlotta, you are muted.
5:04 - 5:08

[CARLOTTA]: Oh, sorry.
So, when I became muted, it's a
5:08 - 5:09

long time, or?
5:09 - 5:12

[JOHN]: You can use-you can use, uh
5:12 - 5:13

some models for classification.
5:13 - 5:15

For example, you can use...
5:15 - 5:17

You were saying this.
5:17 - 5:20

[CARLOTTA]: Uh, so I was in this deck,
5:20 - 5:22

or the previous one?
5:22 - 5:24

[JOHN]: This one, you have been muted
5:24 - 5:26

for, uh, one second [LAUGHS].
5:26 - 5:28

[CARLOTTA]: Okay, okay perfect, perfect.
5:28 - 5:30

Uh, yeah I was talking...sorry for
5:30 - 5:33

that. So, I was talking about the possible
5:33 - 5:35

scenarios in which you,
5:35 - 5:37

you can use a classification model. Like
5:37 - 5:40

have clinic scenario, financial scenario,
5:40 - 5:42

or the third one is business type of
5:42 - 5:44

scenario. You can use characteristics of
5:44 - 5:46

small business to predict if a new
5:46 - 5:48

venture will succeed or not, for
5:48 - 5:50

example. And these are all types of
5:50 - 5:52

binary classification.
5:52 - 5:55

Uh, but today we are also going to talk
5:55 - 5:57

about Azure Machine Learning. So let's
5:57 - 5:58

see.
5:58 - 6:01

What is Azure Machine Learning? So
6:01 - 6:02

training and deploying an effective
6:02 - 6:04

machine learning model involves a lot of
6:04 - 6:07

work, much of it time-consuming and
6:07 - 6:09

resource intensive. So, Azure Machine
6:09 - 6:11

Learning is a cloud-based service that
6:11 - 6:13

helps simplify some of the tasks it
6:13 - 6:16

takes to prepare data, train a model, and
6:16 - 6:18

also deploy it as a predictive service.
6:18 - 6:20

So it helps that the scientists increase
6:20 - 6:22

their efficiency by automating many of
6:22 - 6:25

the time-consuming tasks associated to
6:25 - 6:28

creating and training a model.
6:28 - 6:30

And it enables them also to use
6:30 - 6:32

cloud-based compute resources that scale
6:32 - 6:34

effectively to handle large volumes of
6:34 - 6:36

data while incurring costs only when
6:36 - 6:39

actually used.
6:39 - 6:41

To use Azure Machine Learning, you,
6:41 - 6:43

first thing's first, you need to create a
6:43 - 6:45

workspace resource in your Azure
6:45 - 6:48

subscription, and you can then use these
6:48 - 6:50

workspace to manage data, compute
6:50 - 6:52

resources, code models and other
6:52 - 6:55

artifacts after you have created an
6:55 - 6:57

Azure Machine Learning workspace,
6:57 - 6:58

you can develop solutions with the
6:58 - 6:59

Azure Machine Learning service,
6:59 - 7:01

either with developer
7:01 - 7:03

tools or the Azure Machine Learning
7:03 - 7:04

studio web portal.
7:04 - 7:06

In particular,
Azure Machine Learning studio
7:06 - 7:08

is a web portal for machine
7:08 - 7:10

learning solutions in Azure, and it
7:10 - 7:12

includes a wide range of features and
7:12 - 7:14

capabilities that help data scientists
7:14 - 7:16

prepare data, train models, publish
7:16 - 7:18

predictive services, and monitor also
7:18 - 7:20

their usage.
7:20 - 7:22

So to begin using the web portal, you need
7:22 - 7:23

to assign the workspace
7:23 - 7:25

you created in the Azure portal
7:25 - 7:27

to the Azure Machine
7:27 - 7:30

Learning studio.
7:30 - 7:32

At its core, Azure Machine Learning is a
7:32 - 7:34

service for training and managing
7:34 - 7:36

machine learning models for which you
7:36 - 7:38

need compute resources on which to run
7:38 - 7:40

the training process.
7:40 - 7:44

Compute targets are, um, one of the main
7:44 - 7:47

basic concepts of Azure Machine Learning.
7:47 - 7:49

They are cloud-based resources on which
7:49 - 7:51

you can run model training and data
7:51 - 7:53

exploration processes.
7:53 - 7:55

So in Azure Machine Learning studio, you
7:55 - 7:57

can manage the compute targets for your
7:57 - 7:59

data science activities, and there are
7:59 - 8:03

four kinds of of compute targets you can
8:03 - 8:06

create. We have the compute instances,
8:06 - 8:10

which are vital machine set up for
8:10 - 8:11

running machine learning code during
8:11 - 8:13

development, so they are not designed for
8:13 - 8:14

production.
8:14 - 8:17

Then we have compute clusters, which are
8:17 - 8:20

a set of virtual machines that can scale
8:20 - 8:22

up automatically based on traffic.
8:22 - 8:25

We have inference clusters, which are
8:25 - 8:27

similar to compute clusters, but they are
8:27 - 8:29

designed for deployment, so they are
8:29 - 8:32

deployment targets for predictive
8:32 - 8:36

services that use trained models.
8:36 - 8:38

And finally, we have attached compute,
8:38 - 8:41

which are any compute target that you
8:41 - 8:44

manage yourself outside of Azure ML, like,
8:44 - 8:47

for example, virtual machines or Azure
8:47 - 8:50

data bricks clusters.
8:50 - 8:53

So we talked about Azure Machine
8:53 - 8:54

Learning, but we also mentioned-
8:54 - 8:56

mentioned Azure Machine Learning
8:56 - 8:58

designer. What is Azure Machine Learning
8:58 - 9:00

designer? So, in Azure Machine Learning
9:00 - 9:03

Studio, there are several ways to author
9:03 - 9:05

classification machine learning models.
9:05 - 9:08

One way is to use a visual interface, and
9:08 - 9:10

this visual interface is called designer,
9:10 - 9:13

and you can use it to train, test, and
9:13 - 9:16

also deploy machine learning models. And
9:16 - 9:18

the drag-and-drop interface makes use of
9:18 - 9:20

clearly defined inputs and outputs that
9:20 - 9:23

can be shared, reused, and also version
9:23 - 9:24

control.
9:24 - 9:26

And using the designer, you can identify
9:26 - 9:28

the building blocks or components needed
9:28 - 9:31

for your model, place and connect them on
9:31 - 9:33

your canvas, and run a machine learning
9:33 - 9:35

job.
9:35 - 9:37

So,
9:37 - 9:39

each designer project, so each project
9:39 - 9:42

in the designer is known as a pipeline.
9:42 - 9:46

And in the design, we have a left panel
9:46 - 9:48

for navigation and a canvas on your
9:48 - 9:51

right hand side in which you build your
9:51 - 9:54

pipeline visually. So pipelines let you
9:54 - 9:56

organize, manage, and reuse complex
9:56 - 9:58

machine learning workflows across
9:58 - 10:00

projects and users.
10:00 - 10:03

A pipeline starts with the data set from
10:03 - 10:04

which you want to train the model
10:04 - 10:06

because all begins with data when
10:06 - 10:07

talking about data science and machine
10:07 - 10:10

learning. And each time you run a
10:10 - 10:11

pipeline, the configuration of the
10:11 - 10:13

pipeline and its results are stored in
10:13 - 10:17

your workspace as a pipeline job.
10:17 - 10:22

So the second main concept of Azure
10:22 - 10:25

Machine Learning is a component. So, going
10:25 - 10:28

hierarchically from the pipeline, we can
10:28 - 10:31

say that each building block of a
10:31 - 10:33

pipeline is called a component.
10:33 - 10:34

In other words, an Azure Machine
10:34 - 10:37

Learning component encapsulates one step
10:37 - 10:39

in a machine learning pipeline. So, it's a
10:39 - 10:42

reusable piece of code with inputs and
10:42 - 10:44

outputs, something very similar to a
10:44 - 10:46

function in any programming language.
10:46 - 10:49

And in a pipeline project, you can access
10:49 - 10:51

data assets and components from the left
10:51 - 10:53

panels
10:53 - 10:56

Asset Library tab, as you can see
10:56 - 11:00

here in the screenshot in the deck.
11:00 - 11:03

So you can create data assets on using
11:03 - 11:08

an ADOC page called Data Page. And a data
11:08 - 11:11

asset is a reference to a data source
11:11 - 11:12

location.
11:12 - 11:16

So this data source location could be a
11:16 - 11:19

local file, a data store, a web file or
11:19 - 11:22

even an Azure open asset.
11:22 - 11:24

And these data assets will appear along
11:24 - 11:26

with standard sample data set in the
11:26 - 11:30

designers Asset Library.
11:30 - 11:32

Um.
11:32 - 11:37

Another basic concept of Azure ML is
11:37 - 11:39

Azure Machine Learning jobs.
11:39 - 11:44

So, basically, when you submit a pipeline,
11:44 - 11:47

you create a job which will run all the
11:47 - 11:50

steps in your pipeline. So a job executes
11:50 - 11:53

a task against a specified compute
11:53 - 11:54

target.
11:54 - 11:57

Jobs enable systematic tracking for your
11:57 - 11:59

machine learning experimentation in
11:59 - 12:00

Azure ML.
12:00 - 12:02

And once a job is created, Azure ML
12:02 - 12:05

maintains a run record, uh, for the
12:05 - 12:08

job.
12:08 - 12:12

Um, but, let's move to the classification
12:12 - 12:14

steps. So,
12:14 - 12:17

um, let's introduce how to create a
12:17 - 12:21

classification model in Azure ML, but you
12:21 - 12:24

will see it in more details in a
12:24 - 12:26

handsome demo that John will guide
12:26 - 12:29

through in a few minutes.
12:29 - 12:32

So, you can think of the steps to train
12:32 - 12:34

and evaluate a classification machine
12:34 - 12:37

learning model as four main steps. So
12:37 - 12:38

first of all, you need to prepare your
12:38 - 12:41

data. So, you need to identify the
12:41 - 12:43

features and the label in your data set,
12:43 - 12:46

you need to pre-process, so you need to
12:46 - 12:49

clean and transform the data as needed.
12:49 - 12:51

Then, the second step, of course, is
12:51 - 12:53

training the model.
12:53 - 12:55

And for training the model, you need to
12:55 - 12:57

split the data into two groups: a
12:57 - 13:00

training and a validation set.
13:00 - 13:01

Then you train a machine learning model
13:01 - 13:04

using the training data set and you test
13:04 - 13:05

the machine learning model for
13:05 - 13:07

performance using the validation data
13:07 - 13:08

set.
13:08 - 13:12

The third step is performance evaluation,
13:12 - 13:15

which means comparing how close the
13:15 - 13:16

model's predictions are to the known
13:16 - 13:21

labels and these lead us to compute some
13:21 - 13:23

evaluation performance metrics.
13:23 - 13:26

And then finally...
13:26 - 13:29

So, these three steps are not,
13:29 - 13:33

um, not performed every time in a
13:33 - 13:35

linear manner. It's more an iterative
13:35 - 13:39

process. But once you obtain, you achieve
13:39 - 13:43

a performance with which you are
13:43 - 13:46

satisfied, so you are ready to, let's say
13:46 - 13:49

go into production, and you can deploy
13:49 - 13:52

your train model as a predictive service
13:52 - 13:56

into a real-time, uh, to a real-time
13:56 - 13:58

endpoint. And to do so, you need to
13:58 - 14:00

convert the training pipeline into a
14:00 - 14:03

real-time inference pipeline, and then
14:03 - 14:04

you can deploy the model as an
14:04 - 14:07

application on a server or device so
14:07 - 14:11

that others can consume this model.
14:11 - 14:14

So let's start with the first step, which
14:14 - 14:18

is prepare data. Real-world data can contain
14:18 - 14:20

many different issues that can affect
14:20 - 14:22

the utility of the data and our
14:22 - 14:25

interpretation of the results. So also
14:25 - 14:27

the machine learning model that you
14:27 - 14:29

train using this data. For example, real-
14:29 - 14:31

world data can be affected by a bad
14:31 - 14:34

recording or a bad measurement, and it
14:34 - 14:36

can also contain missing values for some
14:36 - 14:39

parameters. And Azure Machine Learning
14:39 - 14:41

designer has several pre-built
14:41 - 14:43

components that can be used to prepare
14:43 - 14:46

data for training. These components
14:46 - 14:48

enable you to clean data, normalize
14:48 - 14:53

features, join tables, and more.
14:53 - 14:57

Let's come to training. So, to train a
14:57 - 14:59

classification model you need a data set
14:59 - 15:02

that includes historical features, so the
15:02 - 15:04

characteristics of the entity for which
15:04 - 15:07

one to make a prediction, and known label
15:07 - 15:10

values. The label is the class indicator
15:10 - 15:12

we want to train a model to predict.
15:12 - 15:14

And it's common practice to train a
15:14 - 15:16

model using a subset of the data while
15:16 - 15:18

holding back some data with which to
15:18 - 15:21

test the train model. And this enables
15:21 - 15:22

you to compare the labels that the model
15:22 - 15:25

predicts with the actual known labels in
15:25 - 15:27

the original data set.
15:27 - 15:30

This operation can be performed in the
15:30 - 15:32

designer using the split data component
15:32 - 15:35

as shown by the screenshot here in the...
15:35 - 15:37

in the deck.
15:37 - 15:40

There's also another component that you
15:40 - 15:41

should use, which is the score model
15:41 - 15:43

component to generate the predicted
15:43 - 15:45

class label value using the validation
15:45 - 15:48

data as input. So once you connect all
15:48 - 15:50

these components,
15:50 - 15:52

the component specifying the
15:52 - 15:55

model we are going to use, the split data
15:55 - 15:57

component, the trained model component,
15:57 - 16:00

and the score model component, you want
16:00 - 16:03

to run a new experiment in
16:03 - 16:06

Azure ML, which will use the data set
16:06 - 16:10

on the canvas to train and score a model.
16:10 - 16:12

After training a model, it is important,
16:12 - 16:15

we say, to evaluate its performance, to
16:15 - 16:17

understand how bad-how good sorry
16:17 - 16:21

our model is performing.
16:21 - 16:23

And there are many performance metrics
16:23 - 16:25

and methodologies for evaluating how
16:25 - 16:27

well a model makes predictions. The
16:27 - 16:29

component to use to perform evaluation
16:29 - 16:32

in Azure ML designer is called, as
16:32 - 16:35

intuitive as it is, Evaluate Model.
16:35 - 16:38

Once the job of training and evaluation
16:38 - 16:41

of the model is completed, you can review
16:41 - 16:43

evaluation metrics on the completed job
16:43 - 16:46

page by right clicking on the component.
16:46 - 16:48

In the evaluation results, you can also
16:48 - 16:51

find the so-called confusion Matrix that
16:51 - 16:53

you can see here in the right side of
16:53 - 16:55

this deck
16:55 - 16:57

A confusion matrix shows cases where
16:57 - 16:59

both the predicted and actual values
16:59 - 17:02

were one, the so-called true positives
17:02 - 17:04

at the top left and also cases where
17:04 - 17:07

both the predicted and the actual values
17:07 - 17:08

were zero, the so-called true negatives
17:08 - 17:11

at the bottom right. While the other
17:11 - 17:14

cells show cases where the predicting
17:14 - 17:15

and actual values differ,
17:15 - 17:18

called false positive and false
17:18 - 17:20

negatives, and this is an example of a
17:20 - 17:24

confusion matrix for a binary classifier.
17:24 - 17:26

While for a multi-class classification
17:26 - 17:28

model the same approach is used to
17:28 - 17:30

tabulate each possible combination of
17:30 - 17:33

actual and predictive value counts. So
17:33 - 17:35

for example, a model with three possible
17:35 - 17:38

classes would result in three times
17:38 - 17:39

three matrix.
17:39 - 17:42

The confusion matrix is also useful for
17:42 - 17:44

the matrix that can be derived from it,
17:44 - 17:48

like accuracy, recall, or precision.
17:49 - 17:52

We say that the last step is
17:52 - 17:56

deploying the train model to a real-time
17:56 - 17:59

endpoint as a predictive service. And in
17:59 - 18:01

order to automate your model into a
18:01 - 18:03

service that makes continuous
18:03 - 18:05

predictions, you need, first of all, to
18:05 - 18:08

create and then deploy an
18:08 - 18:10

inference pipeline. The process of
18:10 - 18:12

converting the training pipeline into a
18:12 - 18:14

real-time inference pipeline removes
18:14 - 18:16

training components and adds web service
18:16 - 18:19

inputs and outputs to handle requests.
18:19 - 18:21

And the inference pipeline performs...they
18:21 - 18:23

seem that the transformation is the
18:23 - 18:26

first pipeline, but for new data. Then it
18:26 - 18:29

uses the train model to infer or predict
18:29 - 18:33

label values based on its feature.
18:33 - 18:36

So, I think I've talked a lot for now
18:36 - 18:40

I would like to let John show us
18:40 - 18:44

something in practice with
18:44 - 18:47

the hands-on demo, so please, John, go
18:47 - 18:50

ahead, share your screen and guide us
18:50 - 18:52

through this demo of creating a
18:52 - 18:53

classification with
18:53 - 18:56

the Azure Machine Learning designer.
18:56 - 18:59

[JOHN]: Thank you so much Carlotta for
18:59 - 19:01

this interesting explanation of the
19:01 - 19:04

Azure ML designer. And now,
19:04 - 19:08

um, I'm going to start with you in the
19:08 - 19:10

practical demo part, so if you want to
19:10 - 19:13

follow along, go to the link that Carlotta
19:13 - 19:18

sent in the chat so you can do
19:18 - 19:22

the demo or the practical part with me.
19:22 - 19:25

I'm just going to share my screen...
19:25 - 19:27

and...
19:27 - 19:32

...go here. So, uh...
19:32 - 19:34

Where am I right now? I'm inside the
19:34 - 19:37

Microsoft Learn documentation. This is
19:37 - 19:40

the exercise part of this module, and we
19:40 - 19:43

will start by setting two things, which
19:43 - 19:45

are a prequisite for us to work inside
19:45 - 19:50

this module, which are the users group
19:50 - 19:52

and the Azure Machine Learning workspace,
19:52 - 19:56

and something extra which is the compute
19:56 - 20:00

cluster that Carlotta talked about. So I
20:00 - 20:02

just want to make sure that you all have
20:02 - 20:06

a resource group created inside your
20:06 - 20:08

portal inside your Microsoft Azure
20:08 - 20:11

platform. So this is my resource group.
20:11 - 20:15

Inside this is this Resource Group. I
20:15 - 20:17

have created an Azure Machine Learning
20:17 - 20:22

workspace. So I'm just going to access
20:22 - 20:24

the workspace that I have created
20:24 - 20:27

already from this link. I am going to
20:27 - 20:30

open it, which is the studio web URL, and
20:30 - 20:33

I will follow the steps. So what is this?
20:33 - 20:36

This is your machine learning workspace,
20:36 - 20:38

or machine learning studio. You can do a
20:38 - 20:40

lot of things here, but we are going to
20:40 - 20:42

focus mainly on the designer and the
20:42 - 20:46

data and the compute. So another
20:46 - 20:49

prerequisite here, as Carlotta told you,
20:49 - 20:51

we need some resources to power up the
20:51 - 20:54

classification, the processes that
20:54 - 20:55

will happen.
20:55 - 20:58

So, we have created this computing
20:58 - 20:59

cluster,
20:59 - 21:03

and we have set some presets for
21:03 - 21:04

it. So
21:04 - 21:07

where can you find this preset? You go
21:07 - 21:10

here. Under the create compute, you'll
21:10 - 21:13

find everything that you need to do. So
21:13 - 21:17

the size is the Standard DS11 Version 2,
21:17 - 21:20

and it's a CPU not GPU, because we don't
21:20 - 21:22

know the GPU, and we don't need a GPU.
21:22 - 21:26

Uh, it is ready for us to use.
21:26 - 21:31

The next thing which we will look into
21:31 - 21:34

is the designer. How can you access the
21:34 - 21:35

designer?
21:35 - 21:38

You can either click on this icon or
21:38 - 21:40

click on the navigation menu and click
21:40 - 21:42

on the designer for me.
21:43 - 21:46

Now I am inside my designer.
21:46 - 21:48

What we are going to do now is the
21:48 - 21:50

pipeline that Carlotta told you about.
21:50 - 21:54

And from where can I know these steps? If
21:54 - 21:57

you follow along in the learn module, you
21:57 - 21:59

will find everything that I'm doing
21:59 - 22:02

right now in detail, with screenshots
22:02 - 22:06

of course. So I'm going to create a new
22:06 - 22:09

pipeline, and I can do so by clicking on
22:09 - 22:11

this plus button.
22:11 - 22:14

It's going to redirect me to the
22:14 - 22:17

designer authoring the pipeline, uh, where
22:17 - 22:20

I can drag and drop data and components
22:20 - 22:22

that Carlotta told you the difference
22:22 - 22:23

between.
22:23 - 22:26

And here I am going to do some changes
22:26 - 22:29

to the settings. I am going to connect
22:29 - 22:32

this with my compute cluster that I
22:32 - 22:35

created previously so I can utilize it.
22:35 - 22:38

From here I'm going to choose this
22:38 - 22:40

compute cluster demo that I have showed
22:40 - 22:43

you before in the clusters here,
22:43 - 22:46

and I am going to change the name to
22:46 - 22:48

something more meaningful. Instead of
22:48 - 22:51

byline and the date of today I'm going
22:51 - 22:54

to name it Diabetes...
22:54 - 22:56

uh...
22:56 - 23:00

let's just check this training.
23:00 - 23:05

Let's say Training 0.1 or 01, okay?
23:05 - 23:09

And I am going to close this tab in
23:09 - 23:12

order to have a bigger place to work
23:12 - 23:15

inside because this is where we will
23:15 - 23:17

work, where everything will happen. So I
23:17 - 23:20

will click on close from here,
23:20 - 23:23

and I will go to the data and I will
23:23 - 23:26

create a new data set.
23:26 - 23:28

How can I create a new data set? There is
23:28 - 23:30

multiple options here you can find, from
23:30 - 23:32

local files, from data store, from web
23:32 - 23:34

files, from open data set, but I'm going
23:34 - 23:37

to choose from web files, as this is the
23:37 - 23:40

way we're going to create our data.
23:40 - 23:43

From here, the information of my data set
23:43 - 23:47

I'm going to get them from the Microsoft
23:47 - 23:51

Learn module. So if we go to the step
23:51 - 23:53

that says "Create a dataset",
23:53 - 23:55

under it, it illustrates that you can
23:55 - 23:58

access the data from inside the asset
23:58 - 24:00

library, and inside your asset library,
24:00 - 24:02

you'll find the data and find the
24:02 - 24:06

component. And I'm going to select
24:06 - 24:09

this link because this is where my data
24:09 - 24:12

is stored. If you open this link, you will
24:12 - 24:15

find this is a CSV file, I think.
24:15 - 24:17

Yeah. And you can...like, all the data are
24:17 - 24:18

here.
24:18 - 24:21

Now let's get back..
24:21 - 24:22

Um...
24:27 - 24:28

And you are going to do something
24:28 - 24:30

meaningful, but because I have already
24:30 - 24:32

created it before twice, so I'm gonna
24:32 - 24:35

add a number to the name
24:35 - 24:38

The data set is tabular and there is
24:38 - 24:39

the file, but this is a table, so we're
24:39 - 24:41

going to choose the table.
24:41 - 24:42

Data type
24:42 - 24:44

for data set type.
24:44 - 24:46

Now we will click on "Next". That's gonna
24:46 - 24:51

review, or display for you the content
24:51 - 24:54

of this file that you have
24:54 - 24:57

imported to this workspace.
24:57 - 25:02

And for these settings, these are
25:02 - 25:04

related to our file format.
25:04 - 25:08

So this is a delimited file, and it's not
25:08 - 25:11

plain text, it's not a Jason. The delimiter
25:11 - 25:14

is common, as we have seen that they
25:14 - 25:27

[INDISTINGUISHABLE]
25:27 - 25:29

So I'm choosing common
25:29 - 25:33

errors because the only the first five...
25:33 - 25:35

[INDISTINGUISHABLE]
25:35 - 25:38

...for example. Okay, uh, if you have any
25:38 - 25:40

doubts, if you have any problems, please
25:40 - 25:43

don't hesitate to write me
25:43 - 25:45

in the chat,
25:45 - 25:48

like, what is blocking you, and
25:48 - 25:51

me and Carlotta will try to help you,
25:51 - 25:53

like whenever possible.
25:53 - 25:56

And now this is the new preview for my
25:56 - 25:58

data set. I can see that I have an ID, I
25:58 - 26:00

have patient ID, I have pregnancies, I
26:00 - 26:02

have the age of the people,
26:02 - 26:06

I have the body mass, I think
26:06 - 26:08

whether they have diabetes or not, as a
26:08 - 26:11

zero and one. Zero indicates a negative,
26:11 - 26:14

the person doesn't have diabetes, and one
26:14 - 26:16

indicates a positive, that this person
26:16 - 26:18

has diabetes. Okay.
26:18 - 26:21

Now I'm going to click on "Next". Here I am
26:21 - 26:23

defining my schema. All the data types
26:23 - 26:25

inside my columns, the column names, which
26:25 - 26:29

columns to include, which to exclude. And
26:29 - 26:32

here we will include everything except
26:32 - 26:36

the path of the bath color. And we are
26:36 - 26:38

going to review the data types of each
26:38 - 26:40

column. So let's review this first one.
26:40 - 26:43

This is numbers, numbers, numbers, then it's the
26:43 - 26:46

integer. And this is,
26:46 - 26:49

um, like decimal..
26:49 - 26:51

...dotted...
26:51 - 26:54

decimal number. So we are going to choose
26:54 - 26:55

this data type.
26:55 - 26:57

And for this one
26:57 - 27:01

it says diabetic, and it's a zero under
27:01 - 27:02

one, and we are going to make it as
27:02 - 27:04

integers.
27:04 - 27:08

Now we are going to click on "Next" and
27:08 - 27:10

move to reviewing everything. This is
27:10 - 27:12

everything that we have defined together.
27:12 - 27:14

I will click on "Create".
27:14 - 27:15

And...
27:15 - 27:18

now the first step has ended. We have
27:18 - 27:20

gotten our data ready.
27:20 - 27:22

Now...what now? We're going to utilize the
27:22 - 27:23

designer...
27:23 - 27:27

um...power. We're going to drag and drop
27:27 - 27:30

our data set to create the pipeline.
27:30 - 27:33

So I have clicked on it and dragged it
27:33 - 27:36

to this space. It's gonna appear to you.
27:36 - 27:40

And we can inspect it by right clicking and
27:40 - 27:42

choose "Preview data"
27:42 - 27:46

to see what we have created together.
27:46 - 27:49

From here, you can see everything that we
27:49 - 27:51

have seen previously, but in more
27:51 - 27:53

details. And we are just going to close
27:53 - 27:57

this. Now what? Now we are gonna do the
27:57 - 28:01

processing that Carlota mentioned.
28:01 - 28:04

These are some instructions about the
28:04 - 28:05

data, about how you can look at them, how you
28:05 - 28:07

can open them but we are going to move
28:07 - 28:10

to the transformation or the processing.
28:10 - 28:14

So as Carlotta told you, like any data
28:14 - 28:15

for us to work on we have to do some
28:15 - 28:17

processing to it
28:17 - 28:20

to make it easy easier for the model to
28:20 - 28:23

be trained and easier to work with. So, uh,
28:23 - 28:26

we're gonna do the normalization. And
28:26 - 28:29

normalization meaning is, uh,
28:29 - 28:34

to scale our data, either down or up, but
28:34 - 28:35

we're going to scale them down,
28:35 - 28:39

and we are going to decrease, uh,
28:39 - 28:41

relatively decrease
28:41 - 28:45

the values, all the values, to work
28:45 - 28:48

with lower numbers. And if we are working
28:48 - 28:50

with larger numbers, it's going to take
28:50 - 28:52

more time. If we're working with smaller
28:52 - 28:55

numbers, it's going to take less time to
28:55 - 28:59

calculate them, and that's it. So
28:59 - 29:02

where can I find the normalized data? I
29:02 - 29:04

can find it inside my component.
29:04 - 29:07

So I will choose the component and
29:07 - 29:10

search for "Normalized data".
29:10 - 29:12

I will drag and drop it as usual and I
29:12 - 29:15

will connect between these two things
29:15 - 29:18

by clicking on this spot, this, uh,
29:18 - 29:20

circuit, and
29:20 - 29:23

drag and drop onto the next circuit.
29:23 - 29:25

Now we are going to define our
29:25 - 29:27

normalization method.
29:27 - 29:31

So I'm going to double click on the
29:31 - 29:33

normalized data.
29:33 - 29:35

It's going to open the settings for the
29:35 - 29:36

normalization
29:36 - 29:39

as a better transformation method, which is
29:39 - 29:40

a mathematical way
29:40 - 29:42

that is going to scale our data
29:42 - 29:45

according to.
29:45 - 29:48

We're going to choose min-max, and for
29:48 - 29:52

this one, we are going to choose "Use Zero",
29:52 - 29:53

for constant column we are going to
29:53 - 29:54

choose "True",
29:54 - 29:57

and we are going to define which columns
29:57 - 29:59

to normalize. So we are not going to
29:59 - 30:01

normalize the whole data set. We are
30:01 - 30:03

going to choose a subset from the data
30:03 - 30:05

set to normalize. So we're going to
30:05 - 30:07

choose everything except for the patient
30:07 - 30:09

ID and the diabetic, because the patient
30:09 - 30:11

ID is a number, but it's a categorical
30:11 - 30:14

data. It describes a patient, it's not a
30:14 - 30:17

number that I can sum. I can't say "patient
30:17 - 30:20

ID number one plus patient ID number two".
30:20 - 30:22

No, this is a patient and another
30:22 - 30:23

patient, it's not a number that I can do
30:23 - 30:26

mathematical operations on, so I'm not
30:26 - 30:28

going to choose it. So we will choose
30:28 - 30:31

everything as I said, except for the
30:31 - 30:33

diabetic and the patient ID. I will
30:33 - 30:35

click on "Save".
30:35 - 30:38

And it's not showing me a warning again,
30:38 - 30:39

everything is good.
30:39 - 30:42

Now I can click on "Submit"
30:42 - 30:47

and review my normalization output.
30:47 - 30:48

Um.
30:48 - 30:52

So, if you click on "Submit" here,
30:52 - 30:55

you will choose "Create new" and
30:55 - 30:56

set the name that is mentioned here
30:56 - 31:00

inside the notebook. So it tells you
31:00 - 31:03

to create a job and name it, name
31:03 - 31:05

the experiment "MS Learn Diabetes
31:05 - 31:07

Training", because you will continue
31:07 - 31:10

working on and building component later.
31:10 - 31:13

I have it already created, I am the, uh,
31:13 - 31:17

we can review it together. So let
31:17 - 31:20

me just open this in another tab. I think
31:20 - 31:21

I have it...
31:21 - 31:24

here.
31:26 - 31:28

Okay.
31:31 - 31:35

So, these are all the jobs that I have
31:35 - 31:37

created.
31:38 - 31:40

All the jobs there. Let's do this over.
31:40 - 31:42

These are all the jobs that I have
31:42 - 31:44

submitted previously.
31:44 - 31:46

And I think this one is the
31:46 - 31:48

normalization job, so let's see the
31:48 - 31:50

output of it.
31:50 - 31:54

As you can see, it says, uh, "Check mark", yes,
31:54 - 31:57

which means that it worked, and we can
31:57 - 31:59

preview it. How can I do that? Right click
31:59 - 32:03

on it, choose "Preview data",
32:03 - 32:07

and as you can see all the data are
32:07 - 32:08

scaled down
32:08 - 32:11

so everything is between zero
32:11 - 32:16

and, uh, one I think.
32:16 - 32:19

So everything is good for us. Now we
32:19 - 32:22

can move forward to the next step
32:22 - 32:27

which is to create the whole pipeline.
32:27 - 32:31

So, uh, Carlota told you that
32:31 - 32:33

we're going to use a classification
32:33 - 32:37

model to create this data set, so let
32:37 - 32:41

me just drag and drop everything
32:41 - 32:43

to get runtime and we're doing
32:43 - 32:46

[INDISTINGUISHABLE]
32:46 - 32:48

about everything by
32:48 - 32:51

[INDISTINGUISHABLE]
32:51 - 32:53

So,
32:53 - 32:56

as a result, we are going to explain
32:56 - 33:00

[INDISTINGUISHABLE]
33:00 - 33:04

Yeah. So, I'm going to give this split
33:04 - 33:06

data. I'm going to take the
33:06 - 33:09

transformation data to split data and
33:09 - 33:10

connect it like that.
33:10 - 33:12

I'm going to get three model
33:12 - 33:15

components because I want to train my
33:15 - 33:17

model,
33:17 - 33:20

and I'm going to put it right here.
33:20 - 33:22

Okay.
33:22 - 33:24

Let's just move it down there. Okay.
33:24 - 33:27

And we are going to use a classification
33:27 - 33:29

model,
33:29 - 33:32

a two class
33:32 - 33:35

logistic regression model.
33:35 - 33:38

So I'm going to give this algorithm to
33:38 - 33:41

enable my model to work
33:42 - 33:46

This is the untrained model, this is...
33:46 - 33:48

here.
33:48 - 33:51

The left...
33:51 - 33:53

the left, uh, circuit, I'm going to
33:53 - 33:55

connect it to the data set, and the right
33:55 - 33:57

one, we are going to connect it to
33:57 - 34:00

evaluate model.
34:00 - 34:03

Evaluate model...so let's search for
34:03 - 34:05

"Evaluate model" here.
34:05 - 34:07

So because we want to do what...we want to
34:07 - 34:11

evaluate our model and see how it it has
34:11 - 34:14

been doing. Is it good, is it bad?
34:14 - 34:18

Um, sorry...
34:20 - 34:23

This is...
34:23 - 34:26

this is down there
34:26 - 34:28

after the score model.
34:28 - 34:31

So we have to get the score model first,
34:31 - 34:34

so let's get it.
34:34 - 34:36

And this will take the trained model and
34:36 - 34:37

the data set
34:37 - 34:39

to score our model and see if it's
34:39 - 34:42

performing good or bad.
34:42 - 34:44

And...
34:44 - 34:47

um...
34:47 - 34:49

after that, we have finished
34:49 - 34:52

everything. Now, we are going to do the what?
34:52 - 34:54

The presets for everything.
34:54 - 34:57

As a starter, we will be splitting our
34:57 - 34:59

data. So
34:59 - 35:01

how are we going to do this, according to
35:01 - 35:04

what? To the split rules. So I'm going to
35:04 - 35:06

double-click on it and choose "Split rules".
35:06 - 35:09

And the percentage is
35:09 - 35:12

70 percent for the [INSISTINGUASHABLE]
35:12 - 35:13

and 30 percent of the
35:13 - 35:15

data for
35:15 - 35:18

the valuation or for the scoring, okay?
35:18 - 35:21

I'm going to make it a randomization, so
35:21 - 35:23

I'm going to split data randomly and the
35:23 - 35:26

seat is, uh,
35:26 - 35:29

132, uh 23 I think...yeah.
35:29 - 35:33

And I think that's it.
35:33 - 35:35

The split says why this holds, and that's
35:35 - 35:36

good.
35:36 - 35:40

Now for the next one, which is the train
35:40 - 35:42

model we are going to connect it as
35:42 - 35:44

mentioned here.
35:44 - 35:49

And we have done that and...then why
35:49 - 35:51

am I having here? Let's double click
35:51 - 35:55

on it...yeah. It has...it needs the
35:55 - 35:57

label column that I am trying to predict.
35:57 - 35:59

So from here, I'm going to choose
35:59 - 36:01

diabetic. I'm going to save.
36:01 - 36:05

I'm going to close this one.
36:06 - 36:07

So it says here,
36:07 - 36:11

the diabetic label, the model, it will
36:11 - 36:12

predict the zero and one, because this is
36:12 - 36:15

a binary classification algorithm, so
36:15 - 36:16

it's going to predict either this or
36:16 - 36:18

that.
36:18 - 36:18

And...
36:18 - 36:20

um...
36:20 - 36:24

I think that's everything to run the the
36:24 - 36:26

pipeline.
36:26 - 36:29

So everything is done, everything is good
36:29 - 36:31

for this one. We're just gonna leave it
36:31 - 36:34

for now, because this is the next
36:34 - 36:36

step.
36:36 - 36:40

Um, this will be put instead of the
36:40 - 36:44

score model, but let's...
36:44 - 36:47

let's delete it for now.
36:47 - 36:50

Okay.
36:50 - 36:53

Now we have to submit the job in order
36:53 - 36:56

to see the output of it. So I can click
36:56 - 36:59

on "Submit" and choose the previous job
36:59 - 37:01

which is the one that I have showed you
37:01 - 37:02

before.
37:02 - 37:05

And then let's review its output
37:05 - 37:07

together here.
37:07 - 37:10

So if I go to the jobs,
37:10 - 37:15

if I go to MS Learn, maybe it is training?
37:15 - 37:18

I think it's the one that lasted the
37:18 - 37:21

longest, this one here.
37:21 - 37:24

So here I can see
37:24 - 37:27

the job output, what happened inside
37:27 - 37:30

the model, as you can see.
37:30 - 37:34

So the normalization we have seen
37:34 - 37:37

before, the split data, I can preview it.
37:37 - 37:39

The result one or the result two as it
37:39 - 37:42

splits the data to 70 here and
37:42 - 37:44

thirty percent here.
37:44 - 37:47

Um, I can see the score model, which is
37:47 - 37:49

something that we need
37:49 - 37:52

to review.
37:52 - 37:57

Inside the scroll model, uh, from
37:57 - 37:58

here,
37:58 - 38:01

we can see that...
38:01 - 38:04

let's get back here.
38:06 - 38:08

This is the data that the model has
38:08 - 38:12

been scored and this is a scoring output.
38:12 - 38:15

So it says "code label true", and he is
38:15 - 38:17

not diabetic, so this is,
38:17 - 38:19

um,
38:19 - 38:22

a wrong prediction, let's say.
38:22 - 38:24

For this one it's true and true, and this
38:24 - 38:27

is a good, like, what do you say,
38:27 - 38:29

prediction, and the probabilities of this
38:29 - 38:30

score,
38:30 - 38:33

which means the certainty of our model
38:33 - 38:37

of that this is really true. It's 80 percent.
38:37 - 38:39

For this one it's 75 percent.
38:39 - 38:43

So these are some cool metrics that we
38:43 - 38:45

can review to understand how our model
38:45 - 38:48

is performing. It's performing good for
38:48 - 38:49

now.
38:49 - 38:53

Let's check our evaluation model.
38:53 - 38:57

So this is the extra one that I told you
38:57 - 39:00

about. Instead of the
39:00 - 39:02

score model only, we are going to add
39:02 - 39:04

what evaluate model
39:04 - 39:07

after it. So here
39:07 - 39:09

we're going to go to our Asset Library
39:09 - 39:12

and we are going to choose the evaluate
39:12 - 39:15

model,
39:15 - 39:18

and we are going to put it here, and we
39:18 - 39:20

are going to connect it, and we are going
39:20 - 39:23

to submit the job using the same name of
39:23 - 39:25

the job that we used previously.
39:25 - 39:30

Let's review it. Also, so, after it
39:30 - 39:33

finishes, you will find it here. So I have
39:33 - 39:35

already done it before, this is how I'm
39:35 - 39:37

able to see the output.
39:37 - 39:40

So let's see
39:40 - 39:43

what is the output of this
39:43 - 39:46

evaluation process.
39:46 - 39:50

Here it mentioned to you that there are
39:50 - 39:51

some matrix,
39:51 - 39:55

like the confusion matrix, which Carlotta
39:55 - 39:57

told you about, there is the accuracy, the
39:57 - 40:00

precision, the recall, and F1 Score.
40:00 - 40:02

Every matrix gives us some insight about
40:02 - 40:05

our model. It helps us to understand it
40:05 - 40:09

more, and, um,
40:09 - 40:11

understand if it's overfitting, if
40:11 - 40:12

it's good, if it's bad, and really really,
40:12 - 40:16

like, understand how it's working.
40:17 - 40:20

Now I'm just waiting for the job to load.
40:20 - 40:23

Until it loads,
40:23 - 40:24

um,
40:24 - 40:26

we can continue
40:26 - 40:29

to work on our
40:29 - 40:32

model. So I will go to my designer. I'm
40:32 - 40:35

just going to confirm this.
40:35 - 40:38

And I'm going to continue working on it
40:38 - 40:40

from
40:40 - 40:42

where we have stopped. Where have we
40:42 - 40:44

stopped?
40:44 - 40:46

we have stopped on the evaluate model. So
40:46 - 40:49

I'm going to choose this one.
40:49 - 40:53

And it says here
40:54 - 40:57

"select experiment", "create inference
40:57 - 40:58

pipeline", so
40:58 - 41:01

I am going to go to the jobs,
41:01 - 41:05

I'm going to select my experiment.
41:05 - 41:07

I hope this works.
41:07 - 41:10

Okay. Finally, now we have our
41:10 - 41:12

evaluate model output.
41:12 - 41:15

Let's preview evaluation results
41:15 - 41:19

and, uh...
41:19 - 41:22

come on.
41:26 - 41:28

Finally. Now we can create our inference
41:28 - 41:31

pipeline. So,
41:31 - 41:34

I think it says that...
41:34 - 41:35

um...
41:35 - 41:38

select the experiment, then select MS
41:38 - 41:39

Learn. So,
41:39 - 41:43

I am just going to select it,
41:43 - 41:48

and finally. Now we can, the ROC curve, we
41:48 - 41:51

can see it, that the true positive rate
41:51 - 41:54

and the force was integrate. The false
41:54 - 41:57

positive rate is increasing with time,
41:57 - 42:01

and also the true positive rate. True
42:01 - 42:04

positive is something that it predicted,
42:04 - 42:07

that it is, uh, positive it has diabetes,
42:07 - 42:09

and it's really...it's really true.
42:09 - 42:13

The person really has diabetes. Okay. And
42:13 - 42:15

for the false positive, it predicted that
42:15 - 42:18

someone has diabetes and someone doesn't
42:18 - 42:21

have it. This is what true position and
42:21 - 42:25

false positive means. This is the record
42:25 - 42:28

curve, so we can review the metrics
42:28 - 42:32

of our model. This is the lift curve. I
42:32 - 42:36

can change the threshold of my confusion
42:36 - 42:38

matrix here
42:38 - 42:39

and if Carlotta wants to add
42:39 - 42:44

anything about the...the graphs,
42:44 - 42:47

you can do so.
42:50 - 42:53

[CARLOTTA]: Um, yeah, so I just
42:53 - 42:55

wanted to...if you go...yeah.
42:55 - 42:57

I just wanted to comment for the
42:57 - 43:00

RSC curve, that actually from this
43:00 - 43:04

graph, the metric which usually we're
43:04 - 43:07

going to compute is the area under
43:07 - 43:10

under the curve. And this coefficient or
43:10 - 43:12

metric,
43:12 - 43:15

it's a coefficient—
43:15 - 43:18

it's a value that could span from
43:18 - 43:23

zero to one and the the highest is...
43:23 - 43:26

...the highest is the the score.
43:26 - 43:29

So the closest one,
43:29 - 43:33

so the the highest is the amount of
43:33 - 43:35

area under this curve.
43:35 - 43:40

The highest performance
43:40 - 43:43

we've got from from our model.
43:43 - 43:46

And another thing is what John is
43:46 - 43:50

playing with. So this threshold for
43:50 - 43:51

the logistic
43:51 - 43:56

regression is the threshold used by the
43:56 - 44:00

model to, um,
44:00 - 44:03

to predict if the category is zero or
44:03 - 44:05

one. So if the probability—the
44:05 - 44:09

probability score is above the threshold,
44:09 - 44:12

then the category will be predicted as
44:12 - 44:15

one, while if the probability is
44:15 - 44:17

below the threshold, in this case, for
44:17 - 44:21

example, 0.5, the category is predicted
44:21 - 44:24

as zero. So that's why it's very
44:24 - 44:26

important to choose the threshold,
44:26 - 44:29

because the performance really can vary,
44:29 - 44:31

um,
44:31 - 44:34

with this threshold value.
44:34 - 44:41

[JOHN]: Thank you so much, Carlotta, and
44:41 - 44:44

as I mentioned now, we are going to
44:44 - 44:47

create our inference pipeline. So we are
44:47 - 44:49

going to select the latest one, which I
44:49 - 44:51

already have it opened here. This is the
44:51 - 44:53

one that we were reviewing together. This
44:53 - 44:56

is where we have stopped, and we're going
44:56 - 44:58

to create an inference pipeline. We are
44:58 - 45:00

going to choose a real-time inference
45:00 - 45:03

pipeline, okay?
45:03 - 45:05

From where I can find this? Here, as it
45:05 - 45:08

says, "Real-time inference pipeline".
45:08 - 45:11

So it's gonna add some things to my
45:11 - 45:12

workspace. It's going to add the
45:12 - 45:14

web service input, it's gonna
45:14 - 45:15

have the web service output,
45:15 - 45:16

because we will be creating
45:16 - 45:18

it as a web service to access
45:18 - 45:20

it from the internet.
45:20 - 45:22

What are we going to do? We're going
45:22 - 45:25

to remove this diabetes data, okay?
45:25 - 45:28

And we are going to get a component
45:28 - 45:29

called "Web
45:29 - 45:33

input" and...let me check
45:33 - 45:36

it's "enter data manually".
45:36 - 45:38

We have...we already have that with input
45:38 - 45:40

present.
45:40 - 45:42

So we are going to get the entire data
45:42 - 45:43

manually,
45:43 - 45:45

and we're going to collect it—to connect
45:45 - 45:50

it as it was connected before, like that.
45:50 - 45:53

And also, I am not going to directly take
45:53 - 45:55

the web service—sorry, escort model to
45:55 - 45:58

the web service output like that.
45:58 - 46:00

I'm going to delete this
46:00 - 46:04

and I'm going to execute a python script
46:04 - 46:06

before
46:06 - 46:10

I display my result.
46:11 - 46:12

So,
46:12 - 46:17

this will be connected like...
46:19 - 46:20

So...
46:20 - 46:24

the other way around.
46:24 - 46:28

And from here, I am going to connect this
46:28 - 46:31

with that and there is some data that
46:31 - 46:33

we will be getting from the node, or from
46:33 - 46:38

the explanation here, and this is the
46:38 - 46:41

data that will be entered to our
46:41 - 46:44

website manually. Okay? This is instead of
46:44 - 46:47

the data that we have been getting from
46:47 - 46:50

our data set that we created. So I'm just
46:50 - 46:52

going to double click on it and choose
46:52 - 46:56

CSV, and I will choose "it has headers",
46:56 - 47:01

and I will take or copy this content and
47:01 - 47:03

put it there, okay?
47:03 - 47:06

So let's do it.
47:06 - 47:08

I think I have to click on edit code, now
47:08 - 47:11

I can click on "Save", and I can close it.
47:11 - 47:13

Another thing which is the python script
47:13 - 47:17

that we will be executing.
47:17 - 47:18

Um, yeah. We
47:18 - 47:19

are going to remove this, also.
47:19 - 47:21

We don't need the evaluate model
47:21 - 47:24

anymore, so we are going to remove it.
47:24 - 47:26

The python script
47:26 - 47:29

that I will be executing,
47:29 - 47:33

I can find it here.
47:33 - 47:36

Um, yeah.
47:36 - 47:39

This is the python script that we will
47:39 - 47:42

execute. And it says to you that this
47:42 - 47:44

code selects only the patient's ID
47:44 - 47:45

the score label, the score
47:45 - 47:48

probability and return—returns them to
47:48 - 47:50

the web service output. So we don't want
47:50 - 47:52

to return all the columns, as we have
47:52 - 47:53

seen previously,
47:53 - 47:56

that determines everything,
47:56 - 47:57

so
47:57 - 47:59

we want to return certain stuff, the
47:59 - 48:03

stuff that we will use inside our
48:03 - 48:06

endpoint. So I'm just going to select
48:06 - 48:08

everything and delete it, and
48:08 - 48:11

paste the code that I have gotten from
48:11 - 48:14

the, uh,
48:14 - 48:16

the Microsoft Learn docs.
48:16 - 48:19

Now I can click on "Save", and I can close
48:19 - 48:20

this.
48:20 - 48:21

Let me check something,
48:21 - 48:23

I don't think it saved.
48:23 - 48:25

It's saved, but the display is
48:25 - 48:26

wrong, okay.
48:26 - 48:30

And now I think everything is good to go.
48:30 - 48:33

I'm just gonna double-check everything.
48:33 - 48:36

So, uh, yeah. We are gonna change the name
48:36 - 48:39

of this
48:39 - 48:41

pipeline, and we are gonna call it
48:41 - 48:43

"Predict
48:43 - 48:46

diabetes", okay?
48:46 - 48:50

Now let's close it, and
48:50 - 48:56

I think that we are good to go. So,
48:56 - 48:59

um,
49:00 - 49:04

Okay, I think everything is good for us.
49:06 - 49:08

I just want to make sure of something.
49:08 - 49:09

Is the data...
49:09 - 49:12

it's correct, the data is...yeah,
49:12 - 49:14

it's correct.
49:14 - 49:16

Okay, now I can run the pipeline. Let's
49:16 - 49:18

submit.
49:18 - 49:21

Select an "existing" pipeline, and we're
49:21 - 49:22

going to choose
49:22 - 49:24

the "ms-learn-diabetes-training",
49:24 - 49:25

which is the pipeline
49:25 - 49:27

that we have been working on
49:27 - 49:32

from the beginning of this module.
49:32 - 49:34

I don't think that this is going to take
49:34 - 49:36

much time. So we have submitted the job
49:36 - 49:37

and it's running.
49:37 - 49:40

Until the job ends, we are going to set
49:40 - 49:42

everything
49:42 - 49:46

for deploying a service.
49:46 - 49:49

In order to deploy a service,
49:49 - 49:51

um,
49:51 - 49:54

I have to have the job ready, so
49:54 - 49:56

until it's ready, you can't deploy it. So
49:56 - 49:58

let's go to the job—the job details from
49:58 - 50:01

here, okay?
50:01 - 50:05

And until it finishes,
50:05 - 50:07

Carlotta, do you think that we can have
50:07 - 50:09

the questions, and then we can get back
50:09 - 50:13

to the job I'm deploying it?
50:14 - 50:15

[CARLOTTA]: Yeah, yeah, yeah.
50:15 - 50:17

So yeah, guys, if you
50:17 - 50:19

have any questions
50:19 - 50:24

on what you just saw here
50:24 - 50:27

or into introductions, feel free. This is
50:27 - 50:30

a good moment, we can...we can discuss
50:30 - 50:34

now, while we wait for this job to
50:34 - 50:36

finish.
50:36 - 50:39

[JOHN]: Uh, and....
50:39 - 50:40

can...
50:40 - 50:45

we have the knowledge check one? Or, like,
50:45 - 50:46

what do you think?
50:46 - 50:48

[CARLOTTA]: Yeah, we can also go
50:48 - 50:50

to the knowledge check.
50:50 - 50:51

Um...
50:51 - 50:56

Yeah, okay. So let me share my screen.
50:56 - 50:59

Yeah, so if you have not any questions
50:59 - 51:02

for us, we can maybe propose some
51:02 - 51:05

questions to you that you can,
51:05 - 51:06

um,
51:06 - 51:09

check our knowledge so far and you
51:09 - 51:13

can maybe answer to these questions
51:13 - 51:15

via chat.
51:15 - 51:18

So we have...do you see my screen, can
51:18 - 51:20

you see my screen?
51:20 - 51:22

[JOHN]: Yes.
51:22 - 51:24

[CARLOTTA]: So, John, I think I will
51:24 - 51:25

read this
51:25 - 51:29

question aloud and ask it to you, okay? So
51:29 - 51:32

are you ready to answer?
51:32 - 51:34

[JOHN:] Yes I am.
51:34 - 51:35

[CARLOTTA]: So...
51:35 - 51:37

you're using Azure Machine Learning
51:37 - 51:40

designer to create a training pipeline
51:40 - 51:43

for a binary classification model, so
51:43 - 51:45

what we were doing in our demo,
51:45 - 51:48

right? And you have added a data set
51:48 - 51:52

containing features and labels, a Two-
51:52 - 51:54

Class Decision Forest module. So we used
51:54 - 51:57

a logistic regression model our...
51:57 - 51:58

um, in our example.
51:58 - 51:59

Here, we're using a Two-
51:59 - 52:01

Class Decision Forest model.
52:01 - 52:04

And, of course, a Train Model module. You
52:04 - 52:07

plan now to use score model and evaluate
52:07 - 52:09

model modules to test the train model
52:09 - 52:12

with the subset of the data set that
52:12 - 52:14

wasn't used for training.
52:14 - 52:16

But what are we missing? So what's
52:16 - 52:19

another model you should add? We have
52:19 - 52:22

three options: we have Join Data, we have
52:22 - 52:25

Split Data, or we have Select Columns
52:25 - 52:27

in Dataset.
52:27 - 52:28

So
52:28 - 52:32

while John thinks about the answer,
52:32 - 52:34

go ahead and,
52:34 - 52:35

um,
52:35 - 52:38

answer yourself. So give us your
52:38 - 52:40

guess.
52:40 - 52:42

Put it in the chat, or just come off mute
52:42 - 52:45

and answer.
52:47 - 52:48

"A", "B".
52:48 - 52:50

[JOHN]: Yeah, what do you
52:50 - 52:51

is the correct
52:51 - 52:54

answer for this one? I need something to
52:54 - 52:57

uh...I have to score my model, and I
52:57 - 53:00

have to evaluate it, so I need
53:00 - 53:03

something to enable me to do these two
53:03 - 53:05

things.
53:07 - 53:08

[CARLOTTA]: I think it's something
53:08 - 53:11

you showed us in your pipeline,
53:11 - 53:13

right John?
53:13 - 53:17

[JOHN]: Of course I did.
53:23 - 53:25

[CARLOTTA]: Uh, we have no guesses
53:25 - 53:28

in the chat?
53:28 - 53:30

[JOHN]: Can someone...
53:30 - 53:32

Someone want to guess?
53:32 - 53:36

[CARLOTTA]: We have a "B".
53:36 - 53:39

[JOHN]: Uh, maybe.
53:39 - 53:43

So, in order to do this,
53:43 - 53:46

I mentioned the
53:46 - 53:49

the module that is going to help me
53:49 - 53:53

to divide my data into two things:
53:53 - 53:54

70 percent for the
53:54 - 53:56

the training and 30 percent for the
53:56 - 53:59

evaluation. So what did I use? I used
53:59 - 54:02

split data, because this is what is going
54:02 - 54:05

to split my data randomly into training
54:05 - 54:08

data and validation data. So the correct
54:08 - 54:12

answer is "B", and good job. Thank you
54:12 - 54:14

for participating.
54:14 - 54:17

Next question, please.
54:17 - 54:19

[CARLOTTA]: Yes, "B" is the correct
54:19 - 54:23

answer, so thanks, John,
54:23 - 54:26

for explaining to us the correct
54:26 - 54:27

one.
54:27 - 54:30

And we want to go with question two?
54:30 - 54:33

[JOHN]: Yeah, so,
I'm going to ask you now,
54:33 - 54:36

Carlotta. You use Azure Machine Learning
54:36 - 54:38

designer to create a training pipeline
54:38 - 54:40

for your classification model.
54:40 - 54:44

What must you do before you deploy this
54:44 - 54:46

model as a service?
You have to do
54:46 - 54:47

something before
54:47 - 54:47

you deploy it.
54:47 - 54:50

What do you think is the correct answer?
54:50 - 54:53

Is it "A", "B", or "C"?
54:53 - 54:55

Share your thoughts with—
54:55 - 54:57

with us in the chat and
54:57 - 55:00

and I'm also going to give you some
55:00 - 55:03

minutes to think of it before I
55:03 - 55:06

tell you about it.
55:06 - 55:08

[CARLOTTA]: Yeah so let me go
55:08 - 55:09

through the possible
55:09 - 55:12

answers, right? So we have A: "Create an
55:12 - 55:15

inference pipeline from the training
55:15 - 55:16

pipeline";
55:16 - 55:19

B: we have "Add an Evaluate Model
55:19 - 55:22

module to the training pipeline; and then
55:22 - 55:25

three, we have "Clone the training
55:25 - 55:28

pipeline with a different name".
55:30 - 55:32

So what do you think is the correct
55:32 - 55:34

answer? "A", "B", or "C"?
55:34 - 55:37

Also this time, I think it's something
55:37 - 55:39

we mentioned both in the decks and in
55:39 - 55:42

the demo right?
55:43 - 55:45

[JOHN]: Yes it is,
55:45 - 55:47

it's something that I have done
55:47 - 55:50

like two, like five minutes ago.
55:52 - 55:57

It's real-time, real-time.
55:57 - 55:59

[CARLOTTA]: Um,
55:59 - 56:02

yeah, so, think about...you need to deploy
56:02 - 56:05

the model as a service. So if I'm
56:05 - 56:08

going to deploy model,
56:08 - 56:10

I cannot evaluate the model
56:10 - 56:13

after deploying it, right, because I
56:13 - 56:15

cannot go into production if I'm not
56:15 - 56:18

sure, I'm not satisfied with my model, and
56:18 - 56:20

I'm not sure that my model is performing
56:20 - 56:20

well.
56:20 - 56:23

So that's why I would go with,
56:23 - 56:24

um,
56:24 - 56:30

I would...exclude "B" from my
56:30 - 56:32

answer.
56:32 - 56:33

While
56:33 - 56:37

thinking about "C", uh, I don't see you—I
56:37 - 56:39

didn't see you, John, cloning the
56:39 - 56:41

training Pipeline with a different name,
56:41 - 56:45

so I don't think this is the
56:45 - 56:47

right answer.
56:47 - 56:50

While I've seen you creating an
56:50 - 56:53

inference pipeline from the
56:53 - 56:55

training pipeline, and you just converted
56:55 - 56:59

it using a one-click button, right?
56:59 - 57:01

[JOHN]: Yeah, that's correct.
57:01 - 57:04

So this is the right answer.
57:04 - 57:07

Good job. So I created an inference
57:07 - 57:11

real-time pipeline, and it has done.
57:11 - 57:13

It finished—it finished, the job is
57:13 - 57:18

finished. So we can now deploy.
57:18 - 57:19

And...
57:19 - 57:22

Yeah [LAUGHS].
57:22 - 57:25

Exactly, like, on time.
57:25 - 57:28

Like, it finished two seconds...
57:28 - 57:31

three, four seconds ago [LAUGHS].
57:31 - 57:33

So, uh,
57:33 - 57:36

until, um...
57:36 - 57:40

This is my job review, so
57:40 - 57:43

this is the job details that I
57:43 - 57:46

have already submitted, it's just opening,
57:46 - 57:47

and once it opens...
57:47 - 57:50

um...
57:50 - 57:53

I don't know why it's so heavy
57:53 - 57:57

today, it's not like that usually.
57:58 - 58:00

[CARLOTTA]: Yeah, it's probably because
58:00 - 58:01

you are also
58:01 - 58:06

showing your your screen on Teams,
58:06 - 58:08

so that's the bandwidth of your
58:08 - 58:09

connection.
58:09 - 58:11

[JOHN]: Let me do something here
58:11 - 58:14

because...yeah finally.
58:14 - 58:16

I can switch to my mobile internet if it
58:16 - 58:19

did it again. So I will click on "Deploy",
58:19 - 58:21

it's that simple. I'll just click on
58:21 - 58:23

"Deploy" and...
58:23 - 58:26

I am going to deploy a new real-time
58:26 - 58:28

endpoint.
58:28 - 58:30

So what I'm going to name it?
58:30 - 58:32

Description and the compute type.
58:32 - 58:33

Everything is already mentioned
58:33 - 58:34

for me here,
58:34 - 58:36

so I'm just gonna copy and paste it,
58:36 - 58:39

because we...we are running
58:39 - 58:41

out of time.
58:41 - 58:44

So it's all Azure Container Instance,
58:44 - 58:46

not Azure Kubernetes Service,
58:46 - 58:49

which is a containerization service also.
58:49 - 58:51

Both are for containerization, but this
58:51 - 58:54

gives you something, and this gives you
something else.
58:54 - 58:55

For the advanced options,
58:55 - 58:57

it doesn't say for us to do anything, so
58:57 - 59:00

we are just gonna click on "Deploy",
59:00 - 59:05

and now we can test our endpoint from
59:05 - 59:08

the endpoints that we can find here, so
59:08 - 59:11

it's in progress. If I go here
59:11 - 59:14

under the assets, I can find something
59:14 - 59:17

called "Endpoints", and I can find the
59:17 - 59:19

real-time ones and the batch endpoints.
59:19 - 59:22

And we have created a real-time endpoint,
59:22 - 59:25

so we are going to find it under this
59:25 - 59:30

title. So if I click on it, I should
59:30 - 59:33

be able to test it once it's ready.
59:33 - 59:37

It's still loading, but this is the
59:37 - 59:41

input, and this is the output that we
59:41 - 59:45

will get back, so if I click on "Test"...
59:45 - 59:47

and from here,
59:47 - 59:50

I will input some data to the
59:50 - 59:51

endpoint,
59:51 - 59:55

which are: the patient information; the
59:55 - 59:57

columns that we have already seen in our
59:57 - 60:00

data set; the patient ID; the pregnancies.
60:00 - 60:04

And of course, of course I'm not gonna
60:04 - 60:06

enter the label that I'm trying to
60:06 - 60:08

predict, so I'm not going to give him if
60:08 - 60:10

the patient is diabetic or not. This
60:10 - 60:13

endpoint is to tell me this.
60:13 - 60:15

The endpoint, or the URL,
60:15 - 60:16

is going to give me
60:16 - 60:18

back this information, whether someone
60:18 - 60:23

has diabetes, or he doesn't. So if I input
60:23 - 60:25

this data, I'm just going to copy it,
60:25 - 60:28

and go to my endpoint, and click on
60:28 - 60:30

"Test", I'm gonna give the result pack,
60:30 - 60:32

which are the three columns that we have
60:32 - 60:36

defined inside our python script: the
60:36 - 60:38

patient ID, the diabetic prediction, and
60:38 - 60:41

the probability—the certainty of whether
60:41 - 60:46

someone is diabetic or not based on the...
60:46 - 60:49

uh...based on the prediction.
60:49 - 60:51

So that's it.
60:51 - 60:54

And, uh, I think that this is a really
60:54 - 60:57

simple step to do, you can do it on your
60:57 - 60:58

own, you can test it.
60:58 - 61:01

And I think that I have finished, so
61:01 - 61:03

thank you.
61:03 - 61:04

[CARLOTTA]: Uh, yes,
61:04 - 61:06

we are running out of time
61:06 - 61:10

I just wanted to thank you, John, for
61:10 - 61:12

this demo, for going through all these
61:12 - 61:13

steps to
61:13 - 61:17

um, create, train a classification model,
61:17 - 61:20

and also deploy it as a predictive
61:20 - 61:23

service. And I encourage you all to go
61:23 - 61:25

back to the learn module
61:25 - 61:28

and, um, deepen all these topics
61:28 - 61:32

at your own pace, and also maybe
61:32 - 61:35

uh do this demo on your own, on your
61:35 - 61:37

subscription on your Azure for Student
61:37 - 61:39

subscription. Um...
61:39 - 61:43

And I would also like to recall that
61:43 - 61:46

this is part of a series of study
61:46 - 61:50

sessions of Cloud Skill Challenge study
61:50 - 61:51

sessions,
61:51 - 61:54

so you will have more in the...
61:54 - 61:58

in the following days, and this is for
61:58 - 62:00

you to prepare, let's say, to help you
62:00 - 62:05

in taking the Cloud Skills Challenge,
62:05 - 62:07

which collect
62:07 - 62:11

a very interesting learn module that you
62:11 - 62:15

can use to scale up on various topics,
62:15 - 62:18

and some of them are focused on AI and
62:18 - 62:21

ML. So if you are interested in these
62:21 - 62:23

topics, you can select these these learn
62:23 - 62:25

modules.
62:25 - 62:28

So let me also copy
62:28 - 62:30

the link, the short link to the
62:30 - 62:32

challenge in the chat. Remember that
62:32 - 62:35

you have time until the 13th of
62:35 - 62:38

September to take the challenge. And also
62:38 - 62:40

remember that in October, on the 7th of
62:40 - 62:43

October, you have the—you can join the
62:43 - 62:47

student—the Student Developer Summit,
62:47 - 62:50

which is, uh, which will be a virtual or
62:50 - 62:53

in...for some for some cases a hybrid
62:53 - 62:56

event, so stay tuned, because you will
62:56 - 62:59

have some surprises in the following
62:59 - 63:01

days. And if you want to learn more about
63:01 - 63:03

this event you can check the Microsoft
63:03 - 63:08

Imaging Cap Twitter page and stay tuned.
63:08 - 63:11

So thank you everyone for joining
63:11 - 63:13

this session today, and thank you very
63:13 - 63:16

much, John, for co-hosting with this
63:16 - 63:20

session with me. It was a pleasure.
63:21 - 63:23

[JOHN]: Thank you so much,
63:23 - 63:24

Carlotta, for having me
63:24 - 63:26

with you today, and thank you for
63:26 - 63:28

giving me this opportunity to
63:28 - 63:30

be with you here.
63:30 - 63:32

[CARLOTTA]: Great, thank you.
63:32 - 63:33

[JOHN]: Yeah, I hope that we
63:33 - 63:35

work again in the future.
63:35 - 63:38

[CARLOTTA]: Sure, I hope so as well.
63:38 - 63:41

Um, so, thank you everyone.
63:41 - 63:44

And have a nice rest of your day.
63:44 - 63:46

Bye-bye. Speak to you soon.
63:46 - 63:49

[JOHN]: Bye.

Title:: Create a Classification Model with Azure Machine Learning Designer [English]
Description:: more » « less
Video Language:: English
Duration:: 01:03:50

	OEVIDEOS edited English subtitles for Create a Classification Model with Azure Machine Learning Designer [English]
	OEVIDEOS edited English subtitles for Create a Classification Model with Azure Machine Learning Designer [English]
	OEVIDEOS edited English subtitles for Create a Classification Model with Azure Machine Learning Designer [English]

English subtitles

Revisions Compare revisions

Revision 3 Edited

OEVIDEOS
Revision 2 Edited

OEVIDEOS
Revision 1 Uploaded

OEVIDEOS

	Revision Number	Author	Created
	3	OEVIDEOS
	2	OEVIDEOS
	1	OEVIDEOS

Create a Classification Model with Azure Machine Learning Designer [English]

Revisions Compare revisions

Our website uses cookies

Operating cookies (Required)