[CARLOTTA]: Great, so I think we can start
since the meeting is recorded, so if
everyone, uh, jump-jumps in later, they
can watch the recording.
So, hi everyone and welcome to this
um, Cloud Skill Challenge study session
around a create classification models
with Azure Machine learning designer.
So today I'm thrilled to be here with
John. Uh, John do you mind
introduce briefly yourself?
[JOHN]: Uh, thank you Carlotta.
Hello everyone.
Welcome to our workshop today. I hope
that you are all excited for it. I am
John Aziz, a gold Microsoft Learn student
ambassador, and I will be here with, uh,
Carlotta to do the practical part
about this module of the Cloud Skills
Challenge. Thank you for having me.
[CARLOTTA]: Perfect, thanks John.
So for those who
don't know me, I'm Carlotta Castelluccio,
based in Italy and focused on AI
machine learning technologies and about
the use in education.
Um, so,
um this Cloud Skill Challenge study
session is based on a learn module, a
dedicated learn module. I sent to you, uh
the link to this module, uh, in the chat
in a way that you can follow along the
module if you want, or just have a look at
the module later at your own pace.
Um...
So, before starting I would also like to
remember to remember you, uh, the code of
conduct and guidelines of our student
ambassadors community. So please during this
meeting be respectful and inclusive and
be friendly, open, and welcoming and
respectful of other-each other
differences.
If you want to learn more about the code
of conduct, you can use this link in the
deck: aka.ms/SACoC.
And now we are,
um, we are ready to to start our session.
So as we mentioned it we are going to
focus on classification models and Azure ML,
uh, today. So, first of all, we are going
to, um, identify, uh, the kind of
um, of scenarios in which you should
choose to use a classification model.
We're going to introduce Azure Machine
Learning and Azure Machine Designer.
We're going to understand, uh, which are
the steps to follow, to create a
classification model in Azure Machine
Learning, and then John will,
um,
lead an amazing demo about training and
publishing a classification model in
Azure ML Designer.
So, let's start from the beginning. Let's
start from identifying classification
machine learning scenarios.
So, first of all, what is classification?
Classification is a form of machine
learning that is used to predict which
category or class an item belongs to. For
example, we might want to develop a
classifier able to identify if an
incoming email should be filtered or not
according to the style, the sender, the
length of the email, etc.
In this case, the
characteristics of the email are the
features.
And the label is a classification of
either a zero or one, representing a spam
or non-spam for the incoming email. So
this is an example of a binary
classifier. If you want to assign
multiple categories to the incoming
email like work letters, love letters,
complaints, or other categories, in this
case a binary classifier is no longer
enough, and we should develop a
multi-class classifier. So classification
is an example of what is called
supervised machine learning
in which you train a model using data
that includes both the features and
known values for label
so that the model learns to fit the
feature combinations to the label. Then,
after training has been completed, you
can use the train model to predict
labels for new items for-for which the
label is unknown.
But let's see some examples of scenarios
for classification machine learning
models. So, we already mentioned an
example of a solution in which we would
need a classifier, but let's explore
other scenarios for classification in
other industries. For example, you can use
a classification model for a health
clinic scenario, and use clinical data to
predict whether patient will become sick
or not.
You can use, um...
[NO AUDIO]
[JOHN]: Carlotta, you are muted.
[CARLOTTA]: Oh, sorry.
So, when I became muted, it's a
long time, or?
[JOHN]: You can use-you can use, uh
some models for classification.
For example, you can use...
You were saying this.
[CARLOTTA]: Uh, so I was in this deck,
or the previous one?
[JOHN]: This one, you have been muted
for, uh, one second [LAUGHS].
[CARLOTTA]: Okay, okay perfect, perfect.
Uh, yeah I was talking...sorry for
that. So, I was talking about the possible
scenarios in which you,
you can use a classification model. Like
have clinic scenario, financial scenario,
or the third one is business type of
scenario. You can use characteristics of
small business to predict if a new
venture will succeed or not, for
example. And these are all types of
binary classification.
Uh, but today we are also going to talk
about Azure Machine Learning. So let's
see.
What is Azure Machine Learning? So
training and deploying an effective
machine learning model involves a lot of
work, much of it time-consuming and
resource intensive. So, Azure Machine
Learning is a cloud-based service that
helps simplify some of the tasks it
takes to prepare data, train a model, and
also deploy it as a predictive service.
So it helps that the scientists increase
their efficiency by automating many of
the time-consuming tasks associated to
creating and training a model.
And it enables them also to use
cloud-based compute resources that scale
effectively to handle large volumes of
data while incurring costs only when
actually used.
To use Azure Machine Learning, you,
first thing's first, you need to create a
workspace resource in your Azure
subscription, and you can then use these
workspace to manage data, compute
resources, code models and other
artifacts after you have created an
Azure Machine Learning workspace,
you can develop solutions with the
Azure Machine Learning service,
either with developer
tools or the Azure Machine Learning
studio web portal.
In particular,
Azure Machine Learning studio
is a web portal for machine
learning solutions in Azure, and it
includes a wide range of features and
capabilities that help data scientists
prepare data, train models, publish
predictive services, and monitor also
their usage.
So to begin using the web portal, you need
to assign the workspace
you created in the Azure portal
to the Azure Machine
Learning studio.
At its core, Azure Machine Learning is a
service for training and managing
machine learning models for which you
need compute resources on which to run
the training process.
Compute targets are, um, one of the main
basic concepts of Azure Machine Learning.
They are cloud-based resources on which
you can run model training and data
exploration processes.
So in Azure Machine Learning studio, you
can manage the compute targets for your
data science activities, and there are
four kinds of of compute targets you can
create. We have the compute instances,
which are vital machine set up for
running machine learning code during
development, so they are not designed for
production.
Then we have compute clusters, which are
a set of virtual machines that can scale
up automatically based on traffic.
We have inference clusters, which are
similar to compute clusters, but they are
designed for deployment, so they are
deployment targets for predictive
services that use trained models.
And finally, we have attached compute,
which are any compute target that you
manage yourself outside of Azure ML, like,
for example, virtual machines or Azure
data bricks clusters.
So we talked about Azure Machine
Learning, but we also mentioned-
mentioned Azure Machine Learning
designer. What is Azure Machine Learning
designer? So, in Azure Machine Learning
Studio, there are several ways to author
classification machine learning models.
One way is to use a visual interface, and
this visual interface is called designer,
and you can use it to train, test, and
also deploy machine learning models. And
the drag-and-drop interface makes use of
clearly defined inputs and outputs that
can be shared, reused, and also version
control.
And using the designer, you can identify
the building blocks or components needed
for your model, place and connect them on
your canvas, and run a machine learning
job.
So,
each designer project, so each project
in the designer is known as a pipeline.
And in the design, we have a left panel
for navigation and a canvas on your
right hand side in which you build your
pipeline visually. So pipelines let you
organize, manage, and reuse complex
machine learning workflows across
projects and users.
A pipeline starts with the data set from
which you want to train the model
because all begins with data when
talking about data science and machine
learning. And each time you run a
pipeline, the configuration of the
pipeline and its results are stored in
your workspace as a pipeline job.
So the second main concept of Azure
Machine Learning is a component. So, going
hierarchically from the pipeline, we can
say that each building block of a
pipeline is called a component.
In other words, an Azure Machine
Learning component encapsulates one step
in a machine learning pipeline. So, it's a
reusable piece of code with inputs and
outputs, something very similar to a
function in any programming language.
And in a pipeline project, you can access
data assets and components from the left
panels
Asset Library tab, as you can see
here in the screenshot in the deck.
So you can create data assets on using
an ADOC page called Data Page. And a data
asset is a reference to a data source
location.
So this data source location could be a
local file, a data store, a web file or
even an Azure open asset.
And these data assets will appear along
with standard sample data set in the
designers Asset Library.
Um.
Another basic concept of Azure ML is
Azure Machine Learning jobs.
So, basically, when you submit a pipeline,
you create a job which will run all the
steps in your pipeline. So a job executes
a task against a specified compute
target.
Jobs enable systematic tracking for your
machine learning experimentation in
Azure ML.
And once a job is created, Azure ML
maintains a run record, uh, for the
job.
Um, but, let's move to the classification
steps. So,
um, let's introduce how to create a
classification model in Azure ML, but you
will see it in more details in a
handsome demo that John will guide
through in a few minutes.
So, you can think of the steps to train
and evaluate a classification machine
learning model as four main steps. So
first of all, you need to prepare your
data. So, you need to identify the
features and the label in your data set,
you need to pre-process, so you need to
clean and transform the data as needed.
Then, the second step, of course, is
training the model.
And for training the model, you need to
split the data into two groups: a
training and a validation set.
Then you train a machine learning model
using the training data set and you test
the machine learning model for
performance using the validation data
set.
The third step is performance evaluation,
which means comparing how close the
model's predictions are to the known
labels and these lead us to compute some
evaluation performance metrics.
And then finally...
So, these three steps are not,
um, not performed every time in a
linear manner. It's more an iterative
process. But once you obtain, you achieve
a performance with which you are
satisfied, so you are ready to, let's say
go into production, and you can deploy
your train model as a predictive service
into a real-time, uh, to a real-time
endpoint. And to do so, you need to
convert the training pipeline into a
real-time inference pipeline, and then
you can deploy the model as an
application on a server or device so
that others can consume this model.
So let's start with the first step, which
is prepare data. Real-world data can contain
many different issues that can affect
the utility of the data and our
interpretation of the results. So also
the machine learning model that you
train using this data. For example, real-
world data can be affected by a bad
recording or a bad measurement, and it
can also contain missing values for some
parameters. And Azure Machine Learning
designer has several pre-built
components that can be used to prepare
data for training. These components
enable you to clean data, normalize
features, join tables, and more.
Let's come to training. So, to train a
classification model you need a data set
that includes historical features, so the
characteristics of the entity for which
one to make a prediction, and known label
values. The label is the class indicator
we want to train a model to predict.
And it's common practice to train a
model using a subset of the data while
holding back some data with which to
test the train model. And this enables
you to compare the labels that the model
predicts with the actual known labels in
the original data set.
This operation can be performed in the
designer using the split data component
as shown by the screenshot here in the...
in the deck.
There's also another component that you
should use, which is the score model
component to generate the predicted
class label value using the validation
data as input. So once you connect all
these components,
the component specifying the
model we are going to use, the split data
component, the trained model component,
and the score model component, you want
to run a new experiment in
Azure ML, which will use the data set
on the canvas to train and score a model.
After training a model, it is important,
we say, to evaluate its performance, to
understand how bad-how good sorry
our model is performing.
And there are many performance metrics
and methodologies for evaluating how
well a model makes predictions. The
component to use to perform evaluation
in Azure ML designer is called, as
intuitive as it is, Evaluate Model.
Once the job of training and evaluation
of the model is completed, you can review
evaluation metrics on the completed job
page by right clicking on the component.
In the evaluation results, you can also
find the so-called confusion Matrix that
you can see here in the right side of
this deck
A confusion matrix shows cases where
both the predicted and actual values
were one, the so-called true positives
at the top left and also cases where
both the predicted and the actual values
were zero, the so-called true negatives
at the bottom right. While the other
cells show cases where the predicting
and actual values differ,
called false positive and false
negatives, and this is an example of a
confusion matrix for a binary classifier.
While for a multi-class classification
model the same approach is used to
tabulate each possible combination of
actual and predictive value counts. So
for example, a model with three possible
classes would result in three times
three matrix.
The confusion matrix is also useful for
the matrix that can be derived from it,
like accuracy, recall, or precision.
We say that the last step is
deploying the train model to a real-time
endpoint as a predictive service. And in
order to automate your model into a
service that makes continuous
predictions, you need, first of all, to
create and then deploy an
inference pipeline. The process of
converting the training pipeline into a
real-time inference pipeline removes
training components and adds web service
inputs and outputs to handle requests.
And the inference pipeline performs...they
seem that the transformation is the
first pipeline, but for new data. Then it
uses the train model to infer or predict
label values based on its feature.
So, I think I've talked a lot for now
I would like to let John show us
something in practice with
the hands-on demo, so please, John, go
ahead, share your screen and guide us
through this demo of creating a
classification with
the Azure Machine Learning designer.
[JOHN]: Thank you so much Carlotta for
this interesting explanation of the
Azure ML designer. And now,
um, I'm going to start with you in the
practical demo part, so if you want to
follow along, go to the link that Carlotta
sent in the chat so you can do
the demo or the practical part with me.
I'm just going to share my screen...
and...
...go here. So, uh...
Where am I right now? I'm inside the
Microsoft Learn documentation. This is
the exercise part of this module, and we
will start by setting two things, which
are a prequisite for us to work inside
this module, which are the users group
and the Azure Machine Learning workspace,
and something extra which is the compute
cluster that Carlotta talked about. So I
just want to make sure that you all have
a resource group created inside your
portal inside your Microsoft Azure
platform. So this is my resource group.
Inside this is this Resource Group. I
have created an Azure Machine Learning
workspace. So I'm just going to access
the workspace that I have created
already from this link. I am going to
open it, which is the studio web URL, and
I will follow the steps. So what is this?
This is your machine learning workspace,
or machine learning studio. You can do a
lot of things here, but we are going to
focus mainly on the designer and the
data and the compute. So another
prerequisite here, as Carlotta told you,
we need some resources to power up the
classification, the processes that
will happen.
So, we have created this computing
cluster,
and we have set some presets for
it. So
where can you find this preset? You go
here. Under the create compute, you'll
find everything that you need to do. So
the size is the Standard DS11 Version 2,
and it's a CPU not GPU, because we don't
know the GPU, and we don't need a GPU.
Uh, it is ready for us to use.
The next thing which we will look into
is the designer. How can you access the
designer?
You can either click on this icon or
click on the navigation menu and click
on the designer for me.
Now I am inside my designer.
What we are going to do now is the
pipeline that Carlotta told you about.
And from where can I know these steps? If
you follow along in the learn module, you
will find everything that I'm doing
right now in detail, with screenshots
of course. So I'm going to create a new
pipeline, and I can do so by clicking on
this plus button.
It's going to redirect me to the
designer authoring the pipeline, uh, where
I can drag and drop data and components
that Carlotta told you the difference
between.
And here I am going to do some changes
to the settings. I am going to connect
this with my compute cluster that I
created previously so I can utilize it.
From here I'm going to choose this
compute cluster demo that I have showed
you before in the clusters here,
and I am going to change the name to
something more meaningful. Instead of
byline and the date of today I'm going
to name it Diabetes...
uh...
let's just check this training.
Let's say Training 0.1 or 01, okay?
And I am going to close this tab in
order to have a bigger place to work
inside because this is where we will
work, where everything will happen. So I
will click on close from here,
and I will go to the data and I will
create a new data set.
How can I create a new data set? There is
multiple options here you can find, from
local files, from data store, from web
files, from open data set, but I'm going
to choose from web files, as this is the
way we're going to create our data.
From here, the information of my data set
I'm going to get them from the Microsoft
Learn module. So if we go to the step
that says "Create a dataset",
under it, it illustrates that you can
access the data from inside the asset
library, and inside your asset library,
you'll find the data and find the
component. And I'm going to select
this link because this is where my data
is stored. If you open this link, you will
find this is a CSV file, I think.
Yeah. And you can...like, all the data are
here.
Now let's get back..
Um...
And you are going to do something
meaningful, but because I have already
created it before twice, so I'm gonna
add a number to the name
The data set is tabular and there is
the file, but this is a table, so we're
going to choose the table.
Data type
for data set type.
Now we will click on "Next". That's gonna
review, or display for you the content
of this file that you have
imported to this workspace.
And for these settings, these are
related to our file format.
So this is a delimited file, and it's not
plain text, it's not a Jason. The delimiter
is common, as we have seen that they
[INDISTINGUISHABLE]
So I'm choosing common
errors because the only the first five...
[INDISTINGUISHABLE]
...for example. Okay, uh, if you have any
doubts, if you have any problems, please
don't hesitate to write me
in the chat,
like, what is blocking you, and
me and Carlotta will try to help you,
like whenever possible.
And now this is the new preview for my
data set. I can see that I have an ID, I
have patient ID, I have pregnancies, I
have the age of the people,
I have the body mass, I think
whether they have diabetes or not, as a
zero and one. Zero indicates a negative,
the person doesn't have diabetes, and one
indicates a positive, that this person
has diabetes. Okay.
Now I'm going to click on "Next". Here I am
defining my schema. All the data types
inside my columns, the column names, which
columns to include, which to exclude. And
here we will include everything except
the path of the bath color. And we are
going to review the data types of each
column. So let's review this first one.
This is numbers, numbers, numbers, then it's the
integer. And this is,
um, like decimal..
...dotted...
decimal number. So we are going to choose
this data type.
And for this one
it says diabetic, and it's a zero under
one, and we are going to make it as
integers.
Now we are going to click on "Next" and
move to reviewing everything. This is
everything that we have defined together.
I will click on "Create".
And...
now the first step has ended. We have
gotten our data ready.
Now...what now? We're going to utilize the
designer...
um...power. We're going to drag and drop
our data set to create the pipeline.
So I have clicked on it and dragged it
to this space. It's gonna appear to you.
And we can inspect it by right clicking and
choose "Preview data"
to see what we have created together.
From here, you can see everything that we
have seen previously, but in more
details. And we are just going to close
this. Now what? Now we are gonna do the
processing that Carlota mentioned.
These are some instructions about the
data, about how you can look at them, how you
can open them but we are going to move
to the transformation or the processing.
So as Carlotta told you, like any data
for us to work on we have to do some
processing to it
to make it easy easier for the model to
be trained and easier to work with. So, uh,
we're gonna do the normalization. And
normalization meaning is, uh,
to scale our data, either down or up, but
we're going to scale them down,
and we are going to decrease, uh,
relatively decrease
the values, all the values, to work
with lower numbers. And if we are working
with larger numbers, it's going to take
more time. If we're working with smaller
numbers, it's going to take less time to
calculate them, and that's it. So
where can I find the normalized data? I
can find it inside my component.
So I will choose the component and
search for "Normalized data".
I will drag and drop it as usual and I
will connect between these two things
by clicking on this spot, this, uh,
circuit, and
drag and drop onto the next circuit.
Now we are going to define our
normalization method.
So I'm going to double click on the
normalized data.
It's going to open the settings for the
normalization
as a better transformation method, which is
a mathematical way
that is going to scale our data
according to.
We're going to choose min-max, and for
this one, we are going to choose "Use Zero",
for constant column we are going to
choose "True",
and we are going to define which columns
to normalize. So we are not going to
normalize the whole data set. We are
going to choose a subset from the data
set to normalize. So we're going to
choose everything except for the patient
ID and the diabetic, because the patient
ID is a number, but it's a categorical
data. It describes a patient, it's not a
number that I can sum. I can't say "patient
ID number one plus patient ID number two".
No, this is a patient and another
patient, it's not a number that I can do
mathematical operations on, so I'm not
going to choose it. So we will choose
everything as I said, except for the
diabetic and the patient ID. I will
click on "Save".
And it's not showing me a warning again,
everything is good.
Now I can click on "Submit"
and review my normalization output.
Um.
So, if you click on "Submit" here,
you will choose "Create new" and
set the name that is mentioned here
inside the notebook. So it tells you
to create a job and name it, name
the experiment "MS Learn Diabetes
Training", because you will continue
working on and building component later.
I have it already created, I am the, uh,
we can review it together. So let
me just open this in another tab. I think
I have it...
here.
Okay.
So, these are all the jobs that I have
created.
All the jobs there. Let's do this over.
These are all the jobs that I have
submitted previously.
And I think this one is the
normalization job, so let's see the
output of it.
As you can see, it says, uh, "Check mark", yes,
which means that it worked, and we can
preview it. How can I do that? Right click
on it, choose "Preview data",
and as you can see all the data are
scaled down
so everything is between zero
and, uh, one I think.
So everything is good for us. Now we
can move forward to the next step
which is to create the whole pipeline.
So, uh, Carlota told you that
we're going to use a classification
model to create this data set, so let
me just drag and drop everything
to get runtime and we're doing
[INDISTINGUISHABLE]
about everything by
[INDISTINGUISHABLE]
So,
as a result, we are going to explain
[INDISTINGUISHABLE]
Yeah. So, I'm going to give this split
data. I'm going to take the
transformation data to split data and
connect it like that.
I'm going to get three model
components because I want to train my
model,
and I'm going to put it right here.
Okay.
Let's just move it down there. Okay.
And we are going to use a classification
model,
a two class
logistic regression model.
So I'm going to give this algorithm to
enable my model to work
This is the untrained model, this is...
here.
The left...
the left, uh, circuit, I'm going to
connect it to the data set, and the right
one, we are going to connect it to
evaluate model.
Evaluate model...so let's search for
"Evaluate model" here.
So because we want to do what...we want to
evaluate our model and see how it it has
been doing. Is it good, is it bad?
Um, sorry...
This is...
this is down there
after the score model.
So we have to get the score model first,
so let's get it.
And this will take the trained model and
the data set
to score our model and see if it's
performing good or bad.
And...
um...
after that, we have finished
everything. Now, we are going to do the what?
The presets for everything.
As a starter, we will be splitting our
data. So
how are we going to do this, according to
what? To the split rules. So I'm going to
double-click on it and choose "Split rules".
And the percentage is
70 percent for the [INSISTINGUASHABLE]
and 30 percent of the
data for
the valuation or for the scoring, okay?
I'm going to make it a randomization, so
I'm going to split data randomly and the
seat is, uh,
132, uh 23 I think...yeah.
And I think that's it.
The split says why this holds, and that's
good.
Now for the next one, which is the train
model we are going to connect it as
mentioned here.
And we have done that and...then why
am I having here? Let's double click
on it...yeah. It has...it needs the
label column that I am trying to predict.
So from here, I'm going to choose
diabetic. I'm going to save.
I'm going to close this one.
So it says here,
the diabetic label, the model, it will
predict the zero and one, because this is
a binary classification algorithm, so
it's going to predict either this or
that.
And...
um...
I think that's everything to run the the
pipeline.
So everything is done, everything is good
for this one. We're just gonna leave it
for now, because this is the next
step.
Um, this will be put instead of the
score model, but let's...
let's delete it for now.
Okay.
Now we have to submit the job in order
to see the output of it. So I can click
on "Submit" and choose the previous job
which is the one that I have showed you
before.
And then let's review its output
together here.
So if I go to the jobs,
if I go to MS Learn, maybe it is training?
I think it's the one that lasted the
longest, this one here.
So here I can see
the job output, what happened inside
the model, as you can see.
So the normalization we have seen
before, the split data, I can preview it.
The result one or the result two as it
splits the data to 70 here and
thirty percent here.
Um, I can see the score model, which is
something that we need
to review.
Inside the scroll model, uh, from
here,
we can see that...
let's get back here.
This is the data that the model has
been scored and this is a scoring output.
So it says "code label true", and he is
not diabetic, so this is,
um,
a wrong prediction, let's say.
For this one it's true and true, and this
is a good, like, what do you say,
prediction, and the probabilities of this
score,
which means the certainty of our model
of that this is really true. It's 80 percent.
For this one it's 75 percent.
So these are some cool metrics that we
can review to understand how our model
is performing. It's performing good for
now.
Let's check our evaluation model.
So this is the extra one that I told you
about. Instead of the
score model only, we are going to add
what evaluate model
after it. So here
we're going to go to our Asset Library
and we are going to choose the evaluate
model,
and we are going to put it here, and we
are going to connect it, and we are going
to submit the job using the same name of
the job that we used previously.
Let's review it. Also, so, after it
finishes, you will find it here. So I have
already done it before, this is how I'm
able to see the output.
So let's see
what is the output of this
evaluation process.
Here it mentioned to you that there are
some matrix,
like the confusion matrix, which Carlotta
told you about, there is the accuracy, the
precision, the recall, and F1 Score.
Every matrix gives us some insight about
our model. It helps us to understand it
more, and, um,
understand if it's overfitting, if
it's good, if it's bad, and really really,
like, understand how it's working.
Now I'm just waiting for the job to load.
Until it loads,
um,
we can continue
to work on our
model. So I will go to my designer. I'm
just going to confirm this.
And I'm going to continue working on it
from
where we have stopped. Where have we
stopped?
we have stopped on the evaluate model. So
I'm going to choose this one.
And it says here
"select experiment", "create inference
pipeline", so
I am going to go to the jobs,
I'm going to select my experiment.
I hope this works.
Okay. Finally, now we have our
evaluate model output.
Let's preview evaluation results
and, uh...
come on.
Finally. Now we can create our inference
pipeline. So,
I think it says that...
um...
select the experiment, then select MS
Learn. So,
I am just going to select it,
and finally. Now we can, the ROC curve, we
can see it, that the true positive rate
and the force was integrate. The false
positive rate is increasing with time,
and also the true positive rate. True
positive is something that it predicted,
that it is, uh, positive it has diabetes,
and it's really...it's really true.
The person really has diabetes. Okay. And
for the false positive, it predicted that
someone has diabetes and someone doesn't
have it. This is what true position and
false positive means. This is the record
curve, so we can review the metrics
of our model. This is the lift curve. I
can change the threshold of my confusion
matrix here
and if Carlotta wants to add
anything about the...the graphs,
you can do so.
[CARLOTTA]: Um, yeah, so I just
wanted to...if you go...yeah.
I just wanted to comment for the
RSC curve, that actually from this
graph, the metric which usually we're
going to compute is the area under
under the curve. And this coefficient or
metric,
it's a coefficient—
it's a value that could span from
zero to one and the the highest is...
...the highest is the the score.
So the closest one,
so the the highest is the amount of
area under this curve.
The highest performance
we've got from from our model.
And another thing is what John is
playing with. So this threshold for
the logistic
regression is the threshold used by the
model to, um,
to predict if the category is zero or
one. So if the probability—the
probability score is above the threshold,
then the category will be predicted as
one, while if the probability is
below the threshold, in this case, for
example, 0.5, the category is predicted
as zero. So that's why it's very
important to choose the threshold,
because the performance really can vary,
um,
with this threshold value.
[JOHN]: Thank you so much, Carlotta, and
as I mentioned now, we are going to
create our inference pipeline. So we are
going to select the latest one, which I
already have it opened here. This is the
one that we were reviewing together. This
is where we have stopped, and we're going
to create an inference pipeline. We are
going to choose a real-time inference
pipeline, okay?
From where I can find this? Here, as it
says, "Real-time inference pipeline".
So it's gonna add some things to my
workspace. It's going to add the
web service input, it's gonna
have the web service output,
because we will be creating
it as a web service to access
it from the internet.
What are we going to do? We're going
to remove this diabetes data, okay?
And we are going to get a component
called "Web
input" and...let me check
it's "enter data manually".
We have...we already have that with input
present.
So we are going to get the entire data
manually,
and we're going to collect it—to connect
it as it was connected before, like that.
And also, I am not going to directly take
the web service—sorry, escort model to
the web service output like that.
I'm going to delete this
and I'm going to execute a python script
before
I display my result.
So,
this will be connected like...
So...
the other way around.
And from here, I am going to connect this
with that and there is some data that
we will be getting from the node, or from
the explanation here, and this is the
data that will be entered to our
website manually. Okay? This is instead of
the data that we have been getting from
our data set that we created. So I'm just
going to double click on it and choose
CSV, and I will choose "it has headers",
and I will take or copy this content and
put it there, okay?
So let's do it.
I think I have to click on edit code, now
I can click on "Save", and I can close it.
Another thing which is the python script
that we will be executing.
Um, yeah. We
are going to remove this, also.
We don't need the evaluate model
anymore, so we are going to remove it.
The python script
that I will be executing,
I can find it here.
Um, yeah.
This is the python script that we will
execute. And it says to you that this
code selects only the patient's ID
the score label, the score
probability and return—returns them to
the web service output. So we don't want
to return all the columns, as we have
seen previously,
that determines everything,
so
we want to return certain stuff, the
stuff that we will use inside our
endpoint. So I'm just going to select
everything and delete it, and
paste the code that I have gotten from
the, uh,
the Microsoft Learn docs.
Now I can click on "Save", and I can close
this.
Let me check something,
I don't think it saved.
It's saved, but the display is
wrong, okay.
And now I think everything is good to go.
I'm just gonna double-check everything.
So, uh, yeah. We are gonna change the name
of this
pipeline, and we are gonna call it
"Predict
diabetes", okay?
Now let's close it, and
I think that we are good to go. So,
um,
Okay, I think everything is good for us.
I just want to make sure of something.
Is the data...
it's correct, the data is...yeah,
it's correct.
Okay, now I can run the pipeline. Let's
submit.
Select an "existing" pipeline, and we're
going to choose
the "ms-learn-diabetes-training",
which is the pipeline
that we have been working on
from the beginning of this module.
I don't think that this is going to take
much time. So we have submitted the job
and it's running.
Until the job ends, we are going to set
everything
for deploying a service.
In order to deploy a service,
um,
I have to have the job ready, so
until it's ready, you can't deploy it. So
let's go to the job—the job details from
here, okay?
And until it finishes,
Carlotta, do you think that we can have
the questions, and then we can get back
to the job I'm deploying it?
[CARLOTTA]: Yeah, yeah, yeah.
So yeah, guys, if you
have any questions
on what you just saw here
or into introductions, feel free. This is
a good moment, we can...we can discuss
now, while we wait for this job to
finish.
[JOHN]: Uh, and....
can...
we have the knowledge check one? Or, like,
what do you think?
[CARLOTTA]: Yeah, we can also go
to the knowledge check.
Um...
Yeah, okay. So let me share my screen.
Yeah, so if you have not any questions
for us, we can maybe propose some
questions to you that you can,
um,
check our knowledge so far and you
can maybe answer to these questions
via chat.
So we have...do you see my screen, can
you see my screen?
[JOHN]: Yes.
[CARLOTTA]: So, John, I think I will
read this
question aloud and ask it to you, okay? So
are you ready to answer?
[JOHN:] Yes I am.
[CARLOTTA]: So...
you're using Azure Machine Learning
designer to create a training pipeline
for a binary classification model, so
what we were doing in our demo,
right? And you have added a data set
containing features and labels, a Two-
Class Decision Forest module. So we used
a logistic regression model our...
um, in our example.
Here, we're using a Two-
Class Decision Forest model.
And, of course, a Train Model module. You
plan now to use score model and evaluate
model modules to test the train model
with the subset of the data set that
wasn't used for training.
But what are we missing? So what's
another model you should add? We have
three options: we have Join Data, we have
Split Data, or we have Select Columns
in Dataset.
So
while John thinks about the answer,
go ahead and,
um,
answer yourself. So give us your
guess.
Put it in the chat, or just come off mute
and answer.
"A", "B".
[JOHN]: Yeah, what do you
is the correct
answer for this one? I need something to
uh...I have to score my model, and I
have to evaluate it, so I need
something to enable me to do these two
things.
[CARLOTTA]: I think it's something
you showed us in your pipeline,
right John?
[JOHN]: Of course I did.
[CARLOTTA]: Uh, we have no guesses
in the chat?
[JOHN]: Can someone...
Someone want to guess?
[CARLOTTA]: We have a "B".
[JOHN]: Uh, maybe.
So, in order to do this,
I mentioned the
the module that is going to help me
to divide my data into two things:
70 percent for the
the training and 30 percent for the
evaluation. So what did I use? I used
split data, because this is what is going
to split my data randomly into training
data and validation data. So the correct
answer is "B", and good job. Thank you
for participating.
Next question, please.
[CARLOTTA]: Yes, "B" is the correct
answer, so thanks, John,
for explaining to us the correct
one.
And we want to go with question two?
[JOHN]: Yeah, so,
I'm going to ask you now,
Carlotta. You use Azure Machine Learning
designer to create a training pipeline
for your classification model.
What must you do before you deploy this
model as a service?
You have to do
something before
you deploy it.
What do you think is the correct answer?
Is it "A", "B", or "C"?
Share your thoughts with—
with us in the chat and
and I'm also going to give you some
minutes to think of it before I
tell you about it.
[CARLOTTA]: Yeah so let me go
through the possible
answers, right? So we have A: "Create an
inference pipeline from the training
pipeline";
B: we have "Add an Evaluate Model
module to the training pipeline; and then
three, we have "Clone the training
pipeline with a different name".
So what do you think is the correct
answer? "A", "B", or "C"?
Also this time, I think it's something
we mentioned both in the decks and in
the demo right?
[JOHN]: Yes it is,
it's something that I have done
like two, like five minutes ago.
It's real-time, real-time.
[CARLOTTA]: Um,
yeah, so, think about...you need to deploy
the model as a service. So if I'm
going to deploy model,
I cannot evaluate the model
after deploying it, right, because I
cannot go into production if I'm not
sure, I'm not satisfied with my model, and
I'm not sure that my model is performing
well.
So that's why I would go with,
um,
I would...exclude "B" from my
answer.
While
thinking about "C", uh, I don't see you—I
didn't see you, John, cloning the
training Pipeline with a different name,
so I don't think this is the
right answer.
While I've seen you creating an
inference pipeline from the
training pipeline, and you just converted
it using a one-click button, right?
[JOHN]: Yeah, that's correct.
So this is the right answer.
Good job. So I created an inference
real-time pipeline, and it has done.
It finished—it finished, the job is
finished. So we can now deploy.
And...
Yeah [LAUGHS].
Exactly, like, on time.
Like, it finished two seconds...
three, four seconds ago [LAUGHS].
So, uh,
until, um...
This is my job review, so
this is the job details that I
have already submitted, it's just opening,
and once it opens...
um...
I don't know why it's so heavy
today, it's not like that usually.
[CARLOTTA]: Yeah, it's probably because
you are also
showing your your screen on Teams,
so that's the bandwidth of your
connection.
[JOHN]: Let me do something here
because...yeah finally.
I can switch to my mobile internet if it
did it again. So I will click on "Deploy",
it's that simple. I'll just click on
"Deploy" and...
I am going to deploy a new real-time
endpoint.
So what I'm going to name it?
Description and the compute type.
Everything is already mentioned
for me here,
so I'm just gonna copy and paste it,
because we...we are running
out of time.
So it's all Azure Container Instance,
not Azure Kubernetes Service,
which is a containerization service also.
Both are for containerization, but this
gives you something, and this gives you
something else.
For the advanced options,
it doesn't say for us to do anything, so
we are just gonna click on "Deploy",
and now we can test our endpoint from
the endpoints that we can find here, so
it's in progress. If I go here
under the assets, I can find something
called "Endpoints", and I can find the
real-time ones and the batch endpoints.
And we have created a real-time endpoint,
so we are going to find it under this
title. So if I click on it, I should
be able to test it once it's ready.
It's still loading, but this is the
input, and this is the output that we
will get back, so if I click on "Test"...
and from here,
I will input some data to the
endpoint,
which are: the patient information; the
columns that we have already seen in our
data set; the patient ID; the pregnancies.
And of course, of course I'm not gonna
enter the label that I'm trying to
predict, so I'm not going to give him if
the patient is diabetic or not. This
endpoint is to tell me this.
The endpoint, or the URL,
is going to give me
back this information, whether someone
has diabetes, or he doesn't. So if I input
this data, I'm just going to copy it,
and go to my endpoint, and click on
"Test", I'm gonna give the result pack,
which are the three columns that we have
defined inside our python script: the
patient ID, the diabetic prediction, and
the probability—the certainty of whether
someone is diabetic or not based on the...
uh...based on the prediction.
So that's it.
And, uh, I think that this is a really
simple step to do, you can do it on your
own, you can test it.
And I think that I have finished, so
thank you.
[CARLOTTA]: Uh, yes,
we are running out of time
I just wanted to thank you, John, for
this demo, for going through all these
steps to
um, create, train a classification model,
and also deploy it as a predictive
service. And I encourage you all to go
back to the learn module
and, um, deepen all these topics
at your own pace, and also maybe
uh do this demo on your own, on your
subscription on your Azure for Student
subscription. Um...
And I would also like to recall that
this is part of a series of study
sessions of Cloud Skill Challenge study
sessions,
so you will have more in the...
in the following days, and this is for
you to prepare, let's say, to help you
in taking the Cloud Skills Challenge,
which collect
a very interesting learn module that you
can use to scale up on various topics,
and some of them are focused on AI and
ML. So if you are interested in these
topics, you can select these these learn
modules.
So let me also copy
the link, the short link to the
challenge in the chat. Remember that
you have time until the 13th of
September to take the challenge. And also
remember that in October, on the 7th of
October, you have the—you can join the
student—the Student Developer Summit,
which is, uh, which will be a virtual or
in...for some for some cases a hybrid
event, so stay tuned, because you will
have some surprises in the following
days. And if you want to learn more about
this event you can check the Microsoft
Imaging Cap Twitter page and stay tuned.
So thank you everyone for joining
this session today, and thank you very
much, John, for co-hosting with this
session with me. It was a pleasure.
[JOHN]: Thank you so much,
Carlotta, for having me
with you today, and thank you for
giving me this opportunity to
be with you here.
[CARLOTTA]: Great, thank you.
[JOHN]: Yeah, I hope that we
work again in the future.
[CARLOTTA]: Sure, I hope so as well.
Um, so, thank you everyone.
And have a nice rest of your day.
Bye-bye. Speak to you soon.
[JOHN]: Bye.