-
Hi. In this video we are going to continue
-
to talk about inference.
-
But now we're going to be talking about
how you can conduct hypothesis
-
tests in R.
-
So the general hypothesis
testing procedure
-
is we always state hypotheses
about your parameter.
-
We collect some data.
-
We construct a test statistic.
-
We then apply a decision
rule so we can either
-
do that through a critical value
-
or with p-values or like a
critical region, excuse me.
-
Or with p-values.
-
And then we will draw
-
conclusions in context.
-
So the first research question
we're going to talk about
-
today is we're going to continue
using the idea of iris flowers.
-
And see like we're interested in one.
-
And try to hypothesize
that we think that the average
-
petal length for iris flowers
-
is four centimeters. So,
-
our null hypothesis would be
-
that average,
-
petal length is equal to four centimeters.
-
And our alternative will be average
-
petal length is not equal
to four centimeters.
-
Okay.
-
The data we are going to use
-
is the iris petal length data.
-
So it's from the iris dataset.
-
And this is the petal length and variable.
-
Just to kind of remind us, it is just 150
-
observations of different irises.
-
To construct our test statistic
-
we will first need an xbar value,
-
which we can find by taking
-
the mean of our sample.
-
So the mean of the iris of petal length.
-
Which will be 3.758.
-
We also are going to need
the hypothesized value
-
that we are wanting to hypothesize,
which is four centimeters.
-
So I'm going to just call that mu
because that's the parameter of interest.
-
We're going to say it's equal to four.
-
We also need to know
-
the sample standard deviation, s.
-
And so you can get that by
doing this standard deviation
-
of the variable.
-
That value is 1.765.
-
And then we also need to know
the number of observations.
-
So, n. So we will reduce
the length function.
-
And then I'll count how many
observations are in your data set
-
which is 50.
-
Now once we have all of those
-
individual pieces we can
build the test statistic.
-
Since we are doing a hypothesis test
for a mean, we will be constructing
-
what is known as like a t, a
test statistic for a t-distribution.
-
So I'm going to call it t-test stat.
-
And how we create
-
that is we do xbar minus
mu in the numerator
-
divided by I'm just gonna
put this in parentheses
-
as well. S divided by
the square root of n.
-
So thankfully we have all of these pieces
already xbar, mu,
-
s, and n. S, n, and xbar
all come from the data.
-
Mu is the value we specified
in our null hypothesis.
-
And this will compute
our test statistic for us,
-
which is
-
-1.67897. So.
-
Here we go.
-
So our next step
-
is to apply a decision rule.
-
So we have two different
ways we can do that. We'll-
-
We will use a significance level
-
or an alpha of 0.05.
-
So I'm just going to go
ahead and set that.
-
And then if we want to
-
calculate a rejection region,
-
because there's two different
kinds of decision rules we can do.
-
Rejection region.
-
We can find which critical value
-
will give us a tail probability of 0.0 uh-
-
Or since we're doing a
two sided hypothesis test,
-
we'll do our alpha divided by two.
-
I'll kind of show you.
-
So our rejection region
-
is we're going to try, we're
going to find the critical value
-
that, fits the t-distribution,
where the probability in the tail
-
is equal to alpha over two.
-
Because we're doing a two
sided interval hypothesis test.
-
Our degrees of freedom is needed
for the t-test, which is n minus one.
-
And since we are, our test
statistic with a negative value,
-
meaning that it's on the left side of the,
-
of the mean on the curve,
-
we will go ahead and say
lower.tail equals true.
-
Because we want the lower tailed like
or the smaller the tail end probability.
-
If this is a positive number 1.67
we would then do lower.tail
-
equals false because we
want the upper tail.
-
We want kind of the extremes.
-
So anything from where our test
statistic is and more extreme.
-
So what this will tell us
-
is our, oh, alpha not found.
-
I forgot to run that line. There we go.
-
Okay.
-
So our rejection value is one, -1.976.
-
So what this is telling us
is that if our test statistic
-
is equal to -1.976 or less,
-
or if it's greater than positive 1.976,
-
then we will reject our null hypothesis.
-
And in this case, since our test
-
statistic is not in the extreme, it's
actually greater than this value,
-
we will fail to reject
our null hypothesis.
-
So this is telling us that,
-
we will fail to reject
-
our null, meaning that we do not
have enough evidence to conclude
-
that the average petal length
is not equal to four centimeters.
-
The other way you can apply a
decision rule is with a p-value.
-
And since we are doing a
-
two sided hypothesis test,
-
we will, can do two times
whatever probability
-
we get because we're going
to be calculating it for one tail.
-
But since we're doing two sided
we'll just need to multiply it
-
by two.
-
And so what we're going to put
in here is we're going to put
-
in our test statistic that we get.
-
The degrees of freedom again
-
and again we're going to do lower.tail
-
equals true because our
original test statistic is negative.
-
So we want a lower tail
like the extreme value.
-
And then we're going to multiply by two
again because we are doing a two sided
-
p-va- two sided hypothesis test.
-
And then this is the value
that we compare to
-
our alpha, which is 0.05.
-
So if our p-value is less than the alpha
-
less than 0.05, we would
reject the null hypothesis.
-
In this case our p-value
is greater than 0.05.
-
So we would fail to reject our null
hypothesis again as well.
-
You should get the same conclusion.
-
With either method, you should be
-
coming to the same
reject or fail to reject.
-
You should not be getting
different conclusions.
-
So that's how you can kind of
-
compute a hypothesis test by hand.
-
But as always, usually in R
there is an easier way to do it.
-
So there is a function t.test
-
which may be familiar from when we did.
-
Confidence intervals for means.
-
And this is actually you can
-
do confidence intervals plus
hypothesis testing in here.
-
So we still are going to
have the same null.
-
And I turned it off
hypotheses from up here.
-
And so what we're going to do
is we're going to just say t.test,
-
give it the data that we
are doing the t-test on,
-
which is the petal length of iris flowers.
-
We need to specify what our
-
null hypothesis new value is.
-
We're saying that we are
hypothesizing that the true, average
-
petal length is four.
-
So we will say mu is equal to four.
-
And then we also need to specify that our,
-
our, that our alternative hypothesis is a
-
two sided hypothesis test.
-
Okay.
-
And if we go ahead and run that.
-
And notice it shows it is a one
sample t-test which is perfect.
-
We have one sample and a t-test.
-
It gives us a t which
is our test statistic
-
which should match what we got up here.
-
And it does.
-
The degrees freedom is pretty easy.
-
150 minus one. And then here's a p-value
-
Same exact p-value we got
here by doing a by hand.
-
And then
-
you can kind of see
they have xbar right here.
-
And then it also gives you
that 95% confidence interval.
-
So this is an, quick and easy way
-
that you can compute a t-test for me.
-
You can this is kind of showing
you how to do it all by hand.
-
And then this will show you kind of
how to just do it in one simple step
-
by computing a p-value for you.
-
If you wanted to change what your,
-
your null hypothesis was.
-
So say, like you were testing, is
the mean equal to two instead?
-
You could totally do that.
-
And then you can see that
this p-value is way, way smaller.
-
Or if you wanted to
change your alternative.
-
So it's not that it's just not
equal to four and it's, you know,
-
maybe less or greater than. So
-
you could do it like this.
-
You can do less or
-
greater and that'll tell you,
-
which, that'll
-
change the output of your hypothesis test,
-
kind of depending on if you're
doing a one sided or two sided test.