0:00:01.667,0:00:04.142
Hi. In this video we are going to continue

0:00:04.142,0:00:05.759
to talk about inference.

0:00:05.759,0:00:09.446
But now we're going to be talking about[br]how you can conduct hypothesis

0:00:09.446,0:00:12.427
tests in R.

0:00:12.427,0:00:15.660
So the general hypothesis[br]testing procedure

0:00:15.761,0:00:19.651
is we always state hypotheses[br]about your parameter.

0:00:20.308,0:00:23.288
We collect some data.

0:00:24.147,0:00:27.128
We construct a test statistic.

0:00:29.249,0:00:33.594
We then apply a decision[br]rule so we can either

0:00:34.604,0:00:37.585
do that through a critical value

0:00:37.736,0:00:41.222
or with p-values or like a[br]critical region, excuse me.

0:00:41.222,0:00:43.849
Or with p-values.

0:00:43.849,0:00:45.668
And then we will draw

0:00:45.668,0:00:48.648
conclusions in context.

0:00:49.254,0:00:52.689
So the first research question[br]we're going to talk about

0:00:52.689,0:00:56.983
today is we're going to continue[br]using the idea of iris flowers.

0:00:57.438,0:00:59.358
And see like we're interested in one.

0:00:59.358,0:01:03.601
And try to hypothesize[br]that we think that the average

0:01:03.955,0:01:07.188
petal length for iris flowers

0:01:08.350,0:01:11.330
is four centimeters. So,

0:01:12.543,0:01:15.523
our null hypothesis would be

0:01:16.938,0:01:19.514
that average,

0:01:19.514,0:01:22.798
petal length is equal to four centimeters.

0:01:23.101,0:01:27.092
And our alternative will be average

0:01:27.294,0:01:31.487
petal length is not equal[br]to four centimeters.

0:01:32.245,0:01:33.962
Okay.

0:01:33.962,0:01:35.630
The data we are going to use

0:01:35.630,0:01:38.913
is the iris petal length data.

0:01:39.115,0:01:41.136
So it's from the iris dataset.

0:01:41.136,0:01:43.763
And this is the petal length and variable.

0:01:43.763,0:01:47.400
Just to kind of remind us, it is just 150

0:01:47.400,0:01:50.381
observations of different irises.

0:01:51.644,0:01:54.624
To construct our test statistic

0:01:55.129,0:01:58.716
we will first need an xbar value,

0:02:00.030,0:02:02.202
which we can find by taking

0:02:02.202,0:02:05.182
the mean of our sample.

0:02:06.344,0:02:09.375
So the mean of the iris of petal length.

0:02:11.901,0:02:14.781
Which will be 3.758.

0:02:14.781,0:02:19.327
We also are going to need[br]the hypothesized value

0:02:19.327,0:02:24.076
that we are wanting to hypothesize,[br]which is four centimeters.

0:02:24.076,0:02:28.673
So I'm going to just call that mu[br]because that's the parameter of interest.

0:02:28.673,0:02:30.579
We're going to say it's equal to four.

0:02:31.906,0:02:33.119
We also need to know

0:02:33.119,0:02:36.756
the sample standard deviation, s.

0:02:37.110,0:02:40.090
And so you can get that by[br]doing this standard deviation

0:02:40.343,0:02:43.323
of the variable.

0:02:43.677,0:02:46.657
That value is 1.765.

0:02:47.112,0:02:49.789
And then we also need to know[br]the number of observations.

0:02:49.789,0:02:53.023
So, n. So we will reduce[br]the length function.

0:02:53.427,0:02:57.115
And then I'll count how many[br]observations are in your data set

0:02:57.115,0:03:00.146
which is 50.

0:03:00.449,0:03:02.419
Now once we have all of those

0:03:02.419,0:03:06.107
individual pieces we can[br]build the test statistic.

0:03:06.359,0:03:10.805
Since we are doing a hypothesis test[br]for a mean, we will be constructing

0:03:10.805,0:03:15.907
what is known as like a t, a[br]test statistic for a t-distribution.

0:03:16.816,0:03:19.797
So I'm going to call it t-test stat.

0:03:21.060,0:03:22.575
And how we create

0:03:22.575,0:03:26.617
that is we do xbar minus[br]mu in the numerator

0:03:27.476,0:03:31.820
divided by I'm just gonna[br]put this in parentheses

0:03:31.820,0:03:36.821
as well. S divided by[br]the square root of n.

0:03:37.529,0:03:41.014
So thankfully we have all of these pieces[br]already xbar, mu,

0:03:41.014,0:03:44.803
s, and n. S, n, and xbar[br]all come from the data.

0:03:45.409,0:03:48.845
Mu is the value we specified[br]in our null hypothesis.

0:03:49.350,0:03:52.532
And this will compute[br]our test statistic for us,

0:03:53.997,0:03:54.907
which is

0:03:54.907,0:04:00.918
-1.67897. So.

0:04:05.516,0:04:07.435
Here we go.

0:04:07.435,0:04:09.203
So our next step

0:04:09.203,0:04:12.184
is to apply a decision rule.

0:04:16.175,0:04:19.155
So we have two different [br]ways we can do that. We'll-

0:04:19.762,0:04:21.530
We will use a significance level

0:04:21.530,0:04:25.015
or an alpha of 0.05.

0:04:25.369,0:04:28.400
So I'm just going to go[br]ahead and set that.

0:04:31.633,0:04:32.694
And then if we want to

0:04:32.694,0:04:35.675
calculate a rejection region,

0:04:35.686,0:04:38.706
because there's two different[br]kinds of decision rules we can do.

0:04:38.706,0:04:40.120
Rejection region.

0:04:40.120,0:04:43.101
We can find which critical value

0:04:43.202,0:04:47.496
will give us a tail probability of 0.0 uh-

0:04:48.456,0:04:50.880
Or since we're doing a[br]two sided hypothesis test,

0:04:50.880,0:04:53.911
we'll do our alpha divided by two.

0:04:54.065,0:04:55.225
I'll kind of show you.

0:04:55.225,0:04:58.256
So our rejection region

0:04:58.559,0:05:01.893
is we're going to try, we're [br]going to find the critical value

0:05:01.893,0:05:07.198
that, fits the t-distribution,[br]where the probability in the tail

0:05:08.612,0:05:11.037
is equal to alpha over two.

0:05:11.037,0:05:14.725
Because we're doing a two[br]sided interval hypothesis test.

0:05:15.533,0:05:19.878
Our degrees of freedom is needed[br]for the t-test, which is n minus one.

0:05:20.837,0:05:27.152
And since we are, our test[br]statistic with a negative value,

0:05:27.405,0:05:30.537
meaning that it's on the left side of the,

0:05:31.295,0:05:34.225
of the mean on the curve,

0:05:34.225,0:05:37.710
we will go ahead and say[br]lower.tail equals true.

0:05:38.165,0:05:43.116
Because we want the lower tailed like[br]or the smaller the tail end probability.

0:05:43.469,0:05:48.319
If this is a positive number 1.67[br]we would then do lower.tail

0:05:48.370,0:05:51.906
equals false because we [br]want the upper tail.

0:05:53.118,0:05:56.048
We want kind of the extremes.

0:05:56.048,0:05:59.938
So anything from where our test[br]statistic is and more extreme.

0:06:00.999,0:06:03.121
So what this will tell us

0:06:03.121,0:06:06.152
is our, oh, alpha not found.

0:06:06.152,0:06:08.627
I forgot to run that line. There we go.

0:06:10.650,0:06:11.406
Okay.

0:06:11.406,0:06:17.367
So our rejection value is one, -1.976.

0:06:18.074,0:06:21.964
So what this is telling us[br]is that if our test statistic

0:06:21.964,0:06:26.611
is equal to -1.976 or less,

0:06:27.470,0:06:32.876
or if it's greater than positive 1.976,

0:06:33.381,0:06:36.412
then we will reject our null hypothesis.

0:06:39.443,0:06:43.282
And in this case, since our test

0:06:43.282,0:06:48.284
statistic is not in the extreme, it's[br]actually greater than this value,

0:06:48.637,0:06:51.618
we will fail to reject[br]our null hypothesis.

0:06:51.618,0:06:54.598
So this is telling us that,

0:06:55.558,0:06:56.973
we will fail to reject

0:06:56.973,0:07:00.812
our null, meaning that we do not[br]have enough evidence to conclude

0:07:00.963,0:07:04.803
that the average petal length[br]is not equal to four centimeters.

0:07:06.015,0:07:09.754
The other way you can apply a[br]decision rule is with a p-value.

0:07:10.915,0:07:13.340
And since we are doing a

0:07:13.340,0:07:15.462
two sided hypothesis test,

0:07:15.462,0:07:19.402
we will, can do two times[br]whatever probability

0:07:19.402,0:07:22.434
we get because we're going[br]to be calculating it for one tail.

0:07:22.434,0:07:25.439
But since we're doing two sided[br]we'll just need to multiply it

0:07:26.020,0:07:26.941
by two.

0:07:26.941,0:07:29.809
And so what we're going to put[br]in here is we're going to put

0:07:29.809,0:07:33.749
in our test statistic that we get.

0:07:35.316,0:07:38.296
The degrees of freedom again

0:07:38.296,0:07:40.468
and again we're going to do lower.tail

0:07:40.468,0:07:44.308
equals true because our[br]original test statistic is negative.

0:07:44.308,0:07:48.097
So we want a lower tail[br]like the extreme value.

0:07:49.460,0:07:53.335
And then we're going to multiply by two[br]again because we are doing a two sided

0:07:53.704,0:07:56.735
p-va- two sided hypothesis test.

0:07:57.139,0:07:59.918
And then this is the value[br]that we compare to

0:07:59.918,0:08:02.898
our alpha, which is 0.05.

0:08:02.999,0:08:06.485
So if our p-value is less than the alpha

0:08:06.738,0:08:10.829
less than 0.05, we would[br]reject the null hypothesis.

0:08:11.032,0:08:14.416
In this case our p-value [br]is greater than 0.05.

0:08:14.820,0:08:18.357
So we would fail to reject our null[br]hypothesis again as well.

0:08:19.316,0:08:22.348
You should get the same conclusion.

0:08:23.004,0:08:25.581
With either method, you should be

0:08:25.581,0:08:29.117
coming to the same[br]reject or fail to reject.

0:08:29.117,0:08:32.198
You should not be getting[br]different conclusions.

0:08:35.081,0:08:36.491
So that's how you can kind of

0:08:36.491,0:08:39.321
compute a hypothesis test by hand.

0:08:39.726,0:08:43.767
But as always, usually in R[br]there is an easier way to do it.

0:08:44.424,0:08:47.253
So there is a function t.test

0:08:47.253,0:08:50.233
which may be familiar from when we did.

0:08:50.435,0:08:52.406
Confidence intervals for means.

0:08:52.406,0:08:54.123
And this is actually you can

0:08:54.123,0:08:57.104
do confidence intervals plus[br]hypothesis testing in here.

0:08:57.710,0:09:01.044
So we still are going to[br]have the same null.

0:09:01.044,0:09:04.075
And I turned it off[br]hypotheses from up here.

0:09:04.530,0:09:07.763
And so what we're going to do[br]is we're going to just say t.test,

0:09:09.177,0:09:12.208
give it the data that we[br]are doing the t-test on,

0:09:13.370,0:09:16.351
which is the petal length of iris flowers.

0:09:17.058,0:09:20.089
We need to specify what our

0:09:20.190,0:09:22.767
null hypothesis new value is.

0:09:22.767,0:09:27.414
We're saying that we are[br]hypothesizing that the true, average

0:09:27.414,0:09:29.334
petal length is four.

0:09:29.334,0:09:32.315
So we will say mu is equal to four.

0:09:32.567,0:09:35.548
And then we also need to specify that our,

0:09:35.548,0:09:39.235
our, that our alternative hypothesis is a

0:09:39.539,0:09:42.519
two sided hypothesis test.

0:09:43.378,0:09:45.747
Okay.

0:09:45.747,0:09:47.672
And if we go ahead and run that.

0:09:47.672,0:09:51.764
And notice it shows it is a one[br]sample t-test which is perfect.

0:09:51.814,0:09:53.936
We have one sample and a t-test.

0:09:53.936,0:09:57.725
It gives us a t which[br]is our test statistic

0:09:58.129,0:10:00.857
which should match what we got up here.

0:10:00.857,0:10:02.878
And it does.

0:10:02.878,0:10:05.202
The degrees freedom is pretty easy.

0:10:05.202,0:10:08.182
150 minus one. And then here's a p-value

0:10:08.283,0:10:11.112
Same exact p-value we got[br]here by doing a by hand.

0:10:12.963,0:10:13.739
And then

0:10:13.739,0:10:16.720
you can kind of see[br]they have xbar right here.

0:10:17.477,0:10:20.963
And then it also gives you[br]that 95% confidence interval.

0:10:22.226,0:10:24.398
So this is an, quick and easy way

0:10:24.398,0:10:27.379
that you can compute a t-test for me.

0:10:28.541,0:10:31.521
You can this is kind of showing[br]you how to do it all by hand.

0:10:31.796,0:10:35.217
And then this will show you kind of[br]how to just do it in one simple step

0:10:35.217,0:10:37.988
by computing a p-value for you.

0:10:39.099,0:10:41.978
If you wanted to change what your,

0:10:41.978,0:10:43.544
your null hypothesis was.

0:10:43.544,0:10:48.041
So say, like you were testing, is[br]the mean equal to two instead?

0:10:48.748,0:10:50.213
You could totally do that.

0:10:50.213,0:10:54.557
And then you can see that[br]this p-value is way, way smaller.

0:10:55.669,0:10:59.407
Or if you wanted to [br]change your alternative.

0:10:59.559,0:11:03.651
So it's not that it's just not [br]equal to four and it's, you know,

0:11:03.651,0:11:07.389
maybe less or greater than. So

0:11:08.209,0:11:09.511
you could do it like this.

0:11:09.511,0:11:12.491
You can do less or

0:11:12.895,0:11:15.876
greater and that'll tell you,

0:11:17.694,0:11:19.260
which, that'll

0:11:19.260,0:11:22.999
change the output of your hypothesis test,

0:11:22.999,0:11:26.181
kind of depending on if you're[br]doing a one sided or two sided test.