0:00:00.180,0:00:01.940
- [Instructor] In a previous[br]video, we began to think about

0:00:01.940,0:00:04.720
how we can use a regression[br]line and, in particular,

0:00:04.720,0:00:08.090
the slope of a regression[br]line based on sample data,

0:00:08.090,0:00:10.910
how we can use that in[br]order to make inference

0:00:10.910,0:00:15.700
about the slope of the true[br]population regression line.

0:00:15.700,0:00:17.960
In this video, what we're[br]going to think about,

0:00:17.960,0:00:20.260
what are the conditions for inference

0:00:20.260,0:00:22.610
when we're dealing with regression lines?

0:00:22.610,0:00:24.900
And these are going to be, in some ways,

0:00:24.900,0:00:27.280
similar to the conditions for inference

0:00:27.280,0:00:30.320
that we thought about when we[br]were doing hypothesis testing

0:00:30.320,0:00:33.920
and confidence intervals for[br]means and for proportions,

0:00:33.920,0:00:36.890
but there's also going to[br]be a few new conditions.

0:00:36.890,0:00:39.860
So to help us remember these conditions,

0:00:39.860,0:00:44.860
you might want to think about[br]the LINER acronym, L-I-N-E-R.

0:00:46.950,0:00:50.500
And if it isn't obvious to[br]you, this almost is linear.

0:00:50.500,0:00:53.040
Liner, if it had an A, it would be linear.

0:00:53.040,0:00:54.670
And this is valuable because, remember,

0:00:54.670,0:00:57.140
we're thinking about linear regression.

0:00:57.140,0:01:01.240
So the L right over here[br]actually does stand for linear.

0:01:01.240,0:01:05.000
And here, the condition is, is[br]that the actual relationship

0:01:05.000,0:01:08.620
in the population between[br]your x and y variables

0:01:08.620,0:01:11.290
actually is a linear relationship,

0:01:11.290,0:01:12.710
so actual

0:01:13.690,0:01:14.750
linear

0:01:15.670,0:01:16.853
relationship,

0:01:18.360,0:01:19.310
relationship

0:01:20.230,0:01:21.690
between,

0:01:21.690,0:01:23.950
between x

0:01:23.950,0:01:25.910
and y.

0:01:25.910,0:01:28.920
Now, in a lot of cases, you[br]might just have to assume

0:01:28.920,0:01:31.270
that this is going to be[br]the case when you see it on

0:01:31.270,0:01:33.950
an exam, like an AP exam, for example.

0:01:33.950,0:01:36.400
They might say, hey, assume[br]this condition is met.

0:01:36.400,0:01:37.720
Oftentimes, it'll say assume all

0:01:37.720,0:01:38.600
of these conditions are met.

0:01:38.600,0:01:41.100
They just want you to maybe[br]know about these conditions.

0:01:41.100,0:01:42.810
But this is something to think about.

0:01:42.810,0:01:45.660
If the underlying[br]relationship is nonlinear,

0:01:45.660,0:01:47.250
well, then maybe some of your

0:01:47.250,0:01:50.150
inferences might not be as robust.

0:01:50.150,0:01:53.290
Now, the next one is[br]one we have seen before

0:01:53.290,0:01:55.560
when we're talking about general[br]conditions for inference,

0:01:55.560,0:01:57.530
and this is the independence,

0:01:57.530,0:01:59.960
independence condition.

0:01:59.960,0:02:01.980
And there's a couple of[br]ways to think about it.

0:02:01.980,0:02:04.070
Either individual observations

0:02:04.070,0:02:05.830
are independent of each other.

0:02:05.830,0:02:09.180
So you could be sampling with replacement.

0:02:09.180,0:02:11.910
Or you could be thinking[br]about your 10% rule,

0:02:11.910,0:02:13.430
that we have done when we thought about

0:02:13.430,0:02:18.200
the independence condition[br]for proportions and for means,

0:02:18.200,0:02:20.010
where we would need to feel confident

0:02:20.010,0:02:23.710
that the size of our[br]sample is no more than 10%

0:02:23.710,0:02:26.070
of the size of the population.

0:02:26.070,0:02:28.140
Now, the next one is the normal condition,

0:02:28.140,0:02:30.230
which we have talked about[br]when we were doing inference

0:02:30.230,0:02:32.610
for proportions and for means.

0:02:32.610,0:02:35.170
Although, it means something a[br]little bit more sophisticated

0:02:35.170,0:02:37.580
when we're dealing with a regression.

0:02:37.580,0:02:39.590
The normal condition, and, once again,

0:02:39.590,0:02:42.160
many times people just[br]say assume it's been met.

0:02:42.160,0:02:43.820
But let me actually[br]draw a regression line,

0:02:43.820,0:02:44.880
but do it with a little perspective,

0:02:44.880,0:02:46.670
and I'm gonna add a third dimension.

0:02:46.670,0:02:48.410
Let's say that's the x-axis,

0:02:48.410,0:02:50.500
and let's say this is the y-axis.

0:02:50.500,0:02:54.810
And the true population[br]regression line looks like this.

0:02:54.810,0:02:57.270
And so the normal condition tells us

0:02:57.270,0:03:00.033
that, for any given x[br]in the true population,

0:03:00.870,0:03:05.770
the distribution of y's that[br]you would expect is normal,

0:03:05.770,0:03:06.603
is normal.

0:03:06.603,0:03:08.810
So let me see if I can[br]draw a normal distribution

0:03:08.810,0:03:10.910
for the y's,

0:03:10.910,0:03:11.870
given that x.

0:03:11.870,0:03:13.990
So that would be that[br]normal distribution there.

0:03:13.990,0:03:16.860
And then let's say, for[br]this x right over here,

0:03:16.860,0:03:21.300
you would expect a normal[br]distribution as well,

0:03:21.300,0:03:23.460
so just like,

0:03:23.460,0:03:24.530
just like this.

0:03:24.530,0:03:25.380
So if we're given x,

0:03:25.380,0:03:27.760
the distribution of y's should be normal.

0:03:27.760,0:03:29.750
Once again, many times you'll just be

0:03:29.750,0:03:32.470
told to assume that that has[br]been met because it might,

0:03:32.470,0:03:34.390
at least in an introductory[br]statistics class,

0:03:34.390,0:03:36.970
be a little bit hard to[br]figure this out on your own.

0:03:36.970,0:03:38.810
Now, the next condition[br]is related to that,

0:03:38.810,0:03:42.790
and this is the idea of[br]having equal variance,

0:03:42.790,0:03:45.090
equal variance.

0:03:45.090,0:03:46.390
And that's just saying that each

0:03:46.390,0:03:48.670
of these normal distributions should have

0:03:48.670,0:03:51.250
the same spread for a given x.

0:03:51.250,0:03:52.870
And so you could say equal variance,

0:03:52.870,0:03:54.520
or you could even think about them having

0:03:54.520,0:03:56.360
the equal standard deviation.

0:03:56.360,0:03:59.880
So, for example, if, for a[br]given x, let's say for this x,

0:03:59.880,0:04:02.580
all of sudden, you had[br]a much lower variance,

0:04:02.580,0:04:03.620
made it look like this,

0:04:03.620,0:04:06.890
then you would no longer meet[br]your conditions for inference.

0:04:06.890,0:04:10.430
Last, but not least, and this[br]is one we've seen many times,

0:04:10.430,0:04:12.300
this is the random condition.

0:04:12.300,0:04:14.600
And this is that the data comes from

0:04:14.600,0:04:17.170
a well-designed random sample or

0:04:17.170,0:04:19.200
some type of randomized experiment.

0:04:19.200,0:04:23.040
And this condition we have[br]seen in every type of condition

0:04:23.040,0:04:25.760
for inference that we[br]have looked at so far.

0:04:25.760,0:04:27.140
So I'll leave you there.

0:04:27.140,0:04:28.270
It's good to know.

0:04:28.270,0:04:30.470
It will show up on some exams.

0:04:30.470,0:04:32.960
But many times, when it[br]comes to problem solving,

0:04:32.960,0:04:36.130
in an introductory statistics[br]class, they will tell you,

0:04:36.130,0:04:38.720
hey, just assume the conditions[br]for inference have been met.

0:04:38.720,0:04:40.910
Or what are the conditions for inference?

0:04:40.910,0:04:42.970
But they're not going to[br]actually make you prove,

0:04:42.970,0:04:46.010
for example, the normal or[br]the equal variance condition.

0:04:46.010,0:04:47.040
That might be a bit much

0:04:47.040,0:04:49.763
for an introductory statistics class.