0:00:00.180,0:00:01.940 - [Instructor] In a previous[br]video, we began to think about 0:00:01.940,0:00:04.720 how we can use a regression[br]line and, in particular, 0:00:04.720,0:00:08.090 the slope of a regression[br]line based on sample data, 0:00:08.090,0:00:10.910 how we can use that in[br]order to make inference 0:00:10.910,0:00:15.700 about the slope of the true[br]population regression line. 0:00:15.700,0:00:17.960 In this video, what we're[br]going to think about, 0:00:17.960,0:00:20.260 what are the conditions for inference 0:00:20.260,0:00:22.610 when we're dealing with regression lines? 0:00:22.610,0:00:24.900 And these are going to be, in some ways, 0:00:24.900,0:00:27.280 similar to the conditions for inference 0:00:27.280,0:00:30.320 that we thought about when we[br]were doing hypothesis testing 0:00:30.320,0:00:33.920 and confidence intervals for[br]means and for proportions, 0:00:33.920,0:00:36.890 but there's also going to[br]be a few new conditions. 0:00:36.890,0:00:39.860 So to help us remember these conditions, 0:00:39.860,0:00:44.860 you might want to think about[br]the LINER acronym, L-I-N-E-R. 0:00:46.950,0:00:50.500 And if it isn't obvious to[br]you, this almost is linear. 0:00:50.500,0:00:53.040 Liner, if it had an A, it would be linear. 0:00:53.040,0:00:54.670 And this is valuable because, remember, 0:00:54.670,0:00:57.140 we're thinking about linear regression. 0:00:57.140,0:01:01.240 So the L right over here[br]actually does stand for linear. 0:01:01.240,0:01:05.000 And here, the condition is, is[br]that the actual relationship 0:01:05.000,0:01:08.620 in the population between[br]your x and y variables 0:01:08.620,0:01:11.290 actually is a linear relationship, 0:01:11.290,0:01:12.710 so actual 0:01:13.690,0:01:14.750 linear 0:01:15.670,0:01:16.853 relationship, 0:01:18.360,0:01:19.310 relationship 0:01:20.230,0:01:21.690 between, 0:01:21.690,0:01:23.950 between x 0:01:23.950,0:01:25.910 and y. 0:01:25.910,0:01:28.920 Now, in a lot of cases, you[br]might just have to assume 0:01:28.920,0:01:31.270 that this is going to be[br]the case when you see it on 0:01:31.270,0:01:33.950 an exam, like an AP exam, for example. 0:01:33.950,0:01:36.400 They might say, hey, assume[br]this condition is met. 0:01:36.400,0:01:37.720 Oftentimes, it'll say assume all 0:01:37.720,0:01:38.600 of these conditions are met. 0:01:38.600,0:01:41.100 They just want you to maybe[br]know about these conditions. 0:01:41.100,0:01:42.810 But this is something to think about. 0:01:42.810,0:01:45.660 If the underlying[br]relationship is nonlinear, 0:01:45.660,0:01:47.250 well, then maybe some of your 0:01:47.250,0:01:50.150 inferences might not be as robust. 0:01:50.150,0:01:53.290 Now, the next one is[br]one we have seen before 0:01:53.290,0:01:55.560 when we're talking about general[br]conditions for inference, 0:01:55.560,0:01:57.530 and this is the independence, 0:01:57.530,0:01:59.960 independence condition. 0:01:59.960,0:02:01.980 And there's a couple of[br]ways to think about it. 0:02:01.980,0:02:04.070 Either individual observations 0:02:04.070,0:02:05.830 are independent of each other. 0:02:05.830,0:02:09.180 So you could be sampling with replacement. 0:02:09.180,0:02:11.910 Or you could be thinking[br]about your 10% rule, 0:02:11.910,0:02:13.430 that we have done when we thought about 0:02:13.430,0:02:18.200 the independence condition[br]for proportions and for means, 0:02:18.200,0:02:20.010 where we would need to feel confident 0:02:20.010,0:02:23.710 that the size of our[br]sample is no more than 10% 0:02:23.710,0:02:26.070 of the size of the population. 0:02:26.070,0:02:28.140 Now, the next one is the normal condition, 0:02:28.140,0:02:30.230 which we have talked about[br]when we were doing inference 0:02:30.230,0:02:32.610 for proportions and for means. 0:02:32.610,0:02:35.170 Although, it means something a[br]little bit more sophisticated 0:02:35.170,0:02:37.580 when we're dealing with a regression. 0:02:37.580,0:02:39.590 The normal condition, and, once again, 0:02:39.590,0:02:42.160 many times people just[br]say assume it's been met. 0:02:42.160,0:02:43.820 But let me actually[br]draw a regression line, 0:02:43.820,0:02:44.880 but do it with a little perspective, 0:02:44.880,0:02:46.670 and I'm gonna add a third dimension. 0:02:46.670,0:02:48.410 Let's say that's the x-axis, 0:02:48.410,0:02:50.500 and let's say this is the y-axis. 0:02:50.500,0:02:54.810 And the true population[br]regression line looks like this. 0:02:54.810,0:02:57.270 And so the normal condition tells us 0:02:57.270,0:03:00.033 that, for any given x[br]in the true population, 0:03:00.870,0:03:05.770 the distribution of y's that[br]you would expect is normal, 0:03:05.770,0:03:06.603 is normal. 0:03:06.603,0:03:08.810 So let me see if I can[br]draw a normal distribution 0:03:08.810,0:03:10.910 for the y's, 0:03:10.910,0:03:11.870 given that x. 0:03:11.870,0:03:13.990 So that would be that[br]normal distribution there. 0:03:13.990,0:03:16.860 And then let's say, for[br]this x right over here, 0:03:16.860,0:03:21.300 you would expect a normal[br]distribution as well, 0:03:21.300,0:03:23.460 so just like, 0:03:23.460,0:03:24.530 just like this. 0:03:24.530,0:03:25.380 So if we're given x, 0:03:25.380,0:03:27.760 the distribution of y's should be normal. 0:03:27.760,0:03:29.750 Once again, many times you'll just be 0:03:29.750,0:03:32.470 told to assume that that has[br]been met because it might, 0:03:32.470,0:03:34.390 at least in an introductory[br]statistics class, 0:03:34.390,0:03:36.970 be a little bit hard to[br]figure this out on your own. 0:03:36.970,0:03:38.810 Now, the next condition[br]is related to that, 0:03:38.810,0:03:42.790 and this is the idea of[br]having equal variance, 0:03:42.790,0:03:45.090 equal variance. 0:03:45.090,0:03:46.390 And that's just saying that each 0:03:46.390,0:03:48.670 of these normal distributions should have 0:03:48.670,0:03:51.250 the same spread for a given x. 0:03:51.250,0:03:52.870 And so you could say equal variance, 0:03:52.870,0:03:54.520 or you could even think about them having 0:03:54.520,0:03:56.360 the equal standard deviation. 0:03:56.360,0:03:59.880 So, for example, if, for a[br]given x, let's say for this x, 0:03:59.880,0:04:02.580 all of sudden, you had[br]a much lower variance, 0:04:02.580,0:04:03.620 made it look like this, 0:04:03.620,0:04:06.890 then you would no longer meet[br]your conditions for inference. 0:04:06.890,0:04:10.430 Last, but not least, and this[br]is one we've seen many times, 0:04:10.430,0:04:12.300 this is the random condition. 0:04:12.300,0:04:14.600 And this is that the data comes from 0:04:14.600,0:04:17.170 a well-designed random sample or 0:04:17.170,0:04:19.200 some type of randomized experiment. 0:04:19.200,0:04:23.040 And this condition we have[br]seen in every type of condition 0:04:23.040,0:04:25.760 for inference that we[br]have looked at so far. 0:04:25.760,0:04:27.140 So I'll leave you there. 0:04:27.140,0:04:28.270 It's good to know. 0:04:28.270,0:04:30.470 It will show up on some exams. 0:04:30.470,0:04:32.960 But many times, when it[br]comes to problem solving, 0:04:32.960,0:04:36.130 in an introductory statistics[br]class, they will tell you, 0:04:36.130,0:04:38.720 hey, just assume the conditions[br]for inference have been met. 0:04:38.720,0:04:40.910 Or what are the conditions for inference? 0:04:40.910,0:04:42.970 But they're not going to[br]actually make you prove, 0:04:42.970,0:04:46.010 for example, the normal or[br]the equal variance condition. 0:04:46.010,0:04:47.040 That might be a bit much 0:04:47.040,0:04:49.763 for an introductory statistics class.