WEBVTT 00:00:00.180 --> 00:00:01.940 - [Instructor] In a previous video, we began to think about 00:00:01.940 --> 00:00:04.720 how we can use a regression line and, in particular, 00:00:04.720 --> 00:00:08.090 the slope of a regression line based on sample data, 00:00:08.090 --> 00:00:10.910 how we can use that in order to make inference 00:00:10.910 --> 00:00:15.700 about the slope of the true population regression line. 00:00:15.700 --> 00:00:17.960 In this video, what we're going to think about, 00:00:17.960 --> 00:00:20.260 what are the conditions for inference 00:00:20.260 --> 00:00:22.610 when we're dealing with regression lines? 00:00:22.610 --> 00:00:24.900 And these are going to be, in some ways, 00:00:24.900 --> 00:00:27.280 similar to the conditions for inference 00:00:27.280 --> 00:00:30.320 that we thought about when we were doing hypothesis testing 00:00:30.320 --> 00:00:33.920 and confidence intervals for means and for proportions, 00:00:33.920 --> 00:00:36.890 but there's also going to be a few new conditions. 00:00:36.890 --> 00:00:39.860 So to help us remember these conditions, 00:00:39.860 --> 00:00:44.860 you might want to think about the LINER acronym, L-I-N-E-R. 00:00:46.950 --> 00:00:50.500 And if it isn't obvious to you, this almost is linear. 00:00:50.500 --> 00:00:53.040 Liner, if it had an A, it would be linear. 00:00:53.040 --> 00:00:54.670 And this is valuable because, remember, 00:00:54.670 --> 00:00:57.140 we're thinking about linear regression. 00:00:57.140 --> 00:01:01.240 So the L right over here actually does stand for linear. 00:01:01.240 --> 00:01:05.000 And here, the condition is, is that the actual relationship 00:01:05.000 --> 00:01:08.620 in the population between your x and y variables 00:01:08.620 --> 00:01:11.290 actually is a linear relationship, 00:01:11.290 --> 00:01:12.710 so actual 00:01:13.690 --> 00:01:14.750 linear 00:01:15.670 --> 00:01:16.853 relationship, 00:01:18.360 --> 00:01:19.310 relationship 00:01:20.230 --> 00:01:21.690 between, 00:01:21.690 --> 00:01:23.950 between x 00:01:23.950 --> 00:01:25.910 and y. 00:01:25.910 --> 00:01:28.920 Now, in a lot of cases, you might just have to assume 00:01:28.920 --> 00:01:31.270 that this is going to be the case when you see it on 00:01:31.270 --> 00:01:33.950 an exam, like an AP exam, for example. 00:01:33.950 --> 00:01:36.400 They might say, hey, assume this condition is met. 00:01:36.400 --> 00:01:37.720 Oftentimes, it'll say assume all 00:01:37.720 --> 00:01:38.600 of these conditions are met. 00:01:38.600 --> 00:01:41.100 They just want you to maybe know about these conditions. 00:01:41.100 --> 00:01:42.810 But this is something to think about. 00:01:42.810 --> 00:01:45.660 If the underlying relationship is nonlinear, 00:01:45.660 --> 00:01:47.250 well, then maybe some of your 00:01:47.250 --> 00:01:50.150 inferences might not be as robust. 00:01:50.150 --> 00:01:53.290 Now, the next one is one we have seen before 00:01:53.290 --> 00:01:55.560 when we're talking about general conditions for inference, 00:01:55.560 --> 00:01:57.530 and this is the independence, 00:01:57.530 --> 00:01:59.960 independence condition. 00:01:59.960 --> 00:02:01.980 And there's a couple of ways to think about it. 00:02:01.980 --> 00:02:04.070 Either individual observations 00:02:04.070 --> 00:02:05.830 are independent of each other. 00:02:05.830 --> 00:02:09.180 So you could be sampling with replacement. 00:02:09.180 --> 00:02:11.910 Or you could be thinking about your 10% rule, 00:02:11.910 --> 00:02:13.430 that we have done when we thought about 00:02:13.430 --> 00:02:18.200 the independence condition for proportions and for means, 00:02:18.200 --> 00:02:20.010 where we would need to feel confident 00:02:20.010 --> 00:02:23.710 that the size of our sample is no more than 10% 00:02:23.710 --> 00:02:26.070 of the size of the population. 00:02:26.070 --> 00:02:28.140 Now, the next one is the normal condition, 00:02:28.140 --> 00:02:30.230 which we have talked about when we were doing inference 00:02:30.230 --> 00:02:32.610 for proportions and for means. 00:02:32.610 --> 00:02:35.170 Although, it means something a little bit more sophisticated 00:02:35.170 --> 00:02:37.580 when we're dealing with a regression. 00:02:37.580 --> 00:02:39.590 The normal condition, and, once again, 00:02:39.590 --> 00:02:42.160 many times people just say assume it's been met. 00:02:42.160 --> 00:02:43.820 But let me actually draw a regression line, 00:02:43.820 --> 00:02:44.880 but do it with a little perspective, 00:02:44.880 --> 00:02:46.670 and I'm gonna add a third dimension. 00:02:46.670 --> 00:02:48.410 Let's say that's the x-axis, 00:02:48.410 --> 00:02:50.500 and let's say this is the y-axis. 00:02:50.500 --> 00:02:54.810 And the true population regression line looks like this. 00:02:54.810 --> 00:02:57.270 And so the normal condition tells us 00:02:57.270 --> 00:03:00.033 that, for any given x in the true population, 00:03:00.870 --> 00:03:05.770 the distribution of y's that you would expect is normal, 00:03:05.770 --> 00:03:06.603 is normal. 00:03:06.603 --> 00:03:08.810 So let me see if I can draw a normal distribution 00:03:08.810 --> 00:03:10.910 for the y's, 00:03:10.910 --> 00:03:11.870 given that x. 00:03:11.870 --> 00:03:13.990 So that would be that normal distribution there. 00:03:13.990 --> 00:03:16.860 And then let's say, for this x right over here, 00:03:16.860 --> 00:03:21.300 you would expect a normal distribution as well, 00:03:21.300 --> 00:03:23.460 so just like, 00:03:23.460 --> 00:03:24.530 just like this. 00:03:24.530 --> 00:03:25.380 So if we're given x, 00:03:25.380 --> 00:03:27.760 the distribution of y's should be normal. 00:03:27.760 --> 00:03:29.750 Once again, many times you'll just be 00:03:29.750 --> 00:03:32.470 told to assume that that has been met because it might, 00:03:32.470 --> 00:03:34.390 at least in an introductory statistics class, 00:03:34.390 --> 00:03:36.970 be a little bit hard to figure this out on your own. 00:03:36.970 --> 00:03:38.810 Now, the next condition is related to that, 00:03:38.810 --> 00:03:42.790 and this is the idea of having equal variance, 00:03:42.790 --> 00:03:45.090 equal variance. 00:03:45.090 --> 00:03:46.390 And that's just saying that each 00:03:46.390 --> 00:03:48.670 of these normal distributions should have 00:03:48.670 --> 00:03:51.250 the same spread for a given x. 00:03:51.250 --> 00:03:52.870 And so you could say equal variance, 00:03:52.870 --> 00:03:54.520 or you could even think about them having 00:03:54.520 --> 00:03:56.360 the equal standard deviation. 00:03:56.360 --> 00:03:59.880 So, for example, if, for a given x, let's say for this x, 00:03:59.880 --> 00:04:02.580 all of sudden, you had a much lower variance, 00:04:02.580 --> 00:04:03.620 made it look like this, 00:04:03.620 --> 00:04:06.890 then you would no longer meet your conditions for inference. 00:04:06.890 --> 00:04:10.430 Last, but not least, and this is one we've seen many times, 00:04:10.430 --> 00:04:12.300 this is the random condition. 00:04:12.300 --> 00:04:14.600 And this is that the data comes from 00:04:14.600 --> 00:04:17.170 a well-designed random sample or 00:04:17.170 --> 00:04:19.200 some type of randomized experiment. 00:04:19.200 --> 00:04:23.040 And this condition we have seen in every type of condition 00:04:23.040 --> 00:04:25.760 for inference that we have looked at so far. 00:04:25.760 --> 00:04:27.140 So I'll leave you there. 00:04:27.140 --> 00:04:28.270 It's good to know. 00:04:28.270 --> 00:04:30.470 It will show up on some exams. 00:04:30.470 --> 00:04:32.960 But many times, when it comes to problem solving, 00:04:32.960 --> 00:04:36.130 in an introductory statistics class, they will tell you, 00:04:36.130 --> 00:04:38.720 hey, just assume the conditions for inference have been met. 00:04:38.720 --> 00:04:40.910 Or what are the conditions for inference? 00:04:40.910 --> 00:04:42.970 But they're not going to actually make you prove, 00:04:42.970 --> 00:04:46.010 for example, the normal or the equal variance condition. 00:04:46.010 --> 00:04:47.040 That might be a bit much 00:04:47.040 --> 00:04:49.763 for an introductory statistics class.