1 00:00:00,180 --> 00:00:01,940 - [Instructor] In a previous video, we began to think about 2 00:00:01,940 --> 00:00:04,720 how we can use a regression line and, in particular, 3 00:00:04,720 --> 00:00:08,090 the slope of a regression line based on sample data, 4 00:00:08,090 --> 00:00:10,910 how we can use that in order to make inference 5 00:00:10,910 --> 00:00:15,700 about the slope of the true population regression line. 6 00:00:15,700 --> 00:00:17,960 In this video, what we're going to think about, 7 00:00:17,960 --> 00:00:20,260 what are the conditions for inference 8 00:00:20,260 --> 00:00:22,610 when we're dealing with regression lines? 9 00:00:22,610 --> 00:00:24,900 And these are going to be, in some ways, 10 00:00:24,900 --> 00:00:27,280 similar to the conditions for inference 11 00:00:27,280 --> 00:00:30,320 that we thought about when we were doing hypothesis testing 12 00:00:30,320 --> 00:00:33,920 and confidence intervals for means and for proportions, 13 00:00:33,920 --> 00:00:36,890 but there's also going to be a few new conditions. 14 00:00:36,890 --> 00:00:39,860 So to help us remember these conditions, 15 00:00:39,860 --> 00:00:44,860 you might want to think about the LINER acronym, L-I-N-E-R. 16 00:00:46,950 --> 00:00:50,500 And if it isn't obvious to you, this almost is linear. 17 00:00:50,500 --> 00:00:53,040 Liner, if it had an A, it would be linear. 18 00:00:53,040 --> 00:00:54,670 And this is valuable because, remember, 19 00:00:54,670 --> 00:00:57,140 we're thinking about linear regression. 20 00:00:57,140 --> 00:01:01,240 So the L right over here actually does stand for linear. 21 00:01:01,240 --> 00:01:05,000 And here, the condition is, is that the actual relationship 22 00:01:05,000 --> 00:01:08,620 in the population between your x and y variables 23 00:01:08,620 --> 00:01:11,290 actually is a linear relationship, 24 00:01:11,290 --> 00:01:12,710 so actual 25 00:01:13,690 --> 00:01:14,750 linear 26 00:01:15,670 --> 00:01:16,853 relationship, 27 00:01:18,360 --> 00:01:19,310 relationship 28 00:01:20,230 --> 00:01:21,690 between, 29 00:01:21,690 --> 00:01:23,950 between x 30 00:01:23,950 --> 00:01:25,910 and y. 31 00:01:25,910 --> 00:01:28,920 Now, in a lot of cases, you might just have to assume 32 00:01:28,920 --> 00:01:31,270 that this is going to be the case when you see it on 33 00:01:31,270 --> 00:01:33,950 an exam, like an AP exam, for example. 34 00:01:33,950 --> 00:01:36,400 They might say, hey, assume this condition is met. 35 00:01:36,400 --> 00:01:37,720 Oftentimes, it'll say assume all 36 00:01:37,720 --> 00:01:38,600 of these conditions are met. 37 00:01:38,600 --> 00:01:41,100 They just want you to maybe know about these conditions. 38 00:01:41,100 --> 00:01:42,810 But this is something to think about. 39 00:01:42,810 --> 00:01:45,660 If the underlying relationship is nonlinear, 40 00:01:45,660 --> 00:01:47,250 well, then maybe some of your 41 00:01:47,250 --> 00:01:50,150 inferences might not be as robust. 42 00:01:50,150 --> 00:01:53,290 Now, the next one is one we have seen before 43 00:01:53,290 --> 00:01:55,560 when we're talking about general conditions for inference, 44 00:01:55,560 --> 00:01:57,530 and this is the independence, 45 00:01:57,530 --> 00:01:59,960 independence condition. 46 00:01:59,960 --> 00:02:01,980 And there's a couple of ways to think about it. 47 00:02:01,980 --> 00:02:04,070 Either individual observations 48 00:02:04,070 --> 00:02:05,830 are independent of each other. 49 00:02:05,830 --> 00:02:09,180 So you could be sampling with replacement. 50 00:02:09,180 --> 00:02:11,910 Or you could be thinking about your 10% rule, 51 00:02:11,910 --> 00:02:13,430 that we have done when we thought about 52 00:02:13,430 --> 00:02:18,200 the independence condition for proportions and for means, 53 00:02:18,200 --> 00:02:20,010 where we would need to feel confident 54 00:02:20,010 --> 00:02:23,710 that the size of our sample is no more than 10% 55 00:02:23,710 --> 00:02:26,070 of the size of the population. 56 00:02:26,070 --> 00:02:28,140 Now, the next one is the normal condition, 57 00:02:28,140 --> 00:02:30,230 which we have talked about when we were doing inference 58 00:02:30,230 --> 00:02:32,610 for proportions and for means. 59 00:02:32,610 --> 00:02:35,170 Although, it means something a little bit more sophisticated 60 00:02:35,170 --> 00:02:37,580 when we're dealing with a regression. 61 00:02:37,580 --> 00:02:39,590 The normal condition, and, once again, 62 00:02:39,590 --> 00:02:42,160 many times people just say assume it's been met. 63 00:02:42,160 --> 00:02:43,820 But let me actually draw a regression line, 64 00:02:43,820 --> 00:02:44,880 but do it with a little perspective, 65 00:02:44,880 --> 00:02:46,670 and I'm gonna add a third dimension. 66 00:02:46,670 --> 00:02:48,410 Let's say that's the x-axis, 67 00:02:48,410 --> 00:02:50,500 and let's say this is the y-axis. 68 00:02:50,500 --> 00:02:54,810 And the true population regression line looks like this. 69 00:02:54,810 --> 00:02:57,270 And so the normal condition tells us 70 00:02:57,270 --> 00:03:00,033 that, for any given x in the true population, 71 00:03:00,870 --> 00:03:05,770 the distribution of y's that you would expect is normal, 72 00:03:05,770 --> 00:03:06,603 is normal. 73 00:03:06,603 --> 00:03:08,810 So let me see if I can draw a normal distribution 74 00:03:08,810 --> 00:03:10,910 for the y's, 75 00:03:10,910 --> 00:03:11,870 given that x. 76 00:03:11,870 --> 00:03:13,990 So that would be that normal distribution there. 77 00:03:13,990 --> 00:03:16,860 And then let's say, for this x right over here, 78 00:03:16,860 --> 00:03:21,300 you would expect a normal distribution as well, 79 00:03:21,300 --> 00:03:23,460 so just like, 80 00:03:23,460 --> 00:03:24,530 just like this. 81 00:03:24,530 --> 00:03:25,380 So if we're given x, 82 00:03:25,380 --> 00:03:27,760 the distribution of y's should be normal. 83 00:03:27,760 --> 00:03:29,750 Once again, many times you'll just be 84 00:03:29,750 --> 00:03:32,470 told to assume that that has been met because it might, 85 00:03:32,470 --> 00:03:34,390 at least in an introductory statistics class, 86 00:03:34,390 --> 00:03:36,970 be a little bit hard to figure this out on your own. 87 00:03:36,970 --> 00:03:38,810 Now, the next condition is related to that, 88 00:03:38,810 --> 00:03:42,790 and this is the idea of having equal variance, 89 00:03:42,790 --> 00:03:45,090 equal variance. 90 00:03:45,090 --> 00:03:46,390 And that's just saying that each 91 00:03:46,390 --> 00:03:48,670 of these normal distributions should have 92 00:03:48,670 --> 00:03:51,250 the same spread for a given x. 93 00:03:51,250 --> 00:03:52,870 And so you could say equal variance, 94 00:03:52,870 --> 00:03:54,520 or you could even think about them having 95 00:03:54,520 --> 00:03:56,360 the equal standard deviation. 96 00:03:56,360 --> 00:03:59,880 So, for example, if, for a given x, let's say for this x, 97 00:03:59,880 --> 00:04:02,580 all of sudden, you had a much lower variance, 98 00:04:02,580 --> 00:04:03,620 made it look like this, 99 00:04:03,620 --> 00:04:06,890 then you would no longer meet your conditions for inference. 100 00:04:06,890 --> 00:04:10,430 Last, but not least, and this is one we've seen many times, 101 00:04:10,430 --> 00:04:12,300 this is the random condition. 102 00:04:12,300 --> 00:04:14,600 And this is that the data comes from 103 00:04:14,600 --> 00:04:17,170 a well-designed random sample or 104 00:04:17,170 --> 00:04:19,200 some type of randomized experiment. 105 00:04:19,200 --> 00:04:23,040 And this condition we have seen in every type of condition 106 00:04:23,040 --> 00:04:25,760 for inference that we have looked at so far. 107 00:04:25,760 --> 00:04:27,140 So I'll leave you there. 108 00:04:27,140 --> 00:04:28,270 It's good to know. 109 00:04:28,270 --> 00:04:30,470 It will show up on some exams. 110 00:04:30,470 --> 00:04:32,960 But many times, when it comes to problem solving, 111 00:04:32,960 --> 00:04:36,130 in an introductory statistics class, they will tell you, 112 00:04:36,130 --> 00:04:38,720 hey, just assume the conditions for inference have been met. 113 00:04:38,720 --> 00:04:40,910 Or what are the conditions for inference? 114 00:04:40,910 --> 00:04:42,970 But they're not going to actually make you prove, 115 00:04:42,970 --> 00:04:46,010 for example, the normal or the equal variance condition. 116 00:04:46,010 --> 00:04:47,040 That might be a bit much 117 00:04:47,040 --> 00:04:49,763 for an introductory statistics class.