1 00:00:00,120 --> 00:00:02,560 And welcome back. Not every journey to 2 00:00:02,560 --> 00:00:04,279 the world of troubleshooting ends the 3 00:00:04,279 --> 00:00:05,960 same way. Some things are easier to 4 00:00:05,960 --> 00:00:07,799 troubleshoot than others, and also, if you 5 00:00:07,799 --> 00:00:09,719 and I have been working on, like, one large 6 00:00:09,719 --> 00:00:11,240 network and we know it like the back of our 7 00:00:11,240 --> 00:00:13,440 hand, it's a lot easier to troubleshoot 8 00:00:13,440 --> 00:00:14,839 because we know what the subnets are and 9 00:00:14,839 --> 00:00:16,800 the interfaces involved. Whereas, on the 10 00:00:16,800 --> 00:00:17,920 other hand, if we go to a brand new 11 00:00:17,920 --> 00:00:19,920 network or we're doing consulting, it may 12 00:00:19,920 --> 00:00:21,960 take some warm-up time to get used to 13 00:00:21,960 --> 00:00:24,000 and to figure out where everything is on 14 00:00:24,000 --> 00:00:26,000 that specific customer's network. And when 15 00:00:26,000 --> 00:00:27,279 we're doing troubleshooting--again, 16 00:00:27,279 --> 00:00:28,599 whether it's our own network that we 17 00:00:28,599 --> 00:00:30,240 know really well or it's a new network 18 00:00:30,240 --> 00:00:31,759 that we are just introduced to--if we 19 00:00:31,759 --> 00:00:33,520 have a certain process or methodology 20 00:00:33,520 --> 00:00:35,360 for troubleshooting, we can apply that 21 00:00:35,360 --> 00:00:37,760 methodology across the board. So let's 22 00:00:37,760 --> 00:00:38,840 have some fun with this. We'll put an 23 00:00:38,840 --> 00:00:40,160 overview of the high-level steps 24 00:00:40,160 --> 00:00:41,960 regarding a troubleshooting methodology, 25 00:00:41,960 --> 00:00:43,800 and then, as we proceed together, we'll 26 00:00:43,800 --> 00:00:45,680 actually apply those steps as we 27 00:00:45,680 --> 00:00:48,039 troubleshoot together a network. So the 28 00:00:48,039 --> 00:00:49,360 very beginning of this troubleshooting 29 00:00:49,360 --> 00:00:51,920 process would be to identify the 30 00:00:51,920 --> 00:00:53,719 problem. Case in point: let's imagine that 31 00:00:53,719 --> 00:00:55,320 the user who's sitting at this computer 32 00:00:55,320 --> 00:00:57,559 right here, PC 10, calls the service desk 33 00:00:57,559 --> 00:00:58,640 or the help desk, or they're 34 00:00:58,640 --> 00:01:00,359 calling it in your organization, and they 35 00:01:00,359 --> 00:01:02,440 say, "Yeah, I've got a problem." And then the 36 00:01:02,440 --> 00:01:04,439 service desk says, "Okay, tell me more." 37 00:01:04,439 --> 00:01:05,799 And if the user says, "Well, I can't really 38 00:01:05,799 --> 00:01:08,360 tell you anything," well, we have to kind 39 00:01:08,360 --> 00:01:10,240 of, you know, narrow down what the problem 40 00:01:10,240 --> 00:01:12,080 is or at least get what the symptoms are. 41 00:01:12,080 --> 00:01:13,200 And that's why one of the very first 42 00:01:13,200 --> 00:01:14,920 steps is to identify the problem. So 43 00:01:14,920 --> 00:01:16,240 with the identification of the problem, 44 00:01:16,240 --> 00:01:18,280 the user may say, "I can't access the 45 00:01:18,280 --> 00:01:19,560 Internet," or they may just say, "The 46 00:01:19,560 --> 00:01:21,159 network is down." At which point, we would 47 00:01:21,159 --> 00:01:23,040 ask some additional questions. So let's 48 00:01:23,040 --> 00:01:24,840 imagine this user says, "I can't 49 00:01:24,840 --> 00:01:26,840 access anything on the Internet." That 50 00:01:26,840 --> 00:01:28,200 would fall into this category of 51 00:01:28,200 --> 00:01:29,479 identifying the problem: this 52 00:01:29,479 --> 00:01:31,280 user, who normally can access the 53 00:01:31,280 --> 00:01:32,960 Internet, can no longer access the 54 00:01:32,960 --> 00:01:34,880 Internet. The second step would be to 55 00:01:34,880 --> 00:01:37,280 establish a theory regarding why that 56 00:01:37,280 --> 00:01:39,000 might be happening. And so, by leveraging 57 00:01:39,000 --> 00:01:41,240 a topology like this, we could ask 58 00:01:41,240 --> 00:01:43,040 ourselves a few questions. For example, is 59 00:01:43,040 --> 00:01:45,719 this computer powered on? If the 60 00:01:45,719 --> 00:01:47,119 computer is powered on, does it have an 61 00:01:47,119 --> 00:01:49,479 IP address? And if the DHCP client did 62 00:01:49,479 --> 00:01:51,079 get the right information regarding a 63 00:01:51,079 --> 00:01:52,880 default gateway and the subnet and all 64 00:01:52,880 --> 00:01:54,880 that good stuff? And then regarding this 65 00:01:54,880 --> 00:01:56,840 port--is this port on the switch 66 00:01:56,840 --> 00:01:58,360 associated with the right VLAN, which is 67 00:01:58,360 --> 00:02:00,360 VLAN 10? And regarding the trunking, 68 00:02:00,360 --> 00:02:01,640 is it going down from the access layer 69 00:02:01,640 --> 00:02:04,079 switch to the core? Is trunking working, 70 00:02:04,079 --> 00:02:05,960 and is VLAN 10 being allowed? And then, 71 00:02:05,960 --> 00:02:07,880 from the default gateway's perspective 72 00:02:07,880 --> 00:02:09,879 regarding VLAN 10--who's acting as the 73 00:02:09,879 --> 00:02:11,840 default gateway? Is it core 1 or core 2? 74 00:02:11,840 --> 00:02:13,080 Or are they using a First Hop 75 00:02:13,080 --> 00:02:15,000 Redundancy Protocol? And if so, which one 76 00:02:15,000 --> 00:02:16,959 of these two devices is acting as the 77 00:02:16,959 --> 00:02:18,879 active device? And does that device 78 00:02:18,879 --> 00:02:20,560 acting as the default gateway have a 79 00:02:20,560 --> 00:02:22,599 route out towards the Internet? In simple 80 00:02:22,599 --> 00:02:24,239 terms, does it know how to forward? And 81 00:02:24,239 --> 00:02:25,480 the same thing would hold true for this 82 00:02:25,480 --> 00:02:27,920 router and then this connectivity to our 83 00:02:27,920 --> 00:02:29,599 service provider. And also, because we're 84 00:02:29,599 --> 00:02:31,240 using RFC 1918 85 00:02:31,240 --> 00:02:33,160 addresses, perhaps network address 86 00:02:33,160 --> 00:02:34,959 translation is failing or isn't 87 00:02:34,959 --> 00:02:37,000 implemented correctly. So if this user at 88 00:02:37,000 --> 00:02:39,680 PC 10, by doing a few tests, we verify that 89 00:02:39,680 --> 00:02:41,640 it can ping its default gateway--And if 90 00:02:41,640 --> 00:02:43,080 this device in VLAN 10 up here at 91 00:02:43,080 --> 00:02:45,040 headquarters can ping devices out here 92 00:02:45,040 --> 00:02:46,480 at Site 2 and Site 3 and has 93 00:02:46,480 --> 00:02:48,120 reachability there, that can help 94 00:02:48,120 --> 00:02:50,040 identify what is working, and then we 95 00:02:50,040 --> 00:02:51,480 can establish a theory about what may 96 00:02:51,480 --> 00:02:53,440 be specifically causing the problem. And 97 00:02:53,440 --> 00:02:54,879 then, once we've narrowed it down to what 98 00:02:54,879 --> 00:02:56,840 we think it might be, the third 99 00:02:56,840 --> 00:02:58,879 step is to test, which is to basically go 100 00:02:58,879 --> 00:03:00,680 in and prove your theory. If we think the 101 00:03:00,680 --> 00:03:03,120 problem is with router one, or if we 102 00:03:03,120 --> 00:03:04,120 think the problem is with a 103 00:03:04,120 --> 00:03:05,519 multilayer switch, or we think the 104 00:03:05,519 --> 00:03:07,120 problem is with the access layer, we want 105 00:03:07,120 --> 00:03:09,080 to do some testing to validate that what 106 00:03:09,080 --> 00:03:10,840 we think may be the problem really is 107 00:03:10,840 --> 00:03:12,599 causing the problem. And then, once we've 108 00:03:12,599 --> 00:03:14,319 narrowed it down and verified it, we then 109 00:03:14,319 --> 00:03:16,760 want to go ahead and solve the problem. 110 00:03:16,760 --> 00:03:19,319 Now, solving the problem in an 111 00:03:19,319 --> 00:03:21,920 organization also has many steps 112 00:03:21,920 --> 00:03:23,599 involved with it. Let's list a few of 113 00:03:23,599 --> 00:03:25,879 those as far as the solution to this 114 00:03:25,879 --> 00:03:27,239 network connectivity problem that the 115 00:03:27,239 --> 00:03:29,239 user is having out to the Internet. And 116 00:03:29,239 --> 00:03:31,560 let's also imagine, based on our testing, 117 00:03:31,560 --> 00:03:32,879 that we believe it's an issue with 118 00:03:32,879 --> 00:03:35,239 address translation, which could be NAT 119 00:03:35,239 --> 00:03:37,560 or PAT, but definitely needs to happen at 120 00:03:37,560 --> 00:03:39,480 some point before that traffic goes out 121 00:03:39,480 --> 00:03:41,120 to the Internet. So if we've done some 122 00:03:41,120 --> 00:03:42,400 testing and we've narrowed it down that 123 00:03:42,400 --> 00:03:44,120 it is an address translation issue, 124 00:03:44,120 --> 00:03:45,959 regarding solving that, we want to 125 00:03:45,959 --> 00:03:48,720 create a game plan on exactly how we are 126 00:03:48,720 --> 00:03:50,640 going to solve that problem. Perhaps with 127 00:03:50,640 --> 00:03:52,439 network address translation, the NAT device 128 00:03:52,439 --> 00:03:54,959 was set up to support VLAN 20 with the 129 00:03:54,959 --> 00:03:57,519 10.12 subnet and other networks like 130 00:03:57,519 --> 00:03:59,200 this over here at Site 2 and Site 3, 131 00:03:59,200 --> 00:04:01,200 but maybe perhaps not including the 132 00:04:01,200 --> 00:04:03,760 10.110 subnet. So we'd want to make a plan 133 00:04:03,760 --> 00:04:05,720 to correct that. And also, in corporations, 134 00:04:05,720 --> 00:04:06,879 that's going to involve going through 135 00:04:06,879 --> 00:04:08,519 change control if we're going to make a 136 00:04:08,519 --> 00:04:10,439 configuration change. And then, with the 137 00:04:10,439 --> 00:04:12,079 authorization from the change control 138 00:04:12,079 --> 00:04:13,519 board, we're going to go ahead and 139 00:04:13,519 --> 00:04:15,319 implement the change. And then, when we've 140 00:04:15,319 --> 00:04:16,880 implemented it, we also want to verify 141 00:04:16,880 --> 00:04:18,560 that it's working. And that verification 142 00:04:18,560 --> 00:04:20,000 would involve a few things: number one, 143 00:04:20,000 --> 00:04:21,519 that we now have connectivity from this 144 00:04:21,519 --> 00:04:24,040 PC up to the Internet. Also, we'd want to 145 00:04:24,040 --> 00:04:26,240 verify that we didn't make any other 146 00:04:26,240 --> 00:04:28,360 changes that would negatively impact our 147 00:04:28,360 --> 00:04:29,759 environment. Like, we want to make sure that 148 00:04:29,759 --> 00:04:31,320 everything else still functions as well-- 149 00:04:31,320 --> 00:04:33,479 VLAN 20 and the other sites--everybody can 150 00:04:33,479 --> 00:04:35,000 still forward out to the Internet. And 151 00:04:35,000 --> 00:04:36,199 then we'd also want to make sure we 152 00:04:36,199 --> 00:04:38,960 document the solution--what we did, how we 153 00:04:38,960 --> 00:04:41,039 did it. And if we changed the topology in 154 00:04:41,039 --> 00:04:42,880 some fashion, we'd want to include that 155 00:04:42,880 --> 00:04:45,039 update in our documentation. So the 156 00:04:45,039 --> 00:04:46,840 documentation of what was done and also 157 00:04:46,840 --> 00:04:48,520 the topology if there's been updates-- 158 00:04:48,520 --> 00:04:50,600 that's super important because, let's say, 159 00:04:50,600 --> 00:04:52,720 3 or 4 days go by and we have yet 160 00:04:52,720 --> 00:04:54,720 another problem. And we think, "Oh, I wonder 161 00:04:54,720 --> 00:04:57,280 if what we changed here injected 162 00:04:57,280 --> 00:04:58,919 additional problems into the network." So 163 00:04:58,919 --> 00:05:00,360 we could go back through our paper trail 164 00:05:00,360 --> 00:05:02,240 and identify what happened, when it 165 00:05:02,240 --> 00:05:03,960 happened, what was changed. That can 166 00:05:03,960 --> 00:05:05,120 help speed up our troubleshooting 167 00:05:05,120 --> 00:05:07,520 because a lot of times, there are 168 00:05:07,520 --> 00:05:09,080 cabling issues and physical issues and 169 00:05:09,080 --> 00:05:10,880 so forth, but a lot of times when 170 00:05:10,880 --> 00:05:12,560 something breaks on the network--when 171 00:05:12,560 --> 00:05:14,919 something stops working--it's quite often 172 00:05:14,919 --> 00:05:17,520 due to the last change that was made. 173 00:05:17,520 --> 00:05:18,800 So if we go back and take a look at the 174 00:05:18,800 --> 00:05:20,520 last change or two, that can help us 175 00:05:20,520 --> 00:05:22,400 reduce our troubleshooting time by 176 00:05:22,400 --> 00:05:23,919 either confirming that what was done is 177 00:05:23,919 --> 00:05:26,360 not impacting our current problem or by 178 00:05:26,360 --> 00:05:28,759 verifying that what was done indeed is 179 00:05:28,759 --> 00:05:30,520 impacting our current network. And then 180 00:05:30,520 --> 00:05:32,880 the last step here is to go ahead and 181 00:05:32,880 --> 00:05:36,139 repeat this process for the next problem. 182 00:05:36,139 --> 00:05:37,880 So the next service call that 183 00:05:37,880 --> 00:05:40,280 comes in, the next issue, the next problem-- 184 00:05:40,280 --> 00:05:41,880 again, we're going to follow this logical 185 00:05:41,880 --> 00:05:43,680 plan. So what I think would be fun to do 186 00:05:43,680 --> 00:05:45,800 is let's take this network topology, 187 00:05:45,800 --> 00:05:47,160 which we've been playing on and off with 188 00:05:47,160 --> 00:05:48,840 throughout these videos, and what I'll do 189 00:05:48,840 --> 00:05:50,800 is I will inject a problem somewhere in 190 00:05:50,800 --> 00:05:52,440 this mix, and then we can go through 191 00:05:52,440 --> 00:05:53,880 these steps one at a time in this 192 00:05:53,880 --> 00:05:55,960 troubleshooting methodology. And as we do 193 00:05:55,960 --> 00:05:57,720 so, we'll go into more details on each 194 00:05:57,720 --> 00:05:59,800 one. So, in the very next video, join me as 195 00:05:59,800 --> 00:06:01,600 we take a look at this first stage in 196 00:06:01,600 --> 00:06:03,400 the troubleshooting methodology, and that 197 00:06:03,400 --> 00:06:05,240 is identifying the problem, which we'll 198 00:06:05,240 --> 00:06:07,400 do in this network topology. So I'll see 199 00:06:07,400 --> 00:06:10,000 you in that video in just a moment. 200 00:06:10,000 --> 00:06:11,599 Hey, thanks for watching, and subscribe right 201 00:06:11,599 --> 00:06:13,440 here to get the latest information from 202 00:06:13,440 --> 00:06:15,520 CBT Nuggets. And if you're new to or 203 00:06:15,520 --> 00:06:17,440 considering a career in the world of IT, 204 00:06:17,440 --> 00:06:20,640 head on over to CBT Nuggets and sign up for a free trial.