< Return to Video

Steps for Network Troubleshooting

  • 0:00 - 0:03
    And welcome back. Not every journey to
  • 0:03 - 0:04
    the world of troubleshooting ends the
  • 0:04 - 0:06
    same way. Some things are easier to
  • 0:06 - 0:08
    troubleshoot than others, and also, if you
  • 0:08 - 0:10
    and I have been working on, like, one large
  • 0:10 - 0:11
    network and we know it like the back of our
  • 0:11 - 0:13
    hand, it's a lot easier to troubleshoot
  • 0:13 - 0:15
    because we know what the subnets are and
  • 0:15 - 0:17
    the interfaces involved. Whereas, on the
  • 0:17 - 0:18
    other hand, if we go to a brand new
  • 0:18 - 0:20
    network or we're doing consulting, it may
  • 0:20 - 0:22
    take some warm-up time to get used to
  • 0:22 - 0:24
    and to figure out where everything is on
  • 0:24 - 0:26
    that specific customer's network. And when
  • 0:26 - 0:27
    we're doing troubleshooting--again,
  • 0:27 - 0:29
    whether it's our own network that we
  • 0:29 - 0:30
    know really well or it's a new network
  • 0:30 - 0:32
    that we are just introduced to--if we
  • 0:32 - 0:34
    have a certain process or methodology
  • 0:34 - 0:35
    for troubleshooting, we can apply that
  • 0:35 - 0:38
    methodology across the board. So let's
  • 0:38 - 0:39
    have some fun with this. We'll put an
  • 0:39 - 0:40
    overview of the high-level steps
  • 0:40 - 0:42
    regarding a troubleshooting methodology,
  • 0:42 - 0:44
    and then, as we proceed together, we'll
  • 0:44 - 0:46
    actually apply those steps as we
  • 0:46 - 0:48
    troubleshoot together a network. So the
  • 0:48 - 0:49
    very beginning of this troubleshooting
  • 0:49 - 0:52
    process would be to identify the
  • 0:52 - 0:54
    problem. Case in point: let's imagine that
  • 0:54 - 0:55
    the user who's sitting at this computer
  • 0:55 - 0:58
    right here, PC 10, calls the service desk
  • 0:58 - 0:59
    or the help desk, or they're
  • 0:59 - 1:00
    calling it in your organization, and they
  • 1:00 - 1:02
    say, "Yeah, I've got a problem." And then the
  • 1:02 - 1:04
    service desk says, "Okay, tell me more."
  • 1:04 - 1:06
    And if the user says, "Well, I can't really
  • 1:06 - 1:08
    tell you anything," well, we have to kind
  • 1:08 - 1:10
    of, you know, narrow down what the problem
  • 1:10 - 1:12
    is or at least get what the symptoms are.
  • 1:12 - 1:13
    And that's why one of the very first
  • 1:13 - 1:15
    steps is to identify the problem. So
  • 1:15 - 1:16
    with the identification of the problem,
  • 1:16 - 1:18
    the user may say, "I can't access the
  • 1:18 - 1:20
    Internet," or they may just say, "The
  • 1:20 - 1:21
    network is down." At which point, we would
  • 1:21 - 1:23
    ask some additional questions. So let's
  • 1:23 - 1:25
    imagine this user says, "I can't
  • 1:25 - 1:27
    access anything on the Internet." That
  • 1:27 - 1:28
    would fall into this category of
  • 1:28 - 1:29
    identifying the problem: this
  • 1:29 - 1:31
    user, who normally can access the
  • 1:31 - 1:33
    Internet, can no longer access the
  • 1:33 - 1:35
    Internet. The second step would be to
  • 1:35 - 1:37
    establish a theory regarding why that
  • 1:37 - 1:39
    might be happening. And so, by leveraging
  • 1:39 - 1:41
    a topology like this, we could ask
  • 1:41 - 1:43
    ourselves a few questions. For example, is
  • 1:43 - 1:46
    this computer powered on? If the
  • 1:46 - 1:47
    computer is powered on, does it have an
  • 1:47 - 1:49
    IP address? And if the DHCP client did
  • 1:49 - 1:51
    get the right information regarding a
  • 1:51 - 1:53
    default gateway and the subnet and all
  • 1:53 - 1:55
    that good stuff? And then regarding this
  • 1:55 - 1:57
    port--is this port on the switch
  • 1:57 - 1:58
    associated with the right VLAN, which is
  • 1:58 - 2:00
    VLAN 10? And regarding the trunking,
  • 2:00 - 2:02
    is it going down from the access layer
  • 2:02 - 2:04
    switch to the core? Is trunking working,
  • 2:04 - 2:06
    and is VLAN 10 being allowed? And then,
  • 2:06 - 2:08
    from the default gateway's perspective
  • 2:08 - 2:10
    regarding VLAN 10--who's acting as the
  • 2:10 - 2:12
    default gateway? Is it core 1 or core 2?
  • 2:12 - 2:13
    Or are they using a First Hop
  • 2:13 - 2:15
    Redundancy Protocol? And if so, which one
  • 2:15 - 2:17
    of these two devices is acting as the
  • 2:17 - 2:19
    active device? And does that device
  • 2:19 - 2:21
    acting as the default gateway have a
  • 2:21 - 2:23
    route out towards the Internet? In simple
  • 2:23 - 2:24
    terms, does it know how to forward? And
  • 2:24 - 2:25
    the same thing would hold true for this
  • 2:25 - 2:28
    router and then this connectivity to our
  • 2:28 - 2:30
    service provider. And also, because we're
  • 2:30 - 2:31
    using RFC 1918
  • 2:31 - 2:33
    addresses, perhaps network address
  • 2:33 - 2:35
    translation is failing or isn't
  • 2:35 - 2:37
    implemented correctly. So if this user at
  • 2:37 - 2:40
    PC 10, by doing a few tests, we verify that
  • 2:40 - 2:42
    it can ping its default gateway--And if
  • 2:42 - 2:43
    this device in VLAN 10 up here at
  • 2:43 - 2:45
    headquarters can ping devices out here
  • 2:45 - 2:46
    at Site 2 and Site 3 and has
  • 2:46 - 2:48
    reachability there, that can help
  • 2:48 - 2:50
    identify what is working, and then we
  • 2:50 - 2:51
    can establish a theory about what may
  • 2:51 - 2:53
    be specifically causing the problem. And
  • 2:53 - 2:55
    then, once we've narrowed it down to what
  • 2:55 - 2:57
    we think it might be, the third
  • 2:57 - 2:59
    step is to test, which is to basically go
  • 2:59 - 3:01
    in and prove your theory. If we think the
  • 3:01 - 3:03
    problem is with router one, or if we
  • 3:03 - 3:04
    think the problem is with a
  • 3:04 - 3:06
    multilayer switch, or we think the
  • 3:06 - 3:07
    problem is with the access layer, we want
  • 3:07 - 3:09
    to do some testing to validate that what
  • 3:09 - 3:11
    we think may be the problem really is
  • 3:11 - 3:13
    causing the problem. And then, once we've
  • 3:13 - 3:14
    narrowed it down and verified it, we then
  • 3:14 - 3:17
    want to go ahead and solve the problem.
  • 3:17 - 3:19
    Now, solving the problem in an
  • 3:19 - 3:22
    organization also has many steps
  • 3:22 - 3:24
    involved with it. Let's list a few of
  • 3:24 - 3:26
    those as far as the solution to this
  • 3:26 - 3:27
    network connectivity problem that the
  • 3:27 - 3:29
    user is having out to the Internet. And
  • 3:29 - 3:32
    let's also imagine, based on our testing,
  • 3:32 - 3:33
    that we believe it's an issue with
  • 3:33 - 3:35
    address translation, which could be NAT
  • 3:35 - 3:38
    or PAT, but definitely needs to happen at
  • 3:38 - 3:39
    some point before that traffic goes out
  • 3:39 - 3:41
    to the Internet. So if we've done some
  • 3:41 - 3:42
    testing and we've narrowed it down that
  • 3:42 - 3:44
    it is an address translation issue,
  • 3:44 - 3:46
    regarding solving that, we want to
  • 3:46 - 3:49
    create a game plan on exactly how we are
  • 3:49 - 3:51
    going to solve that problem. Perhaps with
  • 3:51 - 3:52
    network address translation, the NAT device
  • 3:52 - 3:55
    was set up to support VLAN 20 with the
  • 3:55 - 3:58
    10.12 subnet and other networks like
  • 3:58 - 3:59
    this over here at Site 2 and Site 3,
  • 3:59 - 4:01
    but maybe perhaps not including the
  • 4:01 - 4:04
    10.110 subnet. So we'd want to make a plan
  • 4:04 - 4:06
    to correct that. And also, in corporations,
  • 4:06 - 4:07
    that's going to involve going through
  • 4:07 - 4:09
    change control if we're going to make a
  • 4:09 - 4:10
    configuration change. And then, with the
  • 4:10 - 4:12
    authorization from the change control
  • 4:12 - 4:14
    board, we're going to go ahead and
  • 4:14 - 4:15
    implement the change. And then, when we've
  • 4:15 - 4:17
    implemented it, we also want to verify
  • 4:17 - 4:19
    that it's working. And that verification
  • 4:19 - 4:20
    would involve a few things: number one,
  • 4:20 - 4:22
    that we now have connectivity from this
  • 4:22 - 4:24
    PC up to the Internet. Also, we'd want to
  • 4:24 - 4:26
    verify that we didn't make any other
  • 4:26 - 4:28
    changes that would negatively impact our
  • 4:28 - 4:30
    environment. Like, we want to make sure that
  • 4:30 - 4:31
    everything else still functions as well--
  • 4:31 - 4:33
    VLAN 20 and the other sites--everybody can
  • 4:33 - 4:35
    still forward out to the Internet. And
  • 4:35 - 4:36
    then we'd also want to make sure we
  • 4:36 - 4:39
    document the solution--what we did, how we
  • 4:39 - 4:41
    did it. And if we changed the topology in
  • 4:41 - 4:43
    some fashion, we'd want to include that
  • 4:43 - 4:45
    update in our documentation. So the
  • 4:45 - 4:47
    documentation of what was done and also
  • 4:47 - 4:49
    the topology if there's been updates--
  • 4:49 - 4:51
    that's super important because, let's say,
  • 4:51 - 4:53
    3 or 4 days go by and we have yet
  • 4:53 - 4:55
    another problem. And we think, "Oh, I wonder
  • 4:55 - 4:57
    if what we changed here injected
  • 4:57 - 4:59
    additional problems into the network." So
  • 4:59 - 5:00
    we could go back through our paper trail
  • 5:00 - 5:02
    and identify what happened, when it
  • 5:02 - 5:04
    happened, what was changed. That can
  • 5:04 - 5:05
    help speed up our troubleshooting
  • 5:05 - 5:08
    because a lot of times, there are
  • 5:08 - 5:09
    cabling issues and physical issues and
  • 5:09 - 5:11
    so forth, but a lot of times when
  • 5:11 - 5:13
    something breaks on the network--when
  • 5:13 - 5:15
    something stops working--it's quite often
  • 5:15 - 5:18
    due to the last change that was made.
  • 5:18 - 5:19
    So if we go back and take a look at the
  • 5:19 - 5:21
    last change or two, that can help us
  • 5:21 - 5:22
    reduce our troubleshooting time by
  • 5:22 - 5:24
    either confirming that what was done is
  • 5:24 - 5:26
    not impacting our current problem or by
  • 5:26 - 5:29
    verifying that what was done indeed is
  • 5:29 - 5:31
    impacting our current network. And then
  • 5:31 - 5:33
    the last step here is to go ahead and
  • 5:33 - 5:36
    repeat this process for the next problem.
  • 5:36 - 5:38
    So the next service call that
  • 5:38 - 5:40
    comes in, the next issue, the next problem--
  • 5:40 - 5:42
    again, we're going to follow this logical
  • 5:42 - 5:44
    plan. So what I think would be fun to do
  • 5:44 - 5:46
    is let's take this network topology,
  • 5:46 - 5:47
    which we've been playing on and off with
  • 5:47 - 5:49
    throughout these videos, and what I'll do
  • 5:49 - 5:51
    is I will inject a problem somewhere in
  • 5:51 - 5:52
    this mix, and then we can go through
  • 5:52 - 5:54
    these steps one at a time in this
  • 5:54 - 5:56
    troubleshooting methodology. And as we do
  • 5:56 - 5:58
    so, we'll go into more details on each
  • 5:58 - 6:00
    one. So, in the very next video, join me as
  • 6:00 - 6:02
    we take a look at this first stage in
  • 6:02 - 6:03
    the troubleshooting methodology, and that
  • 6:03 - 6:05
    is identifying the problem, which we'll
  • 6:05 - 6:07
    do in this network topology. So I'll see
  • 6:07 - 6:10
    you in that video in just a moment.
  • 6:10 - 6:12
    Hey, thanks for watching, and subscribe right
  • 6:12 - 6:13
    here to get the latest information from
  • 6:13 - 6:16
    CBT Nuggets. And if you're new to or
  • 6:16 - 6:17
    considering a career in the world of IT,
  • 6:17 - 6:21
    head on over to CBT Nuggets and sign up for a free trial.
Title:
Steps for Network Troubleshooting
Description:

more » « less
Video Language:
English
Duration:
06:21

English subtitles

Revisions Compare revisions