< Return to Video

Types of Disaster Recovery and Business Continuity Testing: A Comprehensive Overview

  • 0:03 - 0:05
    we've spent a long time or a couple of
  • 0:05 - 0:08
    videos now discussing our Dr testing and
  • 0:08 - 0:10
    processes and our business continuity
  • 0:10 - 0:13
    testing and processes let's jump in and
  • 0:13 - 0:14
    take a look at some of the testing
  • 0:14 - 0:17
    methodologies across our both through
  • 0:17 - 0:20
    respectively to our disaster recovery
  • 0:20 - 0:22
    and business continuity and what's
  • 0:22 - 0:24
    available to us so in in order for us to
  • 0:24 - 0:26
    test and plan let's take a look at what
  • 0:26 - 0:28
    some of those are I'm going to separate
  • 0:28 - 0:30
    this out into two couple of areas here
  • 0:30 - 0:32
    and then we'll just sort of work through
  • 0:32 - 0:33
    this because there's a couple that I
  • 0:33 - 0:35
    want to sort of outline now the first
  • 0:35 - 0:38
    one is walkthroughs now we can outline a
  • 0:38 - 0:40
    couple of walkthroughs as we just
  • 0:40 - 0:42
    finished writing that out
  • 0:42 - 0:45
    walkthroughs is basically running our
  • 0:45 - 0:47
    tabletop exercises or scenarios right so
  • 0:47 - 0:50
    these are all sort of theory based and
  • 0:50 - 0:53
    they're sort of table top scenarios and
  • 0:53 - 0:54
    you may have a bunch of people that sort
  • 0:54 - 0:56
    of come around
  • 0:56 - 0:59
    that's horrible isn't it enabled top
  • 0:59 - 1:02
    scenarios I should write 10 for sure and
  • 1:02 - 1:03
    these are a bunch of people that we can
  • 1:03 - 1:05
    maybe get together in a boardroom so
  • 1:05 - 1:07
    we've got a table here and we can all
  • 1:07 - 1:09
    just come around here you might have a
  • 1:09 - 1:11
    few people and basically what we can do
  • 1:11 - 1:13
    is we basically give scenarios on how
  • 1:13 - 1:14
    we're going to handle and these are
  • 1:14 - 1:17
    obviously you know your
  • 1:17 - 1:19
    okay I can't draw so I'm just going to
  • 1:19 - 1:21
    remove that all together and just write
  • 1:21 - 1:24
    Board Room we're in a boardroom
  • 1:24 - 1:27
    okay that's good boardroom great uh so
  • 1:27 - 1:29
    basically when we're in the boardroom we
  • 1:29 - 1:31
    bring everyone around us and do and
  • 1:31 - 1:33
    perform either from the Dr team and we
  • 1:33 - 1:35
    sit around a tabletop and then you know
  • 1:35 - 1:37
    the leader of that you know who's
  • 1:37 - 1:39
    driving that who's got the initiation or
  • 1:39 - 1:41
    the actual delivery focus of that extra
  • 1:41 - 1:43
    scenario and we basically walk through
  • 1:43 - 1:45
    that scenario so they'll say okay well
  • 1:45 - 1:46
    we're gonna
  • 1:46 - 1:49
    walk through X
  • 1:49 - 1:51
    well walk me through this scenario on
  • 1:51 - 1:53
    how you're going to handle uh this
  • 1:53 - 1:55
    scenario and then you've got you know
  • 1:55 - 1:57
    your Dr team which are your I.T people
  • 1:57 - 1:58
    that's responsible for that you may have
  • 1:58 - 2:00
    your network team
  • 2:00 - 2:01
    you know your network engineering team
  • 2:01 - 2:05
    you have your systems guys girls uh you
  • 2:05 - 2:08
    may have your you know your your change
  • 2:08 - 2:10
    management team there you may have you
  • 2:10 - 2:12
    know your engineering maybe you've got
  • 2:12 - 2:15
    your Dev teams at Dev
  • 2:15 - 2:17
    engineering team you know and so on
  • 2:17 - 2:19
    right so you've got your I.T responsible
  • 2:19 - 2:20
    I.T people that are going to be
  • 2:20 - 2:21
    responsible for that process of
  • 2:21 - 2:23
    restoration should we go down that and
  • 2:23 - 2:26
    we'll run through you know uh you know
  • 2:26 - 2:28
    site goes offline and you know um you
  • 2:28 - 2:30
    know we've got three sites
  • 2:30 - 2:33
    Give an example we've got three sides
  • 2:33 - 2:35
    now I'm just sort of spitballing here so
  • 2:35 - 2:36
    bear with me if I don't get anything
  • 2:36 - 2:38
    right but I'm just going to walk through
  • 2:38 - 2:42
    this so we've got three sites alrighty
  • 2:42 - 2:47
    slight X goes down so this is location
  • 2:48 - 2:50
    I don't know Brisbane now Brisbane
  • 2:50 - 2:53
    branch has gone off-site well what do we
  • 2:53 - 2:54
    do
  • 2:54 - 2:56
    okay this is Sydney
  • 2:56 - 2:58
    this is Melbourne and obviously all
  • 2:58 - 2:59
    these are all connected and such and
  • 2:59 - 3:01
    then we've got our backbones back here
  • 3:01 - 3:03
    obviously they're connecting things as
  • 3:03 - 3:04
    well so obviously these are all our
  • 3:04 - 3:07
    backbone infrastructure well uh Brisbane
  • 3:07 - 3:09
    location goes offline for whatever it is
  • 3:09 - 3:12
    and someone you know walks into the data
  • 3:12 - 3:13
    center into the commons room and they've
  • 3:13 - 3:16
    tripped over the cable and now our data
  • 3:16 - 3:18
    center is offline well what do we do and
  • 3:18 - 3:19
    then we walk through that scenario on
  • 3:19 - 3:21
    how someone's going to recover from that
  • 3:21 - 3:23
    situation could be minor it could be
  • 3:23 - 3:24
    significant depending on the scenario so
  • 3:24 - 3:26
    basically the walkthrough is US walking
  • 3:26 - 3:29
    through it's the least amount of risk
  • 3:29 - 3:31
    with our Dr and obviously a business
  • 3:31 - 3:33
    continuity testing because we're talking
  • 3:33 - 3:35
    but we're not actually doing anything so
  • 3:35 - 3:37
    again we're going to involve this the
  • 3:37 - 3:39
    the you know the relevant people in the
  • 3:39 - 3:40
    parties and then we're going to walk
  • 3:40 - 3:41
    through that scenario based on those
  • 3:41 - 3:45
    smes and then obviously their knowledge
  • 3:45 - 3:46
    of how they're going to recover and
  • 3:46 - 3:48
    bring the systems back up to normal in
  • 3:48 - 3:51
    obviously a time-sensitive approach so
  • 3:51 - 3:53
    that's walkthroughs
  • 3:53 - 3:54
    uh to the other side we've got
  • 3:54 - 3:57
    simulation so we could run actual
  • 3:57 - 3:59
    simulations now simulation could be a
  • 3:59 - 4:01
    physical walkthrough or basically what
  • 4:01 - 4:04
    we call something like a mock event
  • 4:04 - 4:06
    and we give it a scenario we give a very
  • 4:06 - 4:09
    specific scenario and we walk through
  • 4:09 - 4:11
    what we're actually going to do so we
  • 4:11 - 4:13
    simulate what we're going to do if we're
  • 4:13 - 4:15
    using backups or the restoration of our
  • 4:15 - 4:17
    backups process well we would log into
  • 4:17 - 4:19
    the backup app server so if we're using
  • 4:19 - 4:21
    a specific vendor we're saying okay well
  • 4:21 - 4:22
    we're going to log into this server
  • 4:22 - 4:25
    we're going to click our restoration you
  • 4:25 - 4:26
    know and then our process is you know
  • 4:26 - 4:29
    restore hard drive X draw from you know
  • 4:29 - 4:31
    server Y and then that's going to take
  • 4:31 - 4:34
    maybe eight hours to do a full recovery
  • 4:34 - 4:36
    and then I'm going to take that hard
  • 4:36 - 4:37
    drive and then that's going to be our
  • 4:37 - 4:39
    state from how we're going to recover or
  • 4:39 - 4:40
    whatever that process looks like so
  • 4:40 - 4:43
    you'll simulate to the point of not
  • 4:43 - 4:45
    actually clicking
  • 4:45 - 4:48
    or doing anything it's to the point of
  • 4:48 - 4:50
    action right so you're gonna yes I'm
  • 4:50 - 4:52
    gonna log into the server I'm going to
  • 4:52 - 4:54
    look around here's what our hypervisors
  • 4:54 - 4:56
    here's our infrastructure and here's how
  • 4:56 - 4:57
    we're going to restore that process from
  • 4:57 - 4:58
    there we're going to log into this
  • 4:58 - 5:00
    vendors portal page we're going to get a
  • 5:00 - 5:03
    copy of our off-site backups whatever
  • 5:03 - 5:04
    that process looks like right so you run
  • 5:04 - 5:07
    through that mock simulation
  • 5:07 - 5:09
    um it touched equipment you trial it out
  • 5:09 - 5:11
    but to the point of doing it but not
  • 5:11 - 5:13
    actively executing it so you're not
  • 5:13 - 5:15
    going to go away and actually execute
  • 5:15 - 5:17
    your recovery you're just going to
  • 5:17 - 5:19
    basically simulate it up to the point of
  • 5:19 - 5:21
    of doing it uh from here on then we've
  • 5:21 - 5:25
    got something to do with a parallel
  • 5:26 - 5:29
    hist and parallel testing is something
  • 5:29 - 5:31
    like uh basically if we have two
  • 5:31 - 5:32
    environments and you might have
  • 5:32 - 5:35
    something like a prod
  • 5:35 - 5:38
    and test environment
  • 5:38 - 5:40
    that is probably this test and then with
  • 5:40 - 5:42
    parallel test we would recover our
  • 5:42 - 5:44
    production environment in that test
  • 5:44 - 5:46
    environment so we would go through all
  • 5:46 - 5:48
    the restore process but not take
  • 5:48 - 5:50
    production offline so I'm going to say
  • 5:50 - 5:53
    not offline
  • 5:54 - 5:56
    this basically just been doing you know
  • 5:56 - 5:57
    we're just going to go away we're going
  • 5:57 - 5:58
    to test and ensure the backups are
  • 5:58 - 6:01
    working correctly if there are any folds
  • 6:01 - 6:02
    or lessons to learn or issues that we
  • 6:02 - 6:04
    need to Define then we know what they
  • 6:04 - 6:05
    are we're aware of those and everyone
  • 6:05 - 6:07
    knows what to do so we're not taking
  • 6:07 - 6:09
    production offline production remains
  • 6:09 - 6:10
    online we're just going to take our
  • 6:10 - 6:12
    obviously
  • 6:12 - 6:15
    take our recover our production
  • 6:15 - 6:16
    environments we're going to take our
  • 6:16 - 6:17
    product environment and then we're going
  • 6:17 - 6:18
    to replicate that into our test
  • 6:18 - 6:20
    environment so we've got a test bed and
  • 6:20 - 6:22
    we're going to see how that process kind
  • 6:22 - 6:23
    of looks but we're not going to tinkle
  • 6:23 - 6:25
    with or touch our production and
  • 6:25 - 6:27
    production will remain online and
  • 6:27 - 6:29
    testing now the other part of that is
  • 6:29 - 6:30
    our cut over and the cut over is quite
  • 6:30 - 6:33
    similar in that nature
  • 6:33 - 6:35
    um again that sort of broad test
  • 6:35 - 6:37
    scenario so I'm going to use that so
  • 6:37 - 6:40
    let's just go Broad
  • 6:40 - 6:42
    and then test
  • 6:42 - 6:44
    and similar to that where we would go
  • 6:44 - 6:45
    well we're going to store our prod
  • 6:45 - 6:48
    service and then take broad offline and
  • 6:48 - 6:50
    bring the restored service online so
  • 6:50 - 6:52
    it's a full test there's Interruption
  • 6:52 - 6:54
    involved
  • 6:54 - 6:55
    um you know obviously interrupting
  • 6:55 - 6:57
    production as well so we're going to
  • 6:57 - 6:59
    obviously do the switch over and
  • 6:59 - 7:01
    obviously Interruption of some sort
  • 7:01 - 7:03
    right now even if it's a minor
  • 7:03 - 7:05
    Interruption of you know a second or two
  • 7:05 - 7:07
    that is still that's still an
  • 7:07 - 7:09
    interruption right so there will be some
  • 7:09 - 7:11
    sort of interruption but the cutover
  • 7:11 - 7:13
    test is the full given caboodle right
  • 7:13 - 7:15
    it's the four tests it during that's the
  • 7:15 - 7:17
    highest risk because if something does
  • 7:17 - 7:20
    go wrong during that cut over
  • 7:20 - 7:21
    um obviously then it's going to be an
  • 7:21 - 7:23
    outage so you have to be very mindful of
  • 7:23 - 7:25
    if you're going to do a cut over in any
  • 7:25 - 7:27
    state of testing that you've either done
  • 7:27 - 7:29
    a parallel test or you've done some sort
  • 7:29 - 7:30
    of mock simulation you've sort of
  • 7:30 - 7:32
    rehearsed it you understood it not just
  • 7:32 - 7:34
    go and do a cut over straight away now
  • 7:34 - 7:35
    if you're a smaller environment and you
  • 7:35 - 7:37
    don't have really much to impact
  • 7:37 - 7:39
    I'm still cautioning against it because
  • 7:39 - 7:42
    a lot of things can go wrong we want to
  • 7:42 - 7:44
    avoid any disruption or keep that as
  • 7:44 - 7:49
    minimal as minimal as possible again I
  • 7:49 - 7:51
    probably wouldn't advise that we turn
  • 7:51 - 7:52
    off the infrastructure or turn off the
  • 7:52 - 7:54
    service per se I'll probably keep them
  • 7:54 - 7:55
    online or maybe disconnect them from
  • 7:55 - 7:57
    their Network ports that way the servers
  • 7:57 - 7:58
    still remain online if anything does go
  • 7:58 - 8:01
    wrong we can obviously plug them in and
  • 8:01 - 8:02
    obviously you know get things back up
  • 8:02 - 8:04
    and running depending on you know the
  • 8:04 - 8:05
    complexity and depending on how things
  • 8:05 - 8:07
    are situated and what's dependent on
  • 8:07 - 8:09
    what so we want to make sure that we're
  • 8:09 - 8:12
    reducing risk and keeping our downtime
  • 8:12 - 8:15
    minimal as possible so again kind of cut
  • 8:15 - 8:16
    over is running through that actual
  • 8:16 - 8:18
    simulation and actually doing everything
  • 8:18 - 8:20
    and then restoring it in into your
  • 8:20 - 8:22
    product environment so it will go away
  • 8:22 - 8:23
    you'll restore your test and then you'll
  • 8:23 - 8:25
    restore it back into prod again you
  • 8:25 - 8:27
    would go through the full cutover so you
  • 8:27 - 8:29
    will turn off the appliances if you do
  • 8:29 - 8:30
    want to otherwise you can just
  • 8:30 - 8:31
    disconnect them from the network
  • 8:31 - 8:33
    connection you know depending on how you
  • 8:33 - 8:34
    actually want to run the card over but
  • 8:34 - 8:35
    the cut over essentially is running that
  • 8:35 - 8:38
    full test from there
  • 8:38 - 8:40
    once we've done everything then we want
  • 8:40 - 8:42
    to go over and document and this is the
  • 8:42 - 8:45
    most vital part as well as equally
  • 8:45 - 8:47
    important as the rest of them because we
  • 8:47 - 8:49
    are going to want to document and keep
  • 8:49 - 8:53
    things updated right so RPO
  • 8:53 - 8:55
    RTO so
  • 8:55 - 8:59
    our point of um our Point objectives so
  • 8:59 - 9:01
    what is our return of Point what's our
  • 9:01 - 9:04
    time objective what do they look like so
  • 9:04 - 9:06
    did we meet those objectives so I'm
  • 9:06 - 9:08
    going to say meet
  • 9:08 - 9:10
    objectives because obviously
  • 9:10 - 9:11
    everything's going to have some sort of
  • 9:11 - 9:13
    metrics associated with it so did we
  • 9:13 - 9:15
    meet this did this occur in the right
  • 9:15 - 9:17
    manner of the right time do we need to
  • 9:17 - 9:18
    work on it did something go wrong is
  • 9:18 - 9:21
    there room for improvement so room for
  • 9:21 - 9:23
    improvement
  • 9:23 - 9:24
    right that's an item room for
  • 9:24 - 9:26
    improvement because you had something
  • 9:26 - 9:27
    that's going to need Improvement right
  • 9:27 - 9:29
    did we do something wrong we were not
  • 9:29 - 9:30
    aware of something did something need
  • 9:30 - 9:32
    some training to do something else you
  • 9:32 - 9:34
    know it's a multiple of a multitude of
  • 9:34 - 9:36
    different issues that
  • 9:36 - 9:39
    um you know we we can improve on so
  • 9:39 - 9:42
    that's one and then third Point here
  • 9:42 - 9:43
    that I want to sort of mention is
  • 9:43 - 9:46
    lessons mode so Lessons Learned is what
  • 9:46 - 9:47
    are our key takeaways did we identify
  • 9:47 - 9:49
    something that needs updating because
  • 9:49 - 9:52
    something was missed did we maybe
  • 9:52 - 9:55
    um change a backup solution and did we
  • 9:55 - 9:57
    not know how to you know do we now need
  • 9:57 - 9:59
    to account for those plans and document
  • 9:59 - 10:01
    them plus you know lots of other things
  • 10:01 - 10:02
    right so we don't know what the solution
  • 10:02 - 10:04
    is you know if if we've maybe gone
  • 10:04 - 10:06
    through that solution around and we've
  • 10:06 - 10:08
    maybe implemented a change solution you
  • 10:08 - 10:09
    know
  • 10:09 - 10:11
    do now we do we now need to account for
  • 10:11 - 10:13
    that right so if we've got that solution
  • 10:13 - 10:14
    there maybe we have an account for it so
  • 10:14 - 10:16
    that could be something that in lines
  • 10:16 - 10:18
    about documentation or maybe a role of
  • 10:18 - 10:19
    responsibility with who is now
  • 10:19 - 10:20
    responsible for that maybe that was
  • 10:20 - 10:21
    missed
  • 10:21 - 10:23
    um there's obviously a lot of things
  • 10:23 - 10:24
    that come out of the Lessons Learned
  • 10:24 - 10:25
    basically what you're saying this
  • 10:25 - 10:27
    Lessons Learned is what have we defined
  • 10:27 - 10:28
    and what did we learn during that
  • 10:28 - 10:30
    exercise and then this could be through
  • 10:30 - 10:31
    a procurement this could be
  • 10:31 - 10:33
    technological this could be leadership
  • 10:33 - 10:35
    this could be documentation this could
  • 10:35 - 10:38
    be report this could be you know a bunch
  • 10:38 - 10:40
    of different areas that could improve
  • 10:40 - 10:42
    across through that continuous cycle of
  • 10:42 - 10:44
    improvement around our disaster recovery
  • 10:44 - 10:47
    and business continuity so you know
  • 10:47 - 10:49
    that's the three sort of areas around
  • 10:49 - 10:52
    testing our disaster recovery and
  • 10:52 - 10:54
    business continuity so going through
  • 10:54 - 10:56
    your walkthroughs and there's obviously
  • 10:56 - 10:57
    depending on the appetite of the
  • 10:57 - 10:59
    organization there's no right solution
  • 10:59 - 11:01
    here for for anyone it's just what works
  • 11:01 - 11:02
    and each customer or each people
  • 11:02 - 11:04
    business are at different phases right
  • 11:04 - 11:07
    you've got maybe six customers that are
  • 11:07 - 11:08
    doing you know cut over testing because
  • 11:08 - 11:09
    they're highly mature they've done
  • 11:09 - 11:11
    simulations they've done parallel
  • 11:11 - 11:12
    testing
  • 11:12 - 11:13
    and yeah they just set up to cut over
  • 11:13 - 11:14
    face where they're doing actual
  • 11:14 - 11:16
    stimulation of events but you've got
  • 11:16 - 11:18
    customers that are starting things out
  • 11:18 - 11:19
    and you know quite sensitive to these
  • 11:19 - 11:21
    things so you're going to run some
  • 11:21 - 11:23
    tabletops walk through scenarios and you
  • 11:23 - 11:25
    sort of gradually eat yourself into it
  • 11:25 - 11:26
    so
  • 11:26 - 11:28
    each of these have their own sort of
  • 11:28 - 11:30
    very specific areas there is no right
  • 11:30 - 11:32
    solution for you know there is no Silver
  • 11:32 - 11:34
    Bullet essentially so I hope you've
  • 11:34 - 11:36
    enjoyed this overview introduction into
  • 11:36 - 11:39
    testing of our thus recovery and
  • 11:39 - 11:40
    business continuity I hope you've
  • 11:40 - 11:42
    enjoyed this video see you all in the
  • 11:42 - 11:44
    next video and thank you all for viewing
  • 11:44 - 11:47
    bye for now
Title:
Types of Disaster Recovery and Business Continuity Testing: A Comprehensive Overview
Description:

more » « less
Video Language:
English
Duration:
11:48

English subtitles

Revisions Compare revisions