Types of Disaster Recovery and Business Continuity Testing: A Comprehensive Overview

Edit subtitles

0:03 - 0:05

We've spent a long time or a couple of
0:05 - 0:08

videos now discussing our DR testing and
0:08 - 0:10

processes and our business continuity
0:10 - 0:13

testing and processes. Let's jump in and
0:13 - 0:14

take a look at some of the testing
0:14 - 0:17

methodologies across our both through
0:17 - 0:20

respectively to our disaster recovery
0:20 - 0:22

and business continuity. And what's
0:22 - 0:24

available to us, so in order for us to
0:24 - 0:26

test and plan, let's take a look at what
0:26 - 0:28

some of those are. I'm going to separate
0:28 - 0:30

this out into two couple of areas here,
0:30 - 0:32

and then we'll just sort of work through
0:32 - 0:33

this because there's a couple that I
0:33 - 0:35

want to, sort of, outline. Now the first
0:35 - 0:38

one is walkthroughs. Now we can outline a
0:38 - 0:40

couple of walkthroughs as we, just
0:40 - 0:42

finish writing that out.
0:42 - 0:45

Walkthroughs is basically running our
0:45 - 0:47

tabletop exercises or scenarios, right? So
0:47 - 0:50

these are all, sort of, theory based and
0:50 - 0:53

they're, sort of, table top scenarios, and
0:53 - 0:54

you may have a bunch of people that, sort
0:54 - 0:56

of, come around.
0:56 - 0:59

That's horrible, isn't it. Table top
0:59 - 1:02

scenarios, I'll just write 'scen' for short. And
1:02 - 1:03

these are a bunch of people that we can
1:03 - 1:05

maybe get together in a boardroom so
1:05 - 1:07

we've got a table here, and we can all
1:07 - 1:09

just come around here, you might have a
1:09 - 1:11

few people. And basically what we can do
1:11 - 1:13

is we basically give scenarios on how
1:13 - 1:14

we're going to handle, and these are
1:14 - 1:17

obviously, you know, your-
1:17 - 1:19

Okay, I can't draw so I'm just going to
1:19 - 1:21

remove that all together and just write
1:21 - 1:24

boardroom. We're in a boardroom.
1:24 - 1:27

Okay, that's good. Boardroom, great! So
1:27 - 1:29

basically when we're in the boardroom, we
1:29 - 1:31

bring everyone around us and do, and
1:31 - 1:33

perform either from the DR team, and we
1:33 - 1:35

sit around a tabletop and then, you know,
1:35 - 1:37

the leader of that, you know, who's
1:37 - 1:39

driving that, who's got the initiation, or
1:39 - 1:41

the actual delivery focus of that, picks a
1:41 - 1:43

scenario and we basically walk through
1:43 - 1:45

that scenario. So they'll say okay, well
1:45 - 1:46

we're gonna
1:46 - 1:49

walk through X.
1:49 - 1:51

Walk me through this scenario on
1:51 - 1:53

how you're going to handle this
1:53 - 1:55

scenario. And then you've got, you know,
1:55 - 1:57

your DR team which are your IT people
1:57 - 1:58

that's responsible for that. You may have
1:58 - 2:00

your network team,
2:00 - 2:01

you know, your network engineering team.
2:01 - 2:05

You have your systems guys, girls. You
2:05 - 2:08

may have your, you know, your change
2:08 - 2:10

management team there, you may have, you
2:10 - 2:12

know, your engineering. Maybe you've got
2:12 - 2:15

your dev teams or dev
2:15 - 2:17

engineering team, you know, and so on,
2:17 - 2:19

right? So you've got your IT responsible
2:19 - 2:20

IT people that are going to be
2:20 - 2:21

responsible for that process of
2:21 - 2:23

restoration should we go down that. And
2:23 - 2:26

we'll run through, you know,
2:26 - 2:28

site goes offline and, you know,
2:28 - 2:30

we've got three sites.
2:30 - 2:33

Give an example that we've got three sites.
2:33 - 2:35

Now I'm just sort of spitballing here so
2:35 - 2:36

bear with me if I don't get anything
2:36 - 2:38

right, but I'm just going to walk through
2:38 - 2:42

this. So we've got three sites, alrighty.
2:42 - 2:47

Site x goes down, so this is location,
2:48 - 2:50

I don't know, Brisbane. Now Brisbane
2:50 - 2:53

branch has gone off-site, well, what do we
2:53 - 2:54

do?
2:54 - 2:56

Okay this is Sydney.
2:56 - 2:58

This is Melbourne. And obviously all
2:58 - 2:59

these are all connected and such. And
2:59 - 3:01

then we've got our backbones back here,
3:01 - 3:03

obviously they're connecting things as
3:03 - 3:04

well. So obviously these are all our
3:04 - 3:07

backbone infrastructure. Well Brisbane
3:07 - 3:09

location goes offline for whatever reason.
3:09 - 3:12

Someone, you know, walks into the data
3:12 - 3:13

center, into the commons room and they've
3:13 - 3:16

tripped over the cable and now our data
3:16 - 3:18

center is offline. Well what do we do? And
3:18 - 3:19

then we walk through that scenario on
3:19 - 3:21

how someone's going to recover from that
3:21 - 3:23

situation. Could be minor, it could be
3:23 - 3:24

significant depending on the scenario. So
3:24 - 3:26

basically the walkthrough is us walking
3:26 - 3:29

through. It's the least amount of risk
3:29 - 3:31

with our DR and obviously business
3:31 - 3:33

continuity testing because we're talking,
3:33 - 3:35

but we're not actually doing anything. So
3:35 - 3:37

again, we're going to involve the,
3:37 - 3:39

you know, the relevant people in the
3:39 - 3:40

parties, and then we're going to walk
3:40 - 3:41

through that scenario based on those
3:41 - 3:45

SMEs and obviously, their knowledge
3:45 - 3:46

of how they're going to recover and
3:46 - 3:48

bring the systems back up to normal in
3:48 - 3:51

obviously, a time-sensitive approach. So
3:51 - 3:53

that's walkthroughs.
3:53 - 3:54

To the other side we've got
3:54 - 3:57

simulation. So we could run actual
3:57 - 3:59

simulations. Now simulation could be a
3:59 - 4:01

physical walkthrough or basically what
4:01 - 4:04

we call something like a mock event,
4:04 - 4:06

and we give it a scenario, we give a very
4:06 - 4:09

specific scenario, and we walk through
4:09 - 4:11

what we're actually going to do. So we
4:11 - 4:13

simulate what we're going to do. If we're
4:13 - 4:15

using backups or the restoration of our
4:15 - 4:17

backups process, well we would log into
4:17 - 4:19

the backup app server. So if we're using
4:19 - 4:21

a specific vendor, we'll say okay, well
4:21 - 4:22

we're going to log into this server,
4:22 - 4:25

we're going to click our restoration, you,
4:25 - 4:26

know, and then our process is, you know,
4:26 - 4:29

restore hard drive X draw from, you know,
4:29 - 4:31

server Y, and then that's going to take
4:31 - 4:34

maybe eight hours to do a full recovery,
4:34 - 4:36

and then I'm going to take that hard
4:36 - 4:37

drive, and then that's going to be our
4:37 - 4:39

[inaudible] from how we're going to recover or
4:39 - 4:40

whatever that process looks like. So
4:40 - 4:43

you'll simulate to the point of not
4:43 - 4:45

actually clicking
4:45 - 4:48

or doing anything, it's to the point of
4:48 - 4:50

action, right? So you're gonna, yes, I'm
4:50 - 4:52

gonna log into the server, I'm going to
4:52 - 4:54

look around, here's our hypervisors,
4:54 - 4:56

here's our infrastructure, and here's how
4:56 - 4:57

we're going to restore that process from
4:57 - 4:58

there. We're going to log into this
4:58 - 5:00

vendor's portal page, we're going to get a
5:00 - 5:03

copy of our off-site backups, whatever
5:03 - 5:04

that process looks like, right? So you run
5:04 - 5:07

through that mock simulation.
5:07 - 5:09

You touch the equipment, you trial it out,
5:09 - 5:11

but to the point of doing it but not
5:11 - 5:13

actively executing it. So you're not
5:13 - 5:15

going to go away and actually execute
5:15 - 5:17

your recovery, you're just going to
5:17 - 5:19

basically, simulate it up to the point of
5:19 - 5:21

of doing it. From here on, then we've
5:21 - 5:25

got something to do with a parallel
5:26 - 5:29

test. And parallel testing is something
5:29 - 5:31

like, basically if we have two
5:31 - 5:32

environments, and you might have
5:32 - 5:35

something like a prod
5:35 - 5:38

and test environment
5:38 - 5:40

that is a part of this test. And then with the
5:40 - 5:42

parallel test we would recover our
5:42 - 5:44

production environment in that test
5:44 - 5:46

environment. So we would go through all
5:46 - 5:48

the restore process, but not take
5:48 - 5:50

production offline, so I'm going to say
5:50 - 5:53

not offline.
5:54 - 5:56

This would basically just be doing, you know,
5:56 - 5:57

we're just going to go away, we're going
5:57 - 5:58

to test and ensure the backups are
5:58 - 6:01

working correctly. If there are any folds
6:01 - 6:02

or lessons to learn or issues that we
6:02 - 6:04

need to define, then we know what they
6:04 - 6:05

are, we're aware of those, and everyone
6:05 - 6:07

knows what to do. So we're not taking
6:07 - 6:09

production offline, production remains
6:09 - 6:10

online. We're just going to take our
6:10 - 6:12

obviously,
6:12 - 6:15

take our, recover our production
6:15 - 6:16

environments, we're going to take our
6:16 - 6:17

product environment, and then we're going
6:17 - 6:18

to replicate that into our test
6:18 - 6:20

environment. So we've got a test bed and
6:20 - 6:22

we're going to see how that process kind
6:22 - 6:23

of looks, but we're not going to tinkle
6:23 - 6:25

with or touch our production, and
6:25 - 6:27

production will remain online and
6:27 - 6:29

testing. Now the other part of that is
6:29 - 6:30

our cutover and the cutover is quite
6:30 - 6:33

similar in that nature.
6:33 - 6:35

Again that, sort of, prod test
6:35 - 6:37

scenario, so I'm going to use that. So
6:37 - 6:40

let's just go prod
6:40 - 6:42

and then test.
6:42 - 6:44

And similar to that where we would go,
6:44 - 6:45

well, we're going to store our prod
6:45 - 6:48

servers and then take broad offline and
6:48 - 6:50

bring the restored servers online. So
6:50 - 6:52

it's a full test, there's interruption
6:52 - 6:54

involved,
6:54 - 6:55

you know, obviously interrupting
6:55 - 6:57

production as well. So we're going to
6:57 - 6:59

obviously do the switch over and
6:59 - 7:01

obviously interruption of some sort,
7:01 - 7:03

right? Now even if it's a minor
7:03 - 7:05

interruption of, you know, a second or two,
7:05 - 7:07

that's still an
7:07 - 7:09

interruption, right? So there will be some
7:09 - 7:11

sort of interruption, but the cutover
7:11 - 7:13

test is the full kit and caboodle, right?
7:13 - 7:15

It's the full test, [inaudible], it's the
7:15 - 7:17

highest risk because if something does
7:17 - 7:20

go wrong during that cutover,
7:20 - 7:21

obviously then it's going to be an
7:21 - 7:23

outage. So you have to be very mindful of
7:23 - 7:25

if you're going to do a cutover in any
7:25 - 7:27

state of testing, that you've either done
7:27 - 7:29

the parallel test or you've done some sort
7:29 - 7:30

of mock simulation, you've sort of
7:30 - 7:32

rehearsed it, you understood it, not just
7:32 - 7:34

go and do a cutover straight away. Now
7:34 - 7:35

if you're a smaller environment and you
7:35 - 7:37

don't have really much to impact,
7:37 - 7:39

I'm still cautioning against it because
7:39 - 7:42

a lot of things can go wrong. We want to
7:42 - 7:44

avoid any disruption or keep that as
7:44 - 7:49

minimal as possible. Again, I
7:49 - 7:51

probably wouldn't advise that we turn
7:51 - 7:52

off the infrastructure or turn off the
7:52 - 7:54

service per se, I'll probably keep them
7:54 - 7:55

online or maybe disconnect them from
7:55 - 7:57

their network ports, that way the servers
7:57 - 7:58

still remain online if anything does go
7:58 - 8:01

wrong, we can obviously plug them in and
8:01 - 8:02

obviously, you know, get things back up
8:02 - 8:04

and running depending on, you know, the
8:04 - 8:05

complexity and depending on how things
8:05 - 8:07

are situated and what's dependent on
8:07 - 8:09

what. So we want to make sure that we're
8:09 - 8:12

reducing risk and keeping our downtime
8:12 - 8:15

minimal as possible. So again, kind of cut
8:15 - 8:16

over is running through that actual
8:16 - 8:18

simulation and actually doing everything,
8:18 - 8:20

and then restoring it into your
8:20 - 8:22

product environment. So you will go away,
8:22 - 8:23

you'll restore your test, and then you'll
8:23 - 8:25

restore it back into prod. Again, you
8:25 - 8:27

would go through the full cutover, so you
8:27 - 8:29

will turn off the appliances if you do
8:29 - 8:30

want to, otherwise you can just
8:30 - 8:31

disconnect them from the network
8:31 - 8:33

connection, you know, depending on how you
8:33 - 8:34

actually want to run the cutover, but
8:34 - 8:35

the cutover essentially is running that
8:35 - 8:38

full test. From there,
8:38 - 8:40

once we've done everything, then we want
8:40 - 8:42

to go over and document. And this is the
8:42 - 8:45

most vital part as well as equally
8:45 - 8:47

important as the rest of them because we
8:47 - 8:49

are going to want to document and keep
8:49 - 8:53

things updated, right? So RPO,
8:53 - 8:55

RTO. So
8:55 - 8:59

our point of our point objectives, so
8:59 - 9:01

what is our return of point? What's our
9:01 - 9:04

time objective? What do they look like? So
9:04 - 9:06

did we meet those objectives? So I'm
9:06 - 9:08

going to say meet
9:08 - 9:10

objectives because obviously
9:10 - 9:11

everything's going to have some sort of
9:11 - 9:13

metrics associated with it. So did we
9:13 - 9:15

meet this? Did this occur in the right
9:15 - 9:17

manner of the right time? Do we need to
9:17 - 9:18

work on it? Did something go wrong? Is
9:18 - 9:21

there room for improvement? So room for
9:21 - 9:23

improvement,
9:23 - 9:24

right, that's an 'I'. Room for
9:24 - 9:26

improvement because chances are, there's
9:26 - 9:27

something that's going to need improvement, right?
9:27 - 9:29

Did we do something wrong? Were we not
9:29 - 9:30

aware of something? Did somebody need
9:30 - 9:32

some training to do something else? You
9:32 - 9:34

know, it's a multitude of
9:34 - 9:36

different issues that,
9:36 - 9:39

you know, we can improve on. So
9:39 - 9:42

that's one, and then the third point here
9:42 - 9:43

that I want to sort of mention is
9:43 - 9:46

lessons learned. So lessons learned is what
9:46 - 9:47

are our key takeaways? Did we identify
9:47 - 9:49

something that needs updating because
9:49 - 9:52

something was missed? Did we maybe
9:52 - 9:55

change a backup solution and did we
9:55 - 9:57

not know how to, you know, do we now need
9:57 - 9:59

to account for those plans and document
9:59 - 10:01

them? Plus, you know, lots of other things,
10:01 - 10:02

right? So we don't know what the solution
10:02 - 10:04

is, you know, if we've maybe gone
10:04 - 10:06

through that solution [inaudible] and we've
10:06 - 10:08

maybe implemented a changed solution, you
10:08 - 10:09

know,
10:09 - 10:11

do we now need to account for
10:11 - 10:13

that, right? So if we've got that solution
10:13 - 10:14

there, maybe we have an account for it. So
10:14 - 10:16

that could be something that in lines
10:16 - 10:18

with our documentation or maybe a role of
10:18 - 10:19

responsibility with who is now
10:19 - 10:20

responsible for that? Maybe that was
10:20 - 10:21

missed.
10:21 - 10:23

There's obviously a lot of things
10:23 - 10:24

that come out of the lessons learned.
10:24 - 10:25

Basically what you're saying is
10:25 - 10:27

lessons learned is what have we defined
10:27 - 10:28

and what did we learn during that
10:28 - 10:30

exercise? And then this could be through
10:30 - 10:31

a procurement, this could be
10:31 - 10:33

technological, this could be leadership,
10:33 - 10:35

this could be documentation, this could
10:35 - 10:38

be a report, this could be, you know, a bunch
10:38 - 10:40

of different areas that could improve
10:40 - 10:42

across through that continuous cycle of
10:42 - 10:44

improvement around our disaster recovery
10:44 - 10:47

and business continuity. So, you know,
10:47 - 10:49

that's the three, sort of, areas around
10:49 - 10:52

testing our disaster recovery and
10:52 - 10:54

business continuity. So going through
10:54 - 10:56

your walkthroughs and there's obviously,
10:56 - 10:57

depending on the appetite of the
10:57 - 10:59

organization, there's no right solution
10:59 - 11:01

here for anyone. It's just what works
11:01 - 11:02

and each customer or each people,
11:02 - 11:04

business are at different phases, right?
11:04 - 11:07

You've got maybe six customers that are
11:07 - 11:08

doing, you know, cutover testing because
11:08 - 11:09

they're highly mature, they've done
11:09 - 11:11

simulations, they've done parallel
11:11 - 11:12

testing,
11:12 - 11:13

and now they just set up their cutover
11:13 - 11:14

phase where they're doing actual
11:14 - 11:16

simulation of events. But you got
11:16 - 11:18

customers that are starting things out
11:18 - 11:19

and, you know, are quite sensitive to these
11:19 - 11:21

things, so you're going to run some
11:21 - 11:23

tabletops, walk through scenarios, and you,
11:23 - 11:25

sort of, gradually ease yourself into it.
11:25 - 11:26

So
11:26 - 11:28

each of these have their own, sort of,
11:28 - 11:30

very specific areas. There is no right
11:30 - 11:32

solution for, you know, there is no Silver
11:32 - 11:34

Bullet essentially. So I hope you've
11:34 - 11:36

enjoyed this overview/introduction into
11:36 - 11:39

testing of our disaster recovery and
11:39 - 11:40

business continuity. I hope you've
11:40 - 11:42

enjoyed this video. See you all in the
11:42 - 11:44

next video and thank you all for viewing.
11:44 - 11:47

Bye for now.

Title:: Types of Disaster Recovery and Business Continuity Testing: A Comprehensive Overview
Description:: more » « less
Video Language:: English
Duration:: 11:48

	OEVIDEOS edited English subtitles for Types of Disaster Recovery and Business Continuity Testing: A Comprehensive Overview
	OEVIDEOS edited English subtitles for Types of Disaster Recovery and Business Continuity Testing: A Comprehensive Overview
	OEVIDEOS edited English subtitles for Types of Disaster Recovery and Business Continuity Testing: A Comprehensive Overview
	OEVIDEOS edited English subtitles for Types of Disaster Recovery and Business Continuity Testing: A Comprehensive Overview
	OEVIDEOS edited English subtitles for Types of Disaster Recovery and Business Continuity Testing: A Comprehensive Overview
	OEVIDEOS edited English subtitles for Types of Disaster Recovery and Business Continuity Testing: A Comprehensive Overview

English subtitles

Revisions Compare revisions

Revision 6 Edited

OEVIDEOS
Revision 5 Edited

OEVIDEOS
Revision 4 Edited

OEVIDEOS
Revision 3 Edited

OEVIDEOS
Revision 2 Edited

OEVIDEOS
Revision 1 Uploaded

OEVIDEOS

	Revision Number	Author	Created
	6	OEVIDEOS
	5	OEVIDEOS
	4	OEVIDEOS
	3	OEVIDEOS
	2	OEVIDEOS
	1	OEVIDEOS

Types of Disaster Recovery and Business Continuity Testing: A Comprehensive Overview

Revisions Compare revisions

Our website uses cookies

Operating cookies (Required)