-
we've spent a long time or a couple of
-
videos now discussing our Dr testing and
-
processes and our business continuity
-
testing and processes let's jump in and
-
take a look at some of the testing
-
methodologies across our both through
-
respectively to our disaster recovery
-
and business continuity and what's
-
available to us so in in order for us to
-
test and plan let's take a look at what
-
some of those are I'm going to separate
-
this out into two couple of areas here
-
and then we'll just sort of work through
-
this because there's a couple that I
-
want to sort of outline now the first
-
one is walkthroughs now we can outline a
-
couple of walkthroughs as we just
-
finished writing that out
-
walkthroughs is basically running our
-
tabletop exercises or scenarios right so
-
these are all sort of theory based and
-
they're sort of table top scenarios and
-
you may have a bunch of people that sort
-
of come around
-
that's horrible isn't it enabled top
-
scenarios I should write 10 for sure and
-
these are a bunch of people that we can
-
maybe get together in a boardroom so
-
we've got a table here and we can all
-
just come around here you might have a
-
few people and basically what we can do
-
is we basically give scenarios on how
-
we're going to handle and these are
-
obviously you know your
-
okay I can't draw so I'm just going to
-
remove that all together and just write
-
Board Room we're in a boardroom
-
okay that's good boardroom great uh so
-
basically when we're in the boardroom we
-
bring everyone around us and do and
-
perform either from the Dr team and we
-
sit around a tabletop and then you know
-
the leader of that you know who's
-
driving that who's got the initiation or
-
the actual delivery focus of that extra
-
scenario and we basically walk through
-
that scenario so they'll say okay well
-
we're gonna
-
walk through X
-
well walk me through this scenario on
-
how you're going to handle uh this
-
scenario and then you've got you know
-
your Dr team which are your I.T people
-
that's responsible for that you may have
-
your network team
-
you know your network engineering team
-
you have your systems guys girls uh you
-
may have your you know your your change
-
management team there you may have you
-
know your engineering maybe you've got
-
your Dev teams at Dev
-
engineering team you know and so on
-
right so you've got your I.T responsible
-
I.T people that are going to be
-
responsible for that process of
-
restoration should we go down that and
-
we'll run through you know uh you know
-
site goes offline and you know um you
-
know we've got three sites
-
Give an example we've got three sides
-
now I'm just sort of spitballing here so
-
bear with me if I don't get anything
-
right but I'm just going to walk through
-
this so we've got three sites alrighty
-
slight X goes down so this is location
-
I don't know Brisbane now Brisbane
-
branch has gone off-site well what do we
-
do
-
okay this is Sydney
-
this is Melbourne and obviously all
-
these are all connected and such and
-
then we've got our backbones back here
-
obviously they're connecting things as
-
well so obviously these are all our
-
backbone infrastructure well uh Brisbane
-
location goes offline for whatever it is
-
and someone you know walks into the data
-
center into the commons room and they've
-
tripped over the cable and now our data
-
center is offline well what do we do and
-
then we walk through that scenario on
-
how someone's going to recover from that
-
situation could be minor it could be
-
significant depending on the scenario so
-
basically the walkthrough is US walking
-
through it's the least amount of risk
-
with our Dr and obviously a business
-
continuity testing because we're talking
-
but we're not actually doing anything so
-
again we're going to involve this the
-
the you know the relevant people in the
-
parties and then we're going to walk
-
through that scenario based on those
-
smes and then obviously their knowledge
-
of how they're going to recover and
-
bring the systems back up to normal in
-
obviously a time-sensitive approach so
-
that's walkthroughs
-
uh to the other side we've got
-
simulation so we could run actual
-
simulations now simulation could be a
-
physical walkthrough or basically what
-
we call something like a mock event
-
and we give it a scenario we give a very
-
specific scenario and we walk through
-
what we're actually going to do so we
-
simulate what we're going to do if we're
-
using backups or the restoration of our
-
backups process well we would log into
-
the backup app server so if we're using
-
a specific vendor we're saying okay well
-
we're going to log into this server
-
we're going to click our restoration you
-
know and then our process is you know
-
restore hard drive X draw from you know
-
server Y and then that's going to take
-
maybe eight hours to do a full recovery
-
and then I'm going to take that hard
-
drive and then that's going to be our
-
state from how we're going to recover or
-
whatever that process looks like so
-
you'll simulate to the point of not
-
actually clicking
-
or doing anything it's to the point of
-
action right so you're gonna yes I'm
-
gonna log into the server I'm going to
-
look around here's what our hypervisors
-
here's our infrastructure and here's how
-
we're going to restore that process from
-
there we're going to log into this
-
vendors portal page we're going to get a
-
copy of our off-site backups whatever
-
that process looks like right so you run
-
through that mock simulation
-
um it touched equipment you trial it out
-
but to the point of doing it but not
-
actively executing it so you're not
-
going to go away and actually execute
-
your recovery you're just going to
-
basically simulate it up to the point of
-
of doing it uh from here on then we've
-
got something to do with a parallel
-
hist and parallel testing is something
-
like uh basically if we have two
-
environments and you might have
-
something like a prod
-
and test environment
-
that is probably this test and then with
-
parallel test we would recover our
-
production environment in that test
-
environment so we would go through all
-
the restore process but not take
-
production offline so I'm going to say
-
not offline
-
this basically just been doing you know
-
we're just going to go away we're going
-
to test and ensure the backups are
-
working correctly if there are any folds
-
or lessons to learn or issues that we
-
need to Define then we know what they
-
are we're aware of those and everyone
-
knows what to do so we're not taking
-
production offline production remains
-
online we're just going to take our
-
obviously
-
take our recover our production
-
environments we're going to take our
-
product environment and then we're going
-
to replicate that into our test
-
environment so we've got a test bed and
-
we're going to see how that process kind
-
of looks but we're not going to tinkle
-
with or touch our production and
-
production will remain online and
-
testing now the other part of that is
-
our cut over and the cut over is quite
-
similar in that nature
-
um again that sort of broad test
-
scenario so I'm going to use that so
-
let's just go Broad
-
and then test
-
and similar to that where we would go
-
well we're going to store our prod
-
service and then take broad offline and
-
bring the restored service online so
-
it's a full test there's Interruption
-
involved
-
um you know obviously interrupting
-
production as well so we're going to
-
obviously do the switch over and
-
obviously Interruption of some sort
-
right now even if it's a minor
-
Interruption of you know a second or two
-
that is still that's still an
-
interruption right so there will be some
-
sort of interruption but the cutover
-
test is the full given caboodle right
-
it's the four tests it during that's the
-
highest risk because if something does
-
go wrong during that cut over
-
um obviously then it's going to be an
-
outage so you have to be very mindful of
-
if you're going to do a cut over in any
-
state of testing that you've either done
-
a parallel test or you've done some sort
-
of mock simulation you've sort of
-
rehearsed it you understood it not just
-
go and do a cut over straight away now
-
if you're a smaller environment and you
-
don't have really much to impact
-
I'm still cautioning against it because
-
a lot of things can go wrong we want to
-
avoid any disruption or keep that as
-
minimal as minimal as possible again I
-
probably wouldn't advise that we turn
-
off the infrastructure or turn off the
-
service per se I'll probably keep them
-
online or maybe disconnect them from
-
their Network ports that way the servers
-
still remain online if anything does go
-
wrong we can obviously plug them in and
-
obviously you know get things back up
-
and running depending on you know the
-
complexity and depending on how things
-
are situated and what's dependent on
-
what so we want to make sure that we're
-
reducing risk and keeping our downtime
-
minimal as possible so again kind of cut
-
over is running through that actual
-
simulation and actually doing everything
-
and then restoring it in into your
-
product environment so it will go away
-
you'll restore your test and then you'll
-
restore it back into prod again you
-
would go through the full cutover so you
-
will turn off the appliances if you do
-
want to otherwise you can just
-
disconnect them from the network
-
connection you know depending on how you
-
actually want to run the card over but
-
the cut over essentially is running that
-
full test from there
-
once we've done everything then we want
-
to go over and document and this is the
-
most vital part as well as equally
-
important as the rest of them because we
-
are going to want to document and keep
-
things updated right so RPO
-
RTO so
-
our point of um our Point objectives so
-
what is our return of Point what's our
-
time objective what do they look like so
-
did we meet those objectives so I'm
-
going to say meet
-
objectives because obviously
-
everything's going to have some sort of
-
metrics associated with it so did we
-
meet this did this occur in the right
-
manner of the right time do we need to
-
work on it did something go wrong is
-
there room for improvement so room for
-
improvement
-
right that's an item room for
-
improvement because you had something
-
that's going to need Improvement right
-
did we do something wrong we were not
-
aware of something did something need
-
some training to do something else you
-
know it's a multiple of a multitude of
-
different issues that
-
um you know we we can improve on so
-
that's one and then third Point here
-
that I want to sort of mention is
-
lessons mode so Lessons Learned is what
-
are our key takeaways did we identify
-
something that needs updating because
-
something was missed did we maybe
-
um change a backup solution and did we
-
not know how to you know do we now need
-
to account for those plans and document
-
them plus you know lots of other things
-
right so we don't know what the solution
-
is you know if if we've maybe gone
-
through that solution around and we've
-
maybe implemented a change solution you
-
know
-
do now we do we now need to account for
-
that right so if we've got that solution
-
there maybe we have an account for it so
-
that could be something that in lines
-
about documentation or maybe a role of
-
responsibility with who is now
-
responsible for that maybe that was
-
missed
-
um there's obviously a lot of things
-
that come out of the Lessons Learned
-
basically what you're saying this
-
Lessons Learned is what have we defined
-
and what did we learn during that
-
exercise and then this could be through
-
a procurement this could be
-
technological this could be leadership
-
this could be documentation this could
-
be report this could be you know a bunch
-
of different areas that could improve
-
across through that continuous cycle of
-
improvement around our disaster recovery
-
and business continuity so you know
-
that's the three sort of areas around
-
testing our disaster recovery and
-
business continuity so going through
-
your walkthroughs and there's obviously
-
depending on the appetite of the
-
organization there's no right solution
-
here for for anyone it's just what works
-
and each customer or each people
-
business are at different phases right
-
you've got maybe six customers that are
-
doing you know cut over testing because
-
they're highly mature they've done
-
simulations they've done parallel
-
testing
-
and yeah they just set up to cut over
-
face where they're doing actual
-
stimulation of events but you've got
-
customers that are starting things out
-
and you know quite sensitive to these
-
things so you're going to run some
-
tabletops walk through scenarios and you
-
sort of gradually eat yourself into it
-
so
-
each of these have their own sort of
-
very specific areas there is no right
-
solution for you know there is no Silver
-
Bullet essentially so I hope you've
-
enjoyed this overview introduction into
-
testing of our thus recovery and
-
business continuity I hope you've
-
enjoyed this video see you all in the
-
next video and thank you all for viewing
-
bye for now