Education Technology #14

50 | PROMOTION: ULCC

W: www.edtechnology.co.uk | T: @Educ_Technology

Disaster Recovery 101

By Richard Booth, Infrastructure Service Manager, ULCC

O

ver the past few months we have had various queries regarding disaster recovery (DR) services

and how they operate. In particular, there seem to be a growing number of Learning Technologists who are asking how they should protect their VLE resources in case of a disaster. An area where there appears to be

some confusion are the roles played by data backups and service replication within disaster recovery. The distinction between the two is briefly explored below and should hopefully help people understand their relation to the DR process.

Planning and KPIs What is a disaster? Typically it’s defined as a sudden, unplanned event that results in an organisation failing to provide critical business functions in line with contracted service levels. The most likely scenarios for the UK typically include: power failures, network connectivity issues, infrastructure component failure, security breaches, work place access, fire and flooding. In the context of IT service provision, a

Disaster Recovery Plan (DRP) is essential to manage your business critical systems should there be a seriously disruptive event. The question the DRP answers is; “if we lost our IT services how would we recover them?” The DRP is used to document a comprehensive and consistent set of IT procedures to be taken before, during and after a disaster. The document should cover all aspects of the DR life-cycle, including: likely events and scenarios, invoking DR, governance, budgeting, recovery strategies, implementation, testing, incident records and process documentation. Your DRP should also be owned

by a manager who has sufficient responsibility to make organisational or business level decisions. The IT director is usually the obvious choice for this

doesn’t usually turn out to be the most cost effective solution, when you look at the whole picture. There can be a whole raft of planning, external hosting and testing considerations that get overlooked and add to budgetary pressures. Disaster Recovery requires that the right people, software and supporting platforms are all present and available before your data can be restored in a meaningful way. Using backups as your only DR

solution is likely to substantially impact both your RPO and RTO

role, as they should sit on SMT, hold the DR budget and have a good view of the various operational functions.

Recovery Time and Recovery Point Objective If a disaster were to strike, your DRP should enable you to recover in the quickest amount of time (Recovery Time Objective or RTO) and with the least amount of data loss (Recovery Point Objective or RPO). As these KPIs are set by the organisation, the DRP should support these defined objectives. A DRP that doesn’t support the RPO and RTO is likely to lead to reputational and financial repercussions. However, a successful DRP will be your emergency manual if the worst happens, enabling you to return your organisation’s IT functions in good time.

Backups and DR The relationship between your backups and your DRP may not be as straight forward as first thought. Typically, they are performed as daily dumps and also fall into a broader weekly, monthly and yearly schedules. Their main aim is usually for compliance and the granular recovery of data during normal workplace operation; for example, the recovery of single files from various point in time increments or to recover from a single system failure. A DR strategy based on offsite backups

aspirations. Your RPO will be dependent on when your last data dump was taken and this could be up to 24 hours plus, depending on your scheduling. Your RTO will also be entirely reliant on how quickly you can restore your data on to a re-provisioned platform, which could be several days.

Benefits of Replication Data replication technologies have significantly helped the protection of services hosted on virtualised platforms. The main focus with replication is on business continuity and to ensure mission critical systems are highly available, even when a disaster happens. At ULCC, we have been using VMware’s

Site Recovery Manager (SRM) to replicate some customer services from our datacentre in central London to our remote facility in Maidstone Kent. This is an additional bolt-on service, which works seamlessly in the background and allows full, non- disruptive and auditable testing. As an aside, one trend we are starting

to see is that customers are becoming more interested on lowering their RPO at the expense of their RTO. Understandably, data loss seems to be becoming the main influencing KPI. ET

The original blog post first appeared on ULCC Infrastructure Services Blog on 17/10/2014. To read the complete article visit: htp://bit.ly/1HyWxON

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34 | Page 35 | Page 36 | Page 37 | Page 38 | Page 39 | Page 40 | Page 41 | Page 42 | Page 43 | Page 44 | Page 45 | Page 46 | Page 47 | Page 48 | Page 49 | Page 50 | Page 51 | Page 52 | Page 53 | Page 54 | Page 55 | Page 56 | Page 57 | Page 58 | Page 59 | Page 60 | Page 61 | Page 62 | Page 63 | Page 64 | Page 65 | Page 66 | Page 67 | Page 68 | Page 69 | Page 70 | Page 71 | Page 72 | Page 73 | Page 74 | Page 75 | Page 76 | Page 77 | Page 78 | Page 79 | Page 80 | Page 81 | Page 82 | Page 83 | Page 84 | Page 85 | Page 86