Now more than ever, businesses need to understand the risks associated with their daily processes and make changes to safeguard their future and the financial security of their employees and shareholders. Most companies with any kind of IT infrastructure are aware that it is a necessity to have a Disaster Recovery (DR) plan. Business Continuity (BC) picks up where DR leaves off.
The scope of DR is usually relegated to certain situations in IT, whereas an adequate BC plan starts with performing a full Risk Assessment (RA) of all critical business applications and procedures for every business function in an organization. The Business Impact Analysis (BIA) is crucial to set the correct priorities when creating your living Business Continuity Plan (BCP) for the Microsoft Data Platform.
1. Auditing
As mentioned previously, the BIA is an essential part of any organization’s BCP. Setting its scope is extremely critical. The BIA should contain several key components, including:
- Exploratory phase to reveal any vulnerabilities in business processes
- Financial audit of the cost to the business should any of these risks occur
- Strategy phase to remove or reduce the chance of these risks from happening
- Reporting phase to pull all this information together for Senior Management to review and allocate the necessary budget
However, you should not forget that staff are a resource just like your IT infrastructure. Areas such as Health and Safety are often overlooked. If there were a natural disaster, how many employees with medical training do you have? Are they all allowed to book annual leave at the same time? Would there be sufficient medical supplies? Loss of human life is always tragic. Looking at this scenario from a business perspective, the loss of life comes at an additional cost of loss of knowledge about the business. This is the reason some companies ban key employees from traveling together.
2. Setting SLAs
Several service-level agreements (SLAs) need to be factored into BCPs. Below are three of the most prevalent:
- Recovery Point Objective (RPO)—The amount of data that a business is willing to lose from a downtime event.
- Recovery Time Objective (RTO)—The target time for the resumption of a critical activity after an event.
- Maximum Tolerable Period of Disruption (MTPD)—The point at which an organization’s viability will be threatened if critical activities cannot be resumed.
It is critical to factor these SLA definitions into your BCP. Please note that this is not a one-size-fits-all approach. There will be some applications and processes that can have more or less tolerance to data loss than others. Once the RA has taken place and your BIA has been completed, you can start to assign the correct SLAs to these processes.
3. Monitoring Your Business Processes, Not Just Your Infrastructure
“Lights On” monitoring is the most basic form of monitoring that an organization can deploy. It is usually a set of simple scripts or a freeware application that sends some form of notification when a server has gone offline. To be able to provide a high level of service to your internal and external customers, you will require much more information. More importantly, you should be ready and able to act on that information to provide an alternative solution to keep that business process running.
Monitoring is your first line of defense when it comes to DR, HA, and BC planning. More advanced monitoring solutions offer the ability to trigger events when certain scenarios occur. The best monitoring solutions allow you to create these scenarios yourself rather than choosing from a limited list of predetermined events. This empowers you to implement and automate your own BC processes within such a solution.
4. Communicating Issues
Notifying the correct people at the correct time with the correct information to implement next steps is critical to the success of any BCP. The best monitoring solutions will allow you to send notifications using several different platforms, including Slack, Skype, email, and pager.
One often overlooked aspect of BC is noise. Extreme noise and complete silence both present very different problems that need to be addressed. A lack of notifications in and of itself can be considered a good thing. However, have you considered that an Internet outage or your email server going down could prevent notifications from reaching you? A best practice to guard against this is to create false positives at certain times to ensure that notifications are still being received.
Extreme noise can lead to lapses in concentration. It is human nature to look for patterns in things; once noise becomes a pattern, there is a risk that action will not be taken when a truly critical event is hidden in the background noise. When you are creating your conditions, make sure that you tune your notifications carefully to filter out unwanted noise by setting the correct scenarios for each condition.
5. Automating Procedures
In the previous section, I mentioned how notifying the correct people was crucial to the continuity of your business. What if you didn’t have to notify them? What if there was a way to automatically address several issues that could stop the continuation of your business? The best monitoring tools have features in place to allow you to automate scripts to fix certain problems usually associated with your IT infrastructure. The primary benefits of this capability are a reduced RTO and knowing mistakes will not be made under pressure.
Reducing the amount of repetitive work staff must do also reduces the costs associated with their wages. Team members are now able to add more value by working on more pressing or strategic engagements that affect your organization’s bottom line.
Once risks have been identified in the BIA, you then need to develop a way of mitigating that risk. If in this strategy phase there is a way of automating this process, then the best monitoring solutions will be able to help you.
For a more comprehensive guide on planning your BC processes, download the full white paper, “Business Continuity Planning with the Microsoft Data Platform.”
SentryOne SQL Sentry
SentryOne provides the most capable, scalable solution in the industry for monitoring the performance of SQL Server and the entire Microsoft Data Platform—so you can be confident your IT and data teams can proactively manage data performance to prevent downtime.
Download a free trial of SentryOne SQL Sentry or schedule a one-on-one demo to see our flagship product in action.
Richard (@SQLRich) is a Principal Solutions Engineer at SentryOne, specializing in our SQL Server portfolio offering in EMEA. He has worked with SQL Server since version 7.0 in various developer and DBA roles and holds a number of Microsoft certifications. Richard is a keen member of the SQL Server community; previously he ran a PASS Chapter in the UK and served on the organizing committee for SQLRelay.