Shifting Paradigms
Cost management in the cloud has become one of the primary challenges facing organizations as they move from on-premises data centers to cloud-based workloads. In the cloud, organizations pay for the what they build, not what they use. I have personally worked with organizations that were able to save more than 34% by implementing proper governance, capacity planning, and workload monitoring.
Unfortunately, the paradigms used to fill data centers with servers and blades do not apply well to the public cloud. In the past, IT leaders planned for future growth when making hardware purchases by buying capacity above what was immediately required. This is not necessary in the public cloud and will result in excessive spending on resources that are not needed. In many organizations, the data workloads are the largest, most resource-intensive workloads and require the most planning to move.
Controlling costs in the cloud starts with proper governance related to capacity management, workload analysis, and configuration management. The SentryOne Monitoring Platform provides the workload performance details your operational teams need to control costs by ensuring the proper sizing of your cloud resources.
Rightsizing Your VMs
The first step to properly sizing a cloud implementation is to perform a detailed analysis of historical workloads on the servers and instances you plan to migrate. You pay for what you provision in the cloud, not what you use. While there may be a natural impulse to over-provision the initial cloud deployment and downsize as necessary, my experience is that the downsizing never happens. Try telling your DBA team that you are going reduce the available CPU and memory on the production SQL Server instances and see how they react!
It is not uncommon for organizations using a lift and shift methodology to the cloud to overspend by 35% or more for resources that are not being used. Putting in the effort to properly size your cloud deployment will have immediate and substantial ROI.
Taking a data-driven approach to rightsizing requires you to analyze historical performance data for each VM being provisioned in the cloud. When considering a lift and shift strategy for moving workloads to cloud VMs, the following elements will directly impact the cost of your implementation:
- Compute Resources (CPU/RAM)–Unlike on-premises installations, which are built to accommodate future growth, cloud deployments should be rightsized to more fully utilize resources. Systems that typically average less than 40% CPU utilization should be built to run at a higher average utilization while accommodating for utilization spikes. The ability to easily scale up CPU and memory means that there is no reason to pay for unnecessary headroom.
- Storage–Disk I/O is very often the first bottleneck encountered in data-intensive workloads. It is critical to baseline your workload and provision the correct cloud storage to accommodate the throughput requirements. In addition to performance tiers (e.g., Basic, Premium), there are often I/O limits by volume. In many cases, disks can be aggregated into larger volumes with higher I/O limits.
- Outbound Network Traffic–Each of the major cloud providers charges for network egress. It is not uncommon for outbound network costs to rival those of compute and storage.
- High Availability/Disaster Recovery (HA/DR)–As a general rule, the lower your Recovery Time Objective (RTO) and Recovery Point Objective (RPO), the more expensive the solution will be. It is important to have a clear understanding of the RTO/RPO requirements for each of your business applications and underlying databases. Implementing a tiered model such as the one shown in the table below will yield substantial savings
Tier | HA Method | DR Method | RTO | RPO |
T1 | Always On | Always On | Minutes | Minutes |
T2 | Clustering Log Shipping | Log Shipping | Minutes to Hours | Hours |
T3 | Backup/Restore | Backup/Restore | Hours to Days | Hours |
Sample HA/DR tiers
Workload Discovery
Properly planning and sizing your cloud implementation requires a detailed understanding of your database workloads. SentryOne offers several tools to help you size your cloud architecture to save money.
Performance History and Baselines
SentryOne SQL Sentry collects and stores historical performance data for the entire SQL Server system, including critical compute and storage metrics, as well as those related to specific queries and workloads. You can use built-in baselining capabilities to define the required resources used to size your cloud implementation. Custom baselines can be used to ensure workload metrics specific to your organization can be built in order to define useful monitoring thresholds and for capacity planning.
SQL Server performance baseline
When analyzing performance baselines, it is important to capture time ranges that include peak workloads, such as quarterly reporting or data-load operations.
System CPU baseline
Documentation
Creating and maintaining documentation of your data estate is a time-intensive undertaking. However, migrating to the cloud without a clear understanding of your data inventory and dependencies increases the likelihood that something will break after migration.
Using SentryOne DOC xPress to create a centralized repository of each of your database, ETL, and reporting servers will speed this process up and allow you to identify potential risks that could result in downtime. DOC xPress stores all the critical configuration information for each server and the databases housed on the server. Additionally, the Data Lineage feature provides detailed dependency mapping that can be used to identify the potential impact of changes before they are made.
Additional Reading
This post provides an overview of the importance of using baselines and capacity planning to properly size your cloud implementation. See the resource list below for in-depth articles and white papers on this topic.
SentryOne Baseline Documentation
Setting SQL Server Performance Baselines and Alerts
Denis McDowell is a Microsoft Data Professional with over 20 years of experience with SQL Server, scripting, virtualization, IT operations, and cloud implementations across multiple industries. He currently works as the Solutions Engineering Manager for SentryOne and owns a consulting practice in the Charlotte, NC area. Denis has been a regular presenter at local, regional, and national conferences and events, including PASS Summit, Microsoft Hosting Summit, and numerous SQL Saturday and SQL User Groups around the U.S.