Our customer Getty Images was looking for a way to increase the level of user satisfaction by improving the performance of their enormous database system. They wanted to leverage operational best practices to increase resiliency/uptime, performance, and scalability at the lowest possible cost. Their overall goal was to move all of their applications out of their datacenter and over to Amazon Web Services through a lift-and-shift process. Watch the following video to discover how we approached this project.
This blog post documents the best practices for moving SQL Server to AWS, specifically the Amazon EC2 and RDS environments which we developed as part of this project.
How we migrated a major multimedia conglomerate to the cloud
The first consideration we needed to help Getty Images move SQL Server to AWS was to determine which workloads should move to SQL Server running on EC2 instances and which workloads should move to SQL Server on RDS.
The key lies in understanding these points:
- What is your current database and infrastructure technology and what is the appropriate match in AWS?
- What resources do you have to support your database technology in AWS and do those resources more closely favor a path to RDS or EC2?
- How do you approach HA/DR and will it be more easily mapped to RDS or EC2?
- What are your data egress and ingress requirements and how will they affect your charges?
DB Best took a data-centric approach to determine which workloads are most appropriate for the cloud. We use our in-house DBMSys platform which uses an agent-less data collector that gives us a profile of each workload environment such as OS, CPU, RAM, Storage. We then collect various performance counters overtime to determine CPU, disk latency, network latency, and other factors to help compute the desired configuration on AWS. Check out our Data management Optimization for AWS page for more details on how this works.
We also work with our customers to understand the business needs/restrictions of each workload to determine a list of potential candidates that can be moved to the cloud based on understanding RTO and RPO requirements as part of our business impact analysis.
Choosing between Amazon EC2 or RDS for moving SQL Server to AWS
AWS provides a white paper Deploying Microsoft SQL Server on Amazon Web Services that features SQL Server 2014 written in 2016. Since then, Microsoft has shipped two new versions of SQL Server and AWS has made twelve major improvements to RDS for SQL Server including support for SQL Server 2017.
As stated before, when it comes to migrating SQL Server to AWS, the first major decision you have to make is whether you’re moving to EC2 or RDS. In general, our first choice of which to use starts with RDS. We then go through a process of elimination based on high availability, performance, and database features.
Of course, there is already a plethora of material about this available online, while reading this material it’s important to make sure you always check for the most up-to-date information, as the RDS platform continues to advance and the limitations and unsupported features it had a few months ago may have been eliminated. If RDS won’t work or it’s too much effort to create a workaround, we’ll shift gears and go with EC2.
So, how did we decide which great AWS platform would best fit Getty Images? Well, first we asked our own experts how we might differentiate between the two.
Why RDS from a DBA’s Perspective
No need to use any of the limiting features – As mentioned before, this is an ever-shrinking list, but currently includes UNSAFE assemblies, file access, multi-region replicas, and a few more niche features that are needed for very specific cases.
Want to take advantage of the “quiet-at-the-management front” aspect of RDS, which currently includes:
- Automatic backups
- Automatic patching
- Seamless replication
- Super-easy point-in-time restore
Making sure ALL databases are being backed up successfully on a regular basis alone can keep me busy for hours at a time, if it’s disk space running out, or some random developer adding a database in the Simple recovery model.
RDS covers all these and allows DBAs to literally sleep better at night.
Why EC2 from a DBA’s Perspective
Well, that’s a simple answer – if the SQL Server can’t do without a specific feature that is not supported by RDS or requires more resources than are currently offered by the largest RDS instance.
After our experts weighed in on the difference between the two services we then consulted our customer’s data infrastructure, looking at the decision as best we can, from their perspective.
Considerations RDS for our customer
Based on the criteria we gathered for high availability, disaster recovery, performance, scale, and SQL Server features used within the application, our default is to always consider RDS as the deployment target because of the great platform as a service features it provides. Regarding our customer’s data environment, we looked at the following considerations:
- Are there any features that applications use that are not supported by AWS RDS for SQL Server? We use the reporting capabilities for the AWS Schema Conversion Tool (SCT) to identify incompatible features. AWS does a great job of updating to align with the ever-shrinking list of unsupported SQL Server features with RDS.
- Can we reduce the administrative burden with features like automatic backups, automatic patching, seamless replication, and super-easy point-in-time restore?
Considerations for SQL Server on EC2 for our customer
- Use of unsupported features. SCT provides a report of the unsupported features and offers suggestions on how to work around the issues. If the workaround has little impact on the application code, then we will consider it based on the cost to deliver the change. On the other hand, if the workaround is significant, going with EC2 makes sense.
- Hardware resources are not sufficient to support performance and high availability. RDS instances are quickly aligning to the largest of instance types. As of July 2018, SQL Server 2016 & 2017 RDS on Enterprise Edition supports the db.r4.xlarge–16xlarge instance type with 64 vCPUs, with 195 ECUs, and 488 GiB. However, if a larger machine is required or there is a higher level of availability that requires a feature like Always On Cluster Failover Instances and/or Availability Groups, then EC2 is the answer.
Creating a highly available solution for Getty Images
In looking at the current deployments of SQL Server at our customer’s data center, we realized that they could benefit from creating service-tier configurations based on their high availability needs. The example, that we are sharing is the basis of their “Stainless Steel Standard” for a high availability environment across multiple data centers.
Because the solution requires the use of SQL Server Always On Availability Groups with the need to have a secondary replica in a different availability zone as the primary replica and another secondary replica in a different data center, running SQL Server on EC2 was our only option. This is because RDS uses Database Mirroring which only supports one secondary replica.
Why consider the service tier approach?
The use of service tiers or levels makes it easier to plan and deploy standard configurations that are defined based on the business analysis impact and RTO/RPO requirements. Without having to dig into the different types of metal that organizations use to determine configurations, we like to think about the tiers as follows:
- Mission Critical Applications. If your business will suffer significant loses in terms of immediate revenue or long-term revenue due to a poor customer experience, your solution fits here. A classic example would be Amazon.com.
- Business Critical Applications. If your business won’t suffer from any significant revenue or productivity loss if the system is down for a few hours, your solution fits here. In many cases, moving SQL Server to AWS RDS works well here with regular backups and database mirroring. A good example here would be a law office with a content management system. People can always work locally, but after a few hours, you will see a productivity problem when having to collaborate on documents.
- Departmental Applications. If you can afford to be down for a few days, would only require a simple backup and restore of the database, RDS works well here too.
- Dev/Test Configurations. In this case, the configuration should match the backend. For example, if you are spinning up a smaller version of a mission-critical deployment for your development team that uses SQL Server on EC2, then your dev/test deployment must match. Otherwise, you run the risk of using incompatible features of SQL Server.
Let’s talk about architecture
For our customer’s “stainless steel” configuration we developed the following criteria on when it should be used.
- Applications that do NOT require DR should use a lower level architectural standard.
- Applications that require the DR site to provide immediate High Availability should use a higher level architectural standard.
- Should be used when both HA & DR are required.
- Should only be used when the application(s) it supports require warm disaster recovery without immediate high availability in the disaster recovery region.
In this example we show a mission-critical service tier that we created for our customer:
|EC2 Instance||YES||4||2 SQL Full size + 1 Down Sized, 1 File Share|
|SQL Availability Group Nodes||YES||3|
|Windows Cluster File Share Witness||YES||1|
|AWS Regions||Yes||2||West & East|
|AWS Availability Zones – In Region||YES||3|
|AWS Availability Zones – Cross Region||Yes||1|
This next table is used to describe the RTO and RPO between different regions within the environment we built in regard to SQL 2016 and SQL 2014:
|SQL Version||In Region (Minutes)||Across Region (DR) (Minutes)|
|2016||RTO Instance Failure||3||75|
|RTO Database Failure||3||75|
|2014||RTO Instance Failure||3||75|
|RTO Database Failure||30*||75|
Tools used for deployment
Getty Images already invested in GitLab for their development team to use to support their DevOps practice within their datacenter. In addition, they had chosen Terraform for automating infrastructure deployment within their VMware environment. The good news was that both GitLab and Terraform can work together with AWS. At some point in the future when all our customer’s servers are on AWS, they could shift to AWS’s development tools.
With the configuration for the Stainless Steel defined, we then worked with our customer to create a parameterized deployment template for standing up production deployments. The concept can easily be applied for business critical and department deployments using RDS, but that’s another blog post.
In conclusion, we helped a major multimedia conglomerate successfully migrate their data infrastructure based on SQL Server to AWS with minimal downtime. We consulted our own experts in the field and used our internally developed tools to gain a clear and comprehensive idea of not only their current data landscape but how best to move that data landscape to AWS. Our experience in the industry made it so our customer chooses the best options available to them at as little cost as possible. We continue to work with Getty Images today, monitoring their systems and training their team to best manage their new and much-improved data infrastructure.
Contact us if you’d like to see how we can help you save IT cost by migrating your on-prem systems to the cloud.