Zero-Downtime PostgreSQL Migrations: 4 Tips For Success With AWS DMS
Major PostgreSQL version upgrades in AWS RDS are a pain. Sure, it’s easy enough to point-and-click in the console to select a new version number, but this is guaranteed to take your database offline for some time.
When we started planning our upgrade to Postgres 13, we ran into an additional hurdle: the T2 instance type supporting our database wasn’t compatible with recent versions of Postgres. We needed to migrate to new hardware before we could proceed with the upgrade.
We stood up a test environment and timed the upgrade process. The results weren’t encouraging; doing it this way would cost us nearly 25 minutes of downtime. We considered overnight maintenance, but film and entertainment is a 24/7 industry. It’s not unusual for our customers to enter timecards in the small hours of night. Not to mention that overnight work is a recipe for burned out engineers and clumsy mistakes.
We were determined to find another way.
DMS For PostgreSQL
We evaluated a number of options for a zero-downtime migration and ultimately settled on Amazon’s Database Migration Service (DMS). The key factor in this decision was the ease with which we could set up an automated, repeatable migration process.
This isn’t to say DMS is a perfect, turn-key solution. We ran into a number of obstacles along the way and finding solutions to them wasn’t always easy. In this article, I’d like to share four tips I wish I’d known when we started down this path.
Tip #1: Yes, Zero Downtime
During the evaluation phase, there was one question that was surprisingly difficult to answer: Will DMS actually allow us to migrate without any downtime? The official AWS docs are coy about this particular point, using terms like “near-zero” and “minimal”. Third-party sources were similarly unhelpful, some suggesting a fully online migration was possible and others insisting a short period of downtime was necessary.
For those who are struggling to answer this same question, let me say it in plain terms: yes, it is entirely possible to do a zero-downtime PostgreSQL migration with DMS. It worked for us and it can work for you. With properly set sequences (see Tip #4), DMS can facilitate simultaneous writes to both your source and target databases during the Change Data Capture (CDC) phase.
I’ll caveat this by saying that your mileage may vary; your use case may not be the same as ours. Always test your migration plans before executing them in production.
Tip #2: Check Your Column Types
Without a bit of prep work, DMS may mangle or entirely fail to migrate some of your columns. This is due to the way DMS works to support migrations between different engines: data is converted to a DMS-internal format before being reconverted to the appropriate format for the destination database. In some cases, the DMS pre-migration validation step can detect these incompatibilities. However, in our tests, some columns that passed validation still ended up with truncated or rounded data.
Type compatibility is listed in the DMS documentation and should be reviewed carefully. In our case, the offenders were NUMERIC, DECIMAL, CHAR, and VARCHAR. We were able to do a simple schema migration to change these incompatible columns to equivalent types.
Tip #3: Test Against Production Data
When we started testing our migration path, we started with a dry-run in our staging environment. This went perfectly, but we decided to try against a production snapshot as well, just to be safe. I’m glad we did, because we realized that the test data for one of our columns wasn’t representative of reality. In production, data in this column exceeded a DMS limit that caused it to be truncated! We were able to migrate the column to a fully supported data type and avert production data loss.
Another benefit of testing against production data is to estimate migration time. Our migration tests in staging took a few minutes, but when we tried against the production snapshot it took nearly ten hours. Thanks to this test, we knew to kick off data transfer the day before our scheduled migration.
Tip #4: Some Database Prep Required
DMS will migrate your data, but it won’t copy over some crucial resources. Without a little bit of prep work, you’ll run into some major problems. We prepped our database with the following three steps:
Indexes: We needed to recreate the secondary indexes on our target database. If you’re using a framework like Rails with ActiveRecord, an easy approach is to simply apply your schema to the target database before beginning your migration. Be sure to switch the
TRUNCATE_BEFORE_LOADso that DMS doesn’t drop your newly-created tables!
Sequences: We needed to manually copy sequences between databases. This requires a bit of planning, since sequences on the source database will continue to increment until application connections are fully cut over. We took the approach of doubling sequences across the board to ensure there wouldn’t be any conflicts.
Permissions: We needed to recreate all the users, groups, and privileges on our target database. I recommend listing all these resources on your source database to be sure you’ve got them all! We missed recreating a read-only user, which caused problems with one of our BI reports after our migration.
Early in the planning process for our Postgres migration, I was chatting with an engineer at Google about the project. She shared with me her maxim about databases that, “the further we move away from flat files, the further we stray from God.”
I can’t say I disagree with her. Data migrations will always be fraught processes that need to be approached with careful planning. However, I hope that with the insights I’ve shared, you can leverage AWS DMS to make your next migration just a little easier.