Adventures with AWS DMS

The fundamentally broken AWS Database Migration Service

Adventures with AWS DMS

Look, I get it. Databases are rarely sexy. No one really likes talking about indexes, or data de-normalization, or stuff like that. But like it or not databases have a fundamental and important role to play in just about every modern application, whether those databases are large multi-region, multi-replica data warehouses that span hundreds of terabytes, or small embedded or in-memory databases; each database has a role to play in its ecosystem. But outside of the in-memory or embedded databases like Badger or SQLite one of the most challenging problems is actually managing those databases over the lifetime of the data stored in them. Its easy to simply stand up a new PostgreSQL DB and drop some data into it, but actively maintaining that data and migrating it to a new database when you need to upgrade database versions or migrate database engines is a very real and very hard thing still to this day. Managing state is hard and databases are nothing if not a collection of various states. So what are you to do? Well, AWS would like you to think that its Database Migration Service (DMS) is the answer.

Its not…not really.

There’s two kinds of database…well…data. There’s what I colloquially call schema — the bits and bobs that make up the configuration of the database, like sequences, defaults, columns, tables, etc. — and there’s the actual data, the values that populate your tables. When migrating a database with DMS you would expect that both of these things — schema and data — would be migrated. And you would be wrong.

Like most things AWS we’re not exactly sure how they work “under the hood” but we can glean certain things from logs and behavior. For example: we can infer that DMS handles LOB’s (large objects, which in database terms is pretty much anything that is type JSON, blob, text, or the like) rather strangely from the error that arises if you have a NOT NULL constraint on a LOB column. Because this error is raised its safe to assume that DMS creates rows with the LOB values omitted as NULL then after the fact performs an update and inserts the LOB data in its row. Why is it this way? You’d have to ask them. But it does mean that you can’t have NOT NULL constraints on JSON columns in PostgreSQL and use DMS. But back to the main point.

DMS doesn’t care about default values, sequences, indexes, views, or any of the other “ancillary stuff” that makes a database work.

When migrating data from your source database, DMS only migrates enough schema to be able to migrate the data in your database. That’s it. DMS doesn’t care about default values, sequences, indexes, views, or any of the other “ancillary stuff” that makes a database work. All it cares about is the data. And sure, you can make an argument that managing the data is the hardest part, but this seems like a gross oversight to me. If you’re moving a database you want to move all of the database, not just the parts that DMS cares about.

So what are you to do? Well, the best way I’ve come up with is a combination of DMS and a manual dump and restore. The process is something like this:

  • Set up DMS and start your replication
  • Take a dump of the source DB’s schema using pg_dump and selecting “schema only”
  • Make any necessary changes to your output .sql dump, like changing table owners to a new username
  • Write a script that transforms CREATE TABLE statements to ALTER TABLE statements
  • Restore your altered schema into the new database

(Alternatively, and what might be the best way, is to take a manual snapshot of your RDS instance and just upgrade your DB in-place.)

“Generally Available” doesn’t mean “production ready”

For those of you reading this that are veteran AWS operators you’ll recognize this sort of smell. Its the smell of something that was “done” but not “complete”, and I’m sure as many of you know by now — a lesson that I’ve learned the hard way with Aurora PostgreSQL — “generally available” doesn’t mean “production ready” in the AWS world. But this, more so than some of the issues with Aurora PostgreSQL’s write performance or any other “GA but not prod ready” type AWS issues, seems like a gross oversight and a complete miss of the entire purpose of a service like DMS. I am hopeful in the coming months that AWS can address this and push out a new version of DMS that makes a complete 1:1 database migration possible, but for now I’m afraid we’re stuck doing things the old-fashioned way.

Have you had a different experience with DMS? How do you feel about AWS releasing products that aren’t ready for production use? Let me know in the comments!