Skip to main content

MySQL to CockroachDB via the Oregon Trail

MySQL to CockroachDB via the Oregon Trail

Nobody likes migrations. Database, nor cross country. Unless you're in significant pain or there's great fortune at the end of the journey, most sensible humans want to stay where they are and not be pioneers. Do you recall the Oregon Trail? It was one of the best video games in the 80's. As a banker from Boston, you could set out on the trail to Willamette, Oregon in hopes for a better life. Although, a Boston banker doesn't sound half bad. But maybe the gold in Oregon was much more attractive than trying to make it as a fur trader in Independence, Missouri. I'm sure the environment in the mid 1800s goes well beyond the pain of database systems, but the motivation is similar: "I want something better, get me out of here."

The next question becomes, "How do I get out of here? What's the happy path?". Should we take Barlow Road or Columbia River Gorge? Should I take food or bullets? Oxen or Wagon Parts? The same goes with database migrations. What tools do I need and what's the right path to take. This blog will hopefully give you the happy path for doing MySQL migrations to CockroachDB so that you don't break an axel, hunt in scarcity or die of dysentery.

Full disclaimer, I work at Cockroach Labs, played the Oregon Trail too much in the 80s and have done about a 1/2 dozen MySQL migrations with customers to CockroachDB. Along the way I've learned a few things that I think the masses would like to know. Ok..grab your family, here we go.

First off, don't rely on the Cockroach Labs documentation for doing a MySQL migration. Forget this link for now. Our docs are generally terrific, but this assumes that the Oregon Trail is a paved highway, without traffic, and your wagon is a Tesla that never loses power. Additionally our MySQL import tooling still needs polish for handling compatibility with MySQL data types options, objects, etc. Let's avoid the wagon fire and move on.

Second off, many have tried doing migrations from MySQL to Postgres and then to Cockroach. This is a creative idea. The idea of using pg-loader here to facilitate a migration works really well for moving MySQL to Postgres. But unfortunately the migration path from Postgres to Cockroach is not a smooth passage. There's a fair amount of DDL, DML and data changes that will need to occur. This is essentially getting out in the trail with a ton of oxen but no spare parts. Again, get the kids in the wagon and don't look around here.

So what do we do? Well...a good old fashion manual migration with sed and bash scripts have proven to be the winning wagon. Maybe one day we'll have a Tesla to take us from MySQL to Cockroach, but right now...it's a wagon, some oxen, some bullets and some clothes. It's slow, it's uncomfortable, it's old but it will get you to Oregon.

Fortunately, my colleagues and I have prepared several steps and scripts so that you don't have to. There's a MySQL 2 CockroachDB repository that describes all of the paths above in detail. Again, the best one is the manual migration. The steps are fairly straightforward.

The high level steps for doing a MySQL migration are essentially this:

  • Load your mysqldump file into a local MySQL database
  • Export out constraints out of MySQL
  • Export out a schema only dump file out of MySQL
  • Reformat the schema (schemaFormat.sh)
  • Import the schema into CockroachDB
  • Export tsv files out of MySQL
  • Import the data into CockroachDB using IMPORT
  • Apply constraints in CockroachDB

I would encourage you to take the journey and provide us any feedback on ways we can improve the wagon ride. If you have any comments, feel free to add them to the blog post here or create issues in our repository for us to fix.

Comments

Popular posts from this blog

Part & Pin for the Inverted Index Win

Part & Pin for the Inverted Index Win "How to Partition Inverted Indexes in CockroachDB" Recently I worked with a CockroachDB user who had a GDPR locality requirement.  Naturally CockroachDB was a fit since it sports geo-partitioning capabilities for data and objects.  However, the user ran into a snafu where they could not partition and localize data in inverted indexes.  The example they had was customer data existed in a JSONB data column that had an inverted index on it.  The data in the table was already partitioned by country and pinned to servers / nodes that were in that country.  So they were good with table partitioning and pinning.  But the data in the inverted index could theoretically be distributed anywhere in the cluster and also needed to be partitioned and pinned to the proper locality.  I.e.  A German customer must remain in Germany.  This caught me by surprise as well that you can not partition inverted indexes o...