CoursesRefactoring content for migrationGeneral migration principles
Track
Replatforming from a legacy CMS to a Content Operation System

Refactoring content for migration

Lesson
2

General migration principles

A developer guide to content migration covering idempotent scripts with incremental complexity and considered error handling.

Log in to mark your progress for each Lesson and Task

Let’s start with some high-level principles for engineering successful content migrations:

  • Idempotent migration scripts
  • Incremental complexity
  • Graceful error handling
  • Create/update your schema concurrently

When you write a program or script that can be re-run multiple times with the same result, it is called "idempotent." In our experience, content migration scripts usually have to be run multiple times throughout a re-platforming project.

You want idempotency because you want to avoid situations in which your script may recreate new versions of the same existing records, pages, or documents with each iteration. A key element of achieving idempotency is to have deterministic, that is, stable, IDs for your content records. This makes it possible to either skip (or intentionally rewrite) documents when you run your script(s).

Jump to Scripting content migrations for details on available methods to write content to Sanity at scale.
Figure out what you can use as a source for generating stable record IDs in your legacy system

Most re-platforming projects and content migrations have a lot of "unknown unknowns." So expect a degree of trial and error and learning by doing.

Another benefit to deterministic IDs and idempotent scripts is that you are free from the pressure of getting everything right in one script execution. Your first migration might only stage the documents' _id and _type fields. The next can add the slug. The next can add the title. And so on.

Figure out what content in your project would be the simplest to start with

Incrementally building out documents, coupled with real-time feedback from the Sanity Studio updating with incoming changes, makes for a satisfying—perhaps even addictive—feedback loop.

Your script will fail! But that's to be expected. You probably aren't dealing with a perfect, consistent, and predictable corpus of structured content (yet 😉). So let’s prepare for things to fail from the outset and handle those errors gracefully to save you time, effort, and frustration.

Your legacy content source is likely unreliable, and content rot will set in over time. Your existing API might return records for an image’s metadata, but the image binary is missing from the filesystem. Your source likely does not have referential integrity between documents (if it has ways to express such relationships at all!), so there may be broken or missing cross-references to taxonomies or authors.

Every retrieval for a record or an asset must handle a response where the content is corrupted, misshaped, or references missing content. Your migration script must handle these cases by ignoring or recreating that content.

Consider the control flow of your migration script. Simply logging errors to the console might not help. You may wish to throw on errors so that scripts will not proceed with importing invalid data.

As you know your legacy content system, what failures can you already anticipate during migration?

You might already have completed the Hello, Structured Content course and have an idea for the content model. Maybe you have even started to configure it. What's good to know is that the Sanity Studio schema is decoupled from Content Lake. This has the following implications for your migration projects:

  • Since the Content Lake is “schemaless,” you can create new documents with any structure.
  • You can configure the Sanity Studio schema after the fact to match the structure of your documents in the Content Lake.
  • You don't get schema validation in the Content Lake, so you must solve this in the migration scripts and the CLI tooling for bulk validating documents against the Studio schema.

It’s best to enter a migration project with a reasonably planned-out content model based on the ideas of structured content but leave room for changes based on what you learn when you‘re getting hands-on with the migration process.

Plan and configure your foundational content model in your Sanity Studio

Courses in the "Replatforming from a legacy CMS to a Content Operation System" track