CoursesRefactoring content for migrationDeterministic and consistent IDs
Track
Replatforming from a legacy CMS to a Content Operation System

Refactoring content for migration

Lesson
4

Deterministic and consistent IDs

Reusing existing values from your content source helps prevent duplicate data and optimistically set strong references.

Log in to mark your progress for each Lesson and Task

In the Content Lake, document IDs (stored as the attribute _id) can be any unique string value to the dataset. Good to know is that Sanity Studio will automatically generate one using the Universally Unique ID (UUID) specification, while Content Lake uses the NATS Unique ID algorithm to automatically create document IDs.

However, when migrating content into the Content Lake, it is preferable to reuse a unique value from the source content. This helps with your script’s idempotency and allows you to construct references during migration without the need to query the dataset in advance.

Imagine your existing data source has documents like this:

[
{
"type": "post",
"id": 234,
"authors": [123]
},
{
"type": "user",
"id": 123,
},
]

You could convert this into Sanity documents with deterministic IDs and optimistic references like this:

[
{
"_type": "post",
"_id": "post-234",
"authors": [{"_ref": "author-123", "_type": "reference"}]
},
{
"type": "author",
"_id": "author-123",
},
]

To successfully write a new document that contains a strong reference, that referenced document must exist in the dataset or within the same mutation.

An added benefit to predetermining these IDs is that if the incoming data were separated—users and posts—you could create all of the “author” documents in one pass. Then, all “posts” in the next, and the references should be written successfully.

As mentioned, these IDs can be any string value. Still, a pattern we have often seen, which is easy to reason about, is to indicate the document content type and then use whatever unique identifier you can get or generate from the document. Here are some patterns and examples that you can use as inspiration:

  • contentType-recordID 👉 post-234
  • uniqueSlug 👉 hello-world
  • contentType-slug-publishDate 👉 post-hello-world-2010-10-11

You might want your document IDs to follow the UUID pattern but be deterministically generated from a string. Different packages on npm can do this for you:

import getUuid from 'uuid-by-string';
const uuidId = getUuid(record.uniqueSlug);
// d3486ae9-136e-5856-bc42-212385ea7970

The minor drawback with this approach is that you must pass the values through this function when creating references in other documents.

Review the different content types in your legacy CMS and think about their ID scheme. Are there unique values you can use?

Courses in the "Replatforming from a legacy CMS to a Content Operation System" track