Deterministic and consistent IDs - Refactoring content for migration

Reusing existing values from your content source helps prevent duplicate data and optimistically set strong references.

In the Content Lake, document IDs (stored as the attribute _id) can be any unique string value to the dataset. Good to know is that Sanity Studio will automatically generate one using the Universally Unique ID (UUID) specification, while Content Lake uses the NATS Unique ID algorithm to automatically create document IDs.

However, when migrating content into the Content Lake, it is preferable to reuse a unique value from the source content. This helps with your script’s idempotency and allows you to construct references during migration without the need to query the dataset in advance.

Imagine your existing data source has documents like this:

[ 
  {
    "type": "post",
    "id": 234,
    "authors": [123]
  },
  {
    "type": "user",
    "id": 123,
  },  
]

You could convert this into Sanity documents with deterministic IDs and optimistic references like this:

[ 
  {
    "_type": "post",
    "_id": "post-234",
    "authors": [{"_ref": "author-123", "_type": "reference"}] 
  },
  {
    "type": "author",
    "_id": "author-123",
  },  
]

To successfully write a new document that contains a strong reference, that referenced document must exist in the dataset or within the same mutation.

Learn more about how references work in the Content Lake

An added benefit to predetermining these IDs is that if the incoming data were separated—users and posts—you could create all of the “author” documents in one pass. Then, all “posts” in the next, and the references should be written successfully.

As mentioned, these IDs can be any string value. Still, a pattern we have often seen, which is easy to reason about, is to indicate the document content type and then use whatever unique identifier you can get or generate from the document. Here are some patterns and examples that you can use as inspiration:

contentType-recordID 👉 post-234
uniqueSlug 👉 hello-world
contentType-slug-publishDate 👉 post-hello-world-2010-10-11

Learn more about how IDs in Content Lake work

You might want your document IDs to follow the UUID pattern but be deterministically generated from a string. Different packages on npm can do this for you:

import getUuid  from 'uuid-by-string';

const uuidId = getUuid(record.uniqueSlug);
// d3486ae9-136e-5856-bc42-212385ea7970

The minor drawback with this approach is that you must pass the values through this function when creating references in other documents.

Review the different content types in your legacy CMS and think about their ID scheme. Are there unique values you can use?

You have 1 uncompleted task in this lesson

0 of 1