CoursesRefactoring content for migrationScripting content migrations
Track
Replatforming from a legacy CMS to a Content Operation System

Refactoring content for migration

Lesson
7

Scripting content migrations

Sanity's API-first design allows you to write content – even in huge volumes – however you prefer. The CLI Migration tooling offers several conveniences that make it a great fit.

Log in to mark your progress for each Lesson and Task

There are different ways to run content migrations scripts with the Sanity CLI:

  • Recommended: Create and run migration script with sanity migration
  • Executing custom scripts with sanity exec --with-user-token
  • Generating an NDJSON file and importing it with sanity dataset import

The Sanity CLI contains helpful migration tooling. The primary use case is for schema and content migrations of documents in a dataset, like changing a field name or turning a string into an array of strings. However, they can also retrieve content from an external data source and write new documents.

Take the Handling schema changes confidently course for a more thorough introduction.
See the documentation about Migration to learn what it can do.

The benefits of using the migration tooling include:

  • Less scaffolding and great abstractions for creating and changing documents in the Content Lake
  • Automatically batching mutations into transactions to avoid hitting rate limits.
  • Dry-run by default with visual feedback
  • Validate your migrated documents against a Sanity Studio schema

You can create a new migration script by running the following from the command line:

npx sanity@latest migration create

The following is a highly simplified example of a migration script that retrieves content from an API and continues to paginate until no more results are returned.

This example comes from the Migrating content from WordPress to Sanity course.
/migrations/moving-from-wp/posts.ts
import { SanityDocumentLike} from 'sanity'
import { createOrReplace } from 'sanity/migration'
import { wpDataTypeFetch} from '../migrationUtils'
export default defineMigration({
title: 'Import WP JSON data',
async *migrate(documents) {
const wpType = "posts";
let page = 1;
let hasMore = true;
while (hasMore) {
try {
const wpData = await wpDataTypeFetch(wpType, page);
if (Array.isArray(wpData) && wpData.length) {
for (const wpDoc of wpData) {
const doc: SanityDocumentLike = {
_id: `post-${wpDoc.id}`,
_type: "post",
// Add other required fields here based on wpDoc structure
};
yield createOrReplace(doc);
}
page++;
} else {
hasMore = false;
}
} catch (error) {
console.error(`Error fetching data for page ${page}:`, error);
hasMore = false; // Stop the loop in case of an error
}
}
},
});

Running this script with the Sanity CLI migration tooling will build a series of mutations based on content returned from wpDataTypeFetch and, when necessary, automatically batch them into transactions.

This script is the basic building block of using migration scripts to write new documents from an external source – it’s up to you to query your data source, add validation, extra attributes, error handling, and more.

Adapt this script to make a simple migration from your CMS

By default, each individual document mutation is "staged" into the transaction sequentially—one at a time. This is due to the generator/iterator pattern the migration tooling uses. So, if you add an asynchronous function call – such as a fetch for an image upload or external document – it can slow down the migration.

This can be overcome by yielding an array of mutations instead of a single mutation. This is preferable when including image uploads in your migration script. Keep in mind that uploading an image creates an asset metadata document in the Content Lake, so be careful to avoid rate limits by throttling the number of concurrent operations while creating your array of mutations.

An example of this is shown in the Migrating content from WordPress to Sanity course.

Custom scripts that use your Sanity Studio’s CLI configuration – sanity.cli.ts – and your terminal’s authenticated session (npx sanity@latest login) can be run with the following:

npx sanity@latest exec ./path-to-your-script --with-user-token

This may be beneficial for more complex data structures or use cases where you want more low-level control. In that instance, you would also need to be careful to avoid rate limits by throttling the number of mutations concurrently sent to the Content Lake.

The script below is the same basic example as above but uses the CLI client to create a single transaction.

Note that this single transaction could become too large and, when committed, be rejected by the Content Lake.

Also, running it from the command line will immediately write this content to the dataset, with the only visual feedback being the included console logs.

import {getCliClient} from 'sanity/cli'
const client = getCliClient()
async function importData() {
const transaction = client.transaction()
const wpType = 'posts'
let page = 1
let hasMore = true
while (hasMore) {
try {
const wpData = await wpDataTypeFetch(wpType, page)
if (Array.isArray(wpData) && wpData.length) {
for (const wpDoc of wpData) {
const doc: SanityDocumentLike = {
_id: `post-${wpDoc.id}`,
_type: 'post',
// Add other required fields here based on wpDoc structure
}
transaction.createOrReplace(doc)
}
page++
} else {
hasMore = false
}
} catch (error) {
console.error(`Error fetching data for page ${page}:`, error)
hasMore = false // Stop the loop in case of an error
}
}
try {
await transaction.commit()
console.log('Data imported successfully')
} catch (error) {
console.error('Error committing transaction:', error)
}
}
importData()

The script above might also be modified to use a configured Sanity Client with a token if you plan to have a cloud-hosted migration script that external tools can access.

For continuous migrations or regular content imports from an external source—like a podcast episode feed, product stock levels, or property availability—this may be preferable to performing migrations locally.

Sanity CLI includes tooling to export an entire dataset—content and images—to a single file. The text content is stored in NDJSON format, that is, newline delineated JSON:

production.ndjson
{ "_type": "post", "_id": "post-435", "title": "A Model for Reality" }
{ "_type": "post", "_id": "post-436", "title": "Halo [Breathe]" }
{ "_type": "post", "_id": "post-437", "title": "Smiling Through The Pain" }

In some instances, you may prefer to write a content migration script which creates a file in this format, and then use the CLI to import it all in one go.

This might be necessary if your legacy CMS doesn't have (good) export APIs, or if you need to lift content out of a database directly. In some cases, if you have a lot of content, it might also be more efficient by letting you use programing languages that are faster.

This requires more manual steps, and does not account for a "dry run," but has the benefit of being able to upload images as part of the import process.

The source of images can be marked as from a URL or file path, which will be fetched and uploaded as part of the import process.

See sanity dataset import in the documentation for more details about importing text and images from NDJSON.

Courses in the "Replatforming from a legacy CMS to a Content Operation System" track