CoursesMigrating content from WordPress to SanityCreating complete documents
Track
Replatforming from a legacy CMS to a Content Operation System

Migrating content from WordPress to Sanity

Lesson
6

Creating complete documents

Transform WordPress posts into more complete Sanity Studio documents with categories, authors, dates and more, using dedicated functions for each post type.

Log in to mark your progress for each Lesson and Task

Now the migration script can create documents of several different types; it's time to create more complete documents. In this lesson, you'll focus on creating a dedicated function for processing different post type documents. You can then repeat these steps for all other document types.

Instances where it makes sense to transform content into a different content type – like from a page to structured content – will be covered in the next lesson.

So far, the migration script has been using SanityDocumentLike as a TypeScript type for staged documents. This definition is too loose to be useful. Since you have the Sanity Studio schema types created for the types of documents being created, Sanity TypeGen can create more helpful Types for the documents you're creating.

Run the following command in your Sanity Studio project folder to extract your schema definitions
npx sanity@latest schema extract

You should now have a schema.json file at the root of your Studio project.

Run the following command to generate Types from your schema
npx sanity@latest typegen generate

You should now have a sanity.types.ts file at the root of your Studio project.

For more about Sanity TypeGen see Typed content with Sanity TypeGen

In the previous lesson, you updated your script to write a title to pages and posts, and a name to tags and categories. All of these document types will have many more attributes, so to simplify things, you will now create dedicated functions for each post type.

This lesson will focus on just preparing post type documents.

Create a new helper function to transform a WordPress post into a Sanity Studio post:
migrations/import-wp/lib/transformToPost.ts
import {decode} from 'html-entities'
import type {WP_REST_API_Post} from 'wp-types'
import type {Post} from '../../../sanity.types'
// Remove these keys because they'll be created by Content Lake
type StagedPost = Omit<Post, '_createdAt' | '_updatedAt' | '_rev'>
export async function transformToPost(wpDoc: WP_REST_API_Post): Promise<StagedPost> {
const doc: StagedPost = {
_id: `post-${wpDoc.id}`,
_type: 'post',
}
doc.title = decode(wpDoc.title.rendered).trim()
return doc
}

This helper function has the same utility as was written directly into the migration script before, so you'll need to update that script to use it.

Update the migration file:
migrations/import-wp/index.ts
import type {SanityDocumentLike} from 'sanity'
import {createOrReplace, defineMigration} from 'sanity/migrate'
import type {WP_REST_API_Post, WP_REST_API_Term, WP_REST_API_User} from 'wp-types'
import {getDataTypes} from './lib/getDataTypes'
import {transformToPost} from './lib/transformToPost'
import {wpDataTypeFetch} from './lib/wpDataTypeFetch'
export default defineMigration({
title: 'Import WP JSON data',
async *migrate() {
const {wpType} = getDataTypes(process.argv)
let page = 1
let hasMore = true
while (hasMore) {
try {
let wpData = await wpDataTypeFetch(wpType, page)
if (Array.isArray(wpData) && wpData.length) {
const docs: SanityDocumentLike[] = []
for (let wpDoc of wpData) {
if (wpType === 'posts') {
wpDoc = wpDoc as WP_REST_API_Post
const doc = await transformToPost(wpDoc)
docs.push(doc)
} else if (wpType === 'pages') {
wpDoc = wpDoc as WP_REST_API_Post
// add your *page* transformation function
} else if (wpType === 'categories') {
wpDoc = wpDoc as WP_REST_API_Term
// add your *category* transformation function
} else if (wpType === 'tags') {
wpDoc = wpDoc as WP_REST_API_Term
// add your *tag* transformation function
} else if (wpType === 'users') {
wpDoc = wpDoc as WP_REST_API_User
// add your *author* transformation function
}
}
yield docs.map((doc) => createOrReplace(doc))
page++
} else {
hasMore = false
}
} catch (error) {
console.error(`Error fetching data for page ${page}:`, error)
// Stop the loop in case of an error
hasMore = false
}
}
},
})

The migration script still performs the same actions as before; however, it no longer creates pages, categories, or tags. You must create your own "transform" functions for each type individually.

With your script ready to uniquely handle each post type, you can add more attributes to each post type. The following is a step-by-step walkthrough of these attributes; you'll find a completed example at the bottom of this lesson.

Sanity Studio's slug field type stores its value inside an object, so you must convert it appropriately.

Add the slug transformation to your transform function
if (wpDoc.slug) {
doc.slug = { _type: 'slug', current: wpDoc.slug }
}

Categories and Tags are present in the WordPress REST API response as an array of numbers.

"categories": [2864, 502],

These match the IDs in the WordPress database. Since we have used deterministic IDs in imported documents, you can convert this array of numbers to an array of references. Here is an example for categories:

Add the category reference transformation to your transform function
Repeat this logic for tags.
if (Array.isArray(wpDoc.categories) && wpDoc.categories.length) {
doc.categories = wpDoc.categories.map((catId) => ({
_type: 'reference',
_ref: `category-${catId}`
}))
}

These category (and author) documents need to exist in the dataset before you can write a document that references them. Ensure you've run imports for these post types already.

Note that you can send array items without a _key attribute and Content Lake can automatically generate create one for you, but because TypeScript complains, one is included in the final code at the end of this lesson.

A post typically only has one user. You would create a single reference to an "author" document created during the import process. You can use the migration tooling to turn this into an array of authors later if you need to support multiple authors in your front end.

Add author reference to your transform function
if (wpDoc.author) {
doc.author = {
_type: 'reference',
_ref: `author-${wpDoc.author}`
}
}

As detailed in the Setting created and modified dates lesson, while it is possible to set the _createdAt and _updatedAt attributes in a mutation, it is not recommended if these dates have editorial meaning in your content.

Therefore, it's best to add them as individual datetime fields. So, they have been included in the Sanity Studio post schema as the fields date and modified.

Add the date-field transformations to your migration script.
if (wpDoc.date) {
doc.date = wpDoc.date
}
if (wpDoc.modified) {
doc.modified = wpDoc.modified
}

These fields have explicit meaning in your WordPress installation but imply logic that must be recreated in your front end. With Sanity, you are more likely to use a document's draft or published status than a string value. But you are welcome to import it as part of this migration.

Add (or omit) the sticky transformation to your migration script.
if (wpDoc.status) {
doc.status = wpDoc.status as StagedPost['status']
}
doc.sticky = wpDoc.sticky == true

This example is not included in the final script; it is an example for you to implement if it relates to your content model.

Through plugins such as Advanced Custom Fields or YoastSEO your content may have additional content such as taxonomy references or string fields. Here is an example field that is not part of WordPress core but could be present in your data:

"read_time": 22,

First, create a matching field name in your Sanity Studio schema:

defineField({name: 'readTime', type: 'number'})

Remember to re-run schema extract and typegen generate after each schema change!

And add it to your transform function:

if (wpDoc.read_time) {
doc.readTime = wpDoc.read_time
}
Add any desired custom field transformations to your migration script.

Review the transformation function now with all of the extra fields described above. This will add every field except the featured media, content, and excerpt fields, which are covered in later lessons.

Review and update your transformToPost file to stage the remaining attributes.
migrations/import-wp/lib/transformToPost.ts
import {uuid} from '@sanity/uuid'
import {decode} from 'html-entities'
import type {WP_REST_API_Post} from 'wp-types'
import type {Post} from '../../../sanity.types'
// Remove these keys because they'll be created by Content Lake
type StagedPost = Omit<Post, '_createdAt' | '_updatedAt' | '_rev'>
export async function transformToPost(wpDoc: WP_REST_API_Post): Promise<StagedPost> {
const doc: StagedPost = {
_id: `post-${wpDoc.id}`,
_type: 'post',
}
doc.title = decode(wpDoc.title.rendered).trim()
if (wpDoc.slug) {
doc.slug = {_type: 'slug', current: wpDoc.slug}
}
if (Array.isArray(wpDoc.categories) && wpDoc.categories.length) {
doc.categories = wpDoc.categories.map((catId) => ({
_key: uuid(),
_type: 'reference',
_ref: `category-${catId}`,
}))
}
if (wpDoc.author) {
doc.author = {
_type: 'reference',
_ref: `author-${wpDoc.author}`,
}
}
if (wpDoc.date) {
doc.date = wpDoc.date
}
if (wpDoc.modified) {
doc.modified = wpDoc.modified
}
if (wpDoc.status) {
doc.status = wpDoc.status as StagedPost['status']
}
doc.sticky = wpDoc.sticky == true
return doc
}
Run the migration script to create more complete post documents.
npx sanity@latest migration run import-wp --no-dry-run --type=posts

Now open your Sanity Studio, if it wasn't open already, and you should see your post documents have categories, authors, dates and more filled with their correct values. You're getting there!

Your Studio won't be displaying images yet like the screenshot above, that's next!

Up until this point, migration has only been concerned with text content. It's time to start uploading assets and unpacking the (manageable) complexity that can bring.

Courses in the "Replatforming from a legacy CMS to a Content Operation System" track