CoursesMigrating content from WordPress to SanityConverting WordPress blocks to Portable Text
Track
Replatforming from a legacy CMS to a Content Operation System

Migrating content from WordPress to Sanity

Lesson
9

Converting WordPress blocks to Portable Text

Convert raw WordPress content into Portable Text, create custom schema types in Sanity Studio, and make authenticated requests to WordPress.

Log in to mark your progress for each Lesson and Task

There might be situations where you want to preserve some of the presentational data from WordPress in your content. Sometimes, for some types of content, typically marketing landing pages with unique content that doesn't need to be resued, it's more pragmatic to migrate one-to-one.

The wrapper function from the Converting HTML to Portable Text lesson will not be wasted. What you need is the raw, unprocessed HTML, which contains markup from the WordPress block editor to create objects in Portable Text.

Consider a "columns" block, for example. This is a core block in WordPress. The previous lesson would have extracted text and images from its HTML and transformed them into block content without any column positioning detail (it's better to have this logic in your front end code). Trying to preserve that using class names alone from the post-processed HTML would be too difficult.

Our Portable Text configuration in the Sanity Studio has no native concept of columns. You'll need to fix this first.

Register two new schema types to the Sanity Studio:

Create a schema type for an array of columns
./schemaTypes/columnsType.ts
import {defineField, defineType} from 'sanity'
export const columnsType = defineType({
name: 'columns',
type: 'object',
fields: [
defineField({
name: 'columns',
type: 'array',
of: [{type: 'column'}],
}),
],
})
Create a schema type for an individual column:
./schemaTypes/columnType.ts
import {defineField, defineType} from 'sanity'
export const columnType = defineType({
name: 'column',
type: 'object',
fields: [
defineField({
name: 'content',
type: 'portableText',
}),
],
})
Update your Portable Text schema type to include columns:
import {defineField} from 'sanity'
export const portableTextType = defineField({
name: 'portableText',
type: 'array',
of: [{type: 'block'}, {type: 'image'}, {type: 'externalImage'}, {type: 'columns'}],
})
Update your workspace schema types to include columns:
./schemaTypes/index.ts
import {authorType} from './authorType'
import {categoryType} from './categoryType'
import {columnsType} from './columnsType'
import {columnType} from './columnType'
import {externalImageType} from './externalImageType'
import {pageType} from './pageType'
import {portableTextType} from './portableTextType'
import {postType} from './postType'
import {tagType} from './tagType'
export const schemaTypes = [
authorType,
categoryType,
columnsType,
columnType,
externalImageType,
pageType,
portableTextType,
postType,
tagType,
]

Your posts and pages' Portable Text fields should now have the option to add "columns."

If you examine the output of your WordPress REST API you won't find a content.raw in the response. This is because it is only available when a request contains the "context" of edit.

You can try adding this parameter to your request – by adding ?context=edit to the URL, you'll receive a 401 in response as that context is not publicly available.

To resolve this, you'll need to add "basic authentication" to the request, which can be done with an "application password."

Login to your WordPress dashboard and go to wp-admin -> Users -> Edit User. Find your user account and scroll to the bottom of the page.

Create a new application password in WordPress with any name, but be sure to copy the password.

The wpDataTypeFetch function created in the First steps lesson can now be updated to make an authenticated request.

Update your WordPress fetch function to add authentication and a context search parameter – with your WordPress username and application password:
./migrations/import-wp-lib/wpDataTypeFetch.ts
import {BASE_URL, PER_PAGE} from '../constants'
import type {WordPressDataType, WordPressDataTypeResponses} from '../types'
// Basic auth setup in wp-admin -> Users -> Edit User
// This is the WordPress USER name, not the password name
const username = 'replace-with-your-username'
const password = 'replace-with-your-password'
export async function wpDataTypeFetch<T extends WordPressDataType>(
type: T,
page: number,
edit: boolean = false,
): Promise<WordPressDataTypeResponses[T]> {
const wpApiUrl = new URL(`${BASE_URL}/${type}`)
wpApiUrl.searchParams.set('page', page.toString())
wpApiUrl.searchParams.set('per_page', PER_PAGE.toString())
const headers = new Headers()
if (edit) {
// 'edit' context returns pre-processed content and other non-public fields
wpApiUrl.searchParams.set('context', 'edit')
headers.set(
'Authorization',
'Basic ' + Buffer.from(username + ':' + password).toString('base64'),
)
}
return fetch(wpApiUrl, {headers}).then((res) => (res.ok ? res.json() : null))
}

Important: the script above stores a password in plain text. If you plan to commit this script to version control – or host it somewhere – consider storing and retrieving it as environment variables from a .env file.

Now, your migration script can retrieve raw content. It's time to see what that looks like. Earlier in this course, you installed @wordpress/block-serialization-default-parser. A library to take raw content – with all the block editor's comments and unprocessed "shortcodes" – and convert it into an array of objects.

This serialized data is much simpler to work with and convert into Portable Text. Now, you can target each individual block by its name and create whatever block content shape you like.

Deep inside "inner blocks," the content is still stored as HTML, so you will still need to use the same htmlToBlockContent function from the last lesson to convert that HTML into block content – but now targeting and processing content layouts like columns is much simpler.

Create a helper function to process raw content from WordPress, converting paragraphs and columns into Portable Text.
./migrations/import-wp/lib/serializedHtmlToBlockContent.ts
import type {htmlToBlocks} from '@sanity/block-tools'
import {parse} from '@wordpress/block-serialization-default-parser'
import type {SanityClient, TypedObject} from 'sanity'
import {htmlToBlockContent} from './htmlToBlockContent'
export async function serializedHtmlToBlockContent(
html: string,
client: SanityClient,
imageCache: Record<number, string>,
) {
// Parse content.raw HTML into WordPress blocks
const parsed = parse(html)
let blocks: ReturnType<typeof htmlToBlocks> = []
for (const wpBlock of parsed) {
// Convert inner HTML to Portable Text blocks
if (wpBlock.blockName === 'core/paragraph') {
const block = await htmlToBlockContent(wpBlock.innerHTML, client, imageCache)
blocks.push(...block)
} else if (wpBlock.blockName === 'core/columns') {
const columnBlock = {_type: 'columns', columns: [] as TypedObject[]}
for (const column of wpBlock.innerBlocks) {
const columnContent = []
for (const columnBlock of column.innerBlocks) {
const content = await htmlToBlockContent(columnBlock.innerHTML, client, imageCache)
columnContent.push(...content)
}
columnBlock.columns.push({
_type: 'column',
content: columnContent,
})
}
blocks.push(columnBlock)
} else if (!wpBlock.blockName) {
// Do nothing
} else {
console.log(`Unhandled block type: ${wpBlock.blockName}`)
}
}
return blocks
}
Update your request to WordPress to use authentication:
./migrations/import-wp/index.ts
let wpData = await wpDataTypeFetch(wpType, page, true)
Update your doc.content field to use the serialized raw content for your documents:
./migrations/import-wp/lib/transformPost.ts
doc.content = wpDoc.content.raw
? await serializedHtmlToBlockContent(wpDoc.content.raw, client, existingImages)
: undefined

Run your posts and pages migrations again.

npx sanity@latest migration run import-wp --no-dry-run --type=posts

If your existing content used the core columns block like this:

You should see Sanity documents that use the newly created Portable Text columns object like this.

As you notice, the block preview gives you the JSONesque data. This isn't super helpful for most content teams (unless they're into raw data). The last step is to update the block preview to show the columns a little bit nicer:

schemaTypes/columnsType.ts
import {defineField, defineType} from 'sanity'
export const columnsType = defineType({
name: 'columns',
type: 'object',
fields: [
defineField({
name: 'columns',
type: 'array',
of: [{type: 'column'}],
}),
],
preview: {
select: {
columns: 'columns',
},
prepare({columns}) {
const columnsCount = columns.length
return {
title: `${columnsCount} column${columnsCount == 1 ? '' : 's'}`,
}
},
},
})
Update the columnsType field with the new preview configuration.

Your studio should now have a preview like this:

schemaTypes/columnType.ts
import {defineField, defineType} from 'sanity'
export const columnType = defineType({
name: 'column',
type: 'object',
fields: [
defineField({
name: 'content',
type: 'portableText',
}),
],
preview: {
select: {
title: 'content',
},
},
})
Update the columnType field with the new preview configuration.

If you click into the columns block type, you should see the first bit of content in the individual columns:

You can further enhance this preview with custom preview components, using React components to show an even richer preview within the Portable Text editor.

This lesson has only scratched the surface of converting raw WordPress content into Portable Text, but you now have a plan to convert all other WordPress blocks:

  1. Create a custom schema type in Sanity Studio's Portable Text editor for each new block
  2. Intercept that block during serialization and convert to Portable Text
  3. Make sure that the block previews are helpful for your content team

Courses in the "Replatforming from a legacy CMS to a Content Operation System" track