Migrate text field to block array in nested Content objects

9 replies

Last updated: Sep 2, 2021

Hey, I’ve got a bunch of documents (Pages) with arrays of blocks, which all have a common object (Content). The Content object has a

text

object which needs to be migrated to a

block

object, as one or more plain text paragraphs.
There aren’t that many that I couldn’t do it by hand, but I thought I should learn how. I’ve read this article:
https://www.sanity.io/docs/migrating-data , but I’m a bit stuck on how to work with blocks. It seems like it’d be really onerous. Does anyone have any pointers?

AI Update

Good news! You're on the right track with learning migrations, and working with blocks is actually easier than you might think. The key is using the @portabletext/block-tools package, which has utilities specifically for converting text into Portable Text blocks.

Basic Strategy

Since you're converting plain text strings to Portable Text paragraphs, you can use the htmlToBlocks function from @portabletext/block-tools. Even though your source is plain text (not HTML), you can wrap it in paragraph tags to convert it easily.

First, install the package:

npm install -D @portabletext/block-tools jsdom

Migration Script Example

Here's a pattern you can adapt for your Pages → blocks → Content → text to block migration:

import {htmlToBlocks} from '@portabletext/block-tools'
import {JSDOM} from 'jsdom'
import {getCliClient} from 'sanity/cli'

const client = getCliClient()

// Define your block content type schema
const blockContentType = {
  type: 'block',
  styles: [{title: 'Normal', value: 'normal'}],
  lists: [],
  marks: {decorators: [], annotations: []}
}

const migratePages = async () => {
  // Fetch all Pages that need migration
  const pages = await client.fetch(`*[_type == "page"]`)
  
  for (const page of pages) {
    const updatedBlocks = page.blocks?.map(block => {
      if (block.content?.text && typeof block.content.text === 'string') {
        // Convert plain text to HTML paragraphs
        const textParagraphs = block.content.text
          .split('\n\n')
          .map(p => `<p>${p}</p>`)
          .join('')
        
        // Convert to Portable Text blocks
        const portableTextBlocks = htmlToBlocks(
          textParagraphs,
          blockContentType,
          {parseHtml: html => new JSDOM(html).window.document}
        )
        
        return {
          ...block,
          content: {
            ...block.content,
            block: portableTextBlocks
          }
        }
      }
      return block
    })
    
    // Update the document
    await client.patch(page._id).set({blocks: updatedBlocks}).commit()
    console.log(`Migrated ${page._id}`)
  }
}

migratePages()

Simpler Alternative for Plain Text

If your text content is really just plain paragraphs without any formatting, you can actually construct the Portable Text blocks manually without @portabletext/block-tools:

const textToBlocks = (text) => {
  return text.split('\n\n').map(paragraph => ({
    _type: 'block',
    _key: Math.random().toString(36).substr(2, 9),
    style: 'normal',
    markDefs: [],
    children: [{
      _type: 'span',
      _key: Math.random().toString(36).substr(2, 9),
      text: paragraph,
      marks: []
    }]
  }))
}

This creates simple Portable Text blocks without needing the block-tools package. Each paragraph becomes a separate block.

Testing First

Before running on all documents, test with a single document:

const testPage = await client.fetch(`*[_type == "page"][0]`)
// Test your transformation logic
// Then update just that one document

The data migration guide you've already read covers the transaction patterns well. For more complex HTML-to-blocks scenarios, check out the Sanity Learn course on migrating to block content, which has detailed examples of using @portabletext/block-tools with custom deserialization rules for handling images and other complex HTML structures.

Since you mentioned there aren't that many documents, you could also do a hybrid approach: export the text, convert it with a script, then paste it back into Studio. But writing the migration script is definitely a valuable skill to learn!

Show original thread

9 replies

Hey (Removed Name)! Correct me if I'm wrong, but the text already exists inside those documents in your dataset, right? If so, you'll want to use mutations to create the block content from said text. I'll let you know here if I can find a specific example.

Yes, that’s exactly right. So each page is built out of a big array with items like:

{
  "_key": "d14fa57f4452",
  "_type": "contentWithList",
  "backgroundColour": {
    "title": "Dark Grey",
    "value": "#333f4c"
  },
  "content": {
    "_type": "titleTextCta",
    "content": "Do a bunch of stuff!/nAnd do a bunch more",
    "link": {
      "_type": "linkChoices",
      "link": "<mailto:client@example.com>",
      "linkStyle": "link",
      "linkTitle": "Make an enquiry"
    },
    "title": "We deliver outcomes",
    "titleType": "H2"
  },
  "listColour": "green",
  "listItems": [
    "content.",
    "some more content.",
    "More random stuff.",
  ]
}

They are of lots of different types, but (almost) all have a content object like the above item. So in this instance, I’d want to split the content on

/n

and create a new block per bit of content. But in lots of instances it’s just one block.

By the way, this is super unimportant, I’m sure you’ve got lots of more important things to get to. I have few enough records that I’m going to do this by hand. I was just curious.

Is the text you want to migrate always

content.content

Going from text (a relatively simple schema type, as it’s just a string value) to portable text (a potentially complex array of objects) takes a bit of reworking to get what you’re after, as the objects need keys, there are marks involved, etc. Luckily, the text schema type is distinguished by only one thing—the new line—making it relatively easy to parse.
If you haven’t already done these changes by hand, maybe you can give this a try. I’d recommend trying on a non-production dataset first as it modifies live data. I’m assuming that you have documents with an object named

content

, and in that object is a field named

content

of type

text

.
1. You’ll need to install nanoid in your studio folder:

yarn add nanoid

npm install nanoid

, depending on your package manager of choice.2. You’ll want to change your schema type from

text

to block content. At this point you’ll be getting an “Invalid property value” error in the studio (if you happen to check it), but that’s okay and don’t click

Reset value

.3. Save the script following this list in your studio folder (put it wherever you’d like, just be sure to modify the path when you run it). I put it in a

scripts

folder. If my assumptions were right about your naming conventions, you should only have to change the

TYPE

variable near the start, but if you want to consider all documents you can always change the filter in

fetchDocuments()

. You mentioned earlier that you want to break on a single new line, so that’s how I wrote this up. Often convention calls for a new paragraph after two new lines—if that’s the case, change

const paragraphs = doc.content.split('\n')

const paragraphs = doc.content.split('\n\n')

.4. Run the script with

sanity exec scripts/textToBlock.js --with-user-token

// scripts/textToBlock.js

/* eslint-disable no-console */
import { customAlphabet } from 'nanoid'
import sanityClient from 'part:@sanity/base/client'
const client = sanityClient.withConfig({ apiVersion: '2021-09-01' })

const nanoid = customAlphabet('0123456789abcdef', 12)

const TYPE = 'contentWithList' // document _type to consider

const fetchDocuments = () => client.fetch(`*[_type == "${TYPE}"][0..50] {_id, _rev, 'content': content.content}`)

const buildPatches = docs =>
  docs.map(doc => {
    const paragraphs = doc.content.split('\n')
    const output = paragraphs.map((paragraph) => ({
      _key: nanoid(),
      _type: 'block',
      markDefs: [],
      style: 'normal',
      children: [
        {
          _key: nanoid(),
          _type: 'span',
          marks: [],
          'text': paragraph,
        }
      ]
    }))

    return {
      id: doc._id,
      patch: {
        set: {
          content: {
            content: output,
          }
        },
        ifRevisionID: doc._rev,
      }
    }
  })

const createTransaction = patches =>
  patches.reduce((tx, patch) => tx.patch(patch.id, patch.patch), client.transaction())

const commitTransaction = tx => tx.commit()

const migrateNextBatch = async () => {
  const documents = await fetchDocuments()
  const patches = buildPatches(documents)
  if (patches.length === 0) {
    console.log('No more documents to migrate!')
    return null
  }
  console.log(
    `Migrating batch:\n %s`,
    patches.map(patch => `${patch.id} => ${JSON.stringify(patch.patch)}`).join('\n')
  )
  const transaction = createTransaction(patches)
  await commitTransaction(transaction)
  return migrateNextBatch()
}

migrateNextBatch().catch(err => {
  console.error(err)
  process.exit(1)
})

Hopefully this works (on a non-production dataset
😉).

content

, and in that object is a field named

content

of type

text

.
1. You’ll need to install nanoid in your studio folder:

yarn add nanoid

npm install nanoid

, depending on your package manager of choice.2. You’ll want to change your schema type from

text

to block content. At this point you’ll be getting an “Invalid property value” error in the studio (if you happen to check it), but that’s okay and don’t click

Reset value

.3. Save the script following this list in your studio folder (put it wherever you’d like, just be sure to modify the path when you run it). I put it in a

scripts

folder. If my assumptions were right about your naming conventions, you should only have to change the

TYPE

variable near the start, but if you want to consider all documents you can always change the filter in

fetchDocuments()

. You mentioned earlier that you want to break on a single new line, so that’s how I wrote this up. If you'd rather set a new paragraph on two new lines, change

const paragraphs = doc.content.split('\n')

const paragraphs = doc.content.split('\n\n')

.4. Run the script with

sanity exec scripts/textToBlock.js --with-user-token

// scripts/textToBlock.js

/* eslint-disable no-console */
import { customAlphabet } from 'nanoid'
import sanityClient from 'part:@sanity/base/client'
const client = sanityClient.withConfig({ apiVersion: '2021-09-01' })

const nanoid = customAlphabet('0123456789abcdef', 12)

const TYPE = 'contentWithList' // document _type to consider

const fetchDocuments = () => client.fetch(`*[_type == "${TYPE}"][0..50] {_id, _rev, 'content': content.content}`)

const buildPatches = docs =>
  docs.map(doc => {
    const paragraphs = doc.content.split('\n')
    const output = paragraphs.map((paragraph) => ({
      _key: nanoid(),
      _type: 'block',
      markDefs: [],
      style: 'normal',
      children: [
        {
          _key: nanoid(),
          _type: 'span',
          marks: [],
          'text': paragraph,
        }
      ]
    }))

    return {
      id: doc._id,
      patch: {
        set: {
          "content.content": output,
        },
        ifRevisionID: doc._rev,
      }
    }
  })

const createTransaction = patches =>
  patches.reduce((tx, patch) => tx.patch(patch.id, patch.patch), client.transaction())

const commitTransaction = tx => tx.commit()

const migrateNextBatch = async () => {
  const documents = await fetchDocuments()
  const patches = buildPatches(documents)
  if (patches.length === 0) {
    console.log('No more documents to migrate!')
    return null
  }
  console.log(
    `Migrating batch:\n %s`,
    patches.map(patch => `${patch.id} => ${JSON.stringify(patch.patch)}`).join('\n')
  )
  const transaction = createTransaction(patches)
  await commitTransaction(transaction)
  return migrateNextBatch()
}

migrateNextBatch().catch(err => {
  console.error(err)
  process.exit(1)
})

Hopefully this works (on a non-production dataset
😉).

content

, and in that object is a field named

content

of type

text

.
1. You’ll need to install nanoid in your studio folder:

yarn add nanoid

npm install nanoid

, depending on your package manager of choice.2. You’ll want to change your schema type from

text

to block content. At this point you’ll be getting an “Invalid property value” error in the studio (if you happen to check it), but that’s okay and don’t click

Reset value

.3. Save the script following this list in your studio folder (put it wherever you’d like, just be sure to modify the path when you run it). I put it in a

scripts

folder. If my assumptions were right about your naming conventions, you should only have to change the

TYPE

variable near the start, but if you want to consider all documents you can always change the filter in

fetchDocuments()

. You mentioned earlier that you want to break on a single new line, so that’s how I wrote this up. Often convention calls for a new paragraph after two new lines—if that’s the case, change

const paragraphs = doc.content.split('\n')

const paragraphs = doc.content.split('\n\n')

.4. Run the script with

sanity exec scripts/textToBlock.js --with-user-token

// scripts/textToBlock.js

/* eslint-disable no-console */
import { customAlphabet } from 'nanoid'
import sanityClient from 'part:@sanity/base/client'
const client = sanityClient.withConfig({ apiVersion: '2021-09-01' })

const nanoid = customAlphabet('0123456789abcdef', 12)

const TYPE = 'contentWithList' // document _type to consider

const fetchDocuments = () => client.fetch(`*[_type == "${TYPE}"][0..50] {_id, _rev, 'content': content.content}`)

const buildPatches = docs =>
  docs.map(doc => {
    const paragraphs = doc.content.split('\n')
    const output = paragraphs.map((paragraph) => ({
      _key: nanoid(),
      _type: 'block',
      markDefs: [],
      style: 'normal',
      children: [
        {
          _key: nanoid(),
          _type: 'span',
          marks: [],
          'text': paragraph,
        }
      ]
    }))

    return {
      id: doc._id,
      patch: {
        set: {
          "content.content": output,
        },
        ifRevisionID: doc._rev,
      }
    }
  })

const createTransaction = patches =>
  patches.reduce((tx, patch) => tx.patch(patch.id, patch.patch), client.transaction())

const commitTransaction = tx => tx.commit()

const migrateNextBatch = async () => {
  const documents = await fetchDocuments()
  const patches = buildPatches(documents)
  if (patches.length === 0) {
    console.log('No more documents to migrate!')
    return null
  }
  console.log(
    `Migrating batch:\n %s`,
    patches.map(patch => `${patch.id} => ${JSON.stringify(patch.patch)}`).join('\n')
  )
  const transaction = createTransaction(patches)
  await commitTransaction(transaction)
  return migrateNextBatch()
}

migrateNextBatch().catch(err => {
  console.error(err)
  process.exit(1)
})

Hopefully this works (on a non-production dataset
😉).

Oh fab! Thank you (Removed Name)! So we’re okay to manually create the shape of the block content and just pass in our own IDs? That’s a lot simpler than I was expecting.
It’d be great to have a page in the docs talking about that, or including this kind of snippet. I’ve left a bit of feedback there along those lines and linking back to here.

Yes, you’ve nailed it. That’s part of the beauty of Portable Text is that it makes your content so malleable. There are a few requirements for your data to be well-formed, but nothing preventing you from building block content from a bunch of strings, as we’ve done here. I used nanoid to make the keys and followed the convention (I think) of 12 character hexadecimal, but I think you have quite a bit of freedom as long as they’re unique within the array.
I saw what I now know is your feedback.
🙂 Thank you for that. I agree that the more examples of this kind of thing, the better, and will work on a guide or update to the docs. Thanks (Removed Name)!

Sanity – Build the way you think, not the way your CMS thinks

Sanity is the developer-first content operating system that gives you complete control. Schema-as-code, GROQ queries, and real-time APIs mean no more workarounds or waiting for deployments. Free to start, scale as you grow.

Get started for free Explore the demo

Migrate text field to block array in nested Content objects

Basic Strategy

Migration Script Example

Simpler Alternative for Plain Text

Testing First

Sanity – Build the way you think, not the way your CMS thinks

Was this answer helpful?