Migrating text to block content in Sanity.io using a script.
9 replies
Last updated: Sep 2, 2021
J
Hey, I’ve got a bunch of documents (Pages) with arrays of blocks, which all have a common object (Content). The Content object has a
There aren’t that many that I couldn’t do it by hand, but I thought I should learn how. I’ve read this article:
https://www.sanity.io/docs/migrating-data , but I’m a bit stuck on how to work with blocks. It seems like it’d be really onerous. Does anyone have any pointers?
textobject which needs to be migrated to a
blockobject, as one or more plain text paragraphs.
There aren’t that many that I couldn’t do it by hand, but I thought I should learn how. I’ve read this article:
https://www.sanity.io/docs/migrating-data , but I’m a bit stuck on how to work with blocks. It seems like it’d be really onerous. Does anyone have any pointers?
Sep 1, 2021, 8:40 PM
Hey (Removed Name)! Correct me if I'm wrong, but the text already exists inside those documents in your dataset, right? If so, you'll want to use mutations to create the block content from said text. I'll let you know here if I can find a specific example.
Sep 1, 2021, 8:51 PM
J
Yes, that’s exactly right. So each page is built out of a big array with items like:
They are of lots of different types, but (almost) all have a content object like the above item. So in this instance, I’d want to split the content on
{ "_key": "d14fa57f4452", "_type": "contentWithList", "backgroundColour": { "title": "Dark Grey", "value": "#333f4c" }, "content": { "_type": "titleTextCta", "content": "Do a bunch of stuff!/nAnd do a bunch more", "link": { "_type": "linkChoices", "link": "<mailto:client@example.com>", "linkStyle": "link", "linkTitle": "Make an enquiry" }, "title": "We deliver outcomes", "titleType": "H2" }, "listColour": "green", "listItems": [ "content.", "some more content.", "More random stuff.", ] }
/nand create a new block per bit of content. But in lots of instances it’s just one block.
Sep 1, 2021, 9:06 PM
J
By the way, this is super unimportant, I’m sure you’ve got lots of more important things to get to. I have few enough records that I’m going to do this by hand. I was just curious.
Sep 1, 2021, 9:07 PM
Going from text (a relatively simple schema type, as it’s just a string value) to portable text (a potentially complex array of objects) takes a bit of reworking to get what you’re after, as the objects need keys, there are marks involved, etc. Luckily, the text schema type is distinguished by only one thing—the new line—making it relatively easy to parse.
If you haven’t already done these changes by hand, maybe you can give this a try. I’d recommend trying on a non-production dataset first as it modifies live data. I’m assuming that you have documents with an object named
1. You’ll need to install nanoid in your studio folder:
to .4. Run the script with
Hopefully this works (on a non-production dataset
😉).
If you haven’t already done these changes by hand, maybe you can give this a try. I’d recommend trying on a non-production dataset first as it modifies live data. I’m assuming that you have documents with an object named
content, and in that object is a field named
contentof type
text.
1. You’ll need to install nanoid in your studio folder:
yarn add nanoidor
npm install nanoid, depending on your package manager of choice.2. You’ll want to change your schema type from
textto block content. At this point you’ll be getting an “Invalid property value” error in the studio (if you happen to check it), but that’s okay and don’t click
Reset value.3. Save the script following this list in your studio folder (put it wherever you’d like, just be sure to modify the path when you run it). I put it in a
scriptsfolder. If my assumptions were right about your naming conventions, you should only have to change the
TYPEvariable near the start, but if you want to consider all documents you can always change the filter in
fetchDocuments(). You mentioned earlier that you want to break on a single new line, so that’s how I wrote this up. Often convention calls for a new paragraph after two new lines—if that’s the case, change
const paragraphs = doc.content.split('\n')
const paragraphs = doc.content.split('\n\n')
sanity exec scripts/textToBlock.js --with-user-token
// scripts/textToBlock.js /* eslint-disable no-console */ import { customAlphabet } from 'nanoid' import sanityClient from 'part:@sanity/base/client' const client = sanityClient.withConfig({ apiVersion: '2021-09-01' }) const nanoid = customAlphabet('0123456789abcdef', 12) const TYPE = 'contentWithList' // document _type to consider const fetchDocuments = () => client.fetch(`*[_type == "${TYPE}"][0..50] {_id, _rev, 'content': content.content}`) const buildPatches = docs => docs.map(doc => { const paragraphs = doc.content.split('\n') const output = paragraphs.map((paragraph) => ({ _key: nanoid(), _type: 'block', markDefs: [], style: 'normal', children: [ { _key: nanoid(), _type: 'span', marks: [], 'text': paragraph, } ] })) return { id: doc._id, patch: { set: { content: { content: output, } }, ifRevisionID: doc._rev, } } }) const createTransaction = patches => patches.reduce((tx, patch) => tx.patch(patch.id, patch.patch), client.transaction()) const commitTransaction = tx => tx.commit() const migrateNextBatch = async () => { const documents = await fetchDocuments() const patches = buildPatches(documents) if (patches.length === 0) { console.log('No more documents to migrate!') return null } console.log( `Migrating batch:\n %s`, patches.map(patch => `${patch.id} => ${JSON.stringify(patch.patch)}`).join('\n') ) const transaction = createTransaction(patches) await commitTransaction(transaction) return migrateNextBatch() } migrateNextBatch().catch(err => { console.error(err) process.exit(1) })
😉).
Sep 2, 2021, 5:19 AM
Going from text (a relatively simple schema type, as it’s just a string value) to portable text (a potentially complex array of objects) takes a bit of reworking to get what you’re after, as the objects need keys, there are marks involved, etc. Luckily, the text schema type is distinguished by only one thing—the new line—making it relatively easy to parse.
If you haven’t already done these changes by hand, maybe you can give this a try. I’d recommend trying on a non-production dataset first as it modifies live data. I’m assuming that you have documents with an object named
1. You’ll need to install nanoid in your studio folder:
to .4. Run the script with
Hopefully this works (on a non-production dataset
😉).
If you haven’t already done these changes by hand, maybe you can give this a try. I’d recommend trying on a non-production dataset first as it modifies live data. I’m assuming that you have documents with an object named
content, and in that object is a field named
contentof type
text.
1. You’ll need to install nanoid in your studio folder:
yarn add nanoidor
npm install nanoid, depending on your package manager of choice.2. You’ll want to change your schema type from
textto block content. At this point you’ll be getting an “Invalid property value” error in the studio (if you happen to check it), but that’s okay and don’t click
Reset value.3. Save the script following this list in your studio folder (put it wherever you’d like, just be sure to modify the path when you run it). I put it in a
scriptsfolder. If my assumptions were right about your naming conventions, you should only have to change the
TYPEvariable near the start, but if you want to consider all documents you can always change the filter in
fetchDocuments(). You mentioned earlier that you want to break on a single new line, so that’s how I wrote this up. If you'd rather set a new paragraph on two new lines, change
const paragraphs = doc.content.split('\n')
const paragraphs = doc.content.split('\n\n')
sanity exec scripts/textToBlock.js --with-user-token
// scripts/textToBlock.js /* eslint-disable no-console */ import { customAlphabet } from 'nanoid' import sanityClient from 'part:@sanity/base/client' const client = sanityClient.withConfig({ apiVersion: '2021-09-01' }) const nanoid = customAlphabet('0123456789abcdef', 12) const TYPE = 'contentWithList' // document _type to consider const fetchDocuments = () => client.fetch(`*[_type == "${TYPE}"][0..50] {_id, _rev, 'content': content.content}`) const buildPatches = docs => docs.map(doc => { const paragraphs = doc.content.split('\n') const output = paragraphs.map((paragraph) => ({ _key: nanoid(), _type: 'block', markDefs: [], style: 'normal', children: [ { _key: nanoid(), _type: 'span', marks: [], 'text': paragraph, } ] })) return { id: doc._id, patch: { set: { "content.content": output, }, ifRevisionID: doc._rev, } } }) const createTransaction = patches => patches.reduce((tx, patch) => tx.patch(patch.id, patch.patch), client.transaction()) const commitTransaction = tx => tx.commit() const migrateNextBatch = async () => { const documents = await fetchDocuments() const patches = buildPatches(documents) if (patches.length === 0) { console.log('No more documents to migrate!') return null } console.log( `Migrating batch:\n %s`, patches.map(patch => `${patch.id} => ${JSON.stringify(patch.patch)}`).join('\n') ) const transaction = createTransaction(patches) await commitTransaction(transaction) return migrateNextBatch() } migrateNextBatch().catch(err => { console.error(err) process.exit(1) })
😉).
Sep 2, 2021, 5:58 AM
Going from text (a relatively simple schema type, as it’s just a string value) to portable text (a potentially complex array of objects) takes a bit of reworking to get what you’re after, as the objects need keys, there are marks involved, etc. Luckily, the text schema type is distinguished by only one thing—the new line—making it relatively easy to parse.
If you haven’t already done these changes by hand, maybe you can give this a try. I’d recommend trying on a non-production dataset first as it modifies live data. I’m assuming that you have documents with an object named
1. You’ll need to install nanoid in your studio folder:
to .4. Run the script with
Hopefully this works (on a non-production dataset
😉).
If you haven’t already done these changes by hand, maybe you can give this a try. I’d recommend trying on a non-production dataset first as it modifies live data. I’m assuming that you have documents with an object named
content, and in that object is a field named
contentof type
text.
1. You’ll need to install nanoid in your studio folder:
yarn add nanoidor
npm install nanoid, depending on your package manager of choice.2. You’ll want to change your schema type from
textto block content. At this point you’ll be getting an “Invalid property value” error in the studio (if you happen to check it), but that’s okay and don’t click
Reset value.3. Save the script following this list in your studio folder (put it wherever you’d like, just be sure to modify the path when you run it). I put it in a
scriptsfolder. If my assumptions were right about your naming conventions, you should only have to change the
TYPEvariable near the start, but if you want to consider all documents you can always change the filter in
fetchDocuments(). You mentioned earlier that you want to break on a single new line, so that’s how I wrote this up. Often convention calls for a new paragraph after two new lines—if that’s the case, change
const paragraphs = doc.content.split('\n')
const paragraphs = doc.content.split('\n\n')
sanity exec scripts/textToBlock.js --with-user-token
// scripts/textToBlock.js /* eslint-disable no-console */ import { customAlphabet } from 'nanoid' import sanityClient from 'part:@sanity/base/client' const client = sanityClient.withConfig({ apiVersion: '2021-09-01' }) const nanoid = customAlphabet('0123456789abcdef', 12) const TYPE = 'contentWithList' // document _type to consider const fetchDocuments = () => client.fetch(`*[_type == "${TYPE}"][0..50] {_id, _rev, 'content': content.content}`) const buildPatches = docs => docs.map(doc => { const paragraphs = doc.content.split('\n') const output = paragraphs.map((paragraph) => ({ _key: nanoid(), _type: 'block', markDefs: [], style: 'normal', children: [ { _key: nanoid(), _type: 'span', marks: [], 'text': paragraph, } ] })) return { id: doc._id, patch: { set: { "content.content": output, }, ifRevisionID: doc._rev, } } }) const createTransaction = patches => patches.reduce((tx, patch) => tx.patch(patch.id, patch.patch), client.transaction()) const commitTransaction = tx => tx.commit() const migrateNextBatch = async () => { const documents = await fetchDocuments() const patches = buildPatches(documents) if (patches.length === 0) { console.log('No more documents to migrate!') return null } console.log( `Migrating batch:\n %s`, patches.map(patch => `${patch.id} => ${JSON.stringify(patch.patch)}`).join('\n') ) const transaction = createTransaction(patches) await commitTransaction(transaction) return migrateNextBatch() } migrateNextBatch().catch(err => { console.error(err) process.exit(1) })
😉).
Sep 2, 2021, 5:58 AM
J
Oh fab! Thank you (Removed Name)! So we’re okay to manually create the shape of the block content and just pass in our own IDs? That’s a lot simpler than I was expecting.
It’d be great to have a page in the docs talking about that, or including this kind of snippet. I’ve left a bit of feedback there along those lines and linking back to here.
It’d be great to have a page in the docs talking about that, or including this kind of snippet. I’ve left a bit of feedback there along those lines and linking back to here.
Sep 2, 2021, 10:06 AM
Yes, you’ve nailed it. That’s part of the beauty of Portable Text is that it makes your content so malleable. There are a few requirements for your data to be well-formed, but nothing preventing you from building block content from a bunch of strings, as we’ve done here. I used nanoid to make the keys and followed the convention (I think) of 12 character hexadecimal, but I think you have quite a bit of freedom as long as they’re unique within the array.
I saw what I now know is your feedback.
🙂 Thank you for that. I agree that the more examples of this kind of thing, the better, and will work on a guide or update to the docs. Thanks (Removed Name)!
I saw what I now know is your feedback.
🙂 Thank you for that. I agree that the more examples of this kind of thing, the better, and will work on a guide or update to the docs. Thanks (Removed Name)!
Sep 2, 2021, 8:39 PM
Sanity– build remarkable experiences at scale
Sanity is a modern headless CMS that treats content as data to power your digital business. Free to get started, and pay-as-you-go on all plans.