Importing markdown content into Sanity
How to migrate remark-markdown from a Gatsby blog to Sanity.io’s content backend.
Published
Knut Melvær
Head of Developer Community and Education
This is part 1 of the “Remaking overreacted.io with Gatsby, GraphQL and Sanity” series.
- Part 2: Setting up Sanity as a blog backend with a GraphQL API
- Part 3: Setting up Gatsby with your Sanity project’s GraphQL API
We’re big fans of Dan Abramov over at Sanity HQ. Now that he has launched his new blog overreacted.io, using Gatsby, we thought it might be a fun exercise to recreate it, using Sanity as the backend, and Portable Text as the way to deal with the rich text. In this tutorial, you learn how to import content from markdown, set up Gatsby, set up Sanity with a GraphQL API, add an RSS feed, and how to get it all on the internet using Netlify/Zeit.
Although you can use markdown in Sanity, transforming it into Portable Text makes it easier to take your content out in whatever format you’ll need it. Additionally, it will makes text queryable, something we’ll get back to later.
Overreacted uses the remark plugin for Gatsby to transform markdown into Gatsby’s GraphQL API. Hence, the folder structure in /src/pages/
looks like this:
❯ tree
.
├── 404.js
├── how-does-react-tell-a-class-from-a-function.md
├── index.js
├── why-do-react-elements-have-typeof-property.md
└── why-do-we-write-super-props.md
Let’s make a script that takes all the markdown files, and transform them into a format that you can import into Sanity’s content backend.
Glob all the markdown files
Although there are just four blog posts at the moment, it is nice to script something that migrates all of them for us.
We begin with writing a function that takes a path to a directory and returns a list of all the files’ absolute path. Although we could have done this with node’s native fs
library, Sindre Sorhus’ globby
package is easier to deal with (and returns a promise, so that we can use async/await
:
const globby = require('globby') async function globMDFiles (path) { const options = { cwd: path } const files = await globby(`**/*.md`, options) // return an array of absolute paths return files.map(file => `${path}/${file}`) } module.exports = globMDFiles
Get the file contents
Once we have the file paths, we can use fs.readFile
to extract the markdown content. The promise/async/await isn’t strictly necessary here, but it’s nice to have in place in case we needed to do a bit more in this function, and that the function itself returns a promise:
const { readFile } = require('fs') async function extractMDfromFile(filePath) { const mdContent = await new Promise((resolve, reject) => readFile(filePath, 'utf-8', (err, data) => { if (err) { throw Error(err) } return resolve(data) })) return mdContent } module.exports = extractMDfromFile
Convert Markdown to HTML
To convert Markdown to Portable Text we have to go via HTML. There’s plenty of libraries handy to convert all sorts of markdown specifications to HTML. In this tutorial, we’ll use the Remark library from the Unified collective since it’s what’s powering the Gatsby source plugin. It’s also the easiest way to extract the YAML Frontmatter found in these markdown files. Unified returns a VFile with the content and various metadata.
const unified = require('unified') const frontmatter = require('remark-frontmatter') const extract = require('remark-extract-frontmatter') const markdown = require('remark-parse') const html = require('remark-html') const yaml = require('yaml').parse async function convertMDtoVFile (markdownContent) { const HTML = await unified() .use(markdown) .use(frontmatter) .use(extract, { name: 'frontmatter', yaml: yaml }) .use(html) .process(markdownContent) return HTML } module.exports = convertMDtoVFile
Convert HTML to Portable Text
Now for the exciting part! Portable Text is a specification for how to structure rich text in JSON. Contrary to HTML, it’s easier to serialize Portable Text into any markup or text format. Now we have to go the opposite direction and deserialize the Markdown generated HTML.
We’re doing the deserialization by using the block-tools-package that comes with Sanity. It lets us also add custom rules for how to deal with specific HTML elements and patterns. Furthermore, this package allows us to transform parts of the HTML into custom content types.
What’s cool with this deserialization is that it makes it possible to use specialized input components, and previews in the Sanity editor. It also makes the content queryable, so when we’re deserializing all the code blocks in the HTML (<pre><code> some code </code></pre>
) into a code type
, we can easily extract all these examples with GROQ afterward. Alternatively, more easily render them in custom ways in the frontend.
I have commented the code so that it’s easier to follow.
const blockTools = require('@sanity/block-tools').default const jsdom = require('jsdom') const { JSDOM } = jsdom /** * block tools needs a schema definition to now what * types are available * */ const defaultSchema = require('./defaultSchema') const blockContentType = defaultSchema .get('blogPost') .fields.find(field => field.name === 'body').type function convertHTMLtoPortableText (HTMLDoc) { const rules = [ { // Special case for code blocks (wrapped in pre and code tag) deserialize (el, next, block) { if (el.tagName.toLowerCase() !== 'pre') { return undefined } const code = el.children[0] const childNodes = code && code.tagName.toLowerCase() === 'code' ? code.childNodes : el.childNodes let text = '' childNodes.forEach(node => { text += node.textContent }) /** * use `block()` to add it to the * root array, instead of as * children of a block * */ return block({ _type: 'code', text: text }) } } ] /** * Since we're in a node context, we need * to give block-tools JSDOM in order to * parse the HTML DOM elements */ return blockTools.htmlToBlocks(HTMLDoc, blockContentType, { rules, parseHtml: html => new JSDOM(html).window.document }) } module.exports = convertHTMLtoPortableText
Prepare the import document
Dan’s content model is fairly light; it has a title, a date, a “spoiler,” and the text content which as of now is just text and code examples. The keen-eyed observes that there are two date fields: _createdAt
is the internal date, while publishedAt
is used to schedule when the post should appear on the blog.
const convertHTMLtoPortableText = require('./convertHTMLtoPortableText') function convertToSanityDocument({data = {}, contents}) { const { title, date, spoiler } = data.frontmatter || {} const portableText = convertHTMLtoPortableText(contents) const doc = { _type: 'post', _createdAt: new Date(date).toUTCString(), publishedAt: new Date(date).toUTCString(), title, spoiler, body: portableText } return doc } module.exports = convertToSanityDocument
Generate files
Now that we have the prepared sanity documents; we have to ways of actually getting the content into the Sanity backend. We can make a transaction of mutations like this:
const sanityClient = require('@sanity/client') const client = sanityClient({ projectId: '<yourProjectID>', dataset: '<dataset>' }) const res = await sanityDocuments .reduce((trans, doc) => trans .createOrReplace(doc), client.transaction()) .commit() .catch((err) => cb(null, 500))
Another way is to generate a ndjson
(line-deliniated JSON file) that can be imported with the Sanity CLI command > sanity dataset import blog.ndjson
. This is a bit easier especially when you also have to deal with image and file assets.
/* eslint-disable no-console */ const fs = require('fs') function writeToFile ({ filename, sanityDocuments, outputPath }) { const path = `${outputPath}/${filename.split('.ndjson')[0]}` const preparedDocument = sanityDocuments.reduce( (acc, doc) => `${acc }${JSON.stringify(doc)}\n` , '') return fs.writeFile(`${path}.ndjson`, preparedDocument, (err, data) => { if (err) { throw new Error(err) } console.log( `Wrote ${sanityDocuments.length} documents to ${filename}.ndjson` ) }) } module.exports = writeToFile
Wrapping it up
Now we have all the parts, we can wrap them up in a neat function. Here we’re using async/await to prevent the infamous promise-pyramid-of-death, but we could also have used streams and observables.
const globMDFiles = require('./globMDFiles') const extractMDfromFile = require('./extractMDfromFile') const convertMDtoVFile = require('./convertMDtoVFile') const convertToSanityDocument = require('./convertToSanityDocument') async function migrateFiles (inputPath, filename, outputPath) { const files = await globMDFiles(inputPath) const mdDocuments = await Promise.all(files.map(extractMDfromFile)) const VFiles = await Promise.all(mdDocuments.map(convertMDtoVFile)) const sanityDocuments = await Promise.all(VFiles.map(convertToSanityDocument)) return sanityDocuments } module.exports = migrateFiles
For conveinence I wrapped the tool up in a CLI using Inquirer, but you can also import it and use it in your project. It should be noted though, that the deserialization part isn’t bulletproof, I haven’t accounted for images or other things that might appear in a markdown file.
Next steps
You can check out this code and how to install the CLI on GitHub. In the next part we will set up Sanity with the schemas we need to edit the blog, and deploy the GraphQL API we need to easily get the content into Gatsby.
Check out part 2 (not ready yet)