Unlock seamless workflows and faster delivery with our latest releases – get the details

Importing markdown content into Sanity

How to migrate remark-markdown from a Gatsby blog to Sanity.io’s content backend.

Published

  • Knut Melvær

    Knut Melvær

    Head of Developer Community and Education

This is part 1 of the “Remaking overreacted.io with Gatsby, GraphQL and Sanity” series.

  • Part 2: Setting up Sanity as a blog backend with a GraphQL API
  • Part 3: Setting up Gatsby with your Sanity project’s GraphQL API

We’re big fans of Dan Abramov over at Sanity HQ. Now that he has launched his new blog overreacted.io, using Gatsby, we thought it might be a fun exercise to recreate it, using Sanity as the backend, and Portable Text as the way to deal with the rich text. In this tutorial, you learn how to import content from markdown, set up Gatsby, set up Sanity with a GraphQL API, add an RSS feed, and how to get it all on the internet using Netlify/Zeit.

Although you can use markdown in Sanity, transforming it into Portable Text makes it easier to take your content out in whatever format you’ll need it. Additionally, it will makes text queryable, something we’ll get back to later.

Overreacted uses the remark plugin for Gatsby to transform markdown into Gatsby’s GraphQL API. Hence, the folder structure in /src/pages/ looks like this:

❯ tree
.
├── 404.js
├── how-does-react-tell-a-class-from-a-function.md
├── index.js
├── why-do-react-elements-have-typeof-property.md
└── why-do-we-write-super-props.md

Let’s make a script that takes all the markdown files, and transform them into a format that you can import into Sanity’s content backend.

Glob all the markdown files

Although there are just four blog posts at the moment, it is nice to script something that migrates all of them for us.

Don't forget the time you spend finding the chart to look up what you save. And the time spent reading this reminder about the time spent. And the time trying to figure out if either of those actually make sense. Remember, every second counts toward your life total, including these right now.
From https://xkcd.com/1205/

We begin with writing a function that takes a path to a directory and returns a list of all the files’ absolute path. Although we could have done this with node’s native fs library, Sindre Sorhus’ globby package is easier to deal with (and returns a promise, so that we can use async/await:

const globby = require('globby')

async function globMDFiles (path) {
  const options = {
    cwd: path
  }
  const files = await globby(`**/*.md`, options)
  // return an array of absolute paths
  return files.map(file => `${path}/${file}`)
}

module.exports = globMDFiles

Get the file contents

Once we have the file paths, we can use fs.readFile to extract the markdown content. The promise/async/await isn’t strictly necessary here, but it’s nice to have in place in case we needed to do a bit more in this function, and that the function itself returns a promise:

const { readFile } = require('fs')

async function extractMDfromFile(filePath) {
  const mdContent = await new Promise((resolve, reject) => readFile(filePath, 'utf-8', (err, data) => {
    if (err) {
      throw Error(err)
    }
    return resolve(data)
  }))
  return mdContent
}

module.exports = extractMDfromFile

Convert Markdown to HTML

To convert Markdown to Portable Text we have to go via HTML. There’s plenty of libraries handy to convert all sorts of markdown specifications to HTML. In this tutorial, we’ll use the Remark library from the Unified collective since it’s what’s powering the Gatsby source plugin. It’s also the easiest way to extract the YAML Frontmatter found in these markdown files. Unified returns a VFile with the content and various metadata.

const unified = require('unified')
const frontmatter = require('remark-frontmatter')
const extract = require('remark-extract-frontmatter')
const markdown = require('remark-parse')
const html = require('remark-html')
const yaml = require('yaml').parse

async function convertMDtoVFile (markdownContent) {
  const HTML = await unified()
    .use(markdown)
    .use(frontmatter)
    .use(extract, { name: 'frontmatter', yaml: yaml })
    .use(html)
    .process(markdownContent)
  return HTML
}

module.exports = convertMDtoVFile

Convert HTML to Portable Text

Now for the exciting part! Portable Text is a specification for how to structure rich text in JSON. Contrary to HTML, it’s easier to serialize Portable Text into any markup or text format. Now we have to go the opposite direction and deserialize the Markdown generated HTML.

We’re doing the deserialization by using the block-tools-package that comes with Sanity. It lets us also add custom rules for how to deal with specific HTML elements and patterns. Furthermore, this package allows us to transform parts of the HTML into custom content types.

What’s cool with this deserialization is that it makes it possible to use specialized input components, and previews in the Sanity editor. It also makes the content queryable, so when we’re deserializing all the code blocks in the HTML (<pre><code> some code </code></pre>) into a code type, we can easily extract all these examples with GROQ afterward. Alternatively, more easily render them in custom ways in the frontend.

I have commented the code so that it’s easier to follow.

const blockTools = require('@sanity/block-tools').default
const jsdom = require('jsdom')
const { JSDOM } = jsdom

/**
 *  block tools needs a schema definition to now what
 * types are available
 *  */
const defaultSchema = require('./defaultSchema')
const blockContentType = defaultSchema
  .get('blogPost')
  .fields.find(field => field.name === 'body').type

function convertHTMLtoPortableText (HTMLDoc) {
  const rules = [
    {
      // Special case for code blocks (wrapped in pre and code tag)
      deserialize (el, next, block) {
        if (el.tagName.toLowerCase() !== 'pre') {
          return undefined
        }
        const code = el.children[0]
        const childNodes =
          code && code.tagName.toLowerCase() === 'code'
            ? code.childNodes
            : el.childNodes
        let text = ''
        childNodes.forEach(node => {
          text += node.textContent
        })
        /**
         * use `block()` to add it to the
         * root array, instead of as
         * children of a block
         *  */

        return block({
          _type: 'code',
          text: text
        })
      }
    }
  ]
  /**
   * Since we're in a node context, we need
   * to give block-tools JSDOM in order to
   * parse the HTML DOM elements
   */
  return blockTools.htmlToBlocks(HTMLDoc, blockContentType, {
    rules,
    parseHtml: html => new JSDOM(html).window.document
  })
}

module.exports = convertHTMLtoPortableText

Prepare the import document

Dan’s content model is fairly light; it has a title, a date, a “spoiler,” and the text content which as of now is just text and code examples. The keen-eyed observes that there are two date fields: _createdAt is the internal date, while publishedAt is used to schedule when the post should appear on the blog.

const convertHTMLtoPortableText = require('./convertHTMLtoPortableText')

function convertToSanityDocument({data = {}, contents}) {
  const { title, date, spoiler } = data.frontmatter || {}
  const portableText = convertHTMLtoPortableText(contents)

  const doc = {
    _type: 'post',
    _createdAt: new Date(date).toUTCString(),
    publishedAt: new Date(date).toUTCString(),
    title,
    spoiler,
    body: portableText
  }
  return doc
}

module.exports = convertToSanityDocument

Generate files

Now that we have the prepared sanity documents; we have to ways of actually getting the content into the Sanity backend. We can make a transaction of mutations like this:

const sanityClient = require('@sanity/client')
const client = sanityClient({
    projectId: '<yourProjectID>',
    dataset: '<dataset>'
})

const res = await sanityDocuments
    .reduce((trans, doc) => trans
        .createOrReplace(doc), client.transaction())
        .commit()
        .catch((err) => cb(null, 500))

Another way is to generate a ndjson (line-deliniated JSON file) that can be imported with the Sanity CLI command > sanity dataset import blog.ndjson. This is a bit easier especially when you also have to deal with image and file assets.

/* eslint-disable no-console */
const fs = require('fs')

function writeToFile ({ filename, sanityDocuments, outputPath }) {
  const path = `${outputPath}/${filename.split('.ndjson')[0]}`

  const preparedDocument = sanityDocuments.reduce(
    (acc, doc) => `${acc  }${JSON.stringify(doc)}\n`
  , '')

  return fs.writeFile(`${path}.ndjson`, preparedDocument, (err, data) => {
    if (err) {
      throw new Error(err)
    }
    console.log(
      `Wrote ${sanityDocuments.length} documents to ${filename}.ndjson`
    )
  })
}

module.exports = writeToFile

Wrapping it up

Now we have all the parts, we can wrap them up in a neat function. Here we’re using async/await to prevent the infamous promise-pyramid-of-death, but we could also have used streams and observables.

const globMDFiles = require('./globMDFiles')
const extractMDfromFile = require('./extractMDfromFile')
const convertMDtoVFile = require('./convertMDtoVFile')
const convertToSanityDocument = require('./convertToSanityDocument')

async function migrateFiles (inputPath, filename, outputPath) {
  const files = await globMDFiles(inputPath)
  const mdDocuments = await Promise.all(files.map(extractMDfromFile))
  const VFiles = await Promise.all(mdDocuments.map(convertMDtoVFile))
  const sanityDocuments = await Promise.all(VFiles.map(convertToSanityDocument))
  return sanityDocuments
}

module.exports = migrateFiles

For conveinence I wrapped the tool up in a CLI using Inquirer, but you can also import it and use it in your project. It should be noted though, that the deserialization part isn’t bulletproof, I haven’t accounted for images or other things that might appear in a markdown file.

Movie of the migration and the import tool

Next steps

You can check out this code and how to install the CLI on GitHub. In the next part we will set up Sanity with the schemas we need to edit the blog, and deploy the GraphQL API we need to easily get the content into Gatsby.

Check out part 2 (not ready yet)