Give it in plain text: Making your content AI-Ready

A technical deep-dive into implementing AI-friendly content. Learn from how we made our developer documentation accessible to both humans and AI agents.

Published

  • Knut Melvær

    Knut Melvær

    Head of Developer Community and Education

I find myself peer-programming with LLMs more often, especially when I want to quickly bring an idea to life or add a minor feature to a code base. AI-powered coding really shines when it comes to exploring ideas and getting something that runs quickly off the ground.

The other day, I watched an AI bootstrap a new Astro + Sanity blog in about a minute (yes, I timed it - professional curiosity and all). It was impressive, but like many quick solutions, it wasn't quite what we'd recommend in our developer education materials. It missed our official Astro integration, skipped proper TypeGen setup, and the content model was, well, let's say it needed some of our hard-earned structured content wisdom.

This got me thinking about a bigger question: how do we ensure AI tools can access and understand our educational content the same way developers do? The answer turned out to be deceptively simple, but getting there? That's the story I want to share.

When "most likely" isn't what you want

Here's the thing about LLMs: they're like that friend who's read every programming blog post ever written but hasn't actually worked on your specific project. They'll give you the most likely patterns based on what they've seen across GitHub repositories and blog posts. And while that's often good enough, it's not always what we'd call "the Sanity way."

To be completely honest, we haven't been able to share everything we've learned from helping customers and figuring out these patterns ourselves over the years. This can leave developers in a bit of a pickle if they rely too heavily on LLM-generated code without guidance.

This isn't just our problem - it's a challenge for anyone building developer tools in our AI-enhanced world:

  • The models learned from what's out there, including that Stack Overflow answer from 2019 that everyone keeps copying
  • The generated code works, but might not follow the best practices we've discovered since then
  • And unless you explicitly tell them, these models won't know about that cool new feature you just shipped last week

This is why AI-powered code editors like Cursor have features to quickly add a documentation site to its context.

Enter the "Agent Experience"

More broadly, this is also why user experiences in front of LLMs, like ChatGPT, increasingly go out on the web to bring more context into their prompts and output better and more accurate information.

And this is where the "Agent Experience" concept comes in handy. Coined by Mathias Biilmann, Founder/CEO of Netlify, in the blog post “Introducing AX: Why Agent Experience Matters”:

Is it simple for an Agent to get access to operating a platform on behalf of a user? Are there clean, well described APIs that agents can operate? Are there machine-ready documentation and context for LLMs and agents to properly use the available platform and SDKs? Addressing the distinct needs of agents through better AX, will improve their usefulness for the benefit of the human user.

I had been wrestling with this exact challenge a week before Matt's post. I wanted a straightforward way to feed all our learning platform content into Claude (and between you and me, I'm not entirely sure how good ChatGPT's web search is at getting all the content either).

A recent Vercel analysis of AI crawlers showed they're still finding their feet - they don't render JavaScript, are picky about content types, and tend to stumble around your site like a tourist without a map. We needed something better.

So, how do I go about this? The answer might not surprise you.

llms.txt: Like devs, agents love plain text too

As I'm writing this, there is a conversation about how best to accommodate agents visiting your site, provided that you want to make your content accessible. There seems to be a growing consensus around giving them content such as plain text and markdown.

In our opinion, Markdown is not a great format for storing content (you can read my 6000 words about why here, or just read this short summary), but it turns out to be great as a format to interface with LLMs (that has been trained on a lot of Markdown syntax).

The jury is still out on the conventions of making the plain text accessible, but one pattern seems to catch on. /llms.txt is proposed by the folks at Answer.ai, but there are also discussions on using the /.well-known/llms.txt IANA proposal. The documentation platform Mintlify has launched /llms.txt as a feature, as has Anthropic, Svelte, and Vercel's AI SDK for their documentation.

They generally seem to use this pattern for exposing content as plain text:

  • /llms.txt is an abbreviated index of all the content with links
  • /llms-small.txt is the abbreviated content for smaller context windows
  • /llms-full.txt is the complete content (sometimes optimized to fit within the context window limits)

Let's look at how we brought this pattern to Sanity Learn.

Building a plain text route for Sanity Learn

Our learning platform is built with Remix and Sanity (naturally), with lesson content stored in Portable Text fields. We have custom blocks and marks for the different learning affordances (tasks, code blocks, callouts, etc). Portable Text stores this block content as JSON, which makes it queryable and provides neat integration in front end frameworks because it lets you directly serialize content as props to your components.

Screenshot of the Sanity Studio lesson editor displaying a tutorial on installing a new React Router 7 (Remix) application. The editor interface shows formatting options, a code block with terminal commands, and a dropdown menu for adding references or images.

Here is an example of how Portable Text serialization to React looks like:

const myPortableTextComponents = {
  types: {
    image: ({value}) => <img src={value.imageUrl} />,
    callToAction: ({value, isInline}) =>
      isInline ? (
        <a href={value.url}>{value.text}</a>
      ) : (
        <div className="callToAction">{value.text}</div>
      ),
  },

  marks: {
    link: ({children, value}) => {
      const rel = !value.href.startsWith('/') ? 'noreferrer noopener' : undefined
      return (
        <a href={value.href} rel={rel}>
          {children}
        </a>
      )
    },
  },
}

const YourComponent = (props) => {
  return <PortableText value={props.value} components={myPortableTextComponents} />
}

Now, you might be thinking, "That's great for React, but what about our AI friends?"

GROQ lets you quickly query Portable Text arrays in your Sanity dataset as plain text. So, if we wanted to transform lesson content in our dataset to plain text, we could run (try it for yourself):

pt::text(*[_type == "lesson"].content)

The pt:text() function works in a pinch but is constrained as it only parses the text blocks, so it would miss any custom blocks (like code, images, etc.).

We had to take a more elaborate route for our use case to ensure that we also included all the code blocks and other types of content. A simplified implementation example follows to give you a sense of the steps involved.

Adding a llms.txt route to Remix

Let's break this down into manageable pieces.

Adding a llms.txt route to Remix is fairly simple using the file-based router: llms[.]txt.ts in the routes folder, and then scaffold it to return plain text:

// routes/llms[.]txt.ts
export const loader = async () => {
  return new Response('hello world', {
    headers: {
      'Content-Type': 'text/plain; charset=utf-8',
      'Content-Language': 'en-US',
    }
  }

Nothing fancy yet - just your basic "hello world" to ensure everything's wired up correctly. Now comes the interesting part: querying our content. Here's where GROQ shines:

import groq from 'groq'
import {client} from '~/sanity/client'

export const loader = async () => {
 const query = groq`*[_type == "course"] {
    title,
    description,
    "slug": slug.current,
    lessons[]->{
      title,
      description,
      "slug": slug.current,
      content
    }
  }`

  const courses = await client.fetch(query)
  
  return new Response('hello world', {
    headers: {
      'Content-Type': 'text/plain; charset=utf-8',
      'Content-Language': 'en-US',
  })
}

Now we're getting somewhere! This query pulls in all our courses with their lessons. But raw data isn't very AI-friendly, so let's transform it into a nice markdown structure:

import groq from 'groq'
import {client} from '~/sanity/client'

export const loader = async () => {
  const query = groq`*[_type == "course"] {
    title,
    description,
    "slug": slug.current,
    lessons[]->{
      title,
      description,
      "slug": slug.current,
      content
    }
  }`

  const courses = await client.fetch(query)
  let markdown = ''

  for (const course of courses) {
    markdown += `# [${course.title}](/learn/${course.slug})\n\n${course.description || ''}\n\n`

    for (const lesson of course.lessons || []) {
      markdown += `## [${lesson.title}](/learn/${course.slug}/${lesson.slug})\n\n${lesson.description || ''}\n\n`
    }
  }

  return new Response(markdown, {
    headers: {'Content-Type': 'text/plain; charset=utf-8'}
  })
}

Taking It Further: The Full Monty

The above works great for a basic index, but we wanted to go deeper with /llms-full.txt. This is where things get spicy. We'll need to handle custom blocks, code samples, document references, images – the works.

Copy the llms[.]txt.ts file to llms-full[.].txt.ts and install the @sanity/block-content-to-markdown library. Now, we can start parsing the Portable Text content and tackle custom components.

Below is an example of serializing a custom code block and shifting the headings one level down since we're including data about the parent courses. We're also dealing with links and internal references.

// /routes/llms-full[.]txt.ts
import blocksToMarkdown from '@sanity/block-content-to-markdown'
import groq from 'groq'
import {client} from '~/sanity/client'
import {urlFor} from '~/sanity/helpers'

const BASE_URL = 'https://www.sanity.io/learn'

function normalizeUrl(href) {
  if (!href) return ''
  if (href.startsWith('http')) return href
  return `${BASE_URL}${href.startsWith('/') ? '' : '/'}${href}`
}

export const serializers = {
  types: {
    block: (props) => {
      const text = Array.isArray(props.children) ? props.children.join('') : props.children

      switch (props.node.style) {
        case 'h1':
          return `## ${text}\n\n`
        case 'h2':
          return `### ${text}\n\n`
        case 'h3':
          return `#### ${text}\n\n`
        case 'h4':
          return `##### ${text}\n\n`
        case 'bullet':
          return `- ${text}\n`
        case 'number':
          return `1. ${text}\n`
        case 'lead':
          return `> ${text}\n\n`
        default:
          return `${text}\n\n`
      }
    },
    code: (props) => {
      const language = props.node.language || 'text'
      const filePath = props.node.filename || ''
      const header = filePath ? `${language}:${filePath}` : language
      const code = props.node.code

      return '```' + header + '\n' + code + '\n```'
    },
    image: (props) => {
      const href = urlFor(props.node).url()
      const alt = props.node.alt || props.node.asset.altText || 'Missing alt text'
      return `![${alt}](${href})\n`
    },
  },
  marks: {
    link: (props) => {
      const url = normalizeUrl(props.mark?.href || '')
      return `[${props.children}](${url})`
    },
    internalLink: (props) => {
      const href = props.mark?.slug?.current
      return href ? `[${props.children}](${normalizeUrl(href)})` : props.children
    },
  },
}

export const loader = async () => {
  const query = groq`*[_type == "course"] {
    title,
    description,
    "slug": slug.current,
    lessons[]->{
      title,
      description,
      "slug": slug.current,
      content[]{
        ...,
        markDefs[]{
          ...,
          _type == "internalLink" => @->{
            "slug": slug.current,
          }
        }
      }
    }
  }`

  const courses = await client.fetch(query)
  let markdown = ''

  for (const course of courses) {
    markdown += `# [${course.title}](${normalizeUrl(course.slug)})\n\n${course.description || ''}\n\n`

    for (const lesson of course.lessons || []) {
      markdown += `### [${lesson.title}](${normalizeUrl(course.slug)}/${lesson.slug})\n\n${lesson.description || ''}\n\n`
      if (lesson.content) {
        markdown += `${blocksToMarkdown(lesson.content, {serializers})}\n\n`
      }
    }
  }

  return new Response(markdown, {
    headers: {'Content-Type': 'text/plain; charset=utf-8'}
  })
}

The Final Touch

We're planning to add more specialized routes in the future – maybe /llms-small.txt for agents with smaller context windows, or course-specific routes for more focused interactions. We will also bring this over to the official Sanity documentation.

Since we built this on top of Sanity's structured content, adding new formats is just a matter of creating new serializers. No content duplication required!

Want to see it in action? Head over to sanity.io/learn/llms.txt (and /llms-full.txt) and bring the content into your favorite AI assistant. Hopefully, it will output better code. You can probably also ask it to adapt framework-specific course content to other frameworks with more luck.

Making developer education AI-ready (and human-friendly, too!)

As we've seen, making our Learn platform more "agent-friendly" wasn't just about dumping content into a text file and pushing it to git – it required some consideration of content structure and proper markdown serialization. By implementing these /llms.txt routes in Remix, we're not only making our content more accessible to AI agents but also future-proofing our platform for the evolving landscape of developer education.

The best part? Since we're building on top of Sanity's structured content approach, adding new serialization formats or adapting to emerging AI content standards becomes a matter of extending our existing patterns rather than rebuilding from scratch. And since it's generated on demand from the same source, there no copy-pasting content, either!

As the relationship between developer tools and AI continues to evolve, we're curious to see how this enhanced agent experience will help developers learn and build with Sanity more effectively. Tell us if you have found it helpful!

After all, great educational content should work whether you're a human or an AI—and now ours does for both.