📆 Don't miss our Sanity Showcase virtual meetup on March 20th!

Give it in plain text: Making your content AI-Ready

Optimizing for humans AND machines: How we made Sanity Learn bilingual with /llms.txt. Beyond data models, structured content now powers agent experiences.

  • Knut Melvær

    Knut Melvær

    Head of Developer Community and Education

Published

I find myself peer-programming with LLMs more often, especially when I want to quickly bring an idea to life or add a minor feature to a code base. AI-powered coding really shines when it comes to exploring ideas and getting something that runs quickly off the ground.

The other day, I watched an AI bootstrap a new Astro + Sanity blog in about a minute (yes, I timed it - professional curiosity and all). It was impressive, but like many quick solutions, it wasn't quite what we'd recommend in our developer education materials. It missed our official Astro integration, skipped proper TypeGen setup, and the content model was, well, let's say it needed some of our hard-earned structured content wisdom.

This got me thinking about a bigger question: how do we ensure AI tools can access and understand our educational content the same way developers do? The answer turned out to be deceptively simple, but getting there? That's the story I want to share.

When "the most likely" isn't what you want

Here's the thing about LLMs: they're like that friend who's read every programming blog post ever written but hasn't actually worked on your specific project. They'll give you the most likely patterns based on what they've seen across GitHub repositories and blog posts. And while that's often good enough, it's not always what we'd call "the Sanity way."

To be completely honest, we haven't been able to share everything we've learned from helping customers and figuring out these patterns ourselves over the years. This can leave developers in a bit of a pickle if they rely too heavily on LLM-generated code without guidance.

This isn't just our problem - it's a challenge for anyone building developer tools in our AI-enhanced world:

  • The models learned from what's out there, including that Stack Overflow answer from 2019 that everyone keeps copying
  • The generated code works, but might not follow the best practices we've discovered since then
  • And unless you explicitly tell them, these models won't know about that cool new feature you just shipped last week

This is why AI-powered code editors like Cursor have features to quickly add a documentation site to its context.

Enter the "Agent Experience"

More broadly, this is also why user experiences in front of LLMs, like ChatGPT, increasingly go out on the web to bring more context into their prompts and output better and more accurate information.

And this is where the "Agent Experience" concept comes in handy. Coined by Mathias Biilmann, Founder/CEO of Netlify, in the blog post “Introducing AX: Why Agent Experience Matters”:

Is it simple for an Agent to get access to operating a platform on behalf of a user? Are there clean, well described APIs that agents can operate? Are there machine-ready documentation and context for LLMs and agents to properly use the available platform and SDKs? Addressing the distinct needs of agents through better AX, will improve their usefulness for the benefit of the human user.

I had been wrestling with this exact challenge a week before Matt's post. I wanted a straightforward way to feed all our learning platform content into Claude (and between you and me, I'm not entirely sure how good ChatGPT's web search is at getting all the content either).

A recent Vercel analysis of AI crawlers showed they're still finding their feet - they don't render JavaScript, are picky about content types, and tend to stumble around your site like a tourist without a map. We needed something better.

So, how do I go about this? The answer might not surprise you.

llms.txt: Like devs, agents love plain text too

As I'm writing this, there is a conversation about how best to accommodate agents visiting your site, provided that you want to make your content accessible. There seems to be a growing consensus around giving them content such as plain text and markdown.

In our opinion, Markdown is not a great format for storing content (you can read my 6000 words about why here, or just read this short summary), but it turns out to be great as a format to interface with LLMs (that has been trained on a lot of Markdown syntax).

The jury is still out on the conventions of making the plain text accessible, but one pattern seems to catch on. /llms.txt is proposed by the folks at Answer.ai, but there are also discussions on using the /.well-known/llms.txt IANA proposal. The documentation platform Mintlify has launched /llms.txt as a feature, as has Anthropic, Svelte, and Vercel's AI SDK for their documentation.

They generally seem to use this pattern for exposing content as plain text:

  • /llms.txt is an abbreviated index of all the content with links
  • /llms-small.txt is the abbreviated content for smaller context windows
  • /llms-full.txt is the complete content (sometimes optimized to fit within the context window limits)

Beyond Plain Text: The Structured Content Advantage

While converting content to plain text formats like /llms.txt provides a solid starting point for AI consumption, this approach has inherent limitations that echo the challenges developers face when working with unstructured content.

Plain text dumps have an undeniable simplicity - they're straightforward to implement and universal in compatibility. However, context is precious real estate. Using it inefficiently means slower queries and potentially less relevant responses. When developers paste thousands of tokens into a model's context window, that information needs to deliver significant value to justify its inclusion.

The plain text approach also fails to capture the rich relationships and metadata that make content truly valuable. LLMs can produce functional code that works, but may miss nuanced best practices like using named exports instead of default exports, implementing proper TypeScript definitions, or applying organization-specific patterns that have evolved through hard-won experience.

This is where Sanity's structured content approach offers significant advantages:

  1. Relationship-aware content: Unlike plain text, structured content understands that a blog post can have multiple authors, that images have alt text, and that references connect related pieces of content together.
  2. Queryable knowledge: With a structured approach, models could potentially query exactly what they need rather than processing the entire documentation corpus for every request.
  3. Contextual best practices: Structured content can encode not just what something is, but how it should be used according to established patterns and practices.
  4. Evolving knowledge representation: As your product and best practices evolve, structured content provides a framework for updating knowledge representations without rebuilding from scratch.

The future likely isn't about choosing between plain text or structured content for AI consumption, but rather about creating intelligent interfaces that leverage structured content's richness while maintaining the accessibility of plain text formats. Just as we've developed sophisticated serializers that transform Portable Text into React components, we need similar approaches that can present our structured content to AI in ways that preserve its semantic meaning.

We'll likely see conventions for agent-specific context emerging soon - not just turning content into flat text, but creating rich, contextual metadata layers that help AI systems navigate and utilize content more effectively.

The /llms.txt approach is a practical starting point, but just as web development has evolved beyond static HTML to component-based architectures, AI content consumption will likely follow a similar trajectory toward more structured, semantic representations.

Now, enough with the think piece, let's look at how we brought this pattern to Sanity Learn.

Building a plain text route for Sanity Learn

Our learning platform is built with React Router 7 formerly known as Remix, and Sanity (naturally), with lesson content stored in Portable Text fields. We have custom blocks and marks for the different learning affordances (tasks, code blocks, callouts, etc). Portable Text stores this block content as JSON, which makes it queryable and provides neat integration in front end frameworks because it lets you directly serialize content as props to your components.

Screenshot of the Sanity Studio lesson editor displaying a tutorial on installing a new React Router 7 (Remix) application. The editor interface shows formatting options, a code block with terminal commands, and a dropdown menu for adding references or images.

Here is an example of how Portable Text serialization to React looks like:

const myPortableTextComponents = {
  types: {
    image: ({value}) => <img src={value.imageUrl} />,
    callToAction: ({value, isInline}) =>
      isInline ? (
        <a href={value.url}>{value.text}</a>
      ) : (
        <div className="callToAction">{value.text}</div>
      ),
  },

  marks: {
    link: ({children, value}) => {
      const rel = !value.href.startsWith('/') ? 'noreferrer noopener' : undefined
      return (
        <a href={value.href} rel={rel}>
          {children}
        </a>
      )
    },
  },
}

const YourComponent = (props) => {
  return <PortableText value={props.value} components={myPortableTextComponents} />
}

Now, you might be thinking, "That's great for React, but what about our AI friends?"

GROQ lets you quickly query Portable Text arrays in your Sanity dataset as plain text. So, if we wanted to transform lesson content in our dataset to plain text, we could run (try it for yourself):

pt::text(*[_type == "lesson"].content)

The pt:text() function works in a pinch but is constrained as it only parses the text blocks, so it would miss any custom blocks (like code, images, etc.).

We had to take a more elaborate route for our use case to ensure that we also included all the code blocks and other types of content. A simplified implementation example follows to give you a sense of the steps involved.

Adding a llms.txt route to React Router 7

Let's break this down into manageable pieces.

Adding a llms.txt route to React Router 7 is fairly simple using the file-based router: llms[.]txt.ts in the routes folder, and then scaffold it to return plain text:

// routes/llms[.]txt.ts
export const loader = async () => {
  return new Response("Hello AI, welcome to Sanity Learn!", {
    headers: {
      'Content-Type': 'text/plain; charset=utf-8',
    }
  }

Nothing fancy yet - just your basic "hello world" to ensure everything's wired up correctly. Now comes the interesting part: querying our content. Here's where GROQ shines:

import groq from 'groq'
import {client} from '~/sanity/client'

export const loader = async () => {
 const query = groq`*[_type == "course"] {
    title,
    description,
    "slug": slug.current,
    lessons[]->{
      title,
      description,
      "slug": slug.current,
      content
    }
  }`

  const courses = await client.fetch(query)
  
  return new Response('Hello AI, welcome to Sanity Learn!', {
    headers: {
      'Content-Type': 'text/plain; charset=utf-8',
  })
}

Now we're getting somewhere! This query pulls in all our courses with their lessons. But raw data isn't very AI-friendly, so let's transform it into a nice markdown structure:

import groq from 'groq'
import {client} from '~/sanity/client'

export const loader = async () => {
  const query = groq`*[_type == "course"] {
    title,
    description,
    "slug": slug.current,
    lessons[]->{
      title,
      description,
      "slug": slug.current,
      content
    }
  }`

  const courses = await client.fetch(query)
  let markdown = ''

  for (const course of courses) {
    markdown += `# [${course.title}](/learn/${course.slug})\n\n${course.description || ''}\n\n`

    for (const lesson of course.lessons || []) {
      markdown += `## [${lesson.title}](/learn/${course.slug}/${lesson.slug})\n\n${lesson.description || ''}\n\n`
    }
  }

  return new Response(markdown, {
    headers: {'Content-Type': 'text/plain; charset=utf-8'}
  })
}

Taking it further: The full monty

The above works great for a basic index, but we wanted to go deeper with /llms-full.txt. This is where things get spicy. We'll need to handle custom blocks, code samples, document references, images – the works.

Copy the llms[.]txt.ts file to llms-full[.].txt.ts and install the @sanity/block-content-to-markdown library. Now, we can start parsing the Portable Text content and tackle custom components.

Below is an example of serializing a custom code block and shifting the headings one level down since we're including data about the parent courses. We're also dealing with links and internal references.

// /routes/llms-full[.]txt.ts
import blocksToMarkdown from '@sanity/block-content-to-markdown'
import groq from 'groq'
import {client} from '~/sanity/client'
import {urlFor} from '~/sanity/helpers'

const BASE_URL = 'https://www.sanity.io/learn'

function normalizeUrl(href) {
  if (!href) return ''
  if (href.startsWith('http')) return href
  return `${BASE_URL}${href.startsWith('/') ? '' : '/'}${href}`
}

export const serializers = {
  types: {
    block: (props) => {
      const text = Array.isArray(props.children) ? props.children.join('') : props.children

      switch (props.node.style) {
        case 'h1':
          return `## ${text}\n\n`
        case 'h2':
          return `### ${text}\n\n`
        case 'h3':
          return `#### ${text}\n\n`
        case 'h4':
          return `##### ${text}\n\n`
        case 'bullet':
          return `- ${text}\n`
        case 'number':
          return `1. ${text}\n`
        case 'lead':
          return `> ${text}\n\n`
        default:
          return `${text}\n\n`
      }
    },
    code: (props) => {
      const language = props.node.language || 'text'
      const filePath = props.node.filename || ''
      const header = filePath ? `${language}:${filePath}` : language
      const code = props.node.code

      return '```' + header + '\n' + code + '\n```'
    },
    image: (props) => {
      const href = urlFor(props.node).url()
      const alt = props.node.alt || props.node.asset.altText || 'Missing alt text'
      return `![${alt}](${href})\n`
    },
  },
  marks: {
    link: (props) => {
      const url = normalizeUrl(props.mark?.href || '')
      return `[${props.children}](${url})`
    },
    internalLink: (props) => {
      const href = props.mark?.slug?.current
      return href ? `[${props.children}](${normalizeUrl(href)})` : props.children
    },
  },
}

export const loader = async () => {
  const query = groq`*[_type == "course"] {
    title,
    description,
    "slug": slug.current,
    lessons[]->{
      title,
      description,
      "slug": slug.current,
      content[]{
        ...,
        markDefs[]{
          ...,
          _type == "internalLink" => @->{
            "slug": slug.current,
          }
        }
      }
    }
  }`

  const courses = await client.fetch(query)
  let markdown = ''

  for (const course of courses) {
    markdown += `# [${course.title}](${normalizeUrl(course.slug)})\n\n${course.description || ''}\n\n`

    for (const lesson of course.lessons || []) {
      markdown += `### [${lesson.title}](${normalizeUrl(course.slug)}/${lesson.slug})\n\n${lesson.description || ''}\n\n`
      if (lesson.content) {
        markdown += `${blocksToMarkdown(lesson.content, {serializers})}\n\n`
      }
    }
  }

  return new Response(markdown, {
    headers: {'Content-Type': 'text/plain; charset=utf-8'}
  })
}

The final touch

We're planning to add more specialized routes in the future – maybe /llms-small.txt for agents with smaller context windows, or course-specific routes for more focused interactions. We will also bring this over to the official Sanity documentation.

Since we built this on top of Sanity's structured content, adding new formats is just a matter of creating new serializers. No content duplication required!

Want to see it in action? Head over to sanity.io/learn/llms.txt (and /llms-full.txt) and bring the content into your favorite AI assistant. Hopefully, it will output better code. You can probably also ask it to adapt framework-specific course content to other frameworks with more luck.

From plain text to semantic knowledge: The future of Agent Experience

While our current implementation of /llms.txt delivers immediate value, I believe we're just scratching the surface of what's possible when structured content meets AI consumption. The true evolution will come when LLMs can not only read our content but understand its structure, relationships, and intent. Imagine if instead of dumping thousands of tokens of plain text into a context window, an AI agent could:

  • Query our content API directly with specific questions.
  • Understand the relationships between concepts, components, and pattern.
  • Access versioned documentation that matches the exact version of the tool a developer is using.
  • Receive real-time updates when best practices change.

This future isn't as far off as it might seem. The same structured content principles that power Sanity could be extended to create knowledge graphs and semantic APIs specifically designed for AI consumption. Just as REST APIs evolved to GraphQL and GROQ for more flexible querying, we may see the emergence of "AI-native" content interfaces that bridge the gap between human-readable documentation and machine-actionable knowledge.

For now, our /llms.txt implementation serves as an important stepping stone - making our content more accessible while we work toward more sophisticated approaches to Agent Experience. The beauty of building on Sanity's structured foundation is that we can evolve our approach without having to rebuild from scratch.

Making developer education AI-ready (and human-friendly, too!)

As we've seen, making our Learn platform more "agent-friendly" wasn't just about dumping content into a text file and pushing it to git – it required some consideration of content structure and proper markdown serialization. By implementing these /llms.txt routes in React Router 7, we're not only making our content more accessible to AI agents but also future-proofing our platform for the evolving landscape of developer education.

The best part? Since we're building on top of Sanity's structured content approach, adding new serialization formats or adapting to emerging AI content standards becomes a matter of extending our existing patterns rather than rebuilding from scratch. And since it's generated on demand from the same source, there no copy-pasting content, either!

As the relationship between developer tools and AI continues to evolve, we're curious to see how this enhanced agent experience will help developers learn and build with Sanity more effectively. Tell us if you have found it helpful!

After all, great educational content should work whether you're a human or an AI—and now ours does for both.