CoursesRefactoring content for migrationUploading assets efficiently
Track
Replatforming from a legacy CMS to a Content Operation System

Refactoring content for migration

Lesson
8

Uploading assets efficiently

Effortlessly manage and transform high-resolution images with Sanity's asset pipeline, avoid unnecessary uploads, and optimize content migration with metadata and in-memory cache.

Log in to mark your progress for each Lesson and Task

Sanity comes with a capable asset pipeline that allows the content team to upload one high-resolution image and developers to transform it on-demand to whatever size and format they need. Gone are the days when content teams had to upload or manage different-sized duplicates of the same image!

Sanity will also extract metadata from an image, which can be used to tailor the presentation or query for image assets using GROQ. So, moving your images into Sanity has many upsides.

Migrating an asset into Sanity is made convenient with the client.assets.upload() method in the JavaScript client. If all you have is a URL to the image, this is the minimum amount of code required to upload it in a Node script:

import {Readable} from 'node:stream'
import type {UploadClientConfig} from '@sanity/client'
async function uploadImage(url: string, metadata: UploadClientConfig) {
const {body} = await fetch(url)
return client.assets.upload('image', Readable.fromWeb(body), metadata)
}

Once again, remember that the image might not exist, and the URL could be broken. Improve the code above by assuming nothing about the response to your fetch!

Helpfully, images uploaded to the Content Lake are given deterministic IDs based on the image itself. Uploading the same image binary multiple times will always result in the same ID and will not create duplicate documents.

However, uploading the same image every time you run your migration script is not ideal. It’s slow and unnecessary. This can be countered by taking metadata from your data source and saving it on the asset documents created in your dataset for every uploaded file. There’s a dedicated source key on asset documents that we can use.

For example, your existing image may have a record like this:

{
"type": "image",
"id": 647,
"url": "http://www.example.com/image.jpg",
}

You could now call the function above using this metadata to write the “source” of the image when uploading.

uploadImage(
doc.url,
{
source: {
name: "Legacy CMS",
id: doc.id,
url: doc.url
}
}
)

Now, every image that is uploaded contains queryable metadata with values that match your existing data source.

So, instead of constantly uploading every image, you could query the dataset at the beginning of your migration script to create an “in-memory cache” of all existing images.

type ExistingImage = {_id: string; sourceId: number}
const query = '*[
_type == "sanity.imageAsset"
&& defined(source.id)
]{
_id,
"sourceId": source.id
}'
const existingImages = await client.fetch<ExistingImage[]>(query)

Now, during your migration script, it’s easier to check if the image already exists in the dataset by looking for its source. If found, reference its _id in the dataset. If not, upload it!

Once an image is uploaded, you only need to have its _id field to set the reference:

// 1. Query your existing data source:
{
"type": "post",
"id": 4986,
"featuredMedia": 104,
}
// 2. Upload image and get its Sanity-generated _id from the response
// 3. createOrReplace your new Sanity document:
{
"_type": "post"
"_id": "post-4986",
"featuredMedia": {
"_type": "image",
"asset": {
"_ref": "image-b7e1c5136d3b935ebed18298bead5fa1cda2785e-946x473-jpg",
"_type": "reference"
}
},
}

When uploading many assets, it is important to limit the number of concurrent uploads to prevent hitting the rate limits. A popular method to mitigate this is the use of a library like p-limit.

It will allow you to prepare any number of asset uploads in advance, but then control how many are performed concurrently.

See Technical limits for more details about rate limits during mutations
The lesson Uploading assets performantly demonstrates how to do everything in this lesson – including throttled concurrent uploads – within the context of a WordPress migration script.

There are benefits to hosting images within the Content Lake, but there may be instances where you have huge volumes of images with a high turnover already stored on a third-party CDN with stable URLs. Examples include real estate websites where listing data and images typically come from external tools.

In this instance, it may be best to write the URL to an image as a string on a document and still serve the image from its original CDN without actually uploading the images to the Content Lake.

Courses in the "Replatforming from a legacy CMS to a Content Operation System" track