Discussion about special characters in Portable Text to HTML conversion
15 replies
Last updated: Oct 5, 2022
S
Does the portable text to html accept special characters? We need to be able to put characters like Ă (we currently have the old blocks to html, but will be upgrading soon)
Oct 4, 2022, 7:10 PM
K
It works fine with emojis, Japanese characters, Cyrillic characters, etc. So it should be fine, there is no reason these characters won’t work. They’re not more special than any other. :)
Oct 4, 2022, 8:37 PM
S
I am able to get some special characters to work with our current blocks to html, but not all. Here is a screenshot from a coworker of what she is trying to insert vs what she gets
Oct 4, 2022, 8:44 PM
S
And here is what I type in vs what I get
Oct 4, 2022, 8:47 PM
Y
This looks like an encoding issue, not a font issue. For example, á (a with acute) is represented in UTF-8 encoding by the two bytes
0xC3
0xA1. Those same two bytes in Windows-1252 encoding represent à (A with tilde) followed by ¡ (inverted exclamation mark).
Oct 4, 2022, 9:36 PM
S
Is this something I can fix or is it a sanity thing or something different all together?
Oct 5, 2022, 4:27 PM
S
ooh, maybe if I can change it to UTF-16?
Oct 5, 2022, 5:28 PM
Y
UTF-8 is the most common and standard encoding in the web world, so I would use that if I had the choice. Sanity's API serves the results in that encoding as far as I can see (their
Content-Typeheading says
application/json;charset=utf-8here), so you'd have to convert it if you need something else.
Oct 5, 2022, 5:35 PM
Y
Most likely you can fix it by configuring the web site (wherever the right-hand-side parts of the screenshots come from) to serve the content as UTF-8. Exactly how that is done will depend on how the site is hosted, but the HTTP header should say
Content-Type: text/html;charset=utf-8.
Oct 5, 2022, 5:38 PM
Y
Failing that, putting
<meta charset="utf-8">in the actual HTML's
headelement is an option.
Oct 5, 2022, 5:39 PM
S
oh ok, I misunderstood and thought some of the characters I wanted weren't in UTF-8. Just looked at the character list and see I was incorrect. I will look at what you suggested
Oct 5, 2022, 5:53 PM
Y
Ah, no worries. UTF-8 can encode everything in Unicode, just like UTF-16 🙂
Oct 5, 2022, 5:55 PM
S
Added
<meta charset="UTF-8" />in the preview where it was missing and it now displays as expected. So happy it was an easy fix. Thank you for the guidance!!!
Oct 5, 2022, 8:30 PM
Sanity– build remarkable experiences at scale
Sanity is a modern headless CMS that treats content as data to power your digital business. Free to get started, and pay-as-you-go on all plans.