kota's memex

Javascript's strings are encoded with UTF-16 and are represented as arrays. This causes all sorts of weird issues because for lots of characters you can almost treat the array elements as unicode code points. However, they are not.

let horseShoe = "🐎👟"
console.log(horseShoe.length);
// 4
console.log(horseShoe[0]);
// Invalid half character
console.log(horseShoe.charCodeAt(0));
// 55357 (Code of the half character)
console.log(horseShoe.codePointAt(0));

get a codepoint

.codePointAt()

Don't use charCodeAt() it doesn't properly handle > 16 bit unicode code points.

looping

Looping over a string will correctly use unicode code points.

let roseDragon = "🌹🐲"
for (let c of roseDragon) {
	console.log(c)
}
// 🌹
// 🐲