The number of code pages defined by #IBM for its mainframes and #IBMPC is absolutely staggering.
Take a look: https://en.wikipedia.org/wiki/Code_page#IBM_code_pages
Thank goodness for #Unicode!
The number of code pages defined by #IBM for its mainframes and #IBMPC is absolutely staggering.
Take a look: https://en.wikipedia.org/wiki/Code_page#IBM_code_pages
Thank goodness for #Unicode!
So "U+1F4D3 NOTEBOOK" or "U+1F4D4 NOTEBOOK WITH DECORATIVE COVER" which is better? But both are necessary for international language communication.
v
The (Mostly) Complete Unicode Spiral
https://shkspr.mobi/blog/2022/07/the-mostly-complete-unicode-spiral/
I present to you, dear reader, a spiral containing every0 Unicode 14 character in the GNU Unifont. Starting at the centre with the control characters, spiralling clockwise through the remnants of ASCII, and out across the entirety of the Basic Multi Lingual Plane. Then beyond into the esoteric mysteries of the Higher Planes1.
Zoom in for the massiveness. It's a 10,000x10,000px image. Because the Unifont displays individual characters in a 16x16px square, it is quite legible even when printed out on a domestic laser printer at 600dpi:
Terence Eden is on Mastodon
@edent
Replying to @edentStill a Work In Progress.
Created a proper spiral of every Unicode character in the Unifont.
At 600dpi, it is *just about* legible.
The full thing would need to be printed on A2 sized paper! pic.x.com/jsrofjt4xd10
3
014:09 - Fri 08 July 2022
I also made it as a square spiral - which fits into a smaller space.
Again, printed out at 600dpi it is readable. Just!
Terence Eden is on Mastodon
@edent
Replying to @edentPrinted out on A4 @ 600dpi.
Amazingly, it is just about legible!
Still a bit more work to do, but quite pleased with the results so far. pic.x.com/d5kwfwhq271
1
007:18 - Tue 05 July 2022
Printed onto A0 - 841mm square - it's a bit better. The ASCII set is readable:
But characters in CJK weren't particularly legible:
If I wanted the 16px symbols to each be 5mm wide, I'd need to print this on paper over 3 metres wide!
WHY??!?
Because visualising one-dimensional data structures in two-dimensional space is fun! That's why
I was inspired by seeing two lovely piece of artwork recently.
The first was 2015's Unicode in a spiral by Reddit user cormullion.(Click to embiggen.)
It's gorgeous, but doesn't include all characters. Oh, and you also have to rotate your head to read each character.
There's a larger version which covers a lot more of the Basic Multilingual Plane It's an 18MB PDF. And, because of the resolution of the resolution of the font, it needs to be printed out on a 1 metre square at a minimum.
The second interesting thing I found was a 2016 Hilbert Curve of Unicode:
smly
@smly
UNICODE in the picture frame. The placement of characters along the Hilbert curve is beautiful. Original: github.com/hakatashi/unic… pic.x.com/f69hwyzlvc5
0
011:18 - Thu 26 October 2017
The Hilbert Curve poster is beautiful. But it only goes up to Unicode 10 - and we're on Unicode 14 by now. Despite the æsthetically pleasing nature of fractal curves, I find them quite un-intuitive.
Neither show off the gaps in Unicode. That is, where there is space to fit more symbols.
So I wanted to do something which satisfied these criteria:
HOW?!?!
I've written before about the wonders of the Unifont. It contains all of the Unicode 14 glyphs - each squeezed down into a 16x16px box. Even emoji!
Well. Mostly…
Limitations
Although I wanted every character, there are some practical problem. Firstly:
Unifont only stores one glyph per printable Unicode code point. This means that complex scripts with special forms for letter combinations including consonant combinations and floating vowel marks such as with Indic scripts (Devanagari, Bengali, Tamil, etc.) or letters that change shape depending upon their position in a word (Indic and Arabic scripts) will not render well in Unifont.
So there are some scripts which will look a bit ugly. And some characters which won't be well represented.
The second issue is one of size. Some of the newer characters are simply too big:
Scripts such as Cuneiform, Egyptian Hieroglyphs, and Bamum Supplement will not be drawn on a 16-by-16 pixel grid. There are plans to draw these scripts on a 32-by-32 pixel grid in the future.
That means it misses out on characters like 𒀰
, 𒁏
and, of course, 𒀱
. Which, to be fair, would be hard to squeeze in!
The third problem is that Unicode is updating all the time. Although the Unifont is at Version 14 - Python's Unicode Database is stuck at V13. Luckily, there is a library called UnicodeData2 which includes V14.
But, given those limitations, I thought it was possible to craft something nice.
Python Code
I split the problem into several parts.
Plotting equidistant points along a spiral
As ever, I turned to StackOverflow and found a neat little solution:
def spiral_points(arc=1, separation=1): # Adapted from https://stackoverflow.com/a/27528612/1127699 """generate points on an Archimedes' spiral with `arc` giving the length of arc between two points and `separation` giving the distance between consecutive turnings - approximate arc length with circle arc at given distance - use a spiral equation r = b * phi """ def polar_to_cartesian(r, phi): return ( round( r * math.cos(phi) ), round( r * math.sin(phi) ) ) # yield a point at origin yield (0, 0) # initialize the next point in the required distance r = arc b = separation / (2 * math.pi) # find the first phi to satisfy distance of `arc` to the second point phi = float(r) / b while True: yield polar_to_cartesian(r, phi) # advance the variables # calculate phi that will give desired arc length at current radius (approximating with circle) phi += float(arc) / r r = b * phi
Drawing a squaril
I wanted a grid which looked like this:
9 A B8 1 27 0 36 5 4
I found a blog post and source code for a spiral array. It's pretty simple - although I'm sure there's lots of ways to do this:
n = 12nested_list= [[0 for i in range(n)] for j in range(n)]low=0high=n-1x=1levels=int((n+1)/2)for level in range(levels): for i in range(low,high+1): nested_list[level][i]= x x+=1 for i in range(low+1,high+1): nested_list[i][high]= x x+=1 for i in range(high-1,low-1,-1): nested_list[high][i]= x x+=1 for i in range(high-1,low,-1): nested_list[i][low]= x x+=1 low+=1 high-=1for i in range(n): for j in range(n): print(nested_list[i][j],end="\t")# print the row elements with # a tab space after each element print()# Print in new line after each row
However, that printed the spiral backwards:
B A 92 1 83 0 74 5 6
Luckily, Python makes it easy to reverse lists:
for l in nested_list : l.reverse()
Drawing the characters
Turning a number into a Unicode character is as simple as:
unicode_character = chr(character_int)
But how do we know if the font contains that character? I stole some code from StackOverflow which uses the FontTools library:
from fontTools.ttLib import TTFontfont = TTFont(fontpath) # specify the path to the font in questiondef char_in_font(unicode_char, font): for cmap in font['cmap'].tables: if cmap.isUnicode(): if ord(unicode_char) in cmap.cmap: return True return False
But, of course, it is a bit more complicated than that. The Unifont contains some placeholder glyphs - the little black square with hex digits in them that you see here:
I didn't want to draw them. But they exist in the font. So how do I skip them?
Using the Python Unicode Database it's possible to look up the name of a Unicode code-point. e.g. chr(65)
is LATIN CAPITAL LETTER A
. So if there is no name in the database, skip that character.
But, of course, it is a bit more complicated than that! The Unicode database only goes up to Unicode 13. And, for some reason, the control characters don't have names. So the code becomes a tangled mess of if...else
statements. Ah well!
Drawing the characters should have been easy. I was using Pillow to draw text. But, despite the pixely nature of the font itself Pillow was performing anti-aliasing - creating unwanted grey subpixels.
I thought the fix was simple:
jonodrew@mastodon.social
@jonodrew
Replying to @xandypty@xandypty @edent draw = ImageDraw.Draw(image)
draw.fontmode = '1'
...1
1
008:33 - Mon 04 July 2022
Sadly, that does introduce some other artefacts - so I've raised a bug with Pillow.
In the end, I kept the anti-aliasing, but then converted the grey pixels to black. And then converted the entire image to monochrome:
threshold = 191image = image.point(lambda p: p > threshold and 255)image = image.convert('1')
Putting It All Together
Once I'd go the co-ordinates for either the spiral or squaril, I drew the character on the canvas:
draw.text( (x , y), unicode_character, font=font, fill=font_colour)
Except it didn't work!
Sadly, Pillow can't draw non-printable glyphs - even when the font contains something drawable. This is because it can't pass the correct options to the harfbuzz library.
So, I went oldskool! I converted every glyph in the font to a PNG and saved them to disk.
from fontforge import *font = open("unifont_upper-14.0.04.ttf")for i in range( len(font) ) : try: font[i].export( "pngs/" + str(i) + ".png", pixelsize=16, bitdepth=1) except Exception as e: print ( str(i) ) print ( e )
Look, if it's hacky but it works; it isn't hacky! Right?
From there, it's a case of opening the .png and pasting it onto the canvas:
character_png = Image.open('pngs/' + str(character_int) + ".png")image.paste( character_png, (round(x) , round(y)) )
It was too big!
And now we hit the final problem. The image was over 20,000 pixels wide. Why? The Variation Selectors! The last of which is at position U+E01EF
. Which means the spiral looks like this:
Here they are in close up:
So I decided to remove that block!
Source Code
All the code is on GitLab. Because GitHub is so 2019…
Licensing?
The GNU Unifont has a dual licence. GPL2 and OFL. The image is a "document" for the purposes of the OFL and the GPL font exemption. But, I guess you could reverse engineer a font-file from it. So, if you use the image to generate a font, please consider that it inherits the original licence. If you just want to print it out, or use it as art, then the image itself is CC BY-SA.
This is based on my lay-person's understanding of the various copyleft licence compatibility issues. Corrections and clarifications welcome!
What's next?
I would like to print this out on paper. At 200dpi, it would be about 1.5m squared. Which I guess is possible, but might be expensive.
At 600dpi, the square will just about fit on A3 paper. But the quality is atrocious. Even at A0 it wasn't great. Realistically, it needs to be at least 3.3 metres along each side! No idea where I can find a printer which will do that. Or where in my house I'd have space for it!
Of course, it will need updating whenever there is a new release of either Unicode or Unifont.
If you have any suggestions or feedback - please drop them in the comment box!
Well, look, it is complicated. Unicode is Hard™. ↩︎
Not to be confused with the Demonic Planes of Forbidden Unicode. ↩︎
It’s 2025 and:
- There is still no vertical text site mode for #Wikimedia in any language using vertical text.
- #Wikipedia still forces “simplified” Chinese on browsers.
- There is still no true IDS or CangJie composition matrix for characters in #Unicode.
- SignWriting still has no proper #Unicode inclusion, no IDS analogue, no inventory of signs, and is still mostly written by mouse drag and drop in a mishmash of SVG and HTML.
- There is no proper SignWriting IME, such as a Rime schema.
To say this state of affairs is cultural propaganda by mass technic inertia would be an understatement. Infotech is functional colonialism. Thats really all there is to say.
Filed under #崇洋媚外
Updated my unilookup utility. It now accepts unicode strings on stdin as well as a command line parameter. Can be installed directly from PyPi. https://github.com/fastjack/unilookup
#unicode #python
In occasione del World Emoji Day, il Unicode Consortium annuncia l'arrivo di nuovi emoji in Unicode 17. Tra i nuovi arrivi: Trombone
Bigfoot
Orca
Apple svilupperà questi emoji, disponibili dalla prossima primavera. #WorldEmojiDay #Emoji #Unicode
Avec @MoritzBrouhaha, découvrez l'histoire du standard informatique Unicode, utilisé par tout le monde à travers le globe dans nos communications quotidiennes.
https://www.paris-web.fr/2025/conference/a-la-decouverte-du-monde-au-travers-de-lunicode
🜰^ᯣ⥿ᯣ^🜰
there is an #Unicode proposal to make the cat paws bigger
https://www.unicode.org/L2/L2025/25125r-alchemical-glyphs.pdf
The recycling symbol in a git branch name, what a time to be alive
Also, nice of #github to warn about possibly hidden characters, but not sure it applies in this case
Unicode characters for Creative Commons symbols
I've just discovered that there are symbols since Unicode 13.0 for CC licences
CC: 🅭
BY: 🅯
NC: 🄏
ND: ⊜
SA: 🄎
PD: 🅮
CC0: 🄍
Es ist 2025 und überall kennt und nutzt man #unicode und #utf - außer in diesem hartnäckig sich weigernden Kino namens #filmpalast in #karlsruhe
Got a bug report for @novelwriter from someone who uses Cuneiform text in their work. These are 4 byte Unicode symbols, and turned out to be very tricky to handle.
The app is built with Python, which will switch a string to UCS-4 when it contains such characters, so the characters always have a single index in the string.
However, the Qt library uses UTF-16. That means 4-byte characters use two slots, creating a mismatch in indices between the two representations.
[Перевод] Руководство по эффективной локализации в Unreal Engine
Локализация — один из ключевых, но часто недооценённых аспектов разработки игр. По мере роста глобальной аудитории игроки ожидают видеть игры на своём родном языке, и локализация становится не роскошью, а необходимостью. Однако локализация — это не просто перевод текста. Она включает в себя решение технических задач, учёт культурных особенностей и оптимизацию рабочего процесса, чтобы обеспечить плавный и комфортный игровой опыт на нескольких языках. В этой статье я расскажу о сложностях локализации в Unreal Engine, опираясь на свой опыт работы над Wizard of Legend 2 . Мы разберём сбор и управление текстом, а также проблемы с форматированием, гендерно‑зависимым языком и обработкой шрифтов. Также я расскажу о ключевых аспектах, которые могут вызвать задержки, и о том, как их минимизировать.
Fascinating: Two feeds for @hinterzarten_news couldn't be properly pasted from the website anymore, because they changed the dates from having   ("Hair Space") between the dots and the numbers, into ​ ("Zero Width Space"). Shout-out to the creator of https://www.mauvecloud.net/charsets/CharCodeFinder.html, which is a really helpful tool for finding out, what #character you exactly have in front of you.
Главный вопрос к почте на кириллице
Почта с адресом info@пример.бел технически возможна и мы в HB.BY её поддерживаем. Но спроса почти нет. В статье разбираем, кто мечтал о кириллической почте и что от неё отталкивает, чтобы узнать, к чему всё приведёт.
Interesting to see letters like ,
, and
proposed for inclusion in Unicode!