29 Aug 2012

A survey of font rendering techniques

I spent several hours reading about font rendering and all its nuances, and it's interesting stuff, so here it is, in a summarized form. None of this is original, and my knowledge of fonts is limited. I'm a programmer, not a designer or a typographer.

A font describes the shape of a letter as a mathematical curve, which has infinite resolution. To render that onto a fixed (and often low) resolution display, you need to map points of the curve to pixels on screen. This is called font rasterization. But points can fall in between two physical pixels (or, more accurately, four, considering both dimensions). A naive rasterization would pick one of the surrounding physical pixels and turn it on. But you can do better by turning on all pixels on both sides of the point, to varying degrees, which is called anti-aliasing. That is, grayscale instead of black and white. Wikipedia explains this better, and in detail.

Fonts are actually anti-aliased along both the X and Y dimensions, meaning that a point of the curve you're rasterizing affects the four pixels on all four sides of it rather than just the two on the left and right, which would be the case if you're anti-aliasing only on the X dimension. Anti-aliasing along two dimensions gives better results than anti-aliasing along for the same reason that anti-aliasing along one dimension gives better results than not anti-aliasing at all.

Subpixel anti-aliasing

You can go further and treat the red, green and blue sub-pixels of a pixel as separate pixels for the purpose of rendering, meaning that you suddenly have thrice the resolution, naturally producing better results. This is called subpixel rendering.

Subpixel rendering works only along the X dimension, since the red, green and blue subpixels are usually placed horizontally next to each other [1]. This means that you still need non-subpixel anti-aliasing vertically.

As I said, subpixel rendering requires knowledge of the subpixel layout. OS X, Windows and Linux support both RGB and RBG.

If you are wondering what other kinds of subpixel layouts are possible, one example is RGBW, with a white subpixel in addition to red, green and blue [2].

Or RGBG, also called PenTile, which has two green subpixels for every red or blue one, that too with each green subpixel being smaller than a red or a blue one [3]. Stranger combinations like RG-B-GR are possible, as are displays that arrange the subpixels in a diamond-like structure rather than linearly.

Another example is RGBY, which has a yellow subpixel in addition to red, green and blue [4].

What all these cases have in common is that they use a subpixel layout that the font renderer isn't expecting. In these cases, subpixel rendering won't help and will actually hurt, because you are now effectively turning on arbitrary pixels [5].

Android and iOS don't use sub-pixel rendering because those devices can be rotated, which causes the subpixels to be vertically aligned rather than horizontally, breaking subpixel anti-aliasing. So Android and iOS just use whole pixel anti-aliasing.

Hinting

This brings us to hinting. Before we get into hinting, we need to understand that it's essentially another way to bridge the gap between an infinite resolution mathematical curve and a fixed resolution display (anti-aliasing being the first). The difference is that hinting changes the glyph to better fit the screen. What do I mean by that? Isn't anti-aliasing also changing the glyph? No. Anti-aliasing changes the pixels, yes, but only in an effort to better display the design created by the font designer. Whereas hinting changes the design itself, slightly, to better fit the screen. It does this by slightly nudging a point aside to fall on a whole pixel. In other words, changing the position or size of things rendered on screen to line up with the pixel grid.

Another way of understanding hinting is to look at why anti-aliasing isn't sufficient. Yes, anti-aliasing helps, but turning on multiple pixels to shades of gray is good, but not ideal. What would be ideal? To happen to have a pixel at the exact location we need it to be, so that we can turn on just that one pixel to black, instead of multiple pixels to shades of gray, which loses resolution. Hinting does this by changing the glyphs to better fit the pixel layout. So, it's also called grid fitting.

For example, consider the letter m, which consists of three (vertical) strokes. Assume that the leftmost one happens to fall on a pixel boundary. Things are good so far. Now, it's possible, even likely, that at the font size chosen, and depending on the font, the distance between two adjacent strokes is 2.3 pixels. This means that without hinting, the middle stroke will need to be anti-aliased, which we saw is not ideal. Instead, hinting shifts it a little so that the distance is either 2 or 3 pixels [6].

As with antialiasing, you can fit to the pixel grid, or to the sub-pixel grid.

Obviously, hinting should be done with caution, since otherwise the hinted font will look substantially different from the un-hinted one. Even then, hinting is controversial, with some font renderers, like Apple's ones, choosing to do little or no hinting, to prioritize rendering the font as it's designed, even if it's a little blurry, while Windows does some hinting for better legibility.

The other problem with hinting is that it's device-dependent. When we talk about lining things up to the pixel grid, the pixel grid varies from device to device, because they have different resolutions. This means that the same text renders differently on different devices, like screen vs print, two different laptops, or even the same laptop when plugged in to an external display. Line breaks may be in different places, resulting in a document having more or fewer lines, or even an additional page, which breaks users' expectations.

Grid fitting also breaks apps. If you're a programmer building a UI, and you measure the width of your text in a certain font, that width can change when the user plugs in an external monitor, breaking your UI because you suddenly have too much or too little space, resulting in odd gaps between text or clipped text.

For these reasons, Microsoft toned down the grid-fitting and moved to resolution-independent rendering with the GDI+ API (introduced in Windows XP), deprecating the older GDI API. This means that the metrics of text remains the same across devices, while the actual pixels can still vary to take advantage of the pixel grid. Later, Microsoft further tuned the rendering in the DirectWrite API (introduced in Windows 7).

This leaves one aspect — the formats of the files in which fonts are distributed. There are two of them — TypeType and OpenType. TrueType fonts can embed hinting information, either as general rules, in terms of what control points in a glyph can move, etc, or down to the pixel level if needed. OpenType fonts are largely automatically hinted. Manual hinting is a lot of work, and automatic hinting does reasonably well, but the best commercial fonts are still manually hinted.

Further, OpenType fonts are defined as cubic curves rather than as quadratic ones. Font designers work in terms of cubic curves, so OpenType fits better in the font design process. Further, a quadratic curve can always be represented as a cubic curve but not vice-versa, but the designer can use multiple quadratic curves to represent a cubic curve, so at the end of the day, there's no difference in what an OpenType or a TrueType font can represent.

So, the bottom line is that the difference between TrueType and OpenType is largely relevant while designing fonts, not while using them. Web fonts (WOFF) are only a wrapper around a TrueType or an OpenType font and not a new format.


[1] They can't be placed next to each other both horizontally and vertically, since that would mean that a pixel has nine subpixels, which wouldn't make sense. So monitor vendors often place them next to each other horizontally.

[2] The rationale behind RGBW is that the eye is more sensitive to luminance than to chrominance. Or, in English, more sensitive to changes in brightness than to changes in color. A red subpixel is created by putting a red color filter in the path of a white light (the LCD backlight). A white subpixel doesn't have a color filter, so it's brighter. Or, alternatively, generates the same brightness while consuming less power. The point is that you are trading off color accuracy for brightness, which the eye is more sensitive to.

[3] The eye is more sensitive to green than to other colors, so by having green subpixels than red or blue, you're conveying more information to the eye, resulting in better perceived resolution.

[4] The rationale is that more the primary colors, better the color accuracy. Or, more precisely, greater the percentage of the visible spectrum that can be reproduced. This is because color perception does not work like a point in 3D space, which can be fully described by three co-ordinates — any more co-ordinates are superfluous. The gamut of human vision is not a triangle but a convex shape. See the CIE chromaticity diagram. Given a convex shape, a triangle contained in it cannot capture all the points in it. A quadrilateral does better, and a pentagon, even better. In general, more the number of primary colors, greater the fraction of the gamut that can be represented.

[5] It's theoretically possible to write a font renderer that handles some or all of these cases, but in practice it hasn't been done, so subpixel rendering works only if you have RGB or RBG subpixels, of the same size, arranged side by side horizontally.

[6] Even if the distance between the stems of the "m" is an integral number of pixels, it's possible that the width of each stroke is not an integral number of pixels, which again can be fixed by hinting.

1 comment:

  1. This comment has been removed by a blog administrator.

    ReplyDelete