A few notes and thoughts about compressing images to 140 characters for use on Twitter. The first I read about this "competition" was [http://www.flickr.com/photos/quasimondo/3518306770/in/set-72057594062596732/ here]. == Bit availability == Twitter allows for 140 characters in a message. UTF-8 is allowed. UTF-8 is restricted to the formal Unicode definition by RFC 3629. It means that the only legal UTF-8 characters range from U+0000 to U+10FFFF. The following restrictions must also be added: * The 2¹¹ high and low surrogates, used for UTF-16 encoding, restricting the Unicode range to U+0000..U+D7FF and U+E000..U+10FFFF. * The 66 non-characters. The final size of this set is: {{{ #!latex $(2^{20} + 2^{16}) - 2^{11} - 66 = 1111998$ }}} The number of bits that can be encoded using 140 such characters is computed as follows: {{{ #!latex $n_{bits} = \mathrm{floor}\left(\dfrac{140 \log(1111998)}{\log(2)}\right) = 2811$ }}} In theory, 2811 bits is therefore the maximum we can stuff into a Twitter message. However, a lot of these characters are undefined, not yet allocated or are control characters. As of Unicode 5.1 there are 100507 graphic characters, reducing the number of expressed bits to: {{{ #!latex $n_{bits} = \mathrm{floor}\left(\dfrac{140 \log(100507)}{\log(2)}\right) = 2326$ }}} We'll go on with this value of 2326 encodable bits. == Bit allocation == A compressed image usually contains the following information: * The image geometry information (width and height) * Optional colour information (palette) * Elementary picture elements (encoded as pixels, triangles, vectors...) Given the amount of compression we are doing, there is little point in compressing images larger than 512×512. This reduces image geometry information to 18 bits, leaving us with 2308 bits to encode the image information. Whether to use a palette or to encode colour information into the picture elements is undecided yet. We'll cover both options. == Strategy 1: colour information in picture elements == Each picture element will hold data for: * coordinates * colour information * additional control information Coordinates could be absolute (therefore requiring 16 or 14 bits, maybe 12) or relative. I would favour a coordinate system relative to predefined image cells because there is a good chance that each cell will hold a point. Assuming at least 8 horizontal and vertical subdivisions, 6 bits can be gained this way. The final coordinate bit allocation is now 10, 8 or 6. We'll pick 8 to be safe for now: 16 X values and 16 Y values. Using 7 bits per colour allows for the following options: * full bit range usage: 4 red values, 8 green values, 4 blue values * almost full bit range usage: 5 red values, 5 green values, 5 blue values Finally, a weight value could be added, using a final bit. The proposed allocation is then 16, allowing 144 points to be stored in the following configurations: * 12×12 * 10×14 (wasting 4 point slots) * 9×16 * 8×18 * 7×20 (wasting 4 point slots) * 6×24 == Strategy 2: colour information in a separate palette == ''To do.'' == Image reconstruction == Image reconstruction is an interpolation problem on a Delaunay triangulation. We use the natural neighbour coordinates to interpolate between nodes and obtain a first-order smooth image.