img2twit – Caca Labs

Context Navigation

Version 5 (modified by Sam Hocevar, 17 years ago) (diff)
--

A few notes and thoughts about compressing images to 140 characters for use on Twitter.

The first I read about this "competition" was here.

Bit availability

Twitter allows for 140 characters in a message. UTF-8 is allowed.

UTF-8 is restricted to the formal Unicode definition by RFC 3629. It means that the only legal UTF-8 characters range from U+0000 to U+10FFFF. The following restrictions must also be added:

The 2¹¹ high and low surrogates, used for UTF-16 encoding, restricting the Unicode range to U+0000..U+D7FF and U+E000..U+10FFFF.
The 66 non-characters.

The final size of this set is:

$(2^{20} + 2^{16}) - 2^{11} - 66 = 1111998$

The number of bits that can be encoded using 140 such characters is computed as follows:

$n_{bits} = \mathrm{floor}\left(\dfrac{140 \log(1111998)}{\log(2)}\right) = 2811$

In theory, 2811 bits is therefore the maximum we can stuff into a Twitter message. However, a lot of these characters are undefined, not yet allocated or are control characters. As of Unicode 5.1 there are 100507 graphic characters, reducing the number of expressed bits to:

$n_{bits} = \mathrm{floor}\left(\dfrac{140 \log(100507)}{\log(2)}\right) = 2326$

We'll go on with this value of 2326 encodable bits.

Bit allocation

A compressed image usually contains the following information:

The image geometry information (width and height)
Optional colour information (palette)
Elementary picture elements (encoded as pixels, triangles, vectors...)

Given the amount of compression we are doing, there is little point in compressing images larger than 512×512. This reduces image geometry information to 18 bits, leaving us with 2308 bits to encode the image information.

Whether to use a palette or to encode colour information into the picture elements is undecided yet. We'll cover both options.

Strategy 1: colour information in picture elements

Each picture element will hold data for:

coordinates
colour information
additional control information

Coordinates could be absolute (therefore requiring 16 or 14 bits, maybe 12) or relative. I would favour a coordinate system relative to predefined image cells because there is a good chance that each cell will hold a point. Assuming at least 8 horizontal and vertical subdivisions, 6 bits can be gained this way. The final coordinate bit allocation is now 10, 8 or 6. We'll pick 8 to be safe for now: 16 X values and 16 Y values.

Using 7 bits per colour allows for the following options:

full bit range usage: 4 red values, 8 green values, 4 blue values
almost full bit range usage: 5 red values, 5 green values, 5 blue values

Finally, a weight value could be added, using a final bit.

The proposed allocation is then 16, allowing 144 points to be stored in the following configurations:

12×12
10×14 (losing 4 points)
9×16
8×18
7×20 (losig 4 points)
6×24

Not storing a palette

To do.

Attachments (12)

lena_std_scaled.png (115.7 KB) - added by Sam Hocevar 17 years ago.
Mona_Lisa_scaled.jpg (29.4 KB) - added by Sam Hocevar 17 years ago.
so-logo.png (9.6 KB) - added by Sam Hocevar 17 years ago.
mandrill_scaled.jpg (12.2 KB) - added by Sam Hocevar 17 years ago.
Cornell_box_scaled.png (45.6 KB) - added by Sam Hocevar 17 years ago.
twitter3.png (71.5 KB) - added by Sam Hocevar 17 years ago.
twitter5.png (78.8 KB) - added by Sam Hocevar 17 years ago.
minimona.jpg (536 bytes) - added by Sam Hocevar 17 years ago.
minimona2.png (15.0 KB) - added by Sam Hocevar 17 years ago.
twitter1.png (79.9 KB) - added by Sam Hocevar 17 years ago.
twitter2.png (103.5 KB) - added by Sam Hocevar 17 years ago.
twitter4.png (20.0 KB) - added by Sam Hocevar 17 years ago.

Download all attachments as: .zip

Download in other formats:

Plain Text