Version 5 (modified by 16 years ago) (diff) | ,
---|
A few notes and thoughts about compressing images to 140 characters for use on Twitter.
The first I read about this "competition" was here.
Bit availability
Twitter allows for 140 characters in a message. UTF-8 is allowed.
UTF-8 is restricted to the formal Unicode definition by RFC 3629. It means that the only legal UTF-8 characters range from U+0000 to U+10FFFF. The following restrictions must also be added:
- The 2¹¹ high and low surrogates, used for UTF-16 encoding, restricting the Unicode range to U+0000..U+D7FF and U+E000..U+10FFFF.
- The 66 non-characters.
The final size of this set is:
The number of bits that can be encoded using 140 such characters is computed as follows:
In theory, 2811 bits is therefore the maximum we can stuff into a Twitter message. However, a lot of these characters are undefined, not yet allocated or are control characters. As of Unicode 5.1 there are 100507 graphic characters, reducing the number of expressed bits to:
We'll go on with this value of 2326 encodable bits.
Bit allocation
A compressed image usually contains the following information:
- The image geometry information (width and height)
- Optional colour information (palette)
- Elementary picture elements (encoded as pixels, triangles, vectors...)
Given the amount of compression we are doing, there is little point in compressing images larger than 512×512. This reduces image geometry information to 18 bits, leaving us with 2308 bits to encode the image information.
Whether to use a palette or to encode colour information into the picture elements is undecided yet. We'll cover both options.
Strategy 1: colour information in picture elements
Each picture element will hold data for:
- coordinates
- colour information
- additional control information
Coordinates could be absolute (therefore requiring 16 or 14 bits, maybe 12) or relative. I would favour a coordinate system relative to predefined image cells because there is a good chance that each cell will hold a point. Assuming at least 8 horizontal and vertical subdivisions, 6 bits can be gained this way. The final coordinate bit allocation is now 10, 8 or 6. We'll pick 8 to be safe for now: 16 X values and 16 Y values.
Using 7 bits per colour allows for the following options:
- full bit range usage: 4 red values, 8 green values, 4 blue values
- almost full bit range usage: 5 red values, 5 green values, 5 blue values
Finally, a weight value could be added, using a final bit.
The proposed allocation is then 16, allowing 144 points to be stored in the following configurations:
- 12×12
- 10×14 (losing 4 points)
- 9×16
- 8×18
- 7×20 (losig 4 points)
- 6×24
Not storing a palette
To do.
Attachments (12)
- lena_std_scaled.png (115.7 KB) - added by 16 years ago.
- Mona_Lisa_scaled.jpg (29.4 KB) - added by 16 years ago.
- so-logo.png (9.6 KB) - added by 16 years ago.
- mandrill_scaled.jpg (12.2 KB) - added by 16 years ago.
- Cornell_box_scaled.png (45.6 KB) - added by 16 years ago.
- twitter3.png (71.5 KB) - added by 16 years ago.
- twitter5.png (78.8 KB) - added by 16 years ago.
- minimona.jpg (536 bytes) - added by 16 years ago.
- minimona2.png (15.0 KB) - added by 16 years ago.
- twitter1.png (79.9 KB) - added by 15 years ago.
- twitter2.png (103.5 KB) - added by 15 years ago.
- twitter4.png (20.0 KB) - added by 15 years ago.
Download all attachments as: .zip