Version 2 (modified by 16 years ago) (diff) | ,
---|
A few notes and thoughts about compressing images to 140 characters for use on Twitter.
The first I read about this "competition" was here.
Bit allocation discussion
Twitter allows for 140 characters in a message. UTF-8 is allowed.
UTF-8 is restricted to the formal Unicode definition by RFC 3629. It means that the only legal UTF-8 characters range from U+0000 to U+10FFFF. The following restrictions must also be added:
- The high and low surrogates, used for UTF-16 encoding, restricting the Unicode range to U+0000..U+D7FF and U+E000..U+10FFFF.
- The 66 non-characters.
The final size of this set is:
The number of bits that can be encoded using 140 such characters is computed as follows:
In theory, 2811 bits is therefore the maximum we can stuff into a Twitter message. However, a lot of these characters are undefined, not yet allocated or are control characters. As of Unicode 5.1 there are 100507 graphic characters, reducing the number of expressed bits to:
We'll go on with this value of 2326 encodable bits.
Attachments (12)
- lena_std_scaled.png (115.7 KB) - added by 16 years ago.
- Mona_Lisa_scaled.jpg (29.4 KB) - added by 16 years ago.
- so-logo.png (9.6 KB) - added by 16 years ago.
- mandrill_scaled.jpg (12.2 KB) - added by 16 years ago.
- Cornell_box_scaled.png (45.6 KB) - added by 16 years ago.
- twitter3.png (71.5 KB) - added by 16 years ago.
- twitter5.png (78.8 KB) - added by 16 years ago.
- minimona.jpg (536 bytes) - added by 16 years ago.
- minimona2.png (15.0 KB) - added by 16 years ago.
- twitter1.png (79.9 KB) - added by 15 years ago.
- twitter2.png (103.5 KB) - added by 15 years ago.
- twitter4.png (20.0 KB) - added by 15 years ago.
Download all attachments as: .zip