Changes between Initial Version and Version 1 of img2twit


Ignore:
Timestamp:
05/22/2009 12:43:33 AM (16 years ago)
Author:
Sam Hocevar
Comment:

bit allocation discussion

Legend:

Unmodified
Added
Removed
Modified
  • img2twit

    v1 v1  
     1A few notes and thoughts about compressing images to 140 characters for use on Twitter.
     2
     3The first I read about this "competition" was [http://www.flickr.com/photos/quasimondo/3518306770/in/set-72057594062596732/ here].
     4
     5== Bit allocation discussion ==
     6
     7Twitter allows for 140 characters in a message. UTF-8 is allowed.
     8
     9UTF-8 is restricted to the formal Unicode definition by RFC 3629. It means that the only legal UTF-8 characters range from U+0000 to U+10FFFF. The following restrictions must also be added:
     10 * The high and low surrogates, used for UTF-16 encoding, restricting the Unicode range to U+0000..U+D7FF and U+E000..U+10FFFF.
     11 * The 66 non-characters.
     12
     13The final size of this set is:
     14
     15{{{
     16#!latex
     17$(2^20 + 2^16) - 2^11 - 66 = 1111998$
     18}}}
     19
     20The number of bits that can be encoded using 140 such characters is computed as follows:
     21
     22{{{
     23#!latex
     24$n_{bits} = floor(\dfrac{140 \log(1111998)}{\log(2)}) = 2811$
     25}}}
     26
     27In theory, 2811 bits is therefore the maximum we can stuff into a Twitter message. However, a lot of these characters are undefined, not yet allocated or are control characters. As of Unicode 5.1 there are 100507 graphic characters, reducing the number of expressed bits to:
     28
     29{{{
     30#!latex
     31$n_{bits} = floor(\dfrac{140 \log(100507)}{\log(2)}) = 2326$
     32}}}
     33
     34We'll go on with this value of 2326 encodable bits.
     35