Version 10 (modified by Sam Hocevar, 16 years ago) (diff)

new results

A few notes and thoughts about compressing images to 140 characters for use on Twitter.

The first I read about this "competition" was here.

Discussion

Bit availability

Twitter allows for 140 characters in a message. UTF-8 is allowed.

UTF-8 is restricted to the formal Unicode definition by RFC 3629. It means that the only legal UTF-8 characters range from U+0000 to U+10FFFF. The following restrictions must also be added:

  • The 2¹¹ high and low surrogates, used for UTF-16 encoding, restricting the Unicode range to U+0000..U+D7FF and U+E000..U+10FFFF.
  • The 66 non-characters.

The final size of this set is:

$(2^{20} + 2^{16}) - 2^{11} - 66 = 1111998$

The number of bits that can be encoded using 140 such characters is computed as follows:

$n_{bits} = \mathrm{floor}\left(\dfrac{140 \log(1111998)}{\log(2)}\right) = 2811$

In theory, 2811 bits is therefore the maximum we can stuff into a Twitter message. However, a lot of these characters are undefined, not yet allocated or are control characters. As of Unicode 5.1 there are 100507 graphic characters, reducing the number of expressed bits to:

$n_{bits} = \mathrm{floor}\left(\dfrac{140 \log(100507)}{\log(2)}\right) = 2326$

We'll go on with this value of 2326 encodable bits.

Bit allocation

A compressed image usually contains the following information:

  • The image geometry information (width and height)
  • Optional colour information (palette)
  • Elementary picture elements (encoded as pixels, triangles, vectors...)

Given the amount of compression we are doing, there is little point in compressing images larger than 512×512. This reduces image geometry information to 18 bits, leaving us with 2308 bits to encode the image information.

Whether to use a palette or to encode colour information into the picture elements is undecided yet. We'll cover both options.

Strategy 1: colour information in picture elements

Each picture element will hold data for:

  • coordinates
  • colour information
  • additional control information

Coordinates could be absolute (therefore requiring 16 or 14 bits, maybe 12) or relative. I would favour a coordinate system relative to predefined image cells because there is a good chance that each cell will hold a point. Assuming at least 8 horizontal and vertical subdivisions, 6 bits can be gained this way. The final coordinate bit allocation is now 10, 8 or 6. We'll pick 8 to be safe for now: 16 X values and 16 Y values.

Using 7 bits per colour allows for the following options:

  • full bit range usage: 4 red values, 8 green values, 4 blue values
  • almost full bit range usage: 5 red values, 5 green values, 5 blue values

Finally, a weight value could be added, using a final bit.

The proposed allocation is then 16, allowing 144 points to be stored in the following configurations:

  • 12×12
  • 10×14 (wasting 4 point slots)
  • 9×16
  • 8×18
  • 7×20 (wasting 4 point slots)
  • 6×24

Strategy 2: colour information in a separate palette

To do.

Image reconstruction

Image reconstruction is an interpolation problem on a Delaunay triangulation. We use the natural neighbour coordinates to interpolate between nodes and obtain a first-order smooth image.

Preliminary results

Here are the results of img2twit using 140 characters, restricted to U+4e00..U+9fa5 (CJK Unified Ideographs). The 一一一一 characters at the end of some lines indicate wasted bits that the algorithm is unable to use efficiently yet.

輎污涧噊訞巚戴邨姎士踤倭餜洈塉留宒督虞韀澓觀腆趝禄南栥註谎蝲啎狍麃砘焼謩熁迣菝峰嶺綇檂挀黭朿泊攻确碌埬萚鉄毉瘣璚鯱幩冠恈欘肸熈璶礴瘸蚉绋駡碮挺馮譵膀峞黦墮蠅嚓铙觞睡盧孱鳜载襳迠廅榣缹興戰髅垨衦蒺昺醦颥八圪進桊絞螐嗢盉隬岠慷鎃尜淽阨塶沛顭僚計鯬賥占牓吙硘鷸騢挓磪捤鵘坰硖肕萘饼皹侯滗

姲椟筃偡荛璻琛隞夅镬磤湋亹璠熗凋煐攪泲僶壴鋇廷砪臗旝鳑禛渂澣贱涜慘齋芐梡楉迤椻姫閴飙苟稞痡揦麲笉申檫窬偩掛炘忧阙膇样箠愍畄帠掭歜黵歫徯堠傌蕨鱏鷥軠慰糐掭辢猆孏錦戹濸巉魽嚲腫就恩沽厲測沎婲舁铧蠃犱闆醛焊茴鋈叶狝痺矹铓疿镭緄熐魆郗忤櫯嫟韥烥彸漻藉醺夝趼惻炘訶焝汒蝧潚诪躗丌一一一一一一

No image "Cornell_box.png" attached to img2twit

郚辡帅轻垟比芼瘳僺輪磀蜮箁捆婊滭九輤涇玪曃擾褗眗鯮鴚瘟糱靌軏膒跬泵庠譀奘骰偂穖蠵詟沛駮胅卛唁澥込圊澟褡怍两鋅蠤振殰芝耹漹樆蜬龆绎薄核琣捷椁桡痻擏翵峺侞骯溞淎搼曇壧屳跖忄篵雸皰堫谳物渷厸则陃妔醀垻槥彾澪烃瓾鐬轧錈宓皪瀅榦睿宖陗邱鄎巊晨誫鲛蒹瓬棟层鶾牪耣騝墙麇拫途薱鱤朤慦豹怂傍殐欜一

坍嗝昫噰碒荓奊镂胆鶊灦翵歓八罡睂釬篾媹蹛赜言齊瓰谕豭璞捘帔穼柞娀癥卢栀儻湐崽洧俅蔿尘犁倴苠鹯焋朓嗸哉姟箊燐蘫矻豣底氲浧喋婏氾藽憺鄐嵔武枣歚雛胠蓸豑逿娳繜婔涙賁醙淸鍮蕙悢龡賚厴惌鎛膟缷概狙峴鋱鮬刬杂蒟驇骧袹璹導练閦饥頖筎梋炻鼑鎄薕粮軝帨雡豄瀴诨霉窐搉裷橌漣濯超瞉秘驔包颾蘱礜鼦挈蓒諹

豪弅淶鑆斳愔耐俬秱戂孤访红艶劃嬌躑擣昗呎腆猎扭僬题猛嬰頽恓劉檭橀韮闼帣赑峈鮦宝鰢斣麙蓑騰騺鹸希關鈯穨唖秴斮圫究傲駛朘铃邂巢沿譓船櫃晒峩泪蝻鵲皲販口谹鎺侒戣耔凉蠛抏槱戛蝂荄勞攞咉闏涪彃沏全偫吒溸乎洸螕慹鳩弭蚕弣寽砰薨埻铥恣噿悏镏雈壭蒬礡靑徠鼛慗泏郄渺婥俦攨賌羢髙壶耔僪爯姉蔮蠬伣豖弫

Attachments (12)

Download all attachments as: .zip