It has become common practise to present hex encoded hashes to users. While this does provide the advantage of brevity, the hashes are easily forgettable and intimidating to some. This article describes an alternative encoding, taking advantage of peoples propensity to remember pronounceable gibberish.
Hex encoding has arisen as a popular way to encode binary data as it provides an approachable way to encode 16 bits. For technical users this is great - we are well aware of the motivations behind hex, and are comfortable dealing with binary data.
Unfortunately, it has become fairly common practise to use hex encoded data in user facing situations which is less than ideal from a usability perspective. The lay person perceives such data as a stream of random letters, not caring about the link between the letters and the underlying data. In many url shortener web services, for example, the users are even expected to memorise this mumbo-jumbo.
One of the main culprits are hashes, in which hex has become the de-facto encoding. While we talk about hashes in this article, the same techniques can be used with any binary data - hashes are merely the most prominent example.
In an attempt to focus attention on this usability issue this paper proposes one approach to tackling the disconnect. By creating a meaningless, but pronounceable, phrase to represent binary data it is hoped to show that binary data can be represented in an approachable way. In short it is hoped to show that a method that presents hashes as
Barafa-Muduga is nicer than presenting them as
I am not affiliated with any academic institution, and therefore cannot afford the extortionate costs required to do proper background research. A quick google shows several random pronounceable password generators, but nothing that encodes data in a pronounceable way. It may be that I am replicating the work of someone else - I haven't heard of it, but if you know of similar work then please let me know.
The simplest way of implementing such a scheme would be a simple block cipher, encoding to base N, where N is the number of syllables in the alphabet. This poses the constraint that all syllables must be of equal length to facilitate the proper decoding.
Another solution is the n-ary huffman encoding of the data - this allows syllables of varying length. If the encoding tree is deterministically created from a list of syllables then, assuming the list is known by both the encoder and decoder, a dictionary is not required.
The solution I used as a proof of concept is worse than both of these - based upon the huffman tree scheme, it simply provides 16 syllables, thus allowing a hex symbol to be mapped to the syllable. Using a tree created from this list, the resulting words can be decoded. Although not optimal in terms of output (far more data could be encoded by using more syllables) this method was the result after I had hacked at several other approaches.
As the resulting syllable stream can be fairly long, I found it more readable to randomly insert hyphens into the syllable stream. These are simply stripped when the data is converted back. This does mean that the resulting words change every time, and it may be advantageous to use a deterministic method if the possibility for repetitive encoding exists.
From Hex and Back Again
As a simple example of the technique, the form below allows a sequence of hex to be converted into a pronounceable form and back again below.
To examine the code, view the source of the page.
In practise a block cipher would probably be preferable as it would be trivial to construct an alphabet of 2 letter syllables, and the simpler code probably outweighs the benefits of the more complicated n-ary huffman encoding.
This article was little more than a thought experiment in usability, but there is little doubt that a better method for displaying user-facing binary data exists.
All code is released into the public domain without any warranty. If you want to show your appreciation then you can buy me a beer or offer me a job.
Thanks to Philip for proofreading this for me