-- Leo's gemini proxy

-- Connecting to circadian.gemlog.org:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini

Identity Again: Visual Hashing

Feedback on my ideas around identity on Gemini has emphasized the need for decentralization and privacy.

Decentralization is great, but:

> The wise engineer does not build a distributed system when a single server would do; distributed programming is hard.

> The wisest engineer of all uses no servers.

So, I’d like to look at cutting the idea back to the very minimum: visual hashing; no new servers at all.

The Feature

The idea is very simple: servers like Bubble would give users the option to display a visual hash of their client sha1.

So I could go to my Bubble account page and enable “show visual hash”.

Other users can then go to my account page and check my visual hash against my visual hash on other sites—and on my capsule—if they want to check who I am.

And that’s the whole feature.

The Visual Hash

If it’s going to be the whole feature, a few desirable properties come to mind that I did not envisage last time.

Not Reversible

If the visual hash is not reversible, then publishing it does not actually give away the user’s client certificate sha1. This is easy to do, by hashing the sha1, and seems desirable.

Easy To Check

If the whole feature is about people manually checking, it must be easy to check. Ideally, it’s easy to remember, too. This is why I first thought about using words—they’re very easy to remember compared to complex patterns.

Originally I was thinking about displaying all the bits in the sha1, but there’s no particular need for that—we should choose how much entropy the visual hash should contain based on how hard it would be to reverse engineer a specific visual hash.

Provably Complex

We should have strong evidence that there is a particular level of difficulty in duplicating the visual hash—that is, creating a client certificate from scratch that has the same visual hash.

Easy To Implement

If each server computes the visual hash then it will have to be easy to implement, as implementations will be wanted in lots of languages and it’s crucial that they produce equal output always.

New Proposal

Here is what I’ve come up with so far.

First we take the sha256 of the sha1. The algorithm is widely available, and it’s the hash used by bitcoin. That gives us numbers for how fast it’s possible to compute sha256s if a truly ridiculous amount of effort and electricity is dedicated to the problem.

Then we use three word lists: a list of 256 adjectives (8 bits), a list of 1024 nouns (10 bits) and a list of 512 verbs (9 bits). Further, we use two types of symbol: joining symbol from the set `;,` (1 bit), and ending symbol from the set `?.` (1 bit).

We then take bits from the sha256 output, taking each time the number of bits needed for a word or symbol list, and looking up in that list. We use these the lists in this order:

Adjective noun verb joining-symbol adjective noun verb joining-symbol adjective noun verb ending-symbol.

For a total of 8 + 10 + 9 + 1 + 8 + 10 + 9 + 1 + 8 + 10 + 9 + 1 = 84 bits.

Finally, so that it looks nice: we capitalize the first letter.

Here are is an example identity:

Dank bear meets; sunk bake hurls; okay dish damps?

And four more identities:

Long list ties, wise soup flows; edgy page demos.
Wry node zips; soft gold dings; tame vote chips.
Able pat fails, same duty falls, sunk fork calms.
Rash carp apes, fake pint flits; rash fog flits?

I find these pleasing—they’re quite poetic—and I think they’ll be easy to compare and remember. What do you think?

Requirements Revisited

Not Reversible

The use of sha256 makes the visual hash not reversible; the original client sha1 is prohibitively expensive to discover.

Easy To Check

This one’s a matter of opinion, and possibly user testing. What do you think?

Provably Complex

The cost of computing ones visual hash has a lower bound of one sha256 hash operation.

The visual hash space is 84 bits, which is 1.9 x 10 ^ 25 different sentences.

The current worldwide bitcoin hash rate is around 150 trillion hashes per second, 1.5 x 10 ^ 14.

That means it would take all the hashers in the world 1.2 x 10 ^ 9 seconds to cover the visual hash space: 4000 years. This seems like about the right level of complexity; if the whole world is trying to crack your visual hash, the time it will take is still measured in thousands of years.

Easy To Implement

I don’t think anyone else is running a Gemini server in Dart, so my first implementation is not much use to people.

Here is a re-implementation in bash:

#!/bin/bash --

# Prints visual hash of binary data in file called "input".
#
# Data files needed:
#
# 256 adjectives in adjectives.txt
# 1024 nouns in nouns.txt
# 512 verbs in verbs.txt
# 2 joining symbols in joiner.txt
# 2 ending symbols in ender.txt.

# Load from a binary file called "input". Take the sha256 and covert the hex
# output to uppercase.
data=$(sha256sum --binary input | sed -e 's# .*##' | tr a-z A-Z)

# Convert the hex sha256 to binary, pad to 256 bits.
export BC_LINE_LENGTH=0
data=$(echo "ibase=16;obase=2;$data" | bc)
while [[ $(echo -n "$data" | wc -c) != 256 ]]; do
  data = "0$data"
done

# Loop taking bits and appending to the output.
string=$(for round in adjectives,8 nouns,10 verbs,9 joiner,1 \
    adjectives,8 nouns,10 verbs,9 joiner,1 \
    adjectives,8 nouns,10 verbs,9 ender,1; do
  file=$(echo $round | cut -d, -f1)
  bits=$(echo $round | cut -d, -f2)

  # Take the next "bits" bits from "data" and convert to base 10. Add one.
  index=$(echo "ibase=2;obase=1010;$(echo $data | head -c$bits) + 1" | bc)

  # Look up that word or symbol, output it.
  head -n$index $file.txt | tail -n1 | tr '\n' ' ' | sed -e 's# ##'

  # If a space should be output here, output it.
  if [[ $file != verbs && $file != ender ]]; then echo -n " "; fi

  # Trim "data" by the number of bits that were taken.
  data=$(echo $data | tail -c+$bits | tail -c+2)
done
echo)

# Output the visual hash with first character upper cased.
echo ${string^}

I think this shows that it’s easy enough to implement.

Of course there should be some test cases to go with this—I’ll add some when the word lists are finalized, see note below.

If you’d like to contribute an implementation for your favourite language(s), please go ahead! I’ll add them to this post, and/or link to a more definitive place for them.

Requirements Met?

This all seems pretty good so far.

Feedback and suggestions welcome!

I’ve removed all data storage from the ID server and updated it with the new algorithm so you can try it now, if you like:

gemini://id.gemlog.org

Once the algorithm is finished and published so anyone can run it, there will be no need for the server and I’ll take it down.

Word Lists

The word lists are some work to create: the words need to be well known, different and interesting enough to make memorable hashes, and yet have to exclude any words offensive or dubious enough that people might be unhappy with the output.

Assuming there is enough interest to actually finalize the algorithm, here’s the plan: I’ll work on them a bit more until I’m happy with them, then publish draft versions and accept community input—of words to remove and of words to add—before publishing final versions.

More Languages

Words seem to work very well for visual hashes—except their advantages only apply if you know the language. It’s would be possible to maintain word lists and sentence templates for more languages; but then you wouldn’t be able to compare between visual hashes rendered in different languages.

Possibly the answer is to show the visual hash in the main language of the site it’s on, then also show the raw sha256 bytes for comparisons across languages. In Gemtext, the sha256 bytes could go in the “alt text” of the preformatted text block.

Feedback 📮

So far today, 2024-05-12, feedback has been received 2 times. Of these, 0 were likely from bots, and 2 might have been from real people. Thank you, maybe-real people!

   ———
 /     \   i a
| C   a \ D   n |
   irc   \     /
           ———

-- Response ended

-- Page fetched on Sun May 12 17:45:08 2024