-- Leo's gemini proxy

-- Connecting to gemi.dev:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini

Hashtag Index Improvements

2020-01-09 | #crawlers #metadata #hashtags | @Acidus


I think using hashtags in gemtext are pretty cool.

I love @JBanana's #️⃣♊ Hashtags


@JBanana is off to a great start with their Hashtag index of all of Gemini space.

gemini://freeshell.de/hashtags/


However, the index is pretty noisy right now, which makes it hard to discover hashtags that seem intentional, or hashtags that have more than 1 or 2 pages. Here is some of the low hanging fruit improvements I think can be made.


Filtering out accidental hashtags:

Some of these hashtags are obviously CSS colors hex for example (#262133). Anything that matches #[0-9a-fA-F]{3} or #[0-9a-fA-F]{6} should probably be ignored,

Ignore preformatted text blobs. A little tricker, but preformatted text is often used in gemtext for things like ASCII art, computer source code, and so its more likely to find accidental hashtags.

Only index gemtext? I run crawlers on gemini space and there is a lot of plaintext, source code, ascii art, random file types with weird extensions and odd MIME types This is awesome, but leads again to accidental hashtags. Perhaps only text/gemini documents should be indexed for hashtags.


Interface improvements


The hashtag index itself can be improved to more easily surface compelling content.

Include the number of occurrences next to a hashtag, so users can easily find popular topics.

Have a "sorted by occurrence" view in addition to the alphabetical view.

Consider hiding hashtags with only one occurence.


Really crazy ideas:

Use a thesaurus to group hashtags of similar meaning together (assuming they are single words)

Look into stemming words so you can combine similar hashtags into the same hashtag (e.g. #arguing and #argument)

Search? Finding pages with #docker AND #macos could be helpful

-- Response ended

-- Page fetched on Tue May 21 23:14:27 2024