-- Leo's gemini proxy
-- Connecting to gemi.dev:1965...
-- Connected
-- Sending request
-- Meta line: 20 text/gemini
2020-01-09 | #crawlers #metadata #hashtags | @Acidus
I think using hashtags in gemtext are pretty cool.
@JBanana is off to a great start with their Hashtag index of all of Gemini space.
However, the index is pretty noisy right now, which makes it hard to discover hashtags that seem intentional, or hashtags that have more than 1 or 2 pages. Here is some of the low hanging fruit improvements I think can be made.
Some of these hashtags are obviously CSS colors hex for example (#262133). Anything that matches #[0-9a-fA-F]{3} or #[0-9a-fA-F]{6} should probably be ignored,
Ignore preformatted text blobs. A little tricker, but preformatted text is often used in gemtext for things like ASCII art, computer source code, and so its more likely to find accidental hashtags.
Only index gemtext? I run crawlers on gemini space and there is a lot of plaintext, source code, ascii art, random file types with weird extensions and odd MIME types This is awesome, but leads again to accidental hashtags. Perhaps only text/gemini documents should be indexed for hashtags.
The hashtag index itself can be improved to more easily surface compelling content.
Include the number of occurrences next to a hashtag, so users can easily find popular topics.
Have a "sorted by occurrence" view in addition to the alphabetical view.
Consider hiding hashtags with only one occurence.
Use a thesaurus to group hashtags of similar meaning together (assuming they are single words)
Look into stemming words so you can combine similar hashtags into the same hashtag (e.g. #arguing and #argument)
Search? Finding pages with #docker AND #macos could be helpful
-- Response ended
-- Page fetched on Tue May 21 23:14:27 2024