-- Leo's gemini proxy

-- Connecting to freeshell.de:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini;lang=en-GB

Tags FAQ

On other services, hashtags are popular. I wondered if something similar could work on Gemini too. So I wrote a crawler to find tags of various types and link to them.

This is an FAQ about that.

Here are the tags used in more than one capsule

Here are all the tags

Can I use a hashtag?

Yes. If you write anything in a Gemini capsule that includes a #hashtag then the crawler should (eventually) find it and link to your page from the index. One quirk is that gemtext headings begin with a hash. So the crawler ignores any hashes at the start of a line.

The crawler only follows gemini://... links (not http, gopher, etc). It only indexes gemtext (content where the type is text/gemini). It ignores any other content such as pdf, images, text/plain etc.

The crawler prioritises links found on the following aggregators.

Antenna

bot en deriva (Spanish)

CAPCOM

Cosmos

flounder feed

Geddit - no longer working :-(

gmisub

SDF

SDF (seems they have two aggregators?)

Smol Pub feed

Crawling is (intentionally) slow. If you use a hashtag and it appears on an aggregator, it may take a few hours to be indexed. Content that isn't on an aggregator (or linked from a post there) may not be noticed.

I already have a tag system of my own. I'll stick with that.

Your tags may still be included. I noticed a few other tagging systems. The most common is that capsules have something like...

=> gemini://example.com/tags/foo

...and that's a page of links to posts about "foo". Those tags are included here too. Other kinds of tag also recognised are:

🏷 foo, bar
Tags: foo, bar
=> somelink Tags: foo bar

On many web sites, hashtags are links. How does that work here?

Links are nice because readers can see what other people have written about the same tag. Well, I can find your tags, but I can't insert links in other people's content. If you want links then you have to do them yourself. Here's an example.

#ilovehashtags

And here's how to do that.

=> gemini://freeshell.de/hashtags/_ilovehashtags #ILoveHashtags

Notice that in the URL the hash is replaced with an underscore because hash has a different meaning in a URL.

Tags aren't case sensitive, at least with the English alphabet. There are hashtags in Cyrillic, Arabic, Chinese and Japanese, and I'm not qualified to say if case folding works there.

BTW, someone does love hashtags. I didn't make that up.

Another way for tags to become links would be if a client made inline links out of tags. Some people would hate that, so if you enable this in your client, you should probably make it optional.

The tags in your index are rubbish!

You can fix this by writing content with better tags! :-)

Some of them were probably not meant as tags (code comments or ASCII art, for example can have a # in them and so look like hashtags) but the crawler can't tell which tags are "real". Some pruning does occur, but it's a manual task that only happens occasionally.

What if I'm not interested in tags?

Oddly, I'm not that interested either. If you don't ever use them, no problem. My crawler will read your content once every couple of years, find no tags, and that's the end of that.

What if I don't want my content crawled for tags?

To stop your whole capsule being crawled, add something like this to your robots.txt:

User-agent: hashtags
Disallow: /

The crawler should not fetch anything that's disallowed for these user agents:

"*"

"indexer"

"hashtags"

More about robots.txt on Gemini

If you find that your content is in the tag index when you'd rather it weren't, I'm happy to remove it.

Contact details here

I love this! Please will you keep it going forever?

I make no promises. Lots of things could make this go away, not least that this is a public access server and the nice bloke in charge may tell me to stop wasting bandwidth and disk space.

I hate this! Please will you stop?

I'm afraid not, but see "What if I'm not interested in tags?" above.

How's the crawl going?

The crawler is currently reindexing the whole of Geminispace, and all stats got reset. When it has been everywhere it can find it will return to only crawling new content found on aggregators (and anything new linked from there). Content that has been crawled before is ignored for well over a year, but will eventually get crawled again when there's a full re-index.

Other crawlers report the size of geminispace to be more URLs than this crawler has seen, presumably because it's ignoring a ton of stuff that isn't gemtext. You can see the latest numbers at the

crawl stats page

Lupa stats for comparison

back to the capsule root

-- Response ended

-- Page fetched on Tue Apr 30 14:34:29 2024