-- Leo's gemini proxy

-- Connecting to freeshell.de:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini;lang=en-GB

Loopy links 🔄

One thing I found by looking at where my #hashtag crawler went was that some people have a bottomless pit of links.


The first one I saw was someone exposing a repository of the site content. That included a link to the content itself. Not a link to the actual site, but to the copy of it in the repo. That had a repo link, where you could find a site link, and so on. I spotted this when it got to several levels of site/repo/site/repo/site/repo and told the crawler to give up. I'm mildy curious how deep that could go. I suppose it's limited by the maximum length of a gemini request (assuming that either the server or the client respected that limit).


There's another one where someone has a public bookmarking system which is paginated. The first page has a link like /bookmarks?2 that goes to the next page, and so on. The interesting part is that this link is there even if there are no more bookmarks. I tried /bookmarks?9999999999999999999 and it broke. One less 9 was ok. I told my crawler to give up on those too.


Having read Sean Connor's experiences with the crawlers that won't give up on an infinite redirect loop, I think that some crawlers are probably probing the limits of those two capsules.


#hashtags

#crawler


back to gemlog

-- Response ended

-- Page fetched on Sat May 4 03:59:42 2024