-- Leo's gemini proxy

-- Connecting to michaelnordmeyer.com:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini;lang=en-US

Meta Information About Gemini Crawlers


Here is some info about different Geminispace crawlers for people, who look into their log files every now and then and want to know what’s going on. Ideally, I would like to have the respective crawler owner publish this information on their crawler’s sites prominently in one place, so I can delete this page.


The contact info is from the linked sites excactly how it is written there. Contact me, if you want your personal info to be changed or removed, or if you have a crawler and want to publish missing information.


While the amount of indexed pages and capsules is not strictly necessary, as a user I would appreciate this information to make an informed choice about which search engine to use. It is omitted here, because the numbers may change daily.


Note: `pubnix-aware` means `example.com/~foo` and `example.com/~bar` are treated as different sites with each having its own `robots.txt`. That allows for `example.com/~foo` being equivalent to `foo.example.com`. Additionally, this allows for telling pages and people’s Gemini capsules apart to have better statistics.


Last update: 2024-03-22


Indexers


Geminispace.info

WILL SHUTDOWN 2024-06-01

IPv4: 82.165.79.210

IPv6: 2a02:247a:207:8e00:1::1

robots.txt user agents: gus, indexer, *

index update date: yes (but might not have crawled the whole Geminispace), gemini://geminispace.info/statistics

pubnix-aware: yes

content types: text/gemini, text/plain, text/markdown

content size limit: 10 MB text

crawl frequency: Every three days beginning on the first of the month

crawl delay: ?

rate limited: ?

redirects: yes

removal after unavailability: 1 month

filtered URIs: gemini://geminispace.info/documentation/filters

API: no

contact: spacecaptain@geminispace.info or ~rwa/geminispace.info@lists.sr.ht

repository: https://sr.ht/~rwa/geminispace.info


Kennedy

IPv4: 64.149.155.184

IPv6: 2600:1700:1731:d0f:35a7:42d4:c71f:a02b

robots.txt user agents: indexer, *

index update date: gemini://kennedy.gemi.dev/stats

pubnix-aware: ?

content types: text/gemini, text/plain, images

content size limit: 10 MB

crawl frequency: monthly

crawl delay: 1.5 s

rate limited: yes

redirects: yes

removal after unavailability: immediately with the next crawl

filtered URIs: yes

API: no

contact: acidus at gemi dot dev

repository: https://github.com/acidus99/Kennedy


TLGS

IPv4: 23.88.52.182

IPv6: 2a01:4f8:c0c:8d14::1

robots.txt user agents: tlgs, indexer, *

index update date: gemini://tlgs.one/statistics

pubnix-aware: ?

content types: text/gemini, text/plain, text/markdown, text/x-rst, plaintext

content size limit: 2.5 MB text

crawl frequency: ?

crawl delay: ?

rate limited: ?

redirects: yes, max. 5

removal after unavailability: 1 month

filtered URIs: yes

API: limited, gemini://tlgs.one/doc/api

contact: pilot \at tlgs.one

repository: https://github.com/marty1885/tlgs


Gemplex

IPv4: n/a

IPv6: 2a01:4f9:c012:c6f7::1

robots.txt user agents: elektito/gemplex, crawler, indexer, researcher

index update date: no

pubnix-aware: ?

content types: ?

content size limit: ?

crawl frequency: ?

crawl delay: 1 s

rate limited: yes

redirects: 5

removal after unavailability: no

filtered URIs: yes

API: no

contact: mostafa@sepent.com

repository: https://git.sr.ht/~elektito/gemplex


Freeshell Hashtags

IPv4: 116.202.128.144

IPv6: 2a01:4f8:231:482b::2

robots.txt user agents: hashtags, indexer, *

index update date: gemini://freeshell.de/hashtags/stats.gmi

pubnix-aware: no

content types: text/gemini

content size limit: none

crawl frequency: once

crawl delay: 5 s, but from a randomized queue of all indexed pages

rate limited: on “44 SLOW DOWN” no more pages will be requested

redirects: not handled correctly

removal after unavailability: no

filtered URIs: yes

API: no

contact: jbanana at dreamwidth dot org

repository: not open source


AuraGem

IPv4: 67.60.44.127

IPv6: n/a

robots.txt user agents: indexer, *

index update date: gemini://auragem.letz.dev/search/stats

pubnix-aware: ?

content types: ?

content size limit: ?

crawl frequency: ?

crawl delay: 2 seconds

rate limited: yes

redirects: ?

removal after unavailability: ?

filtered URIs: ?

API: no

contact: krixano@mailbox.org

repository: https://gitlab.com/clseibold/auragem_crawler


Archivers


DeLorean Time Machine

(like Kennedy above)

robots.txt user agents: indexer, archiver, *

content types: anything

crawl frequency: weekly


Researchers


Lupa

IPv4: 193.70.85.11

IPv6: 2001:41d0:302:2200::180

robots.txt user agents: ?

index update date: gemini://gemini.bortzmeyer.org/software/lupa/stats.gmi

pubnix-aware: no

content types: ?

content size limit: ?

crawl frequency: ?

crawl delay: ?

rate limited: ?

redirects: yes, but not for the current crawl

removal after unavailability: 46 days

filtered URIs: ?

API: no

contact: stephane+gemini@bortzmeyer.org

repository: https://framagit.org/bortzmeyer/lupa


Webproxies


(To do)


Aggregators


Antenna

IPv4: 109.228.177.104

IPv6: n/a

robots.txt user agents: ?

index update date: manual through user submissions

feed support: Gemini page subscriptions, Atom, RSS, twtxt

content size limit: ?

crawl frequency: manual feed submission with a maximum of 5 minutes until the feed is being crawled

crawl delay: ?

redirects: ?

removal after unavailability: n/a

filtered URIs: user-configurable

API: no

contact: bjorn.warmedal@gmail.com, ew0k@tilde.team

repository: https://notabug.org/tinyrabbit/gemini-antenna

note: feed has to be submitted every time it should be crawled


CAPCOM

IPv4: 168.235.111.58

IPv6: 2604:180:f3::185

robots.txt user agents: ?

index update date: no, but there are post dates

feed support: Gemini page subscriptions, Atom

content size limit: ?

crawl frequency: every 8 hours for 100 randomly selected feeds which change monthly

crawl delay: ?

redirects: ?

removal after unavailability: ?

API: no

contact: none officially published

repository: https://git.sr.ht/~solderpunk/capcom

note: feed has to be submitted once


gmisub aggregate

IPv4: 95.111.237.17

IPv6: 2a02:c207:2038:2970::1

robots.txt user agents: ?

index update date: no, but there are post dates

feed support: Gemini page subscriptions

content size limit: ?

crawl frequency: twice a day

crawl delay: ?

redirects: ?

removal after unavailability: ?

API: no

contact: callum@calcuode.com

repository: https://git.sr.ht/~callum/gmisub

note: feed has to be submitted once, no Ed25519 certificate support


Nytpu’s Comitium Subscriptions

IPv4: 71.19.146.159

IPv6: 2605:2700:0:3:a800:ff:fee7:3f74

robots.txt user agents: ?

index update date: yes

feed support: Gemini page subscriptions, Atom, RSS, and JSON

content size limit: ?

crawl frequency: ?

crawl delay: ?

redirects: ?

removal after unavailability: ?

API: no

contact: alex [at] nytpu.com

repository: https://git.nytpu.com/comitium/


Spacewalk

IPv4: ?

IPv6: ?

robots.txt user agents: ?

index update date: no, but there are post dates

feed support: changes to a given index page

content size limit: ?

crawl frequency: ?

crawl delay: ?

redirects: ?

removal after unavailability: ?

API: no

contact: ‘sloum’ at the host ‘rawtext.club’

repository: https://tildegit.org/sloum/spacewalk


Juntaletras

IPv4: 45.157.179.215

IPv6: n/a

robots.txt user agents: ?

index update date: no, but there are post dates

feed support: ?

content size limit: ?

crawl frequency: every two hours

crawl delay: ?

redirects: ?

removal after unavailability: ?

API: no

contact: ?

repository: ?


Cosmos

Meta-aggregator of BBS, CAPCOM, Antenna, nytpu’s comitium subscriptions, gmisub aggregate, bot en deriva, Spacewalk, The Midnight Pub, Smol Pub, Flounder, Station, Geddit. Some sources (especially the Spacewalk ones) spawn other gemlog-type sources that are then remembered and fetched separately.

IPv4: 85.156.143.233

IPv6: n/a

robots.txt user agents: ?

index update date: no, but there are post dates

content size limit: ?

crawl frequency: 60-90 minutes (randomized to spread the load over time)

crawl delay: ?

redirects: ?

removal after unavailability: ?

filtered URIs: yes

API: no

contact: jaakko.keranen@iki.fi

repository: not open source


Indexers Exposing Content to the Web


GNV Smallweb Index

IPv4: dynamic

IPv6: ?

robots.txt user agents: ?

index update date: no

pubnix-aware: ?

content types: ?

content size limit: ?

crawl frequency: ?

crawl delay: ?

rate limited: ?

redirects: ?

removal after unavailability: ?

filtered URIs: ?

API: no

contact: @louis@emacs.ch (Fediverse)

repository: https://codeberg.org/louis77/gnv

-- Response ended

-- Page fetched on Sat May 11 21:08:53 2024