-- Leo's gemini proxy
-- Connecting to michaelnordmeyer.com:1965...
-- Connected
-- Sending request
-- Meta line: 20 text/gemini;lang=en-US
Here is some info about different Geminispace crawlers for people, who look into their log files every now and then and want to know what’s going on. Ideally, I would like to have the respective crawler owner publish this information on their crawler’s sites prominently in one place, so I can delete this page.
The contact info is from the linked sites excactly how it is written there. Contact me, if you want your personal info to be changed or removed, or if you have a crawler and want to publish missing information.
While the amount of indexed pages and capsules is not strictly necessary, as a user I would appreciate this information to make an informed choice about which search engine to use. It is omitted here, because the numbers may change daily.
Note: `pubnix-aware` means `example.com/~foo` and `example.com/~bar` are treated as different sites with each having its own `robots.txt`. That allows for `example.com/~foo` being equivalent to `foo.example.com`. Additionally, this allows for telling pages and people’s Gemini capsules apart to have better statistics.
Last update: 2024-03-22
WILL SHUTDOWN 2024-06-01
IPv4: 82.165.79.210
IPv6: 2a02:247a:207:8e00:1::1
robots.txt user agents: gus, indexer, *
index update date: yes (but might not have crawled the whole Geminispace), gemini://geminispace.info/statistics
pubnix-aware: yes
content types: text/gemini, text/plain, text/markdown
content size limit: 10 MB text
crawl frequency: Every three days beginning on the first of the month
crawl delay: ?
rate limited: ?
redirects: yes
removal after unavailability: 1 month
filtered URIs: gemini://geminispace.info/documentation/filters
API: no
contact: spacecaptain@geminispace.info or ~rwa/geminispace.info@lists.sr.ht
repository: https://sr.ht/~rwa/geminispace.info
IPv4: 64.149.155.184
IPv6: 2600:1700:1731:d0f:35a7:42d4:c71f:a02b
robots.txt user agents: indexer, *
index update date: gemini://kennedy.gemi.dev/stats
pubnix-aware: ?
content types: text/gemini, text/plain, images
content size limit: 10 MB
crawl frequency: monthly
crawl delay: 1.5 s
rate limited: yes
redirects: yes
removal after unavailability: immediately with the next crawl
filtered URIs: yes
API: no
contact: acidus at gemi dot dev
repository: https://github.com/acidus99/Kennedy
IPv4: 23.88.52.182
IPv6: 2a01:4f8:c0c:8d14::1
robots.txt user agents: tlgs, indexer, *
index update date: gemini://tlgs.one/statistics
pubnix-aware: ?
content types: text/gemini, text/plain, text/markdown, text/x-rst, plaintext
content size limit: 2.5 MB text
crawl frequency: ?
crawl delay: ?
rate limited: ?
redirects: yes, max. 5
removal after unavailability: 1 month
filtered URIs: yes
API: limited, gemini://tlgs.one/doc/api
contact: pilot \at tlgs.one
repository: https://github.com/marty1885/tlgs
IPv4: n/a
IPv6: 2a01:4f9:c012:c6f7::1
robots.txt user agents: elektito/gemplex, crawler, indexer, researcher
index update date: no
pubnix-aware: ?
content types: ?
content size limit: ?
crawl frequency: ?
crawl delay: 1 s
rate limited: yes
redirects: 5
removal after unavailability: no
filtered URIs: yes
API: no
contact: mostafa@sepent.com
repository: https://git.sr.ht/~elektito/gemplex
IPv4: 116.202.128.144
IPv6: 2a01:4f8:231:482b::2
robots.txt user agents: hashtags, indexer, *
index update date: gemini://freeshell.de/hashtags/stats.gmi
pubnix-aware: no
content types: text/gemini
content size limit: none
crawl frequency: once
crawl delay: 5 s, but from a randomized queue of all indexed pages
rate limited: on “44 SLOW DOWN” no more pages will be requested
redirects: not handled correctly
removal after unavailability: no
filtered URIs: yes
API: no
contact: jbanana at dreamwidth dot org
repository: not open source
IPv4: 67.60.44.127
IPv6: n/a
robots.txt user agents: indexer, *
index update date: gemini://auragem.letz.dev/search/stats
pubnix-aware: ?
content types: ?
content size limit: ?
crawl frequency: ?
crawl delay: 2 seconds
rate limited: yes
redirects: ?
removal after unavailability: ?
filtered URIs: ?
API: no
contact: krixano@mailbox.org
repository: https://gitlab.com/clseibold/auragem_crawler
(like Kennedy above)
robots.txt user agents: indexer, archiver, *
content types: anything
crawl frequency: weekly
IPv4: 193.70.85.11
IPv6: 2001:41d0:302:2200::180
robots.txt user agents: ?
index update date: gemini://gemini.bortzmeyer.org/software/lupa/stats.gmi
pubnix-aware: no
content types: ?
content size limit: ?
crawl frequency: ?
crawl delay: ?
rate limited: ?
redirects: yes, but not for the current crawl
removal after unavailability: 46 days
filtered URIs: ?
API: no
contact: stephane+gemini@bortzmeyer.org
repository: https://framagit.org/bortzmeyer/lupa
(To do)
IPv4: 109.228.177.104
IPv6: n/a
robots.txt user agents: ?
index update date: manual through user submissions
feed support: Gemini page subscriptions, Atom, RSS, twtxt
content size limit: ?
crawl frequency: manual feed submission with a maximum of 5 minutes until the feed is being crawled
crawl delay: ?
redirects: ?
removal after unavailability: n/a
filtered URIs: user-configurable
API: no
contact: bjorn.warmedal@gmail.com, ew0k@tilde.team
repository: https://notabug.org/tinyrabbit/gemini-antenna
note: feed has to be submitted every time it should be crawled
IPv4: 168.235.111.58
IPv6: 2604:180:f3::185
robots.txt user agents: ?
index update date: no, but there are post dates
feed support: Gemini page subscriptions, Atom
content size limit: ?
crawl frequency: every 8 hours for 100 randomly selected feeds which change monthly
crawl delay: ?
redirects: ?
removal after unavailability: ?
API: no
contact: none officially published
repository: https://git.sr.ht/~solderpunk/capcom
note: feed has to be submitted once
IPv4: 95.111.237.17
IPv6: 2a02:c207:2038:2970::1
robots.txt user agents: ?
index update date: no, but there are post dates
feed support: Gemini page subscriptions
content size limit: ?
crawl frequency: twice a day
crawl delay: ?
redirects: ?
removal after unavailability: ?
API: no
contact: callum@calcuode.com
repository: https://git.sr.ht/~callum/gmisub
note: feed has to be submitted once, no Ed25519 certificate support
IPv4: 71.19.146.159
IPv6: 2605:2700:0:3:a800:ff:fee7:3f74
robots.txt user agents: ?
index update date: yes
feed support: Gemini page subscriptions, Atom, RSS, and JSON
content size limit: ?
crawl frequency: ?
crawl delay: ?
redirects: ?
removal after unavailability: ?
API: no
contact: alex [at] nytpu.com
repository: https://git.nytpu.com/comitium/
IPv4: ?
IPv6: ?
robots.txt user agents: ?
index update date: no, but there are post dates
feed support: changes to a given index page
content size limit: ?
crawl frequency: ?
crawl delay: ?
redirects: ?
removal after unavailability: ?
API: no
contact: ‘sloum’ at the host ‘rawtext.club’
repository: https://tildegit.org/sloum/spacewalk
IPv4: 45.157.179.215
IPv6: n/a
robots.txt user agents: ?
index update date: no, but there are post dates
feed support: ?
content size limit: ?
crawl frequency: every two hours
crawl delay: ?
redirects: ?
removal after unavailability: ?
API: no
contact: ?
repository: ?
Meta-aggregator of BBS, CAPCOM, Antenna, nytpu’s comitium subscriptions, gmisub aggregate, bot en deriva, Spacewalk, The Midnight Pub, Smol Pub, Flounder, Station, Geddit. Some sources (especially the Spacewalk ones) spawn other gemlog-type sources that are then remembered and fetched separately.
IPv4: 85.156.143.233
IPv6: n/a
robots.txt user agents: ?
index update date: no, but there are post dates
content size limit: ?
crawl frequency: 60-90 minutes (randomized to spread the load over time)
crawl delay: ?
redirects: ?
removal after unavailability: ?
filtered URIs: yes
API: no
contact: jaakko.keranen@iki.fi
repository: not open source
IPv4: dynamic
IPv6: ?
robots.txt user agents: ?
index update date: no
pubnix-aware: ?
content types: ?
content size limit: ?
crawl frequency: ?
crawl delay: ?
rate limited: ?
redirects: ?
removal after unavailability: ?
filtered URIs: ?
API: no
contact: @louis@emacs.ch (Fediverse)
repository: https://codeberg.org/louis77/gnv
-- Response ended
-- Page fetched on Sat May 11 21:08:53 2024