-- Leo's gemini proxy

-- Connecting to gemi.dev:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini

[ANN] Gemini historical snapshot

📧 Messages: 2

🗣️ Authors: 2

📅 First Message: 2020-11-18 03:30

📅 Last Message: 2020-11-18 04:29

1. Michael Lazar (lazar.michael22 (a) gmail.com)

📅 Sent: 2020-11-18 03:30

📧 Message 1 of 2


Greetings,

I'm happy to report that I have finished my effort to create a historical
snapshot of geminispace and upload it to the Internet Archive. I ended up
running three separate crawls in total, spaced over a few months. In total
there were 115,223 unique gemini URLs captured. Here are some general
statistics and download links:

Crawl           | September  | October    | November
---             | ---        | ---        | ---
Date            | 2020-09-24 | 2020-10-31 | 2020-11-07
Size            | 9.3 GB     | 12.9 GB    | 13.5 GB
Domains seen    | 283        | 276        | 314
Total Responses | 51,995     | 71,632     | 65,347
2x Responses    | 43,425     | 61,771     | 56,680

https://archive.org/details/mozz-gemini-crawl-2020-1
https://archive.org/details/mozz-gemini-crawl-2020-2
https://archive.org/details/mozz-gemini-crawl-2020-3

More information on the crawls can be found here:

gemini://mozz.us/archive/

The crawling software and related tools can be found here:

https://github.com/michael-lazar/mozz-archiver

I am also temporarily hosting a mirror of this snapshot on my gemini server.
It works using proxy URLs (which I thought was a neat idea). You can send any
request for a gemini URL to mozz.us:1966, and the server will attempt to
retrieve that URL from the snapshot.

Example using gemget:

$ gemget --proxy mozz.us:1966 -o - gemini://gemini.circumlunar.space/capcom/

Best,
Michael

Link to individual message.

2. colecmac (a) protonmail.com (colecmac (a) protonmail.com)

📅 Sent: 2020-11-18 04:29

📧 Message 2 of 2


This is amazing! As an amateur archivist, I really appreciate this.
I predict geminispace will only grow, and this will be more and more
valuable as time goes on. Will be seeding all three :)

The more I look into this, the more pleased I am. Thanks for doing it
so well.

* Doing multiple crawls
* Uploading to the Internet Archive
* Making the archiving software reproducible and open source
* Writing a detailed description on IA
* Using WARC (first gemini usage afaik)

The proxy server is also a really cool idea, what a cool usage of proxying.
And thanks for the gemget shoutout!

Cheers,
makeworld

Link to individual message.

---

Previous Thread: [ANN] Dʒɛmɪni, a gemini server

Next Thread: Again on feeds in Gemini format

-- Response ended

-- Page fetched on Fri Jun 7 21:27:15 2024