-- Leo's gemini proxy

-- Connecting to rawtext.club:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini


<-- back to the mailing list

Gemini Archiving and WARC


Charles E. Lehner cel at celehner.com

Wed Sep 2 00:43:34 BST 2020


- - - - - - - - - - - - - - - - - - -

Hi Gemini List,


Has anyone thought about, or implemented, archiving of Gemini content/traffic?


WARC (Web ARChive)¹ is a standard format used for web archiving. It uses text headers for metadata like in HTTP and email. It looks to me like WARC could be adapted for Gemini. The WARC spec supports multiple URI schemes, although it doesn't specify any other than http/https, ftp, and dns². Bespoke formats could also be used, of course, or just downloading files wget-style, but using a standard format could allow for interop with "the WARC ecosystem"³.


Archive Team⁴ has also worked on archiving non-HTTP protocols like FTP⁵ and Gopher⁶.


I think there is an opportunity for people to maintain high-quality archives of Gemini content, like what the Internet Archive⁷ and archive.today⁸ do for the HTTP(S) Web. Now is a good time to start, while many of the original Gemini hosts⁹ are still online.


Regards,Charles E. Lehner


¹ https://en.wikipedia.org/wiki/Web_ARChive² https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/#ftp-scheme³ https://www.archiveteam.org/index.php?title=The_WARC_Ecosystem⁴ https://www.archiveteam.org/ https://en.wikipedia.org/wiki/Archive_Team⁵ https://www.archiveteam.org/index.php?title=FTP⁶ https://www.archiveteam.org/index.php?title=Gopher⁷ https://en.wikipedia.org/wiki/Internet_Archive https://archive.org/⁸ https://archive.today https://en.wikipedia.org/wiki/Archive.today⁹ gemini://gemini.circumlunar.space/servers/-------------- next part --------------A non-text attachment was scrubbed...Name: not availableType: application/pgp-signatureSize: 833 bytesDesc: OpenPGP digital signatureURL: <https://lists.orbitalfox.eu/archives/gemini/attachments/20200901/916693f1/attachment.sig>


-- Response ended

-- Page fetched on Thu May 2 05:19:21 2024