-- Leo's gemini proxy

-- Connecting to siiky.srht.site:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini

Pagat Archive

siiky

2022/11/09

2022/11/09

en


Sent an email asking for permission to make a mirror/archive. This time it was for Pagat, a site with tons and tons of card games. And like last time, permission was given provided that I don't make any archives/mirrors public. Fair enough!


Pagat

Content-based Mirrors


My Raspberry Pi has been busy downloading the whole thing:


wget -o download.log -w 30 --random-wait --mirror -k -K -p -i links.txt

The links.txt file was generated from the sitemap.xml with this CHICKEN script:


(import srfi-1 ssax)
(let* ((sitemap (ssax:xml->sxml (current-input-port) '()))
       (entries (cdaddr sitemap))
       (urls (map (o car (cute alist-ref 'http://www.sitemaps.org/schemas/sitemap/0.9:loc <>) cdr) entries)))
  (for-each print urls))

https://www.pagat.com/sitemap.xml


Some details so far:


$ find www.pagat.com/ -type f | wc -l
2941
$ find www.pagat.com/ -type f -iname '*.html' | wc -l
1812
$ du -bchs www.pagat.com/
66M	www.pagat.com/
66M	total

-- Response ended

-- Page fetched on Mon May 13 03:31:10 2024