-- Leo's gemini proxy

-- Connecting to bbs.geminispace.org:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini; charset=utf-8

Small Cosmos fix: paths in entry URLs are now cleaned up so that there are no relative references (`.` or `..`).


This should remove some duplicate entries. Keep an eye out for weirdly malformed/broken URLs, in case I introduced any new bugs with this...


Posted in: s/Cosmos

🕹️ skyjake [mod, sysop]

2023-08-30 · 9 months ago


2 Comments ↓


🚀 stack · 2023-08-30 at 13:47:

A quick thought: instead of worrying about duplicate paths, check for _duplicate content_ by hashing it.


Since you already have to read each text (to scan for a referenced link), a fast FNV1a hash (a mul/xor per character) will stand for its identity, eliminating duplicates. Bernstein''s djb2 is another option, with a shift and two adds.


Love Cosmos, btw; thank you!


🕹️ skyjake [OP/mod...] · 2023-08-30 at 13:55:

Thanks for the suggestion. Content hashing has crossed my mind before, and it would indeed automatically eliminate all duplicates, including mirrored domains where the URLs are actually different. Something to try out in the future...

-- Response ended

-- Page fetched on Sun May 19 19:39:27 2024