-- Leo's gemini proxy

-- Connecting to midnight.pub:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini

Midnight Pub


Just the Text, Huh?


~starbreaker


Re: "Just Show Me the Text" by m150


> If anyone knows of a proxy I could give a web URL to and receive a simple .txt version back of the article, please let me know! Otherwise, I might be tempted to create one. Maybe a gopher service?


I don't know about a proxy, but I wonder how far @m150 could get with the following command:


$ lynx -dump -nolist ${URL} > ${FILENAME}.txt

If a site is too dependent on JS, this won't work, but if there's text hidden under entirely too much JS this might be enough to extract it. You'll still want to massage it using sed, though.


That's what I did when retrieving and cleaning the Limyaael Rants.



Write a reply


Replies


~every wrote:


Lynx works OK and mine defaults to utf-8. I use a sed filter I built to convert extended ASCII stuff to be US-ASCII compliant. Here is my filter so far:


https://every.sdf.org/.webshare/TXT.txt


~m15o wrote (thread):


Thanks starbreaker! That's actually a very elegant way. Always impressed to see the wonders of piping commands. Someone else mentioned:


textify.it


Which I still haven't tested.

-- Response ended

-- Page fetched on Sat Apr 20 03:59:52 2024