gemini://zaibatsu.circumlunar.space/~solderpunk/gemlog/why-not-just-use-a-subset-of-http-and-html.gmi

Based on stuff I've seen posted about Gemini around the web (sadly a lot of it negative! 😞), this question is perhaps the biggest gap in our FAQ. Many people are confused as to why it's worth creating a new protocol to address perceived problems with optional, non-essential features of the web. Just because websites *can* track users and run CPU-hogging Javascript and pull in useless multi-megabyte header images or even larger autoplaying videos, doesn't mean they *have* to. Why not just build non-evil websites using the existing technology, instead of building something new from the ground up?

Of course this is possible. "The Gemini experience" is roughly equivalent to a subset of HTTP where the only method is GET, the only request header is "Host" and the only response header is "Content-type", serving a subset of HTML where the only tags are <p>, <pre>, <a>, <h1> through <h3>, <ul> and <li> and <blockquote> - and the https://gemini.circumlunar.space website offers pretty much this experience. It runs on Shizaru, a webserver I wrote which is designed to make it easy for pubnix admins and the like to give their users webspace which can only be used to serve this kind of website. I *know* it can be done.

But it's not, IMHO, a real solution to the problem I want to solve. I very much want the Gemini FAQ to have an answer to this question, but I'm not yet sure how to formulate a clear and concise answer. So, in my customary fashion, here's a long and rambling answer that tries to cover my thoughts on this. I will try to distill a shorter FAQ answer out of it soon. If you think there are different, or better, arguments to be made here, please let me (and the rest of Geminispace!) know.

To my mind, the problem with deciding upon a strictly limited subset of HTTP and HTML and slapping a label on it (let's say "SafeWeb") and calling it a day is that it would do almost nothing to create a clearly demarcated space where people can go to consume *only* that kind of content in *only* that kind of way, which is what I think we really want. There's simply no way to know in advance whether fetching any given https:// URL will yield SafeWeb content or UnsafeWeb content. Even a website which claims to be part of the SafeWeb, and which superficially *looks* like it must only be using SafeWeb-approved technologies, could actually be serving you tracking E-Tags or doing other things which are invisible to, but certainly not harmless to, the user. Mainstream browsers like Firefox and Chrome simply do not offer convenient, fine-grained control over what websites can do to you (a constantly shifting ecosystem of a dozen third-party plugins which are prone to compatibility problems is the best you can hope for), so if you ever visit an even slightly UnsafeWeb page, you will suffer the consequences. Do you really want to manually inspect the headers and source code of every alleged SafeWebsite you visit to make sure everybody is playing by the rules?

Now, there's nothing stopping people from writing their own web browsers which refuse to implement any particular piece of HTTP or HTML which they disagree with, yielding a guaranteed SafeWeb experience (although such an undertaking would be an order of magnitude more work than writing a fully featured Gemini client). Supposing you had such a browser, what would you do with it? The overwhelming majority of websites would not render correctly on it. How do you find the sites authored by like-minded SafeWeb advocates which are designed to work nicely in such a browser? You certainly can't ask Google to only show you SafeWeb results! A hypothetical SafeWeb community could setup a wiki where people can share links to SafeWeb sites, but even if nobody deliberately submitted UnsafeWeb links to this list maliciously (which they surely would), sites which *were* SafeWebsites at the time of addition could become UnsafeWebsites in the future. Safeweb status is inherently unstable by virtue of being a subset of something greater - people will start off building SafeWebsites but then later decide that "SafeWeb plus just these one or two extra tags that I really want and promise to use responsibly!" is "SafeEnoughWeb". The verification and maintenance burden of providing a list of truly SafeWebsites would be immense.

Even assuming you had a list with hundreds of guaranteed SafeWeb sites to explore, each one you visit will itself contain links, and some of those links will be to other parts of the SafeWeb while other links will take you back into the UnsafeWeb. Expecting authors of SafeWebsites to manually annotate every link in their pages to indicate this distinction replicates the wiki maintenance burden described above for every participating SafeWeb author, which even less feasible. Maybe the community could write a web crawler which programmatically validated whether or not each webpage it found was SafeWeb-compliant, and publish a big list of approved pages in a machine-readable format, which the new web browser the same community wrote from scratch could automatically consult, so that links which take you out of the SafeWeb could be displayed in a different colour from links which keep you inside it???

All of this is an *insane* quantity of tedious and error-prone work in order to do a bad job of replicating what simple-by-design protocols like Gopher or Gemini offer at a drastically reduced cost of entry: a clearly defined online space, distinct from the web, where you know for sure and in advance that everybody is playing by the same rules. When you explore Geminispace, you can follow a link to a domain you've never heard of before and there is no matter of worry or trust regarding whether or not that transaction will violate your privacy, or gratuitously waste your bandwidth, or cause your laptop fan to spin up, or start playing music even though you're in a quiet room. You simply know that it won't happen. When all somebody can do to you is send you some text and some links you can choose to follow or not, you can let your guard down. You can cruise around Geminispace freely and fearlessly, reading anything that takes your fancy. This is *tremendously* psychologically liberating compared to surfing the web in a mainstream browser and hoping that a huge pile of plugins will protect you like they're supposed to and wondering which ones it might be safe to temporarily disable when a page you'd really like to read isn't rendering properly because of them. It's like riding a bike through a park instead of driving a main battle tank through a minefield while trying to stick to a very narrow and poorly marked safe corridor. Not only is the bike ride a more pleasant experience, the fact that you can easily build your own bike from scratch over a weekend, just the way you like it, even if you've never done that kind of thing before is *tremendously* empowering.

Why not just use a subset of HTTP and HTML?