-- Leo's gemini proxy

-- Connecting to gemini.bortzmeyer.org:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini; lang=en

IRI in Gemini


Currently (january 2021), the specification seems silent about IRI (Internationalized Resource Identifiers, RFC 3987). It just says "<URL> is a UTF-8 encoded absolute URL" which is absurd (URI must be in US-ASCII). Handling IRI would require more than that, as well as practical advices for software authors.


Issue #1 in the specification work

Gemini current specification

RFC 3986 on URI syntax

RFC 3987 on IRI syntax

RFC 5890 on IDN (domain names in Unicode)


What should programs do?


It is not clear what servers and clients should do (send an IRI, or accept IRI but convert it to URI or something else). A test with some clients seem to indicate it does not always work.


Testing server, with an IDN in the name (e with accent).

Testing server, an IRI with an IDN and a non-ASCII character in the path


The server at the end is (january 2021) a Gemserv. The domain name was configured in Punycode ('hostname = "xn--gmeaux-bva.bortzmeyer.org"' in config.toml).


The Gemserv server


Currently (january 2021):


Amfora claims the domain name does not exist (it does exist), "Failed to connect to the server: dial tcp: lookup gémeaux.bortzmeyer.org: no such host."

AV-98 does not protest and sends the IRI but the server I use does not understand it with the above setup

Bombadillo says "Found "é", expected EOF"

Lagrange now works (before that, it said "Failed to communicate with the host. Here is the error message: Failed to look up hostname")


Proposals


Accept IRI as first-class citizens


This is more natural for a new protocol, free of HTTP legacy. Limit Punycode to the minimum (the current state of the domain name tree requires Punycode for DNS lookups).


parse the IRI and extract the domain name

convert it to Punycode

do the DNS lookup

connect to the IP address and send the IRI as request


Many software libraries already do so automatically.


Remaining issues:


certificates (Let's Encrypt will put Punycode in the certificate)

Unicode normalization. What if the client sends NFC and the server is configured with a name in NFD? RFC 5198 says NFC.


RFC 5198 on a canonical Internet form of Unicode


Use Punycode and percent-encoding for everything


Another proposal is to convert all IDNs to Punycode before putting them on the wire, whether in DNS traffic or in Gemini traffic. In that case, the server is configured with a Punycode. Same thing for the path in the URI, use percent-encoding (café → caf%C3%A9). This is how the test server above is configured and it works with Lagrange and Agunua.


Lagrange

Agunua


Do nothing


This page would become illegal, with its IRI. In this proposal, gemtext (text/gemini files) would have to use US-ASCII URI only.


Links


Solderpunk's summary of the three proposals

His #1 solution is my "Do nothing", his #2 is "Use Punycode and percent-encoding for everything" and his #3 is "Accept IRI as first-class citizens".


RFC 3492 On Punycode


RFC 8399, IDN in certificates


The Gemini specification


[Web] The issue in the Go-gemini library

-- Response ended

-- Page fetched on Sun May 5 11:53:53 2024