-- Leo's gemini proxy
-- Connecting to gemini.bortzmeyer.org:1965...
-- Connected
-- Sending request
-- Meta line: 20 text/gemini; lang=en
Currently (january 2021), the specification seems silent about IRI (Internationalized Resource Identifiers, RFC 3987). It just says "<URL> is a UTF-8 encoded absolute URL" which is absurd (URI must be in US-ASCII). Handling IRI would require more than that, as well as practical advices for software authors.
It is not clear what servers and clients should do (send an IRI, or accept IRI but convert it to URI or something else). A test with some clients seem to indicate it does not always work.
The server at the end is (january 2021) a Gemserv. The domain name was configured in Punycode ('hostname = "xn--gmeaux-bva.bortzmeyer.org"' in config.toml).
Currently (january 2021):
Amfora claims the domain name does not exist (it does exist), "Failed to connect to the server: dial tcp: lookup gémeaux.bortzmeyer.org: no such host."
AV-98 does not protest and sends the IRI but the server I use does not understand it with the above setup
Bombadillo says "Found "é", expected EOF"
Lagrange now works (before that, it said "Failed to communicate with the host. Here is the error message: Failed to look up hostname")
This is more natural for a new protocol, free of HTTP legacy. Limit Punycode to the minimum (the current state of the domain name tree requires Punycode for DNS lookups).
parse the IRI and extract the domain name
convert it to Punycode
do the DNS lookup
connect to the IP address and send the IRI as request
Many software libraries already do so automatically.
Remaining issues:
certificates (Let's Encrypt will put Punycode in the certificate)
Unicode normalization. What if the client sends NFC and the server is configured with a name in NFD? RFC 5198 says NFC.
Another proposal is to convert all IDNs to Punycode before putting them on the wire, whether in DNS traffic or in Gemini traffic. In that case, the server is configured with a Punycode. Same thing for the path in the URI, use percent-encoding (café → caf%C3%A9). This is how the test server above is configured and it works with Lagrange and Agunua.
This page would become illegal, with its IRI. In this proposal, gemtext (text/gemini files) would have to use US-ASCII URI only.
His #1 solution is my "Do nothing", his #2 is "Use Punycode and percent-encoding for everything" and his #3 is "Accept IRI as first-class citizens".
-- Response ended
-- Page fetched on Sun May 5 11:53:53 2024