-- Leo's gemini proxy

-- Connecting to soviet.circumlunar.space:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini; lang=en

[tech] robots.txt format


From: rwagner at rw-net.de

Date: Wed, 27 Jan 2021 10:40:39 +0100 (CET)


Hi,


simple question:

is the following robots.txt format valid in a form that the "disallow" is applied to all User-agents mentioned before?

---

User-agent: researcher

User-agent: indexer

User-agent: archiver

Disallow: about

---


or do i need to be more chatty?

---

User-agent: researcher

Disallow: about

User-agent: indexer

Disallow: about

User-agent: archiver

Disallow: about

---


kind regards

Ren?


--------

From: Stephane Bortzmeyer

Date: Wed, 27 Jan 2021 11:27:41 +0100


On Wed, Jan 27, 2021 at 10:40:39AM +0100,

Ren? Wagner <rwagner at rw-net.de> wrote

a message of 23 lines which said:


> simple question:


Complicated answers:


> is the following robots.txt format valid in a form that the

> "disallow" is applied to all User-agents mentioned before?


1) There is no standard for robots.txt.


2) There is not yet an "official" adaptation to Gemini, just

proposals.


--------

From: Sean Conner

Date: Wed, 27 Jan 2021 05:56:04 -0500


It was thus said that the Great Ren? Wagner once stated:

> Hi,

>

> simple question:

> is the following robots.txt format valid in a form that the "disallow" is applied to all User-agents mentioned before?

> ---

> User-agent: researcher

> User-agent: indexer

> User-agent: archiver

> Disallow: about

> ---


That will work, but you need to add a leading '/' to the Disallow line:


Disallow: /about


That will match any request starting with '/about', like '/about',

'/aboutthis', '/about/that', etc.


> or do i need to be more chatty?

> ---

> User-agent: researcher

> Disallow: about

> User-agent: indexer

> Disallow: about

> User-agent: archiver

> Disallow: about

> ---


That will work too (same thing about the Disallow: line though). You can

read more about it at <http://www.robotstxt.org/>.


-spc



--------

From: Stephane Bortzmeyer

Date: Wed, 27 Jan 2021 12:12:54 +0100


On Wed, Jan 27, 2021 at 05:56:04AM -0500,

Sean Conner <sean at conman.org> wrote

a message of 33 lines which said:


> That will work too (same thing about the Disallow: line though). You can

> read more about it at <http://www.robotstxt.org/>.


But do note that many Gemini capsules do not follow this specification

but one of the others (typically more complicated).


--------

From: rwagner at rw-net.de

Date: Wed, 27 Jan 2021 15:38:48 +0100 (CET)


Thanks for the replys.


I've opted for the first version at the moment.

Off course no one knows how exactly crawlers out there are implemented or if they obey robots.txt at all.


Atleast i can serve a valid robots.txt now.


cheers

Ren?


--------

-- Response ended

-- Page fetched on Mon Jun 17 19:17:59 2024