-- Leo's gemini proxy

-- Connecting to gemi.dev:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini

Finding PDFs, MP3s, and more in Geminispace with Kennedy

2023-07-06 | #kennedy #search | @Acidus


My Gemini search engine Kennedy now lets you filter a search query to a specific type of file like PDFs, MP3s, or ZIPs. This makes it really easy to specific types of files about a subject.


PDFs about Apple Macintosh

MP3 Music

Dice games for Palm handhelds


To use this new feature, just add a "filetype:[whatever]" modifier to your query, as shown in the examples above.


While other Gemini search engines have had a limited abilities to do this, I improved upon it in several ways:


1. You don't need to know the MIME type of a file type. You can just type "mp3" or "pdf" instead of "audio/mpeg" or "application/pdf".

2. You can specify search terms in addition to file type. Previously TLGS is the only other search engine that lets you do this.

3. Kennedy indexes the text of hyperlinks a file, which allows Kennedy to find more files, with more accurate results, than other search engines.


The "More Info / Archived Copy" link on each result is especially helpful when searching for files. This gives you meta data about the file including links to the pages which link to the file. Visiting those is a great way to get additional context about a file:


Page Info: /~anthonyg/docs/unixpowertools.pdf


How "filetype:" works


As mentioned in a previous post on indexing plain text files, MIME types are notoriously unreliable. Not only can a file have the wrong MIME type, you could have multiple MIME types that all mean the same "type" of file. So searching by MIME type won't find all the files of that type. On top of that, all of this assumes the user even knows the MIME type for a type of file to begin with.


MIME Lies: Indexing Plain Text files in Kennedy


So clearly, MIME types were out.


Instead, I do what Google and other web search engines do, and determine the type of file something is. If you want PDFs, Kennedy should be able to find PDFs for you, regardless of whatever gross or incorrect MIME types various capsules used to server them. Right now I'm using a pretty primitive way to determine file type, but I can always improve my file type detection code to recognize more file types, without you needing to use a different search syntax.


That gets me a better way to filter to certain file types. How about the actually searching? Indexing files, especially non-text files like a ZIP files, is a great example of how you can use the hyperlink nature of Gemini to your advantage. I don't need to index the contents of a PDF to know its the Apple Macintosh business plan. The text used in hyperlinks that point to the file provide a lot of context about the contents of the file.


Preliminary Macintosh Business Plan (1981)


I used the exact same strategy when I rolled out image search for Kennedy last year. To learn more about link text and path structure can be used in search indexes, read that post:


Finding images on Gemini with Kennedy Image Search


Give me Everything!


You don't have to specify a search query. If you want all the mp4 files in Geminispace, you can just do a search with a `filetype` modifier, and no search terms:


All MP4 videos in Geminispace.


Turns out someone is hosting the class zombie movie Night of the Living Dead 🧟


Feedback wanted


I'm always making changes to Kennedy, and much of it based on feedback I get. Give it a try and let me know what you think.


Contact me

-- Response ended

-- Page fetched on Tue May 21 13:08:07 2024