-- Leo's gemini proxy

-- Connecting to perso.pw:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini;

Port of the week: pup


Author: Solène

Date: 22 April 2021

Tags: internet


Comment on Mastodon


Introduction


Today I will introduce you to the utility "pup" providing CSS selectors filtering for HTML documents. It is a perfect companion to curl to properly fetch only a specific data from an HTML page.


On OpenBSD you can install it with `pkg_add pup` and check its documentation at /usr/local/share/doc/pup/README.md


pup official project


Examples


pup is quite easy to use once you understand the filters. Let's see a few examples to illustrate practical uses.


Fetch my blog titles list to a JSON format


The following command will returns a JSON structure with an array of data from the tags matching "a" tags with in "h4" tags.


curl https://dataswamp.org/~solene/index.html | pup "h4 a json{}"

The output (only an extract here) looks like this:


[
 {
  "href": "2021-04-18-ipfs-bandwidth-mgmt.html",
  "tag": "a",
  "text": "Bandwidth management in go-IPFS"
 },
 {
  "href": "2021-04-17-ipfs-openbsd.html",
  "tag": "a",
  "text": "Introduction to IPFS"
 },
 [truncated]
 {
  "href": "2016-05-02-3.html",
  "tag": "a",
  "text": "How to add a route through a specific interface on FreeBSD 10"
 }
]

Fetch OpenBSD -current specific changes


The page https://www.openbsd.org/faq/current.html contains specific instructions that are required for people using OpenBSD -current and you may want to be notified for changes. Using pup it's easy to make a script to compare your last data to see what has been appended.


curl https://www.openbsd.org/faq/current.html | pup "h3 json{}"

Output sample as JSON, perfect for further processing with a scripting language.


[
 {
  "id": "r20201107",
  "tag": "h3",
  "text": "2020/11/07 - iked.conf \u0026#34;to dynamic\u0026#34;"
 },
 {
  "id": "r20210312",
  "tag": "h3",
  "text": "2021/03/12 - IPv6 privacy addresses renamed to temporary addresses"
 },
 {
  "id": "r20210329",
  "tag": "h3",
  "text": "2021/03/29 - [packages] yubiserve replaced with yubikeyedup"
 }
]

I provide a RSS feed for that


Conclusion


There are many possibilities with pup and I won't list them all. I highly recommend reading the README.md file from the project because it's its documentation and explains the syntax for filtering.

-- Response ended

-- Page fetched on Tue Apr 23 13:33:30 2024