-- Leo's gemini proxy

-- Connecting to idiomdrottning.org:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini; lang=en

xj — HTML to JSON


This, xj, is a Unix filter that reads XML (or permissively parses HTML) and outputs JSON. Perfect for piping directly into jq, jaq, zq, gron or json2tsv.


Usage


wget -qO- https://jqlang.github.io/jq/|xj|jq '..|select(.title?)[][]'

Installation


apt install chicken
chicken-install xj

jq

jaq

zq

gron

json2tsv


Formal Semantics


Elements are objects with one key, the element name, and the value is an array with the children of the element, or an empty array if there aren’t any. (This is to disambiguate elements from text data.)


Iff there are any attributes, an attibute object is listed first among the children, disambiguated from the other children by having a “@” key. The attributes are not in a list, they can be accessed directly.


In XML, an element can have several children with the same name, and in turn have grandchildren. But the same isn’t true for attributes which is why it can have simpler semantics.


Source code


git clone https://idiomdrottning.org/xj

Comparison to xq


There is also xq, which on Debian is in the yq package where it’s called xq-python.


I didn’t know about that when I made xj, but maybe xj was worth making anyway since xq doesn’t put elements in sequences the way xj does to get around JSON’s limitations.


ellen% echo '<p><i>this here</i> is <i>so strange</i></p>'|xj
{"p":
  [{"i":
    ["this here"]},
   " is ",
   {"i":
    ["so strange"]}]}

echo '<p><i>this here</i> is <i>so strange</i></p>'|xq-python
{
  "p": {
    "i": [
      "this here",
      "so strange"
    ],
    "#text": "is"
  }
}

“Yodel, you seem to be talking a little backward!” from Mad Magazine #242

-- Response ended

-- Page fetched on Fri Apr 19 05:59:48 2024