<article>
<h2 id="introduction">Introduction</h2>
<p>This is the start of a short series about the <a
href="https://en.wikipedia.org/wiki/JSON">JSON data format</a>, and how
the command-line tool <a
href="https://en.wikipedia.org/wiki/Jq_(programming_language)"><code>jq</code></a>
can be used to process such data. The plan is to make an open series to
which others may contribute their own experiences using this tool.</p>
<p>The <code>jq</code> command is described on the <a
href="https://jqlang.github.io/jq/">GitHub page</a> as follows:</p>
<blockquote>
<p>jq is a lightweight and flexible command-line JSON processor</p>
</blockquote>
<p>…and as:</p>
<blockquote>
<p><code>jq</code> is like sed for <code>JSON</code> data - you can use
it to slice and filter and map and transform structured data with the
same ease that <code>sed</code>, <code>awk</code>, <code>grep</code> and
friends let you play with text.</p>
</blockquote>
<p>The <code>jq</code> tool is controlled by a programming language
(also referred to as <code>jq</code>), which is very powerful. This
series will mainly deal with this.</p>
<h2 id="json-javascript-object-notation">JSON (JavaScript Object
Notation)</h2>
<p>To begin we will look at <code>JSON</code> itself. It is defined on
the <a href="https://en.wikipedia.org/wiki/JSON">Wikipedia page</a>
thus:</p>
<blockquote>
<p><code>JSON</code> is an open standard file format and data
interchange format that uses human-readable text to store and transmit
data objects consisting of attribute–value pairs and arrays (or other
serializable values). It is a common data format with diverse uses in
electronic data interchange, including that of web applications with
servers.</p>
</blockquote>
<p>The syntax of JSON is defined by <a
href="https://datatracker.ietf.org/doc/html/rfc8259">RFC 8259</a> and by
<a
href="https://www.ecma-international.org/publications-and-standards/standards/ecma-404/">ECMA-404</a>.
It is fairly simple in principle but has some complexity.</p>
<p>JSON’s basic data types are (edited from the Wikipedia page):</p>
<ul>
<li><p><em>Number</em>: a signed decimal number that may contain a
fractional part and may use exponential E notation, but cannot include
non-numbers. (<strong>NOTE</strong>: Unlike what I said in the audio,
there are two values representing non-numbers: <code>'nan'</code> and
infinity: <code>'infinity'</code>.</p></li>
<li><p><em>String</em>: a sequence of zero or more Unicode characters.
Strings are delimited with double quotation marks and support a
backslash escaping syntax.</p></li>
<li><p><em>Boolean</em>: either of the values <code>true</code> or
<code>false</code></p></li>
<li><p><em>Array</em>: an ordered list of zero or more elements, each of
which may be of any type. Arrays use square bracket notation with
comma-separated elements.</p></li>
<li><p><em>Object</em>: a collection of name–value pairs where the names
(also called keys) are strings. Objects are delimited with curly
brackets and use commas to separate each pair, while within each pair
the colon <code>':'</code> character separates the key or name from its
value.</p></li>
<li><p><em>null</em>: an empty value, using the word
<code>null</code></p></li>
</ul>
<h3 id="examples">Examples</h3>
<p>These are the basic data types listed above (same order):</p>
<pre><code>42
"HPR"
true
["Hacker","Public","Radio"]
{ "firstname": "John", "lastname": "Doe" }
null</code></pre>
<h2 id="jq"><code>jq</code></h2>
<p>From the Wikipedia page:</p>
<blockquote>
<p><code>jq</code> was created by Stephen Dolan, and released in October
2012. It was described as being “like sed for JSON data”. Support for
regular expressions was added in jq version 1.5.</p>
</blockquote>
<h3 id="obtaining-jq">Obtaining <code>jq</code></h3>
<p>This tool is available in most of the Linux repositories. For
example, on Debian and Debian-based releases you can install it
with:</p>
<pre><code>sudo apt install jq</code></pre>
<p>See the <a href="https://jqlang.github.io/jq/download/">download
page</a> for the definitive information about available versions.</p>
<h3 id="manual-for-jq">Manual for <code>jq</code></h3>
<p>There is a detailed manual describing the use of the <code>jq</code>
programming language that is used to filter JSON data. It can be found
at <a href="https://jqlang.github.io/jq/manual/"
class="uri">https://jqlang.github.io/jq/manual/</a>.</p>
<h2 id="the-hpr-statistics-page">The HPR statistics page</h2>
<p>This is a collection of statistics about HPR, in the form of JSON
data. We will use this as a moderately detailed example in this
episode.</p>
<p>A link to this page may be found on the HPR <a
href="https://hub.hackerpublicradio.org/calendar.php">Calendar page</a>
close to the foot of the page under the heading <a
href="https://hub.hackerpublicradio.org/calendar.php#workflow"><code>Workflow</code></a>.
The link to the JSON statistics is <a
href="https://hub.hackerpublicradio.org/stats.json"
class="uri">https://hub.hackerpublicradio.org/stats.json</a>.</p>
<p>If you click on this you should see the JSON data formatted for you
by your browser. Different browsers represent this in different
ways.</p>
<p>You can also collect and display this data from the command line,
using <code>jq</code> of course:</p>
<pre><code>$ curl -s https://hub.hackerpublicradio.org/stats.json | jq '.' | nl -w3 -s' '
1 {
2 "stats_generated": 1712785509,
3 "age": {
4 "start": "2005-09-19T00:00:00Z",
5 "rename": "2007-12-31T00:00:00Z",
6 "since_start": {
7 "total_seconds": 585697507,
8 "years": 18,
9 "months": 6,
10 "days": 28
11 },
12 "since_rename": {
13 "total_seconds": 513726307,
14 "years": 16,
15 "months": 3,
16 "days": 15
17 }
18 },
19 "shows": {
20 "total": 4626,
21 "twat": 300,
22 "hpr": 4326,
23 "duration": 7462050,
24 "human_duration": "0 Years, 2 months, 27 days, 8 hours, 47 minutes and 30 seconds"
25 },
26 "hosts": 356,
27 "slot": {
28 "next_free": 8,
29 "no_media": 0
30 },
31 "workflow": {
32 "UPLOADED_TO_IA": "2",
33 "RESERVE_SHOW_SUBMITTED": "27"
34 },
35 "queue": {
36 "number_future_hosts": 7,
37 "number_future_shows": 28,
38 "unprocessed_comments": 0,
39 "submitted_shows": 0,
40 "shows_in_workflow": 15,
41 "reserve": 27
42 }
43 }</code></pre>
<p>The <code>curl</code> utility is useful for collecting information
from links like this. I have used the <code>-s</code> option to ensure
it does not show information about the download process, since it does
this by default. The output is piped to <code>jq</code> which displays
the data in a “pretty printed” form by default, as you see. In this case
I have given <code>jq</code> a minimal filter which causes what it
receives to be printed. The filter is simply <code>'.'</code>. I have
piped the formatted JSON through the <code>nl</code> command to get line
numbers for reference.</p>
<p>The JSON shown here consists of nested JSON <em>objects</em>. The
first opening brace and the last at line 43 define the whole thing as a
single object.</p>
<p>Briefly, the object contains the following:</p>
<ul>
<li>a number called <code>stats_generated</code> (line 2)</li>
<li>an object called <code>age</code> on lines 3-18; this object
contains two strings and two objects</li>
<li>an object called <code>shows</code> on lines 19-25</li>
<li>a number called <code>hosts</code> on line 26</li>
<li>an object called <code>slot</code> on lines 27-30</li>
<li>an object called <code>workflow</code> on lines 31-34</li>
<li>an object called <code>queue</code> on lines 35-42</li>
</ul>
<p>We will look at ways to summarise and reformat such output in a later
episode.</p>
<h2 id="next-episode">Next episode</h2>
<p>I will look at some of the options to <code>jq</code> next time,
though most of them will be revealed as they become relevant.</p>
<p>I will also start looking at <code>jq</code> filters in that
episode.</p>
<h2 id="links">Links</h2>
<ul>
<li>JSON (JavaScript Object Notation):
<ul>
<li><a href="https://en.wikipedia.org/wiki/JSON">Wikipedia page about
JSON</a></li>
<li>Standards:
<ul>
<li><a href="https://datatracker.ietf.org/doc/html/rfc8259">RFC8259: The
JavaScript Object Notation (JSON) Data Interchange Format</a></li>
<li><a
href="https://www.ecma-international.org/publications-and-standards/standards/ecma-404/">ECMA-404:
The JSON data interchange syntax</a></li>
</ul></li>
</ul></li>
</ul>
<ul>
<li><code>jq</code>:
<ul>
<li><a href="https://jqlang.github.io/jq/">GitHub page</a></li>
<li><a href="https://jqlang.github.io/jq/download/">Downloading
<code>jq</code></a></li>
<li><a href="https://jqlang.github.io/jq/manual/">The <code>jq</code>
manual</a></li>
<li><a
href="https://en.wikipedia.org/wiki/Jq_(programming_language)">Wikipedia
page about the jq programming language</a></li>
</ul></li>
</ul>
<ul>
<li><a
href="https://hackerpublicradio.org/correspondents/0201.html">MrX’s</a>
show on using the HPR statistics in JSON: <a
href="https://hackerpublicradio.org/eps/hpr4089/index.html">Modifying a
Python script with some help from ChatGPT</a></li>
</ul>
</article>