An introduction to the magic of jq

May 31, 2022
manav@ente.io

jq is a magical tool for dealing with JSON on the CLI. Like its famous predecessors, sed and awk, it is unfriendly to beginners (or as the neckbeards of old would say, it is just choosy in who it considers as friends).

The unfriendliness comes in large part because of a historical tendency in the man pages of these tools to not provide examples. The man pages of all these are extremely well written, and a pleasure to read (yes, I'm that sort of a person who likes spending Sundays reading man pages). But because of a lack of examples, they are opaque to beginners.

Enough ranting, let's show some of the magic of jq.

The task I wanted to do was, well, compare two lists. First a bit of background: we have a security group on Scaleway that restricts inbound connections to our API server instances. The only entity we want to be able to open connections to these is Cloudflare, which provides the load balancer that api.ente.io resolves to, and then reverse proxies incoming connections to our API server instances. So the IP ranges from which Cloudflare can connect to our instances should be whitelisted in our Scaleway security group.

So my goal is to ensure that the two lists -- the allowed IP ranges in the security group, and Cloudflare's IP ranges -- match.

This is perfect for demonstrating the power of jq. We'll see filtering, joining, sorting, and more. The hope is that with a real example (this is something I actually needed to do), the usefulness of jq is even more apparent.

First, let's start off by getting the ID of our security group. We'll use the Scaleway CLI tool (called scw).

$ scw instance security-group list name=scg-prod
ID                                    NAME      DESCRIPTION
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx  scg-prod  Security group - ...

We want the output in JSON format, for which the scw command provides the handy --output json switch.

$ scw instance security-group list --output json name=scg-prod
[{"id":"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx","name":"scg-prod",...}]

We get big blob of JSON, machine readable, but unreadable to us. Let's prettify it by piping it into jq.

$ scw instance security-group list --output json name=scg-prod | jq
[
  {
    "id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "name": "scg-prod",
    ...
   }
]

If you take nothing else away from this post, just remember that | jq is a quick way to prettify JSON.

Moving on though. We only want the "id" of the security group (we need it for the next command that gets the security group details). For doing this, we can use .id to get jq to only emit the value of the "id" field in the JSON object we give it.

The JSON we see above is an array, but we only care about the first element, so we can use first to get at that. Combining the first (to get the first element), and .id (to get the value of "id" from within that element), we get

$ scw instance security-group list --output json name=scg-prod | jq 'first.id'
"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

We just want the value without the quotes, so let us ask jq to give us raw output by specifying the "--raw-output" flag (or its shorthand, "-r").

$ scw instance security-group list --output json name=scg-prod | jq -r 'first.id'
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Perfect.

Now let's get the details of this security group. We could copy paste the output above, but we're cooler than that. So we'll use the $( ) syntax to run a subcommand whose result is then used as the argument to our top level command.

$ scw instance security-group get "$(scw instance security-group list --output json name=scg-prod | jq -r 'first.id')"
Security Group:
ID                     xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
...

Again we want JSON output, and to make it easier to inspect we want to prettify it by passing it to jq

$ scw instance security-group get --output json "$(scw instance security-group list --output json name=scg-prod | jq -r 'first.id')" | jq
{
  "id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "name": "scg-prod",
  ...
  "Rules": [
    {
      "id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
      "protocol": "TCP",
      "direction": "outbound",
      "action": "drop",
      "ip_range": "0.0.0.0/0",
      "dest_port_from": 465,
      "dest_port_to": null,
      "position": 2,
      "editable": false,
      "zone": "fr-par-1"
    },
    ...

So what we have is an JSON object representing the security group. It has an array called "Rules", each of which is a security group rule to apply.

We're interested only in the rules, so we can ask jq to only give us the Rules array by using

jq '.Rules'

Now, of all the rules we're only interested in the "inbound" rules (not the "outbound" ones). So we ask jq to select only those rules which have "direction" equal to "inbound" by saying select(.direction == "inbound").

What select does is only allow only those objects that satisfy the given criteria to pass, the rest are discarded. However, select works on objects; Rules is an array. So we need a way to apply select to each object in the Rules array. Ta-da! map!

jq '.Rules | map(select(.direction == "inbound"))'

We also want to filter only those rules that are for port 443 (the HTTPS port where our API instances are listening on).

jq '.Rules | map(select(.direction == "inbound" and .dest_port_from == 443))'

Finally, this will give us all the Rules objects that satisfy the condition, but we only want the IP ranges from each rule. So let's add another step in the pipeline to extract only the "ip_range" from each of the filtered rules.

jq '.Rules | map(select(.direction == "inbound" and .dest_port_from == 443)) | map(.ip_range)'

The last two steps are both maps, and they can be combined to still get the same effect.

jq '.Rules | map(select(.direction == "inbound" and .dest_port_from == 443) | .ip_range)'

Let us also sort the ranges (this will simplify the comparison later).

jq '.Rules | map(select(.direction == "inbound" and .dest_port_from == 443) | .ip_range) | sort'

In full context,

$ scw instance security-group get --output json "$(scw instance security-group list --output json name=scg-prod | jq -r 'first.id')" | jq '.Rules | map(select(.direction == "inbound" and .dest_port_from == 443) | .ip_range) | sort'
[
  "103.21.244.0/22",
  "103.22.200.0/22",
  "103.31.4.0/22",
  "104.16.0.0/13",
  "104.24.0.0/14",
  "108.162.192.0/18",
  "131.0.72.0/22",
  "141.101.64.0/18",
  "162.158.0.0/15",
  "172.64.0.0/13",
  "173.245.48.0/20",
  "188.114.96.0/20",
  "190.93.240.0/20",
  "197.234.240.0/22",
  "198.41.128.0/17",
  "2400:cb00::/32",
  "2405:8100::/32",
  "2405:b500::/32",
  "2606:4700::/32",
  "2803:f800::/32",
  "2a06:98c0::/29",
  "2c0f:f248::/32"
]

Brilliant!

So now let us get the other half of the data -- the list of Cloudflare IP addresses. These can be obtained from the Cloudflare API.

$ curl -s 'https://api.cloudflare.com/client/v4/ips'
{"result":{"ipv4_cidrs":...

Again, let's pipe it to jq to get some idea what the JSON structure is

$ curl -s 'https://api.cloudflare.com/client/v4/ips' | jq
{
  "result": {
    "ipv4_cidrs": [
      "173.245.48.0/20",
      ...
    ],
    "ipv6_cidrs": [
      "2400:cb00::/32",
      ...
    ],
    ...
  },
  "success": true,
  "errors": [],
  "messages": []
}

So we're interested in the "ipv4_cidrs" and "ipv6_cidrs" arrays inside the "result" object of the response JSON.

jq has an interesting comma operator that duplicates th incoming stream and sends it to both operations of a comma. So we can in effect get both "ipv4_cidrs" and "ipv6_cidrs" from the same result object.

jq '.result | .ipv4_cidrs,.ipv6_cidrs'

This will, however, give us two arrays -- the "ipv4_cidrs" array, and the "ipv6_cidrs" array. What we want is a single array. We can flatten the two arrays into a single one using the "flatten" operator. However, before flattening we'll need to put both the arrays into a single container array. This is easy - just wrapping them in [].

jq '.result | [.ipv4_cidrs,.ipv6_cidrs] | flatten'

Finally, let's also sort them

jq '.result | [.ipv4_cidrs,.ipv6_cidrs] | flatten | sort'

In full context,

$ curl -s 'https://api.cloudflare.com/client/v4/ips' | jq '.result | [.ipv4_cidrs,.ipv6_cidrs] | flatten | sort'
[
  "103.21.244.0/22",
  "103.22.200.0/22",
  "103.31.4.0/22",
  "104.16.0.0/13",
  "104.24.0.0/14",
  "108.162.192.0/18",
  "131.0.72.0/22",
  "141.101.64.0/18",
  "162.158.0.0/15",
  "172.64.0.0/13",
  "173.245.48.0/20",
  "188.114.96.0/20",
  "190.93.240.0/20",
  "197.234.240.0/22",
  "198.41.128.0/17",
  "2400:cb00::/32",
  "2405:8100::/32",
  "2405:b500::/32",
  "2606:4700::/32",
  "2803:f800::/32",
  "2a06:98c0::/29",
  "2c0f:f248::/32"
]

Yay! So all that's left now is to compare the two lists, which we can do using the diff command.

If the full command that'll come later is too confusing, you can also do it by saving each list to a temporary file, and then comparing the temporary files:

diff /tmp/output-of-scw-instances /tmp/output-of-cf-api

But being the rockstars we are, we'll do everything in a single command by using the "<()" process substition syntax that is supported by many shells. The shell will take the output of whatever is inside "<()", put that in a temporary file, and pass that file to the top level command. Which is sort of what we would've done by hand.

Let's try running the monster, in her full glory.

$ diff <( scw instance security-group get --output json "$(scw instance
security-group list --output json name=scg-prod | jq -r 'first.id')" | jq
'.Rules | map(select(.direction == "inbound" and .dest_port_from == 443) |
.ip_range) | sort' ) <( curl -s 'https://api.cloudflare.com/client/v4/ips' | jq
'.result | [.ipv4_cidrs,.ipv6_cidrs] | flatten | sort' )`
$

There was no output, which is good! It means there was no difference between the two lists.

So our work here is done, let us rest now, for a day another comes.


If you like what you saw here, say hi on Twitter or come hang out with us on Discord. Till then, stay cool and jq 😎