Getting the object array index in jq - arrays

I have a json object that looks like this (prodused by i3-msg -t get_workspaces.
[
{
"name": "1",
"urgent": false
},
{
"name": "2",
"urgent": false
},
{
"name": "something",
"urgent": false
}
]
I am trying to use jq to figure out which index number in the list is based on a select query. jq have something called index(), but it seams to support only strings?
Using something like i3-msg -t get_workspaces | jq '.[] | select(.name=="something")' gives me the object I want. But I want it's index. In this case 2 (starting counting at 0)
Is this possible using jq alone?

So I provided a strategy for a solution to the OP, which OP quickly accepted. Subsequently #peak and #Jeff Mercado offered better and more complete solutions. So I have turned this into a community wiki. Please improve this answer if you can.
A straightforward solution (pointed out by #peak) is to use the builtin function, index:
map(.name == "something") | index(true)
The jq documentation confusingly suggests that index operates on strings, but it operates on arrays as well. Thus index(true) returns the index of the first true in the array of booleans produced by the map. If there is no item satisfying the condition, the result is null.
jq expresions are evaluated in a "lazy" manner, but map will traverse the entire input array. We can verify this by rewriting the above code and introducing some debug statements:
[ .[] | debug | .name == "something" ] | index(true)
As suggested by #peak, the key to doing better is to use the break statement introduced in jq 1.5:
label $out |
foreach .[] as $item (
-1;
.+1;
if $item.name == "something" then
.,
break $out
else
empty
end
) // null
Note that the // is no comment; it is the alternative operator. If the name is not found the foreach will return empty which will be converted to null by the alternative operator.
Another approach is to recursively process the array:
def get_index(name):
name as $name |
if (. == []) then
null
elif (.[0].name == $name) then
0
else
(.[1:] | get_index($name)) as $result |
if ($result == null) then null else $result+1 end
end;
get_index("something")
However this recursive implementation will use stack space proportional to the length of the array in the worst case as pointed out by #Jeff Mercado. In version 1.5 jq introduced Tail Call Optimization (TCO) which will allow us to optimize this away using a local helper function (note that this is minor adaptation to a solution provided by #Jeff Mercado so as to be consistent with the above examples):
def get_index(name):
name as $name |
def _get_index:
if (.i >= .len) then
null
elif (.array[.i].name == $name) then
.i
else
.i += 1 | _get_index
end;
{ array: ., i: 0, len: length } | _get_index;
get_index("something")
According to #peak obtaining the length of an array in jq is a constant time operation, and apparently indexing an array is inexpensive as well. I will try to find a citation for this.
Now let's try to actually measure. Here is an example of measuring the simple solution:
#!/bin/bash
jq -n '
def get_index(name):
name as $name |
map(.name == $name) | index(true)
;
def gen_input(n):
n as $n |
if ($n == 0) then
[]
else
gen_input($n-1) + [ { "name": $n, "urgent":false } ]
end
;
2000 as $n |
gen_input($n) as $i |
[(0 | while (.<$n; [ ($i | get_index(.)), .+1 ][1]))][$n-1]
'
When I run this on my machine, I get the following:
$ time ./simple
1999
real 0m10.024s
user 0m10.023s
sys 0m0.008s
If I replace this with the "fast" version of get_index:
def get_index(name):
name as $name |
label $out |
foreach .[] as $item (
-1;
.+1;
if $item.name == $name then
.,
break $out
else
empty
end
) // null;
Then I get:
$ time ./fast
1999
real 0m13.165s
user 0m13.173s
sys 0m0.000s
And if I replace it with the "fast" recursive version:
def get_index(name):
name as $name |
def _get_index:
if (.i >= .len) then
null
elif (.array[.i].name == $name) then
.i
else
.i += 1 | _get_index
end;
{ array: ., i: 0, len: length } | _get_index;
I get:
$ time ./fast-recursive
1999
real 0m52.628s
user 0m52.657s
sys 0m0.005s
Ouch! But we can do better. #peak mentioned an undocumented switch --debug-dump-disasm which lets you see how jq is compiling your code. With this you can see that modifying and passing the object to _indexof and then extracting the array, length, and index is expensive. Refactoring to just pass the index is a huge improvement, and a further refinement to avoid testing the index against the length makes it competitive with the iterative version:
def indexof($name):
(.+[{name: $name}]) as $a | # add a "sentinel"
length as $l | # note length sees original array
def _indexof:
if ($a[.].name == $name) then
if (. != $l) then . else null end
else
.+1 | _indexof
end
;
0 | _indexof
;
I get:
$ time ./fast-recursive2
null
real 0m13.238s
user 0m13.243s
sys 0m0.005s
So it appears that if each element is equally likely, and you want an average case performance, you should stick with the simple implementation. (C-coded functions tend to be fast!)

The solution originally proposed by #Jim-D using foreach would only work as intended for arrays of JSON objects, and both the originally proposed solutions are very inefficient. Their behavior in the absence of an item satisfying the condition might also have been surprising.
Solution using index/1
If you just want a quick-and-easy solution, you can use the builtin function, index, as follows:
map(.name == "something") | index(true)
If there is no item satisfying the condition, then the result will be null.
Incidentally, if you wanted ALL indices for which the condition is true, then the above is easily transformed into a super-fast solution by simply changing index to indices:
map(.name == "something") | indices(true)
Efficient solution
Here is a generic and efficient function that returns the index (i.e. offset) of the first occurrence of the item in the input array for which (item|f) is truthy (neither null nor false), and null otherwise. (In jq, javascript, and many others, the index into arrays is always 0-based.)
# 0-based index of item in input array such that f is truthy, else null
def which(f):
label $out
| foreach .[] as $x (-1; .+1; if ($x|f) then ., break $out else empty end)
// null ;
Example usage:
which(.name == "something")

Converting an array to entries will give you access to both the index and value in the array of the items. You could use that to then find the value you're looking for and get its index.
def indexof(predicate):
reduce to_entries[] as $i (null;
if (. == null) and ($i.value | predicate) then
$i.key
else
.
end
);
indexof(.name == "something")
This however does not short circuit and will go through the entire array to find the index. You'll want to return as soon as the first index has been found. Taking a more functional approach might be more appropriate.
def indexof(predicate):
def _indexof:
if .i >= .len then
null
elif (.arr[.i] | predicate) then
.i
else
.i += 1 | _indexof
end;
{ arr: ., i: 0, len: length } | _indexof;
indexof(.name == "something")
Note that the arguments are passed in to the inner function in this way to take advantage of some optimizations. Namely to take advantage of TCO, the function must not accept any additional parameters.
A still faster version can be obtained by recognizing that the array and its length do not vary:
def indexof(predicate):
. as $in
| length as $len
| def _indexof:
if . >= $len then null
elif ($in[.] | predicate) then .
else . + 1 | _indexof
end;
0 | _indexof;

Here is another version which seems to be slightly faster than the optimized versions from #peak and #jeff-mercado:
label $out | . as $elements | range(length) |
select($elements[.].name == "something") | . , break $out
IMO it is easier to read although it still relies on the break (to get the first match only).
I was doing 100 iterations on a ~1,000,000 element array (with the last element being the one to match). I only counted the user and kernel times, not the wall clock time. On average this solution took 3.4s, #peak's solution took 3.5s, and #jeff-mercado's took 3.6s. This matched what I was seeing in one off runs although to be fair I did have a run where this solution to 3.6s on average so there is unlikely to be any statistical significant difference between each solution.

Related

Convert messages to arrays using jq

My messages generator output
$ ./messages.sh
{"a":"v1"}
{"b":"v2"}
{"c":"v3"}
...
Required output
$ ./messages.sh | jq xxxxxx
[{"a":"v1"},{"b":"v2"}]
[{"c":"v3"},{"d":"v4"}]
...
Take the first item using ., and the second using input (prepended by try to handle cases of not enough input items). Then, wrap them both into array brackets, and provide the -c option for compact output. jq will work through its whole input one by one (or two by two).
./messages.sh | jq -c '[., try input]'
[{"a":"v1"},{"b":"v2"}]
[{"c":"v3"},{"d":"v4"}]
What if I want more objects in the array than 2? For example, 3, 10, 100?
You can surround the array body with limit, and use inputs instead (note the s) to fetch more than just one item:
./messages.sh | jq -c '[limit(3; ., try inputs)]'
[{"a":"v1"},{"b":"v2"},{"c":"v3"}]
[{"d":"v4"}]
Use slurp with _nwise(2) to chunk into parts of 2:
jq --slurp --compact-output '_nwise(2)' <<< $(./messages.sh)
[{"a":"v1"},{"b":"v2"}]
[{"c":"v3"},{"d":"v4"}]
The --compact-output is to output each array on a single line
Here is a stream-oriented, generic and portable def of nwise:
# Group the items in the given stream into arrays of up to length $n
# assuming $n is a non-negative integer > 0
# Input: a stream
# Output: a stream of arrays no longer than $n
# such that [stream] == ([ nwise(stream; $n) ] | add)
# Notice that there is no assumption about an eos marker.
def nwise(stream; $n):
foreach ((stream | [.]), null) as $x ([];
if $x == null then .
elif length == $n then $x
else . + $x end;
if $x == null and length>0 then .
elif length==$n then .
else empty
end);
For the task at hand, you could use it like so:
nwise(inputs; 2)
with the -n command-line option.

How to split an array into chunks with jq?

I have a very large JSON file containing an array. Is it possible to use jq to split this array into several smaller arrays of a fixed size? Suppose my input was this: [1,2,3,4,5,6,7,8,9,10], and I wanted to split it into 3 element long chunks. The desired output from jq would be:
[1,2,3]
[4,5,6]
[7,8,9]
[10]
In reality, my input array has nearly three million elements, all UUIDs.
There is an (undocumented) builtin, _nwise, that meets the functional requirements:
$ jq -nc '[1,2,3,4,5,6,7,8,9,10] | _nwise(3)'
[1,2,3]
[4,5,6]
[7,8,9]
[10]
Also:
$ jq -nc '_nwise([1,2,3,4,5,6,7,8,9,10];3)'
[1,2,3]
[4,5,6]
[7,8,9]
[10]
Incidentally, _nwise can be used for both arrays and strings.
(I believe it's undocumented because there was some doubt about an appropriate name.)
TCO-version
Unfortunately, the builtin version is carelessly defined, and will not perform well for large arrays. Here is an optimized version (it should be about as efficient as a non-recursive version):
def nwise($n):
def _nwise:
if length <= $n then . else .[0:$n] , (.[$n:]|_nwise) end;
_nwise;
For an array of size 3 million, this is quite performant:
3.91s on an old Mac, 162746368 max resident size.
Notice that this version (using tail-call optimized recursion) is actually faster than the version of nwise/2 using foreach shown elsewhere on this page.
The following stream-oriented definition of window/3, due to Cédric Connes
(github:connesc), generalizes _nwise,
and illustrates a
"boxing technique" that circumvents the need to use an
end-of-stream marker, and can therefore be used
if the stream contains the non-JSON value nan. A definition
of _nwise/1 in terms of window/3 is also included.
The first argument of window/3 is interpreted as a stream. $size is the window size and $step specifies the number of values to be skipped. For example,
window(1,2,3; 2; 1)
yields:
[1,2]
[2,3]
window/3 and _nsize/1
def window(values; $size; $step):
def checkparam(name; value): if (value | isnormal) and value > 0 and (value | floor) == value then . else error("window \(name) must be a positive integer") end;
checkparam("size"; $size)
| checkparam("step"; $step)
# We need to detect the end of the loop in order to produce the terminal partial group (if any).
# For that purpose, we introduce an artificial null sentinel, and wrap the input values into singleton arrays in order to distinguish them.
| foreach ((values | [.]), null) as $item (
{index: -1, items: [], ready: false};
(.index + 1) as $index
# Extract items that must be reused from the previous iteration
| if (.ready | not) then .items
elif $step >= $size or $item == null then []
else .items[-($size - $step):]
end
# Append the current item unless it must be skipped
| if ($index % $step) < $size then . + $item
else .
end
| {$index, items: ., ready: (length == $size or ($item == null and length > 0))};
if .ready then .items else empty end
);
def _nwise($n): window(.[]; $n; $n);
Source:
https://gist.github.com/connesc/d6b87cbacae13d4fd58763724049da58
If the array is too large to fit comfortably in memory, then I'd adopt the strategy suggested by #CharlesDuffy -- that is, stream the array elements into a second invocation of jq using a stream-oriented version of nwise, such as:
def nwise(stream; $n):
foreach (stream, nan) as $x ([];
if length == $n then [$x] else . + [$x] end;
if (.[-1] | isnan) and length>1 then .[:-1]
elif length == $n then .
else empty
end);
The "driver" for the above would be:
nwise(inputs; 3)
But please remember to use the -n command-line option.
To create the stream from an arbitrary array:
$ jq -cn --stream '
fromstream( inputs | (.[0] |= .[1:])
| select(. != [[]]) )' huge.json
So the shell pipeline might look like this:
$ jq -cn --stream '
fromstream( inputs | (.[0] |= .[1:])
| select(. != [[]]) )' huge.json |
jq -n -f nwise.jq
This approach is quite performant. For grouping a stream of 3 million items into groups of 3 using nwise/2,
/usr/bin/time -lp
for the second invocation of jq gives:
user 5.63
sys 0.04
1261568 maximum resident set size
Caveat: this definition uses nan as an end-of-stream marker. Since nan is not a JSON value, this cannot be a problem for handling JSON streams.
here's a simple one that worked for me:
def chunk(n):
range(length/n|ceil) as $i | .[n*$i:n*$i+n];
example usage:
jq -n \
'def chunk(n): range(length/n|ceil) as $i | .[n*$i:n*$i+n];
[range(5)] | chunk(2)'
[
0,
1
]
[
2,
3
]
[
4
]
bonus: it doesn't use recursion and doesn't rely on _nwise, so it also works with jaq.
The below is hackery, to be sure -- but memory-efficient hackery, even with an arbitrarily long list:
jq -c --stream 'select(length==2)|.[1]' <huge.json \
| jq -nc 'foreach inputs as $i (null; null; [$i,try input,try input])'
The first piece of the pipeline streams in your input JSON file, emitting one line per element, assuming the array consists of atomic values (where [] and {} are here included as atomic values). Because it runs in streaming mode it doesn't need to store the entire content in memory, despite being a single document.
The second piece of the pipeline repeatedly reads up to three items and assembles them into a list.
This should avoid needing more than three pieces of data in memory at a time.

Unique Combos from powershell array - No duplicate combos

I'm trying to figure out the best way to get unique combinations from a powershell array. For instance, my array might be
#(B,C,D,E)
I would be hoping for an output like this :
B
C
D
E
B,C
B,D
B,E
C,D
C,E
D,E
B,C,D
C,D,E
B,C,D,E
I do not want re-arranged combos. If combo C,D exists already then I do not want combo D,C. It's redundant for my purposes.
I looked into the functions here : Get all combinations of an array
But they aren't what I want. I've been working on figuring this out myself, but have spent quite a bit of time without success. I thought I'd ask the question here so that if someone else already know I'm not wasting my time.
Thanks!
This is an adaptation from a solution for a C# class I took that asked this same question. For any set find all subsets, including the empty set.
function Get-Subsets ($a){
#uncomment following to ensure only unique inputs are parsed
#e.g. 'B','C','D','E','E' would become 'B','C','D','E'
#$a = $a | Select-Object -Unique
#create an array to store output
$l = #()
#for any set of length n the maximum number of subsets is 2^n
for ($i = 0; $i -lt [Math]::Pow(2,$a.Length); $i++)
{
#temporary array to hold output
[string[]]$out = New-Object string[] $a.length
#iterate through each element
for ($j = 0; $j -lt $a.Length; $j++)
{
#start at the end of the array take elements, work your way towards the front
if (($i -band (1 -shl ($a.Length - $j - 1))) -ne 0)
{
#store the subset in a temp array
$out[$j] = $a[$j]
}
}
#stick subset into an array
$l += -join $out
}
#group the subsets by length, iterate through them and sort
$l | Group-Object -Property Length | %{$_.Group | sort}
}
Use like so:
PS C:>Get-Subsets #('b','c','d','e')
b
c
d
e
bc
bd
be
cd
ce
de
bcd
bce
bde
cde
bcde
Note that computational costs go up exponentially with the length of the input array.
Elements SecondstoComplete
15 46.3488228
14 13.4836299
13 3.6316713
12 1.2542701
11 0.4472637
10 0.1942997
9 0.0867832
My tired attempt at this. I did manage to get it to produce the expected results but how it does it is not as elegant. Uses a recursive functionality.
Function Get-Permutations{
Param(
$theInput
)
$theInput | ForEach-Object{
$element = $_
$sansElement = ($theInput | Where-Object{$_ -ne $element})
If($sansElement.Count -gt 1){
# Build a collection of permutations using the remaining elements that were not isolated in this pass.
# Use the single element since it is a valid permutation
$perms = ,$element
For($elementIndex = 0;$elementIndex -le ($sansElement.Count - 1);$elementIndex++){
$perms += ,#(,$element + $sansElement[0..$elementIndex] | sort-object)
}
# For loop does not send to output properly so that is the purpose of collecting the results of this pass in $perms
$perms
# If there are more than 2 elements in $sansElement then we need to be sure they are accounted for
If($sansElement -gt 2){Get-Permutations $sansElement}
}
}
}
Get-Permutations B,C,D,E | %{$_ -join ","} | Sort-Object -Unique
I hope I can explain myself clearly....So each pass of the function will take an array. Each individual element of that array will be isolated from the rest of the array which is represented by the variables $element and $sansElement.
Using those variables we build individual and progressively larger arrays composing of those elements. Let this example show using the array 1,2,3,4
1
1,2
1,2,3
1,2,3,4
The above is done for each "number"
2
2,1
2,1,3
2,1,3,4
and so forth. If the returned array contains more that two elements (1,2 would be the same as 2,1 in your example so we don't care about pairs beyond one match) we would take that array and run it through the same function.
The real issue is that the logic here (I know this might be hard to swallow) creates several duplicates. I suppose you could create a hashtable instead which I will explore but it does not remove the logic flaw.
Regardless of me beating myself up as long as you don't have thousands of elements the process would still produce results.
Get-Permutations would return and array of arrays. PowerShell would display that one element per line. You asked for comma delimited output which is where -join comes in. Sort-Object -Unique takes those sorted string an discards the duplicates.
Sample Output
B
B,C
B,C,D
B,C,D,E
B,C,E #< Missing from your example output.
B,D
B,D,E #< Missing from your example output.
B,E
C
C,D
C,D,E
C,E
D
E

Create object from array of keys and values

I've been banging my head against the wall for several hours on this and just can't seem to find a way to do this. I have an array of keys and an array of values, how can I generate an object? Input:
[["key1", "key2"], ["val1", "val2"]]
Output:
{"key1": "val1", "key2": "val2"}
Resolved this on github:
.[0] as $keys |
.[1] as $values |
reduce range(0; $keys|length) as $i ( {}; . + { ($keys[$i]): $values[$i] })
The current version of jq has a transpose filter that can be used to pair up the keys and values. You could use it to build out the result object rather easily.
transpose | reduce .[] as $pair ({}; .[$pair[0]] = $pair[1])
Just to be clear:
(0) Abdullah Jibaly's solution is simple, direct, efficient and generic, and should work in all versions of jq;
(1) transpose/0 is a builtin in jq 1.5 and has been available in pre-releases since Oct 2014;
(2) using transpose/0 (or zip/0 as defined above), an even shorter but still simple, fast, and generic solution to the problem is:
transpose | map( {(.[0]): .[1]} ) | add
Example:
$ jq 'transpose | map( {(.[0]): .[1]} ) | add'
Input:
[["k1","k2","k3"], [1,2,3] ]
Output:
{
"k1": 1,
"k2": 2,
"k3": 3
}
Scratch this, it doesn't actually work for any array greater than size 2.
[map(.[0]) , map(.[1])] | map({(.[0]):.[1]}) | add
Welp, I thought this would be easy, having a little prolog experience... oh man. I ended up banging my head against a wall too. Don't think I'll ever use jq ever again.
Here is a solution which uses reduce with a state object holding an iteration index and a result object. It iterates over the keys in .[0] setting corresponding values in the result from .[1]
.[1] as $v
| reduce .[0][] as $k (
{idx:0, result:{}}; .result[$k] = $v[.idx] | .idx += 1
)
| .result

Array.Find and IndexOf for multiple elements that are exactly the same object

I have trouble of getting index of the current element for multiple elements that are exactly the same object:
$b = "A","D","B","D","C","E","D","F"
$b | ? { $_ -contains "D" }
Alternative version:
$b = "A","D","B","D","C","E","D","F"
[Array]::FindAll($b, [Predicate[String]]{ $args[0] -contains "D" })
This will return:
D
D
D
But this code:
$b | % { $b.IndexOf("D") }
Alternative version:
[Array]::FindAll($b, [Predicate[String]]{ $args[0] -contains "D" }) | % { $b.IndexOf($_) }
Returns:
1
1
1
so it's pointing at the index of the first element. How to get indexes of the other elements?
You can do this:
$b = "A","D","B","D","C","E","D","F"
(0..($b.Count-1)) | where {$b[$_] -eq 'D'}
1
3
6
mjolinor's answer is conceptually elegant, but slow with large arrays, presumably due to having to build a parallel array of indices first (which is also memory-inefficient).
It is conceptually similar to the following LINQ-based solution (PSv3+), which is more memory-efficient and about twice as fast, but still slow:
$arr = 'A','D','B','D','C','E','D','F'
[Linq.Enumerable]::Where(
[Linq.Enumerable]::Range(0, $arr.Length),
[Func[int, bool]] { param($i) $arr[$i] -eq 'D' }
)
While any PowerShell looping solution is ultimately slow compared to a compiled language, the following alternative, while more verbose, is still much faster with large arrays:
PS C:\> & { param($arr, $val)
$i = 0
foreach ($el in $arr) { if ($el -eq $val) { $i } ++$i }
} ('A','D','B','D','C','E','D','F') 'D'
1
3
6
Note:
Perhaps surprisingly, this solution is even faster than Matt's solution, which calls [array]::IndexOf() in a loop instead of enumerating all elements.
Use of a script block (invoked with call operator & and arguments), while not strictly necessary, is used to prevent polluting the enclosing scope with helper variable $i.
The foreach statement is faster than the Foreach-Object cmdlet (whose built-in aliases are % and, confusingly, also foreach).
Simply (implicitly) outputting $i for each match makes PowerShell collect multiple results in an array.
If only one index is found, you'll get a scalar [int] instance instead; wrap the whole command in #(...) to ensure that you always get an array.
While $i by itself outputs the value of $i, ++$i by design does NOT (though you could use (++$i) to achieve that, if needed).
Unlike Array.IndexOf(), PowerShell's -eq operator is case-insensitive by default; for case-sensitivity, use -ceq instead.
It's easy to turn the above into a (simple) function (note that the parameters are purposely untyped, for flexibility):
function get-IndicesOf($Array, $Value) {
$i = 0
foreach ($el in $Array) {
if ($el -eq $Value) { $i }
++$i
}
}
# Sample call
PS C:\> get-IndicesOf ('A','D','B','D','C','E','D','F') 'D'
1
3
6
You would still need to loop with the static methods from [array] but if you are still curious something like this would work.
$b = "A","D","B","D","C","E","D","F"
$results = #()
$singleIndex = -1
Do{
$singleIndex = [array]::IndexOf($b,"D",$singleIndex + 1)
If($singleIndex -ge 0){$results += $singleIndex}
}While($singleIndex -ge 0)
$results
1
3
6
Loop until a match is not found. Assume the match at first by assigning the $singleIndex to -1 ( Which is what a non match would return). When a match is found add the index to a results array.

Resources