How to modify array under specific JSONB key in PostgreSQL? - arrays

We're storing various heterogeneous data in a JSONB column called ext and under some keys we have arrays of values. I know how to replace the whole key (||). If I want to add one or two values I still need to extract the original values (that would be ext->'key2' in the example lower) - in some cases this may be too many.
I realize this is trivial problem in relational world and that PG still needs to overwrite the whole row anyway, but at least I don't need to pull the unchanged part of the data from DB to the application and push them back.
I can construct the final value of the array in the select, but I don't know how to merge this into the final value of ext so it is usable in UPDATE statement:
select ext, -- whole JSONB
ext->'key2', -- JSONB array
ARRAY(select jsonb_array_elements_text(ext->'key2')) || array['asdf'], -- array + concat
ext || '{"key2":["new", "value"]}' -- JSONB with whole "key2" key replaced (not what I want)
from (select '{"key1": "val1", "key2": ["val2-1", "val2-2"]}'::jsonb ext) t
So the question: How to write such a modification into the UPDATE statement?
Example uses jsonb_*_text function, some values are non-textual, e.g. numbers, that would need non _text function, but I know what type it is when I construct the query, no problem here.
We also need to remove the values from the arrays as well, in which case if the array is completely empty we would like to remove the key from the JSONB altogether.
Currently we achieve this with this expression in the UPDATE statement
coalesce(ext, '{}')::jsonb - <array of items to delete> || <jsonb with additions> (<parts> are symbolic here, we use single JDBC parameter for each value). If the final value of the array is empty, the key for that value goes into the first array, otherwise the final value appears int he JSONB after || operator.
To be clear:
I know the path to the JSONB value I want to change - it's actually always a single key on the top level.
I know whether that key stores single value (no problem for those) or array (that's where I don't have satisfying solution yet), because we know the definitions of each key, this is stored separately.
I need to add and/or remove multiple values I provide, but I don't know what is in the array at that moment - that's the whole point, so that application doesn't need to read it.
I may also want to replace the whole array under the key, but this is trivial case and I know how to do this.
Finally, if removal results in an empty array, we'd like to get rid of the key as well.
I could probably write a function doing it all if necessary but I've not committed to that yet.
Obviously, restructuring the data out of that JSONB column is not an option. Eventually I want to make it more flexible and data with these characteristics would go to some other table, but at this moment we're not able to do it with our application.

You can use jsonb_set to modify an array which is placed under some key.
To update a value in an array you should specify a zero-based index within the array in the below example.
To add a new element on a start/end - specify negative/positive index which is greter than array's length.
UPDATE <table>
SET ext = jsonb_set(ext, '{key2, <index>}', '5')
WHERE <condition>

Related

How to concatenate multiple ranges within a Match function

I have a list of values that I would like to match against the combination of multiple ranges.
So, for example, my ranges are A1:A100 and B1:B100.
Instead of concatenating A with B in a new column C, i.e.
CONCAT(A1,B1)...CONCAT(A100,B100)
and then matching my value against that new column - I would like to do something like this:
MATCH(value,CONCATENATE(A1:B100),0)
And copy this down a column near my list of values.
I have a feeling this can be done with some sort of array formula...
Yes as an array formula:
=MATCH(value,$A$1:$A$100 & $B$1:$B$100,0)
Being an array formula it must be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode.
Though they may seem similar in approach they are not. CONCATENATE will return a string not an array to the MATCH with all 200 values in one long string. Where the above will return 100 values, each row concatenated, as an array which can be used to search.
One further note, If performance becomes a issue, Array formulas are inherently slower, adding the helper column and using a regular MATCH will improve the responsiveness.
This should work, basically you just need to concatenate it yourself using &
=MATCH(D1,A1:A10&B1:B10,0)
D1 is the value you're trying to look for.
This is an array, so remember to hit Ctrl+Shift+Enter when you input it.

Find Maximum of Array of Index/Match from Concatenated String

Given the following table:
I would like the Actual Start to show the Preferred Start value, if the Depends column is empty (easy).
If the Depends column contains one or more comma-separated Id values, I would like to split on comma, look up the array of "Preferred Start" values based on the corresponding Id value, and then select the maximum value.
The following formula will correctly split the "Depends" cell:
=FILTERXML("<t><s>"&SUBSTITUTE(G6,",","</s><s>")&"</s></t>","//s")
Which can be verified, by using an array-valued MAX function (this returns "4"):
={MAX((FILTERXML("<t><s>"&SUBSTITUTE(G6,",","</s><s>")&"</s></t>","//s")))}
However, what I really want to do is:
={MAX(INDEX(Table1[Preferred Start],MATCH((FILTERXML("<t><s>"&SUBSTITUTE(G6,",","</s><s>")&"</s></t>","//s")),Table1[Id],0)))}
Somewhere along the way however, it loses the "arrayness", and simply returns the "Preferred Start" of the first Id number of the split (Id 3, 17 Jan 18).
Is what I'm trying to do even possible without resorting to VBA? I suspect I will run into a circular reference in actuality, since I really need to take the maximum of the "Actual Start" (adjusted for dependencies), to properly cascade a chain of dependent items.
Thanks
This is a known issue with INDEX, it's reluctant to return an array without some co-ercion. Generically this should work
=INDEX(range,N(IF(1,{array})))
so that becomes the following with your specific scenario
=MAX(INDEX(Table1[Preferred Start],N(IF(1,MATCH((FILTERXML("<t><s>"&SUBSTITUTE(G6,",","</s><s>")&"</s></t>","//s")),Table1[Id],0)))))
confirm with CTRL+SHIFT+ENTER
I assume that every row has a different ID number because the MATCH function will only find the first match for each ID
....or for a completely different approach you can use AGGREGATE function (and SEARCH instead of FILTERXML), which doesn't require "array entry" and would return the correct MAX even if IDs repeat, i.e.
=AGGREGATE(14,6,Table1[Preferred Start]/SIGN(SEARCH(","&Table1[Id]&",",","&G6&",")),1)
Reorder the match to include the max in it:
=INDEX(Table1[Preferred Start],MATCH(MAX((FILTERXML("<t><s>"&SUBSTITUTE(G6,",","</s><s>")&"</s></t>","//s"))),Table1[Id],0))
Enter as an array formula using Ctrl-Shift-Enter.

Retrieving data from table using cell array - Matlab

I have a table in Matlab crsp and and cell array of numbers that serve as keys. I want to retrieve information from the table using that cell array which is stored as a variable. My code is as follows:
function y = generateWeights(permno_vector, this_datenum, crsp)
crsp(crsp.PERMNO == permno_vector,:);
crsp is defined as a table while permno_vector is the cell array. It contains a couple permnos that are used to retrieve information.
In this case, my code is not working and will not allow me to access the values in crsp. How do we access table values using a vector array?
As James Johnstone points out, the first problem with the code you've posted is that it doesn't assign anything to y, so as written your function doesn't return a value. Once you've fixed that, I assume the error you are seeing is Undefined operator '==' for input arguments of type 'cell'. It's always helpful to include this sort of detail when asking a question.
The syntax
crsp(crsp.PERMNO == x,:)
would return those rows of crsp that had PERMNO equal to x. However if you want to supply a list of possible values, and get back all the rows of your table where your target variable matches one of the values in the list, you need to use ismember:
crsp(ismember(crsp.PERMNO, cell2mat(permno_vector)),:)
if permno_vector is a cell array, or simply:
crsp(ismember(crsp.PERMNO, permno_vector),:)
if you can instead supply permno_vector as a numeric vector (assuming of course the data in crsp.PERMNO is also numeric).

Using a string key to return a value from an array

I have a named array of 14 rows by 2 columns. The first has a string key (ie: Country), and the second an attribute (ie: Owner). I want to retrieve the Owner by supplying the Country.
I only know how to use =INDEX to retrieve values from named arrays, but that expects col/row numbers.
How might I achieve my requirement?
For the sake of an answer.
Feed the INDEX function with a MATCH function to provide the requisite row number, along the lines:
=INDEX(B:B,MATCH(A2,A:A,0))
VLOOKUP will work but INDEX/MATCH is more powerful (see) so if you are already comfortable with INDEX it might be better to add MATCH to your arsenal rather than to bother with V/H LOOKUP.

The optimum* way to do a table-lookup-like function in C?

I have to do a table lookup to translate from input A to output A'. I have a function with input A which should return A'. Using databases or flat files are not possible for certain reasons. I have to hardcode the lookup in the program itself.
What would be the the most optimum (*space-wise and time-wise separately): Using a hashmap, with A as the key and A' as the value, or use switch case statements in the function?
The table is a string to string lookup with a size of about 60 entries.
If speed is ultra ultra necessary, then I would consider perfect hashing. Otherwise I'd use an array/vector of string to string pairs, created statically in sort order and use binary search. I'd also write a small test program to check the speed and memory constraints were met.
I believe that both the switch and the table-look up will be equivalent (although one should do some tests on the compiler being used). A modern C compiler will implement a big switch with a look-up table. The table look-up can be created more easily with a macro or a scripting language.
For both solutions the input A must be an integer. If this is not the case, one solution will be to implement a huge if-else statement.
If you have strings you can create two arrays - one for input and one for output (this will be inefficient if they aren't of the same size). Then you need to iterate the contents of the input array to find a match. Based on the index you find, you return the corresponding output string.
Make a key that is fast to calculate, and hash
If the table is pretty static, unlikely to change in future, you could have a look-see if adding a few selected chars (with fix indexes) in the "key" string could get unique values (value K). From those insert the "value" strings into a hash_table by using the pre-calculated "K" value for each "key" string.
Although a hash method is fast, there is still the possibility of collision (two inputs generating the same hash value). A fast method depends on the data type of the input.
For integral types, the fastest table lookup method is an array. Use the incoming datum as an index into the array. One of the problems with this method is that the array must account for the entire spectrum of values for the fastest speed. Otherwise execution is slowed down by translating the original index into an index for the array (kind of like a hashing method).
For string input types, a nested look up may be the fastest. One example is to break up tables by length. The first array returns pointers to the table to search based on length, e.g. char * sub_table = First_Array[5] for a string of length 5. These can be configured for specialized input data.
Another method is to use a B-Tree, which is a binary tree of "pages". Behavior is similar to nested arrays.
If you let us know the input type, we can better answer your question.

Resources