Postgres/JSON - update all array elements - arrays

Given the following json:
{
"foo": [
{
"bar": true
},
{
"bar": true
}
]
}
How can I select the following:
{
"foo": [
{
"bar": false
},
{
"bar": false
}
]
}
?
So far I've figured out how to manipulate a single array value:
SELECT
jsonb_set(
'{
"foo": [
{
"bar": true
},
{
"bar": true
}
]
}'::jsonb, '{foo,0,bar}', to_jsonb(false)
)
But how do I set all elements within an array?

You might want to kill two birds with one stone - update existing key in every object in the array or insert the key with a given value. jsonb_set is a perfect match here, but it requires us to pass the index of each object, so we have to iterate over the array first.
The implementation is HIGHLY inspired by klin's answer, which didn't solve my problem (which was updating and inserting) and didn't work if there were multiple keys in the object.
So, the implementation is as follows:
-- the params are the same as in aforementioned `jsonb_set`
CREATE OR REPLACE FUNCTION update_array_elements(target jsonb, path text[], new_value jsonb)
RETURNS jsonb language sql AS $$
-- aggregate the jsonb from parts created in LATERAL
SELECT jsonb_agg(updated_jsonb)
-- split the target array to individual objects...
FROM jsonb_array_elements(target) individual_object,
-- operate on each object and apply jsonb_set to it. The results are aggregated in SELECT
LATERAL jsonb_set(individual_object, path, new_value) updated_jsonb
$$;
And that's it... :)
I hope it'll help someone with the same problem I had.

There is no standard function to update json array elements by key.
A custom function is probably the simplest way to solve the problem:
create or replace function update_array_elements(arr jsonb, key text, value jsonb)
returns jsonb language sql as $$
select jsonb_agg(jsonb_build_object(k, case when k <> key then v else value end))
from jsonb_array_elements(arr) e(e),
lateral jsonb_each(e) p(k, v)
$$;
select update_array_elements('[{"bar":true},{"bar":true}]'::jsonb, 'bar', 'false');
update_array_elements
----------------------------------
[{"bar": false}, {"bar": false}]
(1 row)
Your query may look like this:
with a_data(js) as (
values(
'{
"foo": [
{
"bar": true
},
{
"bar": true
}
]
}'::jsonb)
)
select
jsonb_set(js, '{foo}', update_array_elements(js->'foo', 'bar', 'false'))
from a_data;
jsonb_set
-------------------------------------------
{"foo": [{"bar": false}, {"bar": false}]}
(1 row)

Related

Postgresql update jsonb keys recursively

Having the following datamodel:
create table test
(
id int primary key,
js jsonb
);
insert into test values (1, '{"id": "total", "price": 400, "breakdown": [{"id": "product1", "price": 400}] }');
insert into test values (2, '{"id": "total", "price": 1000, "breakdown": [{"id": "product1", "price": 400}, {"id": "product2", "price": 600}]}');
I need to update all the price keys to a new name cost.
It is easy to do that on the static field, using:
update test
set js = jsonb_set(js #- '{price}', '{cost}', js #> '{price}');
result:
1 {"id": "total", "cost": 1000, "breakdown": [{"id": "product1", "price": 400}]}
2 {"id": "total", "cost": 2000, "breakdown": [{"id": "product1", "price": 400}, {"id": "product2", "price": 600}]}
But I also need to do this inside the breakdown array.
How can I do this without knowing the number of items in the breakdown array?
In other words, how can I apply a function in place on every element from a jsonb array.
Thank you!
SOLUTION 1 : clean but heavy
First you create an aggregate function simlilar to jsonb_set :
CREATE OR REPLACE FUNCTION jsonb_set(x jsonb, y jsonb, _path text[], _key text, _val jsonb, create_missing boolean DEFAULT True)
RETURNS jsonb LANGUAGE sql IMMUTABLE AS
$$
SELECT jsonb_set(COALESCE(x, y), COALESCE(_path, '{}' :: text[]) || _key, COALESCE(_val, 'null' :: jsonb), create_missing) ;
$$ ;
DROP AGGREGATE IF EXISTS jsonb_set_agg (jsonb, text[], text, jsonb, boolean) CASCADE ;
CREATE AGGREGATE jsonb_set_agg (jsonb, text[], text, jsonb, boolean)
(
sfunc = jsonb_set
, stype = jsonb
) ;
Then, you call the aggregate function while iterating on the jsonb array elements :
WITH list AS (
SELECT id, jsonb_set_agg(js #- '{breakdown,' || ind || ',price}', '{breakdown,' || ind || ',cost}', js #> '{breakdown,' || ind || ',price}', true) AS js
FROM test
CROSS JOIN LATERAL generate_series(0, jsonb_array_length(js->'{breakdown}') - 1) AS ind
GROUP BY id)
UPDATE test AS t
SET js = jsonb_set(l.js #- '{price}', '{cost}', l.js #> '{price}')
FROM list AS l
WHERE t.id = l.id ;
SOLUTION 2 : quick and dirty
You simply convert jsonb to string and replace the substring 'price' by 'cost' :
UPDATE test
SET js = replace(js :: text, 'price', 'cost') :: jsonb
In the general case, this solution will replace the substring 'price' even in the jsonb string values and in the jsonb keys which include the substring 'price'. In order to reduce the risk, you can replace the substring '"price" :' by '"cost" :' but the risk still exists.
This query is sample and easy for change field:
You can see my query structure in: dbfiddle
update test u_t
set js = tmp.new_js
from (
select t.id,
(t.js || jsonb_build_object('cost', t.js ->> 'price')) - 'price'
||
jsonb_build_object('breakdown', jsonb_agg(
(b.value || jsonb_build_object('cost', b.value ->> 'price')) - 'price')) as new_js
from test t
cross join jsonb_array_elements(t.js -> 'breakdown') b
group by t.id) tmp
where u_t.id = tmp.id;
Another way to replace jsonb key in all jsonb objets into a jsonb array:
My query disaggregate the jsonb array. For each object, if price key exist, remove the price key from jsonb object, add the new cost key with the old price's value, then create a new jsonb array with the modified objects. Finally replace the old jsonb array with the new one.
WITH cte AS (SELECT id, jsonb_agg(CASE WHEN item ? 'price'
THEN jsonb_set(item - 'price', '{"cost"}', item -> 'price')
ELSE item END) AS cost_array
FROM test
CROSS JOIN jsonb_array_elements(js -> 'breakdown') WITH ORDINALITY arr(item, index)
GROUP BY id)
UPDATE test
SET js = jsonb_set(js, '{breakdown}', cte.cost_array, false)
FROM cte
WHERE cte.id = test.id;

How do I do an aggregate query against a Couchbase array index?

I have documents in my database that contain a "flags" array. Each of those has a "flag" value that contains a string. I'm trying to get the count of how many of each flag string there are across all documents. So for example, if I had two documents:
{
"flags": [
{
"flag": "flag1",
...
},
{
"flag": "flag2",
...
}
],
...
},
{
"flags": [
{
"flag": "flag1",
...
},
{
"flag": "flag3",
...
}
],
...
}
I would expect a result back like:
{
{
"flag": "flag1",
"flag_count": 2
},
{
"flag": "flag2",
"flag_count": 1
},
{
"flag": "flag3",
"flag_count": 1
}
}
I've created an index that looks like this:
CREATE INDEX `indexname` ON `dbname`((all (array (`f`.`flag`) for `f` in `flags` end)),`flags`) WHERE (`type` in ["type1", "type2"])
So far, the only way I've been able to get this to work is with a query like this:
SELECT f1.flag, count(*) as flag_count from dbname s unnest flags as f1 where (s.type in ["type1", "type2"]) AND any f in s.flags satisfies f.flag like '%' end group by f1.flag
This all makes sense to me except that it requires something along the lines of that AND any f in s.flags satisfies f.flag like '%' part to run at all - if I leave that out, it tells me it can't find an index that works. Is there a way to structure this such that I could leave that out? It seems unnecessary to me, but I guess I'm missing something.
CREATE INDEX ix1 ON dbname( ALL ARRAY f.flag FOR f IN flags END)
WHERE type IN ["type1", "type2"];
SELECT f.flag, COUNT(1) AS flag_count
FROM dbname AS d
UNNEST d.flags AS f
WHERE d.type IN ["type1", "type2"] AND f.flag LIKE "%"
GROUP BY f.flag;
If ARRAY has duplicate flag value and count one per document
SELECT f.flag, COUNT( DISTINCT META(d).id) AS flag_count
FROM dbname AS d
UNNEST d.flags AS f
WHERE d.type IN ["type1", "type2"] AND f.flag LIKE "%"
GROUP BY f.flag;
Check UNNEST https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/indexing-arrays.html

Query JSON Key:Value Pairs in AWS Athena

I have received a data set from a client that is loaded in AWS S3. The data contains unnamed JSON key:value pairs. This isn't my area of expertise, so I was looking for a little help.
The structure of JSON data that I've typically worked with in the past looks similar to this:
{ "name":"John", "age":30, "car":null }
The data that I have received from my client is formatted as such:
{
"answer_id": "cc006",
"answer": {
"101086": 1,
"101087": 2,
"101089": 2,
"101090": 7,
"101091": 5,
"101092": 3,
"101125": 2
}
}
This is survey data, where the key on the left is a numeric customer identifier, and the value on the right is their response to a survey question, i.e. customer "101125" answered the survey with a value of "2". I need to be able to query the JSON data using Athena such that my result set looks similar to:
Cross joining the unnested children against the parent node isn't an issue. What I can't figure out is how to select all of the keys from the array "answer" without specifying that actual key name. Similarly, I want to be able to select all of the values as well.
Is it possible to create a virtual table in Athena that would allow for these results, or do I need to convert the JSON to a format this looks more similar to the following:
{
"answer_id": "cc006",
"answer": [
{ "key": "101086", "value": 1 },
{ "key": "101087", "value": 2 },
{ "key": "101089", "value": 2 },
{ "key": "101090", "value": 7 },
{ "key": "101091", "value": 5 },
{ "key": "101092", "value": 3 },
{ "key": "101125", "value": 2 }
]
}
EDIT 6/4/2020
I was able to use the code that Theon provided below along with the following table structure:
CREATE EXTERNAL TABLE answer_example (
answer_id string,
answer string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://mybucket/'
That allowed me to use the following query to generate the results that I needed.
WITH Data AS(
SELECT
answer_id,
CAST(json_extract(answer, '$') AS MAP(VARCHAR, VARCHAR)) as answer
FROM
answer_example
)
SELECT
answer_id,
key,
element_at(answer, key) AS value
FROM
Data
CROSS JOIN UNNEST (map_keys(answer)) AS answer (key)
EDIT 6/5/2020
Taking additional advice from Theon's response below, the following DDL and Query simplify this quite a bit.
DDL:
CREATE EXTERNAL TABLE answer_example (
answer_id string,
answer map<string,string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://mybucket/'
Query:
SELECT
answer_id,
key,
element_at(answer, key) AS value
FROM
answer_example
CROSS JOIN UNNEST (map_keys(answer)) AS answer (key)
Cross joining with the keys of the answer property and then picking the corresponding value. Something like this:
WITH data AS (
SELECT
'cc006' AS answer_id,
MAP(
ARRAY['101086', '101087', '101089', '101090', '101091', '101092', '101125'],
ARRAY[1, 2, 2, 7, 5, 3, 2]
) AS answers
)
SELECT
answer_id,
key,
element_at(answers, key) AS value
FROM data
CROSS JOIN UNNEST (map_keys(answers)) AS answer (key)
You could probably do something with transform_keys to create rows of the key value pairs, but the SQL above does the trick.

How can I count the number of top level json keys in a Snowflake database variant field?

I'm looking for the number 2 here... array_size appears to work on a variant list but is not doing so well on this json. Is there a clever way to do this? I don't know/probably can't trust the structure will only go this deep, and am hoping to use this as just another field on a query where I'm pulling out a bunch of other values out of the json; so ideally a solution allows this as well.
select dict, array_size(dict)
from (select parse_json('{
"partition": [
"partition_col"
],
"sample_weight": [
"sample_weight"
]
}') as dict)
In that case you can create a small Javascript UDF:
create or replace function count_keys(MYVAR variant)
returns float
language javascript
as '
return (Object.entries(MYVAR)).length
'
;
To call it:
select count_keys(parse_json(
'{
"partition": [
"partition_col"
],
"sample_weight": [
"sample_weight"
]
}')
)
;
Use flatten:
with dict as (
select parse_json('{
"partition": [
"partition_col"
],
"sample_weight": [
"sample_weight"
]
}') val
)
select val, count(*)
from dict,
lateral flatten(input => val)
group by val
;

How to delete array element in JSONB column based on nested key value?

How can I remove an object from an array, based on the value of one of the object's keys?
The array is nested within a parent object.
Here's a sample structure:
{
"foo1": [ { "bar1": 123, "bar2": 456 }, { "bar1": 789, "bar2": 42 } ],
"foo2": [ "some other stuff" ]
}
Can I remove an array element based on the value of bar1?
I can query based on the bar1 value using: columnname #> '{ "foo1": [ { "bar1": 123 } ]}', but I've had no luck finding a way to remove { "bar1": 123, "bar2": 456 } from foo1 while keeping everything else intact.
Thanks
Running PostgreSQL 9.6
Assuming that you want to search for a specific object with an inner object of a certain value, and that this specific object can appear anywhere in the array, you need to unpack the document and each of the arrays, test the inner sub-documents for containment and delete as appropriate, then re-assemble the array and the JSON document (untested):
SELECT id, jsonb_build_object(key, jarray)
FROM (
SELECT foo.id, foo.key, jsonb_build_array(bar.value) AS jarray
FROM ( SELECT id, key, value
FROM my_table, jsonb_each(jdoc) ) foo,
jsonb_array_elements(foo.value) AS bar (value)
WHERE NOT bar.value #> '{"bar1": 123}'::jsonb
GROUP BY 1, 2 ) x
GROUP BY 1;
Now, this may seem a little dense, so picked apart you get:
SELECT id, key, value
FROM my_table, jsonb_each(jdoc)
This uses a lateral join on your table to take the JSON document jdoc and turn it into a set of rows foo(id, key, value) where the value contains the array. The id is the primary key of your table.
Then we get:
SELECT foo.id, foo.key, jsonb_build_array(bar.value) AS jarray
FROM foo, -- abbreviated from above
jsonb_array_elements(foo.value) AS bar (value)
WHERE NOT bar.value #> '{"bar1": 123}'::jsonb
GROUP BY 1, 2
This uses another lateral join to unpack the arrays into bar(value) rows. These objects can now be searched with the containment operator to remove the objects from the result set: WHERE NOT bar.value #> '{"bar1": 123}'::jsonb. In the select list the arrays are re-assembled by id and key but now without the offending sub-documents.
Finally, in the main query the JSON documents are re-assembled:
SELECT id, jsonb_build_object(key, jarray)
FROM x -- from above
GROUP BY 1;
The important thing to understand is that PostgreSQL JSON functions only operate on the level of the JSON document that you can explicitly indicate. Usually that is the top level of the document, unless you have an explicit path to some level in the document (like {foo1, 0, bar1}, but you don't have that). At that level of operation you can then unpack to do your processing such as removing objects.

Resources