Sum of values of json array in PostgreSQL - arrays

In PostgreSQL 9.3, I have a table like this
id | array_json
---+----------------------------
1 | ["{123: 456}", "{789: 987}", "{111: 222}"]
2 | ["{4322: 54662}", "{123: 5121}", "{1: 5345}" ... ]
3 | ["{3232: 413}", "{5235: 22}", "{2: 5453}" ... ]
4 | ["{22: 44}", "{12: 4324}", "{234: 4235}" ... ]
...
I want to get the sum of all values in array_json column. So, for example, for first row, I want:
id | total
---+-------
1 | 1665
Where 1665 = 456 + 987 + 222 (the values of all the elements of json array). No previous information about the keys of the json elements (just random numbers)
I'm reading the documentation page about JSON functions in PostgreSQL 9.3, and I think I should use json_each, but can't find the right query. Could you please help me with it?
Many thanks in advance

You started looking at the right place (going to the docs is always the right place).
Since your values are JSON arrays -> I would suggest using json_array_elements(json)
And since it's a json array which you have to explode to several rows, and then combine back by running sum over json_each_text(json) - it would be best to create your own function (Postgres allows it)
As for your specific case, assuming the structure you provided is correct, some string parsing + JSON heavy wizardry can be used (let's say your table name is "json_test_table" and the columns are "id" and "json_array"), here is the query that does your "thing"
select id, sum(val) from
(select id,
substring(
json_each_text(
replace(
replace(
replace(
replace(
replace(json_array,':','":"')
,'{',''),
'}','')
,']','}')
,'[','{')::json)::varchar
from '\"(.*)\"')::int as val
from json_test_table) j group by id ;
if you plan to run it on a huge dataset - keep in mind string manipulations are expensive in terms of performance

You can get it using this:
/*
Sorry, sqlfiddle is busy :p
CREATE TABLE my_table
(
id bigserial NOT NULL,
array_json json[]
--,CONSTRAINT my_table_pkey PRIMARY KEY (id)
)
INSERT INTO my_table(array_json)
values (array['{"123": 456}'::json, '{"789": 987}'::json, '{"111": 222}'::json]);
*/
select id, sum(json_value::integer)
from
(
select id, json_data->>json_object_keys(json_data) as json_value from
(
select id, unnest(array_json) as json_data from my_table
) A
) B
group by id

Related

How to parse a table with a JSON array field in PostgreSQL into rows?

I have a table that contains a json array. Here is a sample of the contents of the field from:
SELECT json_array FROM table LIMIT 5;
Result:
[{"key1":"value1"}, {"key1":"value2"}, ..., {"key2":"value3"}]
[]
[]
[]{"key1":"value1"}
[]
How can I retrieve all the values and count how many of each value was found?
I am using PostgreSQL 9.5.14, and I have tried the solutions here Querying a JSON array of objects in Postgres
and the ones suggested to me by another generous stackoverflow user in my last question: How can I parse JSON arrays in postgresql?
I tried:
SELECT
value -> 'key1'
FROM
table,
json_array_elements(json_array);
which sadly does not work for me due to receiving the error: cannot call json_array_elements on a scalar
This error happens when using a query that returns more than one row or more than one column as a scalar subquery.
Another solution I tried was:
SELECT json_array as json, (json_array->0),
coalesce(
case
when (json_array->0) IS NULL then null
else (json_array->0->>'key1')
end,
'No value') AS "Value"
FROM table;
which only returned null values for the "Value"
Referencing Querying a JSON array of objects in Postgres I attempted to use this solution as well:
WITH json_test (col) AS (
values (json_arrays)
)
SELECT
y.x->'key1' "key1"
FROM json_test jt,
LATERAL (SELECT json_array_elements(jt.col) x) y;
But I would need to be able to fit all the elements of the json_arrays into json_test
So far I have only attempted to list all the values in the all json arrays, but my ideal end-result for the query resembles this:
Value | Amount
---------------
value1 | 48
value2 | 112
value3 | 93
value4 | 0
Yet again I am grateful for any help with this, thank you in advance.
step-by-step demo:db<>fiddle
SELECT
each.value,
COUNT(*)
FROM
data,
json_array_elements(json_array) elems, -- 1
json_each_text(elems) each -- 2
GROUP BY each.value -- 3
Expand array into one row for each array element
split the key/value pairs into two columns
group by the new value column/count

Postgres select by array element range

In my table I've got column facebook where I store facebook data ( comment count, share count etc.) and It's an array. For example:
{{total_count,14},{comment_count,0},{comment_plugin_count,0},{share_count,12},{reaction_count,2}}
Now I'm trying to SELECT rows that facebook total_count is between 5 and 10. I've tried this:
SELECT * FROM pl where regexp_matches(array_to_string(facebook, ' '), '(\d+).*')::numeric[] BETWEEN 5 and 10;
But I'm getting an error:
ERROR: operator does not exist: numeric[] >= integer
Any ideas?
There is no need to convert the array to text and use regexp. You can access a particular element of the array, e.g.:
with pl(facebook) as (
values ('{{total_count,14},{comment_count,0},{comment_plugin_count,0},{share_count,12},{reaction_count,2}}'::text[])
)
select facebook[1][2] as total_count
from pl;
total_count
-------------
14
(1 row)
Your query may look like this:
select *
from pl
where facebook[1][2]::numeric between 5 and 10
Update. You could avoid the troubles described in the comments if you would use the word null instead of empty strings ''''.
with pl(id, facebook) as (
values
(1, '{{total_count,14},{comment_count,0}}'::text[]),
(2, '{{total_count,null},{comment_count,null}}'::text[]),
(3, '{{total_count,7},{comment_count,10}}'::text[])
)
select *
from pl
where facebook[1][2]::numeric between 5 and 10
id | facebook
----+--------------------------------------
3 | {{total_count,7},{comment_count,10}}
(1 row)
However, it would be unfair to leave your problems without an additional comment. The case is suitable as an example for the lecture How not to use arrays in Postgres. You have at least a few better options. The most performant and natural is to simply use regular integer columns:
create table pl (
...
facebook_total_count integer,
facebook_comment_count integer,
...
);
If for some reason you need to separate this data from others in the table, create a new secondary table with a foreign key to the main table.
If for some mysterious reason you have to store the data in a single column, use the jsonb type, example:
with pl(id, facebook) as (
values
(1, '{"total_count": 14, "comment_count": 0}'::jsonb),
(2, '{"total_count": null, "comment_count": null}'::jsonb),
(3, '{"total_count": 7, "comment_count": 10}'::jsonb)
)
select *
from pl
where (facebook->>'total_count')::integer between 5 and 10
hstore can be an alternative to jsonb.
All these ways are much easier to maintain and much more efficient than your current model. Time to move to the bright side of power.

PSQL: Count number of wildcard values in JSONB array

My table has a jsonb column that stores JSON arrays of strings in this format:
["ItemA", "ItemB", "ItemC"]
I'm trying to filter the rows based on the number of certain items in the array, using a wildcard for a part of the name of the item.
From what I have read here on SO, I could use the jsonb_to_recordset function and then just query the data normally, but I can't put the pieces together.
How do I use the jsonb_to_recordset to accomplish this? It's asking for a column definition list, but how do I specify one for just a string array?
My hypothetical (but of course not valid) query would look something like this:
SELECT * FROM mytable, jsonb_to_recordset(mytable.jsonbdata) AS text[] WHERE mytable.jsonbdata LIKE 'Item%'
EDIT:
Maybe it could be done using something like this instead:
SELECT * FROM mytable WHERE jsonbdata ? 'Item%';
Use jsonb_array_elements():
select *
from
mytable t,
jsonb_array_elements_text(jsonbdata) arr(elem)
where elem like 'Item%';
jsonbdata | elem
-----------------------------+-------
["ItemA", "ItemB", "ItemC"] | ItemA
["ItemA", "ItemB", "ItemC"] | ItemB
["ItemA", "ItemB", "ItemC"] | ItemC
(3 rows)
Probably you'll want to select only distinct table rows:
select distinct t.*
from
mytable t,
jsonb_array_elements_text(jsonbdata) arr(elem)
where elem like 'Item%';

Find valid combinations based on matrix

I have a in CALC the following matrix: the first row (1) contains employee numbers, the first column (A) contains productcodes.
Everywhere there is an X that productitem was sold by the corresponding employee above
| 0302 | 0303 | 0304 | 0402 |
1625 | X | | X | X |
1643 | | X | X | |
...
We see that product 1643 was sold by employees 0303 and 0304
What I would like to see is a list of what product was sold by which employees but formatted like this:
1625 | 0302, 0304, 0402 |
1643 | 0303, 0304 |
The reason for this is that we need this matrix ultimately imported into an SQL SERVER table. We have no access to the origins of this matrix. It contains about 50 employees and 9000+ products.
Thanx for thinking with us!
try something like this
;with data as
(
SELECT *
FROM ( VALUES (1625,'X',NULL,'X','X'),
(1643,NULL,'X','X',NULL))
cs (col1, [0302], [0303], [0304], [0402])
),cte
AS (SELECT col1,
col
FROM data
CROSS apply (VALUES ('0302',[0302]),
('0303',[0303]),
('0304',[0304]),
('0402',[0402])) cs (col, val)
WHERE val IS NOT NULL)
SELECT col1,
LEFT(cs.col, Len(cs.col) - 1) AS col
FROM cte a
CROSS APPLY (SELECT col + ','
FROM cte B
WHERE a.col1 = b.col1
FOR XML PATH('')) cs (col)
GROUP BY col1,
LEFT(cs.col, Len(cs.col) - 1)
I think there are two problems to solve:
get the product codes for the X marks;
concatenate them into a single, comma-separated string.
I can't offer a solution for both issues in one step, but you may handle both issues separately.
1.
To replace the X marks by the respective product codes, you could use an array function to create a second table (matrix). To do so, create a new sheet, copy the first column / first row, and enter the following formula in cell B2:
=IF($B2:$E3="X";$B$1:$E$1;"")
You'll have to adapt the formula, so it covers your complete input data (If your last data cell is Z9999, it would be =IF($B2:$Z9999="X";$B$1:$Z$1;"")). My example just covers two rows and four columns.
After modifying it, confirm with CTRL+SHIFT+ENTER to apply it as array formula.
2.
Now, you'll have to concatenate the product codes. LO Calc lacks a feature to concatenate an array, but you could use a simple user-defined function. For such a string-join function, see this answer. Just create a new macro with the StarBasic code provided there and save it. Now, you have a STRJOIN() function at hand that accepts an array and concatenates its values, leaving empty values out.
You could add that function using a helper column on the second sheet and apply it by dragging it down. Finally, to get rid of the cells with the single product IDs, copy the complete second sheet, paste special into a third sheet, pasting only the values. Now, you can remove all columns except the first one (employee IDs) and the last one (with the concatenated product ids).
I created a table in sql for holding the data:
CREATE TABLE [dbo].[mydata](
[prod_code] [nvarchar](8) NULL,
[0100] [nvarchar](10) NULL,
[0101] [nvarchar](10) NULL,
[and so on...]
I created the list of columns in Calc by copying and pasting them transposed. After that I used the concatenate function to create the columnlist + datatype for the create table statement
I cleaned up the worksheet and imported it into this table using SQL Server's import wizard. Cleaning meant removing unnecessary rows/columns. Since the columnnames were identical mapping was done correctly for 99%.
Now I had the data in SQL Server.
I adapted the code MM93 suggested a bit:
;with data as
(
SELECT *
FROM dbo.mydata <-- here i simply referenced the whole table
),cte
and in the next part I uses the same 'worksheet' trick to list and format all the column names and pasted them in.
),cte
AS (SELECT prod_code, <-- had to replace col1 with 'prod_code'
col
FROM data
CROSS apply (VALUES ('0100',[0100]),
('0101', [0101] ),
(and so on... ),
The result of this query was inserted into a new table and my colleagues and I are querying our harts out :)
PS: removing the 'FOR XML' clause resulted in a table with two columns :
prodcode | employee
which containes al the unique combinations of prodcode + employeenumber which is a lot faster and much more practical to query.

SQL Server : LIKE - Finding 1 and not 11 in a string

Say I have a SQL Server table with these values:
ID test
-----------------
1 '1,11,X1'
2 'Q22,11,111,51'
3 '1'
4 '5,Q22,1'
If I want to find out which rows contain the comma-separated value '1', I can just do the following and it will work but I'd like to find a better or less wordy way of doing so if it exists. Unfortunately I cannot use RegExp because using \b1\b would be awesome here.
Select test
FROM ...
WHERE
test LIKE '1,%'
OR test = '1'
OR test LIKE '%,1'
OR test LIKE %,1,%
Something like...
WHERE
test LIKE '%[,{NULL}]1[,{NULL}]%'
I know this line isn't correct but you get what I'm after... hopefully ;)
EDITED based on comments below
You shouldn't use comma-delimited values to store lists. You should use a junction table. But, if you have to, the following logic might help:
Select test
FROM ...
WHERE ',' + test + ',' like '%,' + '1' + ',%' ;
This assumes that what you are looking for is "1" as the entire item in the list.
Note: You can/should write the like pattern as '%,1,%'. I just put it in three pieces to separate out the pattern you are looking for.
There are plenty of SplitString functions available if you google around (many here on StackOverflow) that take a comma delimited string like you have, and split it out into multiple rows. You can CROSS APPLY that table-value function to your query, and then just select for those rows that have '1'.
For example, using this splitstring function here (just one of many):
T-SQL split string
You can write this code to get exactly what you want (note, the declare and insert are just to set up test data so you can see it in action):
DECLARE #test TABLE (ID int, test varchar(400));
INSERT INTO #test (ID, test)
VALUES(1, '1,11,X1'),
(2, 'Q22,11,111,51'),
(3, '1'),
(4, '5,Q22,1')
SELECT *
FROM #test
CROSS APPLY splitstring(test)
WHERE [Name] = '1'
This query returns this:
1 1,11,X1 1
3 1 1
4 5,Q22,1 1
select *
from table
where ',' + test + ',' like '%,1,%'
You have to "normalize" your database. If you have multiple attributs for one row, it's a problem!
Add a "One to Many" relation with yours attributs.
You can do like that:
ID, test
1, 1
1, 11
1, X1
2, Q22
2, 11
[...]
SELECT test FROM ...
WHERE ID = (SELECT ID FROM ... WHERE test = 1)
You primary key is (ID, test) now.
You need something like:
SELECT test
FROM _tableName_
WHERE (test LIKE '1,%'
OR test LIKE '%,1'
OR test LIKE '%,1,%'
OR test LIKE '1')
This will return rows that match in order
1 starts a list
1 ends a list
1 is in the middle of a list
1 is its own list

Resources