PSQL: Count number of wildcard values in JSONB array - arrays

My table has a jsonb column that stores JSON arrays of strings in this format:
["ItemA", "ItemB", "ItemC"]
I'm trying to filter the rows based on the number of certain items in the array, using a wildcard for a part of the name of the item.
From what I have read here on SO, I could use the jsonb_to_recordset function and then just query the data normally, but I can't put the pieces together.
How do I use the jsonb_to_recordset to accomplish this? It's asking for a column definition list, but how do I specify one for just a string array?
My hypothetical (but of course not valid) query would look something like this:
SELECT * FROM mytable, jsonb_to_recordset(mytable.jsonbdata) AS text[] WHERE mytable.jsonbdata LIKE 'Item%'
EDIT:
Maybe it could be done using something like this instead:
SELECT * FROM mytable WHERE jsonbdata ? 'Item%';

Use jsonb_array_elements():
select *
from
mytable t,
jsonb_array_elements_text(jsonbdata) arr(elem)
where elem like 'Item%';
jsonbdata | elem
-----------------------------+-------
["ItemA", "ItemB", "ItemC"] | ItemA
["ItemA", "ItemB", "ItemC"] | ItemB
["ItemA", "ItemB", "ItemC"] | ItemC
(3 rows)
Probably you'll want to select only distinct table rows:
select distinct t.*
from
mytable t,
jsonb_array_elements_text(jsonbdata) arr(elem)
where elem like 'Item%';

Related

how to remove double quotes from an Array

I had created a table in hive with ORC format and loaded data into the table. I have use collect_set to eliminate the duplicates as follows to insert the data. However, i see double quotes in the array. Is there anyway to remove those double quotes?
This is an sample data iim getting from table a and inserting into the table b using:
insert into table b
select a.name as name, collect_set(b.sub) as subjects from a group by a.name;
my table be would be like this:
name | subjects
john | ["Eng", "Math", "Phy"]
Sarah | ["Math", "Chem"]
I want to get ride of the double quote in the array to look like this:
name | subjects
john | [Eng, Math, Phy]
Sarah | [Math, Chem]
Is there anyway to do this using hql?
Array is an object and to be displayed it needs to be transformed to string.
When you select array, it is being transformed(serialized) to string. Hive displays array as comma separated values, double quoted, in square brackets.
Consider this example:
select array('Eng', 'Math', 'Phy');
Returns:
["Eng","Math","Phy"]
What I'm trying to say is that there is no double-quotes " in the initial data most probably, it is being serialized to String with double-quotes when you select it directly without explicit conversion to string.
If this is the real reason of double quotes in the select result, then the solution is to transform array to string explicitly:
select concat('[',concat_ws(',',array('Eng', 'Math', 'Phy')),']');
Returns:
[Eng,Math,Phy]
Is it what you expected?
If not and you really need to remove double-quotes from column value, then regexp_replace will do.
Example of array containing double quotes in the values:
select concat('[',concat_ws(',',array('"Eng"', '"Math"', '"Phy"')),']');
Returns:
["Eng","Math","Phy"]
In such case you can apply regexp_replace when loading your table
regexp_replace(string, '["]', '') --this will remove double-qutes
Your insert statement will look like this:
insert into table b select a.name as name, collect_set(regexp_replace(sub, '["]', '')) as subjects from a group by a.name;

How to parse a table with a JSON array field in PostgreSQL into rows?

I have a table that contains a json array. Here is a sample of the contents of the field from:
SELECT json_array FROM table LIMIT 5;
Result:
[{"key1":"value1"}, {"key1":"value2"}, ..., {"key2":"value3"}]
[]
[]
[]{"key1":"value1"}
[]
How can I retrieve all the values and count how many of each value was found?
I am using PostgreSQL 9.5.14, and I have tried the solutions here Querying a JSON array of objects in Postgres
and the ones suggested to me by another generous stackoverflow user in my last question: How can I parse JSON arrays in postgresql?
I tried:
SELECT
value -> 'key1'
FROM
table,
json_array_elements(json_array);
which sadly does not work for me due to receiving the error: cannot call json_array_elements on a scalar
This error happens when using a query that returns more than one row or more than one column as a scalar subquery.
Another solution I tried was:
SELECT json_array as json, (json_array->0),
coalesce(
case
when (json_array->0) IS NULL then null
else (json_array->0->>'key1')
end,
'No value') AS "Value"
FROM table;
which only returned null values for the "Value"
Referencing Querying a JSON array of objects in Postgres I attempted to use this solution as well:
WITH json_test (col) AS (
values (json_arrays)
)
SELECT
y.x->'key1' "key1"
FROM json_test jt,
LATERAL (SELECT json_array_elements(jt.col) x) y;
But I would need to be able to fit all the elements of the json_arrays into json_test
So far I have only attempted to list all the values in the all json arrays, but my ideal end-result for the query resembles this:
Value | Amount
---------------
value1 | 48
value2 | 112
value3 | 93
value4 | 0
Yet again I am grateful for any help with this, thank you in advance.
step-by-step demo:db<>fiddle
SELECT
each.value,
COUNT(*)
FROM
data,
json_array_elements(json_array) elems, -- 1
json_each_text(elems) each -- 2
GROUP BY each.value -- 3
Expand array into one row for each array element
split the key/value pairs into two columns
group by the new value column/count

Access the index of an element in a jsonb array

I would like to access the index of an element in a jsonb array, like this:
SELECT
jsonb_array_elements(data->'Steps') AS Step,
INDEX_OF_STEP
FROM my_process
I don't see any function in the manual for this.
Is this somehow possible?
Use with ordinality. You have to call the function in the from clause to do this:
with my_process(data) as (
values
('{"Steps": ["first", "second"]}'::jsonb)
)
select value as step, ordinality- 1 as index
from my_process
cross join jsonb_array_elements(data->'Steps') with ordinality
step | index
----------+-------
"first" | 0
"second" | 1
(2 rows)
Read in the documentation (7.2.1.4. Table Functions):
If the WITH ORDINALITY clause is specified, an additional column of type bigint will be added to the function result columns. This column numbers the rows of the function result set, starting from 1.
You could try using
jsonb_each_text(jsonb)
which should supply both the key and value.
There is an example in this question:
Extract key, value from json objects in Postgres
except you would use the jsonb version.

Find valid combinations based on matrix

I have a in CALC the following matrix: the first row (1) contains employee numbers, the first column (A) contains productcodes.
Everywhere there is an X that productitem was sold by the corresponding employee above
| 0302 | 0303 | 0304 | 0402 |
1625 | X | | X | X |
1643 | | X | X | |
...
We see that product 1643 was sold by employees 0303 and 0304
What I would like to see is a list of what product was sold by which employees but formatted like this:
1625 | 0302, 0304, 0402 |
1643 | 0303, 0304 |
The reason for this is that we need this matrix ultimately imported into an SQL SERVER table. We have no access to the origins of this matrix. It contains about 50 employees and 9000+ products.
Thanx for thinking with us!
try something like this
;with data as
(
SELECT *
FROM ( VALUES (1625,'X',NULL,'X','X'),
(1643,NULL,'X','X',NULL))
cs (col1, [0302], [0303], [0304], [0402])
),cte
AS (SELECT col1,
col
FROM data
CROSS apply (VALUES ('0302',[0302]),
('0303',[0303]),
('0304',[0304]),
('0402',[0402])) cs (col, val)
WHERE val IS NOT NULL)
SELECT col1,
LEFT(cs.col, Len(cs.col) - 1) AS col
FROM cte a
CROSS APPLY (SELECT col + ','
FROM cte B
WHERE a.col1 = b.col1
FOR XML PATH('')) cs (col)
GROUP BY col1,
LEFT(cs.col, Len(cs.col) - 1)
I think there are two problems to solve:
get the product codes for the X marks;
concatenate them into a single, comma-separated string.
I can't offer a solution for both issues in one step, but you may handle both issues separately.
1.
To replace the X marks by the respective product codes, you could use an array function to create a second table (matrix). To do so, create a new sheet, copy the first column / first row, and enter the following formula in cell B2:
=IF($B2:$E3="X";$B$1:$E$1;"")
You'll have to adapt the formula, so it covers your complete input data (If your last data cell is Z9999, it would be =IF($B2:$Z9999="X";$B$1:$Z$1;"")). My example just covers two rows and four columns.
After modifying it, confirm with CTRL+SHIFT+ENTER to apply it as array formula.
2.
Now, you'll have to concatenate the product codes. LO Calc lacks a feature to concatenate an array, but you could use a simple user-defined function. For such a string-join function, see this answer. Just create a new macro with the StarBasic code provided there and save it. Now, you have a STRJOIN() function at hand that accepts an array and concatenates its values, leaving empty values out.
You could add that function using a helper column on the second sheet and apply it by dragging it down. Finally, to get rid of the cells with the single product IDs, copy the complete second sheet, paste special into a third sheet, pasting only the values. Now, you can remove all columns except the first one (employee IDs) and the last one (with the concatenated product ids).
I created a table in sql for holding the data:
CREATE TABLE [dbo].[mydata](
[prod_code] [nvarchar](8) NULL,
[0100] [nvarchar](10) NULL,
[0101] [nvarchar](10) NULL,
[and so on...]
I created the list of columns in Calc by copying and pasting them transposed. After that I used the concatenate function to create the columnlist + datatype for the create table statement
I cleaned up the worksheet and imported it into this table using SQL Server's import wizard. Cleaning meant removing unnecessary rows/columns. Since the columnnames were identical mapping was done correctly for 99%.
Now I had the data in SQL Server.
I adapted the code MM93 suggested a bit:
;with data as
(
SELECT *
FROM dbo.mydata <-- here i simply referenced the whole table
),cte
and in the next part I uses the same 'worksheet' trick to list and format all the column names and pasted them in.
),cte
AS (SELECT prod_code, <-- had to replace col1 with 'prod_code'
col
FROM data
CROSS apply (VALUES ('0100',[0100]),
('0101', [0101] ),
(and so on... ),
The result of this query was inserted into a new table and my colleagues and I are querying our harts out :)
PS: removing the 'FOR XML' clause resulted in a table with two columns :
prodcode | employee
which containes al the unique combinations of prodcode + employeenumber which is a lot faster and much more practical to query.

Sum of values of json array in PostgreSQL

In PostgreSQL 9.3, I have a table like this
id | array_json
---+----------------------------
1 | ["{123: 456}", "{789: 987}", "{111: 222}"]
2 | ["{4322: 54662}", "{123: 5121}", "{1: 5345}" ... ]
3 | ["{3232: 413}", "{5235: 22}", "{2: 5453}" ... ]
4 | ["{22: 44}", "{12: 4324}", "{234: 4235}" ... ]
...
I want to get the sum of all values in array_json column. So, for example, for first row, I want:
id | total
---+-------
1 | 1665
Where 1665 = 456 + 987 + 222 (the values of all the elements of json array). No previous information about the keys of the json elements (just random numbers)
I'm reading the documentation page about JSON functions in PostgreSQL 9.3, and I think I should use json_each, but can't find the right query. Could you please help me with it?
Many thanks in advance
You started looking at the right place (going to the docs is always the right place).
Since your values are JSON arrays -> I would suggest using json_array_elements(json)
And since it's a json array which you have to explode to several rows, and then combine back by running sum over json_each_text(json) - it would be best to create your own function (Postgres allows it)
As for your specific case, assuming the structure you provided is correct, some string parsing + JSON heavy wizardry can be used (let's say your table name is "json_test_table" and the columns are "id" and "json_array"), here is the query that does your "thing"
select id, sum(val) from
(select id,
substring(
json_each_text(
replace(
replace(
replace(
replace(
replace(json_array,':','":"')
,'{',''),
'}','')
,']','}')
,'[','{')::json)::varchar
from '\"(.*)\"')::int as val
from json_test_table) j group by id ;
if you plan to run it on a huge dataset - keep in mind string manipulations are expensive in terms of performance
You can get it using this:
/*
Sorry, sqlfiddle is busy :p
CREATE TABLE my_table
(
id bigserial NOT NULL,
array_json json[]
--,CONSTRAINT my_table_pkey PRIMARY KEY (id)
)
INSERT INTO my_table(array_json)
values (array['{"123": 456}'::json, '{"789": 987}'::json, '{"111": 222}'::json]);
*/
select id, sum(json_value::integer)
from
(
select id, json_data->>json_object_keys(json_data) as json_value from
(
select id, unnest(array_json) as json_data from my_table
) A
) B
group by id

Resources