Concatenate string values of objects in a postgresql json array - arrays

I have a postgresql table with a json column filled with objects nested in array. Now I want to build a query that returns the id and concatenated string values of objects in the json column. PostgreSQL version is 9.5.
Example data
CREATE TABLE test
(
id integer,
data json
);
INSERT INTO test (id, data) VALUES (1, '{
"info":"a",
"items":[
{ "name":"a_1" },
{ "name":"a_2" },
{ "name":"a_3" }
]
}');
INSERT INTO test (id, data) VALUES (2, '{
"info":"b",
"items":[
{ "name":"b_1" },
{ "name":"b_2" },
{ "name":"b_3" }
]
}');
INSERT INTO test (id, data) VALUES (3, '{
"info":"c",
"items":[
{ "name":"c_1" },
{ "name":"c_2" },
{ "name":"c_3" }
]
}');
Not quite working as intended example
So far I've been able to get the values from the table, unfortunately without the strings being added to one another.
SELECT
row.id,
item ->> 'name'
FROM
test as row,
json_array_elements(row.data #> '{items}' ) as item;
Which will output:
id | names
----------
1 | a_1
1 | a_2
1 | a_3
2 | b_1
2 | b_2
2 | b_3
3 | c_1
3 | c_2
3 | c_3
Intended output example
How would a query look like that returns this output?
id | names
----------
1 | a_1, a_2, a_3
2 | b_1, b_2, b_3
3 | c_1, c_2, c_3
SQL Fiddle link

Your original attempt was missing a group by step
This should work:
SELECT
id
, STRING_AGG(item->>'name', ', ')
FROM
test,
json_array_elements(test.data->'items') as item
GROUP BY 1

By changing the SQL for the second column into an array should give the required results ....
SELECT
row.id, ARRAY (SELECT item ->> 'name'
FROM
test as row1,
json_array_elements(row.data #> '{items}' ) as item WHERE row.id=row1.id)
FROM
test as row;

Related

How to loop through array & loop through datatable to compare values in Cypress?

In my Cypress test, I am retrieving a SQL Server DB record from a table, looping through the response, & logging each column value like so:
And('the below values are populated in DB ', (dataTable) => {
cy.task('myDb', `SELECT * FROM Customer WHERE CustomerName = '${customerName}'`).then((response) => {
response.forEach(record => {
cy.log(record)
});
});
});
With the above code, I am logging the below array, each item in the array is a column value:
I have a datatable in my feature file that corresponds to the table structure:
And the below values are populated in DB
| CustomerId | CustomerName | AddressLine1 | AddressLine2 | City | State | Zip |
| 1 | Kevin Mitchell | | | | NULL | NULL |
What I am looking to do is update my existing code so that it can loop through my datatable & compare the above array against the datatable values in my feature file.
Can someone please explain what changes are required for this?
I've managed to get this working, & it works functionally so I'm going to post it here, but I'm sure there's a more efficient way to do this.
I'm happy for someone to suggest improvements to the below answer:
And('the below values are populated in DB', (dataTable) => {
cy.task('glp', `SELECT * FROM Customer WHERE CustomerName = '${customerName}'`).then((response) => {
dataTable.rows().forEach(function ($ele) {
response.forEach(function (record) {
record.forEach(function(record, index) {
expect(record.value).to.eq($ele[index])
});
});
});
});
});

snowflake pivot attribute values into columns in array of objects

EDIT: I gave bad example data. Updated some details and switched out dummy data for sanitized, actual data.
Source system: Freshdesk via Stitch
Table Structure:
create or replace TABLE TICKETS (
CC_EMAILS VARIANT,
COMPANY VARIANT,
COMPANY_ID NUMBER(38,0),
CREATED_AT TIMESTAMP_TZ(9),
CUSTOM_FIELDS VARIANT,
DUE_BY TIMESTAMP_TZ(9),
FR_DUE_BY TIMESTAMP_TZ(9),
FR_ESCALATED BOOLEAN,
FWD_EMAILS VARIANT,
ID NUMBER(38,0) NOT NULL,
IS_ESCALATED BOOLEAN,
PRIORITY FLOAT,
REPLY_CC_EMAILS VARIANT,
REQUESTER VARIANT,
REQUESTER_ID NUMBER(38,0),
RESPONDER_ID NUMBER(38,0),
SOURCE FLOAT,
SPAM BOOLEAN,
STATS VARIANT,
STATUS FLOAT,
SUBJECT VARCHAR(16777216),
TAGS VARIANT,
TICKET_CC_EMAILS VARIANT,
TYPE VARCHAR(16777216),
UPDATED_AT TIMESTAMP_TZ(9),
_SDC_BATCHED_AT TIMESTAMP_TZ(9),
_SDC_EXTRACTED_AT TIMESTAMP_TZ(9),
_SDC_RECEIVED_AT TIMESTAMP_TZ(9),
_SDC_SEQUENCE NUMBER(38,0),
_SDC_TABLE_VERSION NUMBER(38,0),
EMAIL_CONFIG_ID NUMBER(38,0),
TO_EMAILS VARIANT,
PRODUCT_ID NUMBER(38,0),
GROUP_ID NUMBER(38,0),
ASSOCIATION_TYPE NUMBER(38,0),
ASSOCIATED_TICKETS_COUNT NUMBER(38,0),
DELETED BOOLEAN,
primary key (ID)
);
Note the variant field, "custom_fields". It undergoes an unfortunate transformation between the api and snowflake. The resulting field contains an array of 3 or more objects, each one a custom field. I do not have the ability to change the data format. Examples:
# values could be null
[
{
"name": "cf_request",
"value": "none"
},
{
"name": "cf_related_with",
"value": "none"
},
{
"name": "cf_question",
"value": "none"
}
]
# or values could have a combination of null and non-null values
[
{
"name": "cf_request",
"value": "none"
},
{
"name": "cf_related_with",
"value": "none"
},
{
"name": "cf_question",
"value": "concern"
}
]
# or they could all have non-null values
[
{
"name": "cf_request",
"value": "issue with timer"
},
{
"name": "cf_related_with",
"value": "timer stopped"
},
{
"name": "cf_question",
"value": "technical problem"
}
]
I would essentially like to pivot these into fields in a select query where the name attribute's value becomes a column header. Making the output similar to the following:
+----+------------------+-----------------+-------------------+-----------------------------+
| id | cf_request | cf_related_with | cf_question | all_other_fields |
+----+------------------+-----------------+-------------------+-----------------------------+
| 5 | issue with timer | timer stopped | technical problem | more data about this ticket |
| 6 | hq | laptop issues | some value | more data |
| 7 | a thing | about a thing | about something | more data |
+----+------------------+-----------------+-------------------+-----------------------------+
Is there a function that searches the values of array objects and returns objects with qualifying values? Something like:
select
id,
get_object_where(name = 'category', value) as category,
get_object_where(name = 'subcategory', value) as category,
get_object_where(name = 'subsubcategory', value) as category
from my_data_table
Unfortunately, PIVOT requires an aggregate function, I tried using min and max, but only get a return of null values. Something similar to this approach would be great if there is another syntax to do it that doesn't require aggregation.
with arr as (
select
id,
cs.value:name col_name,
cs.value:value col_value
from my_data_table,
lateral flatten(input => custom_fields) cs
)
select
*
from arr
pivot(col_name for col_value in ('category', 'subcategory', 'subsubcategory')
as p (id, category, subcategory, subsubcategory);
It is possible to use the following approach, but it is flawed in that any time a new custom field is added I have to add cases to account for new positions within the array.
select
id,
case
when custom_fields[0]:name = 'cf_request' then custom_fields[0]:value
when custom_fields[1]:name = 'cf_request' then custom_fields[1]:value
when custom_fields[2]:name = 'cf_request' then custom_fields[2]:value
when custom_fields[2]:name = 'cf_request' then custom_fields[3]:value
else null
end cf_request,
case
when custom_fields[0]:name = 'cf_related_with' then custom_fields[0]:value
when custom_fields[1]:name = 'cf_related_with' then custom_fields[1]:value
when custom_fields[2]:name = 'cf_related_with' then custom_fields[2]:value
when custom_fields[2]:name = 'cf_related_with' then custom_fields[3]:value
else null
end cf_related_with,
case
when custom_fields[0]:name = 'cf_question' then custom_fields[0]:value
when custom_fields[1]:name = 'cf_question' then custom_fields[1]:value
when custom_fields[2]:name = 'cf_question' then custom_fields[2]:value
when custom_fields[2]:name = 'cf_question' then custom_fields[3]:value
else null
end cf_question,
created_at
from my_db.my_schema.tickets;
I think you almost had it. You just need to add a max() or min() around your col_name. As you stated, it needs an aggregate function, and something like max() or min() will work here, since it is aggregating on the name/value pairs that you have. If you have 2 subcategory values, for example, it'll pick the min/max value. From your example, that doesn't appear to be an issue, so it'll always choose the value you want. I was able to replicate your scenario with this query:
WITH x AS (
SELECT parse_json('[{"name": "category","value": "Bikes"},{"name": "subcategory","value": "Mountain Bikes"},{"name": "subsubcategory","value": "hardtail bikes"}]')::VARIANT as field_var
),
arr as (
select
seq,
cs.value:name::varchar col_name,
cs.value:value::varchar col_value
from x,
lateral flatten(input => x.field_var) cs
)
select
*
from arr
pivot(max(col_value) for col_name in ('category','subcategory','subsubcategory')) as p (seq, category, subcategory, subsubcategory);

Hive - Extract arrays from Json

I have table which contains some array of values.
create external table apidetails
(
inputdetails sting
)
Location 'XXXX'
select inputdetails from apidetails
{ "Name": "api-server1", "ID": "api-1", "tags": ["tag-1","tag-2"] }
I need the results as
| ID | tags |
|-------|---------------|
|api-1| tag-1,tag-2|
I tried select json_extract_scalar(inputdetails ,'$tags'), but it returns errors.
Here are few options:
Option 1: JSON
select json_extract_scalar(inputdetails ,'$.ID') as ID
,json_extract(inputdetails ,'$.tags') as tags
from apidetails
;
ID | tags
-------+-------------------
api-1 | ["tag-1","tag-2"]
Option 2: array(varchar)
select json_extract_scalar(inputdetails ,'$.ID') as ID
,cast(json_extract(inputdetails ,'$.tags') as array(varchar)) as tags
from apidetails
;
ID | tags
-------+----------------
api-1 | [tag-1, tag-2]
Option 3: delimited string
select json_extract_scalar(inputdetails ,'$.ID') as ID
,array_join(cast(json_extract(inputdetails ,'$.tags') as array(varchar)),',') as tags
from apidetails
;
ID | tags
-------+-------------
api-1 | tag-1,tag-2

Using jsonb_set() for updating specific jsonb array value

Currently I am working with PostgreSQL 9.5 and try to update a value inside an array of a jsonb field. But I am unable to get the index of the selected value
My table just looks like this:
CREATE TABLE samples (
id serial,
sample jsonb
);
My JSON looks like this:
{"result": [
{"8410": "ABNDAT", "8411": "Abnahmedatum"},
{"8410": "ABNZIT", "8411": "Abnahmezeit"},
{"8410": "FERR_R", "8411": "Ferritin"}
]}
My SELECT statement to get the correct value works:
SELECT
id, value
FROM
samples s, jsonb_array_elements(s.sample#>'{result}') r
WHERE
s.id = 26 and r->>'8410' = 'FERR_R';
results in:
id | value
----------------------------------------------
26 | {"8410": "FERR_R", "8411": "Ferritin"}
Ok, this is what I wanted. Now I want to execute an update using the following UPDATE statement to add a new element "ratingtext" (if not already there):
UPDATE
samples s
SET
sample = jsonb_set(sample,
'{result,2,ratingtext}',
'"Some individual text"'::jsonb,
true)
WHERE
s.id = 26;
After executing the UPDATE statement, my data looks like this (also correct):
{"result": [
{"8410": "ABNDAT", "8411": "Abnahmedatum"},
{"8410": "ABNZIT", "8411": "Abnahmezeit"},
{"8410": "FERR_R", "8411": "Ferritin", "ratingtext": "Some individual text"}
]}
So far so good, but I manually searched the index value of 2 to get the right element inside the JSON array. If the order will be changed, this won't work.
So my problem:
Is there a way to get the index of the selected JSON array element and combine the SELECT statement and the UPDATE statement into one?
Just like:
UPDATE
samples s
SET
sample = jsonb_set(sample,
'{result,' || INDEX OF ELEMENT || ',ratingtext}',
'"Some individual text"'::jsonb,
true)
WHERE
s.id = 26;
The values of samples.id and "8410" are known before preparing the statement.
Or is this not possible at the moment?
You can find an index of a searched element using jsonb_array_elements() with ordinality (note, ordinality starts from 1 while the first index of json array is 0):
select
pos- 1 as elem_index
from
samples,
jsonb_array_elements(sample->'result') with ordinality arr(elem, pos)
where
id = 26 and
elem->>'8410' = 'FERR_R';
elem_index
------------
2
(1 row)
Use the above query to update the element based on its index (note that the second argument of jsonb_set() is a text array):
update
samples
set
sample =
jsonb_set(
sample,
array['result', elem_index::text, 'ratingtext'],
'"some individual text"'::jsonb,
true)
from (
select
pos- 1 as elem_index
from
samples,
jsonb_array_elements(sample->'result') with ordinality arr(elem, pos)
where
id = 26 and
elem->>'8410' = 'FERR_R'
) sub
where
id = 26;
Result:
select id, jsonb_pretty(sample)
from samples;
id | jsonb_pretty
----+--------------------------------------------------
26 | { +
| "result": [ +
| { +
| "8410": "ABNDAT", +
| "8411": "Abnahmedatum" +
| }, +
| { +
| "8410": "ABNZIT", +
| "8411": "Abnahmezeit" +
| }, +
| { +
| "8410": "FERR_R", +
| "8411": "Ferritin", +
| "ratingtext": "Some individual text"+
| } +
| ] +
| }
(1 row)
The last argument in jsonb_set() should be true to force adding a new value if its key does not exist yet. It may be skipped however as its default value is true.
Though concurrency issues seem to be unlikely (due to the restrictive WHERE condition and a potentially small number of affected rows) you may be also interested in Atomic UPDATE .. SELECT in Postgres.

Is there a jsonb array overlap function for postgres?

Am not able to extract and compare two arrays from jsonb in postgres to do an overlap check. Is there a working function for this?
Example in people_favorite_color table:
{
"person_id":1,
"favorite_colors":["red","orange","yellow"]
}
{
"person_id":2,
"favorite_colors":["yellow","green","blue"]
}
{
"person_id":3,
"favorite_colors":["black","white"]
}
Array overlap postgres tests:
select
p1.json_data->>'person_id',
p2.json_data->>'person_id',
p1.json_data->'favorite_colors' && p2.json_data->'favorite_colors'
from people_favorite_color p1 join people_favorite_color p2 on (1=1)
where p1.json_data->>'person_id' < p2.json_data->>'person_id'
Expected results:
p1.id;p2.id;likes_same_color
1;2;t
1;3;f
2;3;f
--edit--
Attempting to cast to text[] results in an error:
select
('{
"person_id":3,
"favorite_colors":["black","white"]
}'::jsonb->>'favorite_colors')::text[];
ERROR: malformed array literal: "["black", "white"]"
DETAIL: "[" must introduce explicitly-specified array dimensions.
Use array_agg() and jsonb_array_elements_text() to convert jsonb array to text array:
with the_data as (
select id, array_agg(color) colors
from (
select json_data->'person_id' id, color
from
people_favorite_color,
jsonb_array_elements_text(json_data->'favorite_colors') color
) sub
group by 1
)
select p1.id, p2.id, p1.colors && p2.colors like_same_colors
from the_data p1
join the_data p2 on p1.id < p2.id
order by 1, 2;
id | id | like_same_colors
----+----+------------------
1 | 2 | t
1 | 3 | f
2 | 3 | f
(3 rows)

Resources