My table in hive:
Group1 | sibling
=====================
ad3jkfk 4
ad3jkfk 4
ad3jkfk 2
fkjh43f 1
fkjh43f 8
fkjh43f 8
rjkhd93 7
rjkhd93 4
rjkhd93 7
abcd45 1
defg63 1
Expected result:
Group1 | sibling
===========================
ad3jkfk 4,4,2
fkjh43f 1,8,8
rjkhd93 7,4,7
collect_set produces an array of distinct values. For ad3jkfk it will produce 4,2, not 4,4,2
If you want 4,4,2, then use collect_list().
To filter arrays with more than 1 element, use size() function:
select Group1, concat_ws(',',sibling_list) sibling --Concatenate array to string delimited w comma like in your example
from
(
select Group1, collect_list(sibling) sibling_list --get the list
from mytable
group by Group1
)s
where size(sibling_list)>1 --filter
I have a column named "impact" which has data in nested Json format
input:
[{"internalid":"079","impactid":[{"position":"1","typeid":"NOEUD","value":"G1"},{"position":"2","typeid":"ID","value":"001"},{"position":"3","typeid":"CODE_CI","value":"14"}],"typeid":"BTS","cdrs":"X110","belong":"OF","impactclass":"R","count":"0","numberaccessimpacted":"0","impactcalculationrequest":null},{"internalid":"6381075","impactid":[{"position":"1","typeid":"NOEUD","value":"G3"},{"position":"2","typeid":"ID","value":"003"},{"position":"3","typeid":"CI","value":"58"}],"typeid":"BTS","cdrs":"X110","belong":"OF","impactclass":"R","count":"0","numberaccessimpacted":"0","impactcalculationrequest":null},{"internalid":"6381071","impactid":[{"position":"1","typeid":"NOEUD","value":"G2"},{"position":"2","typeid":"IDT","value":"002"},{"position":"3","typeid":"CI","value":"57"}],"typeid":"BTS","cdrs":"X110","belong":"OF","impactclass":"R","count":"0","numberaccessimpacted":"0","impactcalculationrequest":null}]
I use the code below:
SELECT
get_json_object(single_json_table.identifiant, '$.position') AS impact_position,
get_json_object(single_json_table.identifiant, '$.value') AS impact_value
FROM
(SELECT exp2.identifiant
FROM socle s
lateral view explode(split(regexp_replace(substr(impact, 2, length(impact)-2),
'},\\{"', '},,,,{"'), ',,,,')) exp2 as identifiant
)single_json_table
Here is the results, it skips the first position and value, does anyone know how can i fix it please ?
impact_position | impact_value
(null) (null)
2 001
3 14
(null) (null)
2 003
3 58
(null) (null)
2 002
3 57
Your input is a JSON with nested arrays. Upper level array is the whole input, contains struct<internalid : string, impactid : array < struct <> > >, impactid is a nested array, containing struct elements like this: {"position":"1","typeid":"NOEUD","value":"G1"}
You need to explode both arrays. First explode upper array: change delimiters, split, explode, then do the same with nested array.
Demo:
with socle as (
select '[{"internalid":"079","impactid":[{"position":"1","typeid":"NOEUD","value":"G1"},{"position":"2","typeid":"ID","value":"001"},{"position":"3","typeid":"CODE_CI","value":"14"}],"typeid":"BTS","cdrs":"X110","belong":"OF","impactclass":"R","count":"0","numberaccessimpacted":"0","impactcalculationrequest":null},{"internalid":"6381075","impactid":[{"position":"1","typeid":"NOEUD","value":"G3"},{"position":"2","typeid":"ID","value":"003"},{"position":"3","typeid":"CI","value":"58"}],"typeid":"BTS","cdrs":"X110","belong":"OF","impactclass":"R","count":"0","numberaccessimpacted":"0","impactcalculationrequest":null},{"internalid":"6381071","impactid":[{"position":"1","typeid":"NOEUD","value":"G2"},{"position":"2","typeid":"IDT","value":"002"},{"position":"3","typeid":"CI","value":"57"}],"typeid":"BTS","cdrs":"X110","belong":"OF","impactclass":"R","count":"0","numberaccessimpacted":"0","impactcalculationrequest":null}]'
as impact
)
select internalid,
get_json_object(e.impact, '$.position') as position,
get_json_object(e.impact, '$.value') as value
from
(
select get_json_object(impacts, '$.internalid') internalid,
--extract inner impact array, remove [], convert delimiters
regexp_replace(regexp_replace(get_json_object(impacts,'$.impactid'),'^\\[|\\]$',''),'\\},\\{','},,,,{') impact
from
(
SELECT --First we need to explode upper array. Since it is a string,
--we need to prepare delimiters to be able to explode it
--remove first [ and last ], replace delimiters between inner structure with 4 commas
regexp_replace(regexp_replace(s.impact,'^\\[|\\]$',''),'\\},\\{"internalid"','},,,,{"internalid"') upper_array_str
FROM socle s
)s lateral view explode (split(upper_array_str, ',,,,')) e as impacts --get upper array element
)s lateral view explode (split(impact, ',,,,') ) e as impact
Result:
internalid position value
079 1 G1
079 2 001
079 3 14
6381075 1 G3
6381075 2 003
6381075 3 58
6381071 1 G2
6381071 2 002
6381071 3 57
Postgresql behaves strangely when unnesting multiple arrays in the select list:
select unnest('{1,2}'::int[]), unnest('{3,4}'::int[]);
unnest | unnest
--------+--------
1 | 3
2 | 4
vs when arrays are of different lengths:
select unnest('{1,2}'::int[]), unnest('{3,4,5}'::int[]);
unnest | unnest
--------+--------
1 | 3
2 | 4
1 | 5
2 | 3
1 | 4
2 | 5
Is there any way to force the latter behaviour without moving stuff to the from clause?
The SQL is generated by a mapping layer and it will be very much easier for me to implement the new feature I am adding if I can keep everything in the select.
https://www.postgresql.org/docs/10/static/release-10.html
Set-returning functions are now evaluated before evaluation of scalar
expressions in the SELECT list, much as though they had been placed in
a LATERAL FROM-clause item. This allows saner semantics for cases
where multiple set-returning functions are present. If they return
different numbers of rows, the shorter results are extended to match
the longest result by adding nulls. Previously the results were cycled
until they all terminated at the same time, producing a number of rows
equal to the least common multiple of the functions' periods.
(emphasis mine)
I tested with version 12.12 of Postgres and as mentioned by OP in a comment, having two unnest() works as expected when you move those to the FROM clause like so:
SELECT a, b
FROM unnest('{1,2}'::int[]) AS a,
unnest('{3,4}'::int[]) AS b;
Then you get the expected table:
a | b
--+--
1 | 3
1 | 4
2 | 3
2 | 4
As we can see, in this case the arrays have the same size.
In my case, the arrays come from a table. You can first name the name and then name column inside the unnest() calls like so:
SELECT a, b
FROM my_table,
unnest(col1) AS a,
unnest(col2) AS b;
You can, of course, select other columns as required. This is a normal cartesian product.
I have a json array with a couple of records, all of which have 3 fields lat, lon, v.
I would like to create a select subquery from this array to join with another query. The problem is that I cannot make the example in the PostgreSQL documentation work.
select * from json_populate_recordset(null::x, '[{"a":1,"b":2},{"a":3,"b":4}]')
Should result in:
a | b
---+---
1 | 2
3 | 4
But I only get ERROR: type "x" does not exist Position: 45
It is necessary to pass a composite type to json_populate_recordset whereas a column list is passed to json_to_recordset:
select *
from json_to_recordset('[{"a":1,"b":2},{"a":3,"b":4}]') x (a int, b int)
;
a | b
---+---
1 | 2
3 | 4
I have a table
id int | data json
With data:
1 | [1,2,3,2]
2 | [2,3,4]
I want to modify rows to delete array element (int) 2
Expected result:
1 | [1,3]
2 | [3,4]
As a_horse_with_no_name suggests in his comment the proper data type is int[] in this case. However, you can transform the json array to int[], use array_remove() and transform the result back to json:
with my_table(id, data) as (
values
(1, '[1,2,3,2]'::json),
(2, '[2,3,4]')
)
select id, to_json(array_remove(translate(data::text, '[]', '{}')::int[], 2))
from my_table;
id | to_json
----+---------
1 | [1,3]
2 | [3,4]
(2 rows)
Another possiblity is to unnest the arrays with json_array_elements(), eliminate unwanted elements and aggregate the result:
select id, json_agg(elem)
from (
select id, elem
from my_table,
lateral json_array_elements(data) elem
where elem::text::int <> 2
) s
group by 1;