I have a column named "impact" which has data in nested Json format
input:
[{"internalid":"079","impactid":[{"position":"1","typeid":"NOEUD","value":"G1"},{"position":"2","typeid":"ID","value":"001"},{"position":"3","typeid":"CODE_CI","value":"14"}],"typeid":"BTS","cdrs":"X110","belong":"OF","impactclass":"R","count":"0","numberaccessimpacted":"0","impactcalculationrequest":null},{"internalid":"6381075","impactid":[{"position":"1","typeid":"NOEUD","value":"G3"},{"position":"2","typeid":"ID","value":"003"},{"position":"3","typeid":"CI","value":"58"}],"typeid":"BTS","cdrs":"X110","belong":"OF","impactclass":"R","count":"0","numberaccessimpacted":"0","impactcalculationrequest":null},{"internalid":"6381071","impactid":[{"position":"1","typeid":"NOEUD","value":"G2"},{"position":"2","typeid":"IDT","value":"002"},{"position":"3","typeid":"CI","value":"57"}],"typeid":"BTS","cdrs":"X110","belong":"OF","impactclass":"R","count":"0","numberaccessimpacted":"0","impactcalculationrequest":null}]
I use the code below:
SELECT
get_json_object(single_json_table.identifiant, '$.position') AS impact_position,
get_json_object(single_json_table.identifiant, '$.value') AS impact_value
FROM
(SELECT exp2.identifiant
FROM socle s
lateral view explode(split(regexp_replace(substr(impact, 2, length(impact)-2),
'},\\{"', '},,,,{"'), ',,,,')) exp2 as identifiant
)single_json_table
Here is the results, it skips the first position and value, does anyone know how can i fix it please ?
impact_position | impact_value
(null) (null)
2 001
3 14
(null) (null)
2 003
3 58
(null) (null)
2 002
3 57
Your input is a JSON with nested arrays. Upper level array is the whole input, contains struct<internalid : string, impactid : array < struct <> > >, impactid is a nested array, containing struct elements like this: {"position":"1","typeid":"NOEUD","value":"G1"}
You need to explode both arrays. First explode upper array: change delimiters, split, explode, then do the same with nested array.
Demo:
with socle as (
select '[{"internalid":"079","impactid":[{"position":"1","typeid":"NOEUD","value":"G1"},{"position":"2","typeid":"ID","value":"001"},{"position":"3","typeid":"CODE_CI","value":"14"}],"typeid":"BTS","cdrs":"X110","belong":"OF","impactclass":"R","count":"0","numberaccessimpacted":"0","impactcalculationrequest":null},{"internalid":"6381075","impactid":[{"position":"1","typeid":"NOEUD","value":"G3"},{"position":"2","typeid":"ID","value":"003"},{"position":"3","typeid":"CI","value":"58"}],"typeid":"BTS","cdrs":"X110","belong":"OF","impactclass":"R","count":"0","numberaccessimpacted":"0","impactcalculationrequest":null},{"internalid":"6381071","impactid":[{"position":"1","typeid":"NOEUD","value":"G2"},{"position":"2","typeid":"IDT","value":"002"},{"position":"3","typeid":"CI","value":"57"}],"typeid":"BTS","cdrs":"X110","belong":"OF","impactclass":"R","count":"0","numberaccessimpacted":"0","impactcalculationrequest":null}]'
as impact
)
select internalid,
get_json_object(e.impact, '$.position') as position,
get_json_object(e.impact, '$.value') as value
from
(
select get_json_object(impacts, '$.internalid') internalid,
--extract inner impact array, remove [], convert delimiters
regexp_replace(regexp_replace(get_json_object(impacts,'$.impactid'),'^\\[|\\]$',''),'\\},\\{','},,,,{') impact
from
(
SELECT --First we need to explode upper array. Since it is a string,
--we need to prepare delimiters to be able to explode it
--remove first [ and last ], replace delimiters between inner structure with 4 commas
regexp_replace(regexp_replace(s.impact,'^\\[|\\]$',''),'\\},\\{"internalid"','},,,,{"internalid"') upper_array_str
FROM socle s
)s lateral view explode (split(upper_array_str, ',,,,')) e as impacts --get upper array element
)s lateral view explode (split(impact, ',,,,') ) e as impact
Result:
internalid position value
079 1 G1
079 2 001
079 3 14
6381075 1 G3
6381075 2 003
6381075 3 58
6381071 1 G2
6381071 2 002
6381071 3 57
Related
I'm trying to load a file and transpose the row into different rows.
Days Column have 11010011 and need to transpose into vertical format.
Below is the sample input
I'm trying to get the expected output like below
Can you please help me on this in Snowflake? Appreciate your help
Replace '1' with '1,' and '0' with '0,'. Trim the trailing comma. You can then use split to table to turn that into rows:
with SOURCE_DATA as
(
select COLUMN1::int as FACTORY
,COLUMN2::int as YEAR
,COLUMN3::string as DAYS
from (values
(01,2021,'01010100100101010001'),
(99,2021,'00100111010101011010')
)
)
select FACTORY, YEAR, SEQ as SOURCE_ROW, INDEX as POSITION_IN_STRING, VALUE as WORKING_DAY
from SOURCE_DATA, table(split_to_table(trim(replace(replace(DAYS,'1','1,'),'0','0,'),','),',')) D
;
Abbreviated output:
FACTORY
YEAR
SOURCE_ROW
POSITION_IN_STRING
WORKING_DAY
1
2021
1
1
0
1
2021
1
2
1
1
2021
1
3
0
1
2021
1
4
1
1
2021
1
5
0
The split() table function gives you some metadata columns with information on the split. You can change the sample to select * to see them and maybe they're useful in some way for your requirements.
Suppose I have a column of arrays like this:
column_x
[1,5,[],[2,3,22,42,3,-5]]
[1,5,[],[-3,67,32,2,2.14,5]]
[1,5,[],[32,1,3,34,6.7,90]]
I want to extract the fourth element of the array in each row, and separate these elements into different columns like this:
column1 column2 column3 column4 column5 column6
2 3 22 42 3 -5
-3 67 32 2 2.14 5
32 1 3 34 6.7 90
I tried using the getItem() function but it's not working. I'm not entirely sure if I'm using it correctly.
Since Spark 3.0.0 you can use vector_to_array
https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.functions.vector_to_array.html
Since you have an array nested in, you might have to use it twice.
My table in hive:
Group1 | sibling
=====================
ad3jkfk 4
ad3jkfk 4
ad3jkfk 2
fkjh43f 1
fkjh43f 8
fkjh43f 8
rjkhd93 7
rjkhd93 4
rjkhd93 7
abcd45 1
defg63 1
Expected result:
Group1 | sibling
===========================
ad3jkfk 4,4,2
fkjh43f 1,8,8
rjkhd93 7,4,7
collect_set produces an array of distinct values. For ad3jkfk it will produce 4,2, not 4,4,2
If you want 4,4,2, then use collect_list().
To filter arrays with more than 1 element, use size() function:
select Group1, concat_ws(',',sibling_list) sibling --Concatenate array to string delimited w comma like in your example
from
(
select Group1, collect_list(sibling) sibling_list --get the list
from mytable
group by Group1
)s
where size(sibling_list)>1 --filter
How to delete first matching row in a file using a second one ?
I use Talend DI 7.2 and I need to delete some rows in one delimited file using a second one containing the rows to delete. My first file contains multiple rows matching the second one but for each row in my second file I need to delete only the first row matching in the first file.
For example :
File A : File B :
Code | Amount Code | Amount
1 | 45 1 | 45
1 | 45 3 | 70
2 | 50 3 | 70
2 | 60
3 | 70
3 | 70
3 | 70
3 | 70
At the end, I need to obtain :
File A :
Code | Amount
1 | 45
2 | 50
2 | 60
3 | 70
3 | 70
Only the first match in file A for each row in file B is missing.
I tried with tMap and tFilterRow but it matches all rows not only the first one.
Example edited : I can have many times the same couple code-amount in file B and I need to remove this same number of rows from file A
You can do this by using Variables within the Tmap. I created 3:
v_match - return "match" if code and amount are in lookup file b.
v_count - add to the count if it's a repeating value. otherwise reset to 0
v_last_row - set the value of v_match to this before comparing again. this way we can compare current row to last row and get counts
Then add an Expression filter to remove any first match.
This will give the desired results:
You can't delete rows from a file, so you'll have to generate a new file containing only the rows you want.
Here's a simple solution.
First, join your files using a left join between A as a main flow, and B as a lookup.
In the tMap, using an output filter, you only write to the output file the rows from A that don't match anything in B (row2.code == null) or those which have a match, but not a first match.
The trick is to use a Numeric.sequence, with the code as an id of the sequence; if the sequence returns a value other than 1, you know you've already had that line previously. If it's the first occurence of the code, the sequence would start at 1 and return 1, so the row is filtered out.
I have a json array with a couple of records, all of which have 3 fields lat, lon, v.
I would like to create a select subquery from this array to join with another query. The problem is that I cannot make the example in the PostgreSQL documentation work.
select * from json_populate_recordset(null::x, '[{"a":1,"b":2},{"a":3,"b":4}]')
Should result in:
a | b
---+---
1 | 2
3 | 4
But I only get ERROR: type "x" does not exist Position: 45
It is necessary to pass a composite type to json_populate_recordset whereas a column list is passed to json_to_recordset:
select *
from json_to_recordset('[{"a":1,"b":2},{"a":3,"b":4}]') x (a int, b int)
;
a | b
---+---
1 | 2
3 | 4