Related
I have copied some json files into Snowflake from a stage and I have a property name which contains a hyphen.
When I try to query for this property name (as shown below), I get this error.
select my_variant:test-id from mytable;
SQL compilation error: error line 1 at position 44 invalid identifier 'ID'.
I assume it doesn't like the hyphen. Is there any way I can rename this hyphenated name in my variant column so I don't get the error?
You just need to quote the column name in the variant:
select my_variant:"test-id" from mytable;
If you want to update it, see below. It assumes that you have a key per row, so that we can aggregate it back to rebuild the variant at the row level.
Setup test table:
create or replace table test (k int, a variant);
insert into test
select 1, parse_json('{"test-id": 1, "test-id2": "2"}')
union all
select 2, parse_json('{"test-1": 1, "test-2": "2"}');
select * from test;
+---+-------------------+
| K | A |
|---+-------------------|
| 1 | { |
| | "test_id": 1, |
| | "test_id2": "2" |
| | } |
| 2 | { |
| | "test_1": 1, |
| | "test_2": "2" |
| | } |
+---+-------------------+
Update the table:
update test t
set t.a = b.value
from (
with t as (
select
k,
replace(f.key, '-', '_') as key,
f.value as value
from test,
lateral flatten(a) f
)
select
k, object_agg(key, value) as value
from t
group by k
) b
where t.k = b.k
;
select * from test;
+---+-------------------+
| K | A |
|---+-------------------|
| 1 | { |
| | "test_id": 1, |
| | "test_id2": "2" |
| | } |
| 2 | { |
| | "test_1": 1, |
| | "test_2": "2" |
| | } |
+---+-------------------+
I have a column item_id that contains data in JSON (like?) structure.
+----------+---------------------------------------------------------------------------------------------------------------------------------------+
| id | item_id |
+----------+---------------------------------------------------------------------------------------------------------------------------------------+
| 56711 | {itemID":["0530#2#1974","0538\/2#2#1974","0538\/3#2#1974","0538\/18#2#1974","0539#2#1974"]}" |
| 56712 | {itemID":["0138528#2#4221","0138529#2#4221","0138530#2#4221","0138539#2#4221","0118623\/2#2#4220"]}" |
| 56721 | {itemID":["2704\/1#1#1356"]}" |
| 56722 | {itemID":["0825\/2#2#3349","0840#2#3349","0844\/10#2#3349","0844\/11#2#3349","0844\/13#2#3349","0844\/14#2#3349","0844\/15#2#3349"]}" |
| 57638 | {itemID":["0161\/1#2#3364","0162\/1#2#3364","0163\/2#2#3364"]}" |
| 57638 | {itemID":["109#1#3364","110\/1#1#3364"]}" |
+----------+---------------------------------------------------------------------------------------------------------------------------------------+
I need the last four digits before every comma (if there is) and the last 4 digits distincted and separated into individual colums.
The distinct should happen across id as well, so only one result row with id: 57638 is permitted.
Here is a fiddle with a code draft that is not giving the right answer.
The desired result should look like this:
+----------+-----------+-----------+
| id | item_id_1 | item_id_2 |
+----------+-----------+-----------+
| 56711 | 1974 | |
| 56712 | 4220 | 4221 |
| 56721 | 1356 | |
| 56722 | 3349 | |
| 57638 | 3364 | 3365 |
+----------+-----------+-----------+
There can be quite a lot 'item_id_%' column in the results.
with the_table (id, item_id) as (
values
(56711, '{"itemID":["0530#2#1974","0538\/2#2#1974","0538\/3#2#1974","0538\/18#2#1974","0539#2#1974"]}'),
(56712, '{"itemID":["0138528#2#4221","0138529#2#4221","0138530#2#4221","0138539#2#4221","0118623\/2#2#4220"]}'),
(56721, '{"itemID":["2704\/1#1#1356"]}'),
(56722, '{"itemID":["0825\/2#2#3349","0840#2#3349","0844\/10#2#3349","0844\/11#2#3349","0844\/13#2#3349","0844\/14#2#3349","0844\/15#2#3349"]}'),
(57638, '{"itemID":["0161\/1#2#3364","0162\/1#2#3364","0163\/2#2#3364"]}'),
(57638, '{"itemID":["109#1#3365","110\/1#1#3365"]}')
)
select id
,(array_agg(itemid)) [1] itemid_1
,(array_agg(itemid)) [2] itemid_2
from (
select distinct id
,split_part(replace(json_array_elements(item_id::json -> 'itemID')::text, '"', ''), '#', 3)::int itemid
from the_table
order by 1
,2
) t
group by id
DEMO
You can unnest the json array, get the last 4 characters of each element as a number, then do conditional aggregation:
select
id,
max(val) filter(where rn = 1) item_id_1,
max(val) filter(where rn = 2) item_id_2
from (
select
id,
right(val, 4)::int val,
dense_rank() over(partition by id order by right(val, 4)::int) rn
from mytable t
cross join lateral jsonb_array_elements_text(t.item_id -> 'itemID') as x(val)
) t
group by id
You can add more conditional max()s to the outer query to handle more possible values.
Demo on DB Fiddle:
id | item_id_1 | item_id_1
----: | --------: | --------:
56711 | 1974 | null
56712 | 4220 | 4221
56721 | 1356 | null
56722 | 3349 | null
57638 | 3364 | 3365
I have created an table with complex data type array in hive. The query is
create table testivr (
mobNo string,
callTime string,
refNo int,
callCat string,
menus array <string>,
endType string,
duration int,
transferNode string
)
row format delimited
fields terminated by ','
collection items terminated by '|'
The records loaded are like
9220276765 2011-05-01 21:26:45 29 E ["PRE_HOST10_JINGLE_PP-PREF_WELCOME_PP-PREF_PROMO_PP","M001:4","M477:3","M005:2","M090:5","M465:9"] RAT 218 TR716
Now I need to check whether the first two fields of array are:
PRE_HOST10_JINGLE_PPPREF_WELCOME_PP-PREF_PROMO_PP and M001.
I tried using:
select * where menu[0] = "val1" and menu[1] = "val2"`
and also like
menu(0) = "val1" and menu(1) = "val2"
I'm getting an error like:
SemanticException [Error 10011]: Line 3:0 Invalid function 'menus'
How to compare them?
if the val1 and val2 are hardcoded value
val1 = "PRE_HOST10_JINGLE_PPPREF_WELCOME_PP-PREF_PROMO_PP"
val2 = 'M001'
you may try to convert array to string and use substr to find expect value
--sample records
with testivr as (
select '9220276765' as mobNo
, '2011-05-01 21:26:45' as callTime
, 29 as refNo
, 'E' as callCat
, array("PRE_HOST10_JINGLE_PP-PREF_WELCOME_PP-PREF_PROMO_PP","M001:4","M477:3","M005:2","M090:5","M465:9") as menus
, 'RAT' as endType
, 218 as duration
, 'TR716' as transferNode
union all
select '9220276766'
, '2011-05-02 21:26:45'
, 30
, 'E'
, array("PRE_HOST10_JINGLE_PP-PREF_WELCOME_PP-PREF_PROMO_PP","M001:5","M478:4","M006:3","M091:5","M465:10")
, 'RAT'
, 219
, 'TR717'
union all
select '9220276767'
, '2011-05-03 21:26:45'
, 31
, 'E'
, array("PRE_HOST10_JINGLE_PP-PREF_WELCOME_PP-PREF_PROMO_PP","M002:5","M478:4","M006:3","M091:5","M465:10")
, 'RAT'
, 220
, 'TR718'
)
select mobno,refno,menus
from testivr t
where substr(concat_ws(',',menus),1,50) = 'PRE_HOST10_JINGLE_PP-PREF_WELCOME_PP-PREF_PROMO_PP'
and substr(concat_ws(',',menus),52,4) = 'M001'
;
Temp Table testivr:
+-------------+----------------------+----------+------------+-------------------------------------------------------------------------------------------------------+------------+-------------+-----------------+--+
| t.mobno | t.calltime | t.refno | t.callcat | t.menus | t.endtype | t.duration | t.transfernode |
+-------------+----------------------+----------+------------+-------------------------------------------------------------------------------------------------------+------------+-------------+-----------------+--+
| 9220276765 | 2011-05-01 21:26:45 | 29 | E | ["PRE_HOST10_JINGLE_PP-PREF_WELCOME_PP-PREF_PROMO_PP","M001:4","M477:3","M005:2","M090:5","M465:9"] | RAT | 218 | TR716 |
| 9220276766 | 2011-05-02 21:26:45 | 30 | E | ["PRE_HOST10_JINGLE_PP-PREF_WELCOME_PP-PREF_PROMO_PP","M001:5","M478:4","M006:3","M091:5","M465:10"] | RAT | 219 | TR717 |
| 9220276767 | 2011-05-03 21:26:45 | 31 | E | ["PRE_HOST10_JINGLE_PP-PREF_WELCOME_PP-PREF_PROMO_PP","M002:5","M478:4","M006:3","M091:5","M465:10"] | RAT | 220 | TR718 |
+-------------+----------------------+----------+------------+-------------------------------------------------------------------------------------------------------+------------+-------------+-----------------+--+
Query results:
+-------------+--------+-------------------------------------------------------------------------------------------------------+--+
| mobno | refno | menus |
+-------------+--------+-------------------------------------------------------------------------------------------------------+--+
| 9220276765 | 29 | ["PRE_HOST10_JINGLE_PP-PREF_WELCOME_PP-PREF_PROMO_PP","M001:4","M477:3","M005:2","M090:5","M465:9"] |
| 9220276766 | 30 | ["PRE_HOST10_JINGLE_PP-PREF_WELCOME_PP-PREF_PROMO_PP","M001:5","M478:4","M006:3","M091:5","M465:10"] |
+-------------+--------+-------------------------------------------------------------------------------------------------------+--+
I have a dataset where I need to calculate a value that for each row depends on the value in the previous row of the same column. Or a 1 initially when there is no previous row. I need to do this on different partitions.
The formula looks like this: factor = (previous factor or 1 if it does not exist) * (1 + div / nav)
This needs to be partitioned by Inst_id.
I would prefer to avoid a cursor. Maybe cte with recursion - but I cannot get my head around it - or another way?
I know this code does not work as I cannot reference the same column, but it is another way of showing what I'm trying to do:
SELECT Dato, Inst_id, nav, div
, (1 + div / nav ) * ISNULL(LAG(factor, 1) OVER (PARTITION BY Inst_id ORDER BY Date), 1) AS factor
FROM #tmp
So with my test data I need to get these results in the factor column below.
Please ignore rounding issues, as I calculated this in Excel:
date Inst_id nav div factor
11-04-2012 16 57.5700 5.7500 1.09987841
19-04-2013 16 102.8600 10.2500 1.20948130
29-04-2014 16 65.9300 16.7500 1.51675890
08-04-2013 29 111.2736 17.2500 1.15502333
10-04-2014 29 101.9650 16.3000 1.33966395
15-04-2015 29 109.5400 7.5000 1.43138825
27-04-2016 29 94.2500 0.4000 1.43746311
15-04-2015 34 159.1300 11.4000 1.07163954
27-04-2016 34 124.6100 17.6000 1.22299863
26-04-2017 34 139.7900 9.2000 1.30348784
01-04-2016 38 99.4600 0.1000 1.00100543
26-04-2017 38 102.9200 2.1000 1.02143014
Test data:
DECLARE #tmp TABLE(Dato DATE, Inst_id INT, nav DECIMAL(26,19), div DECIMAL(26,19), factor DECIMAL(26,19))
INSERT INTO #tmp (Dato, Inst_id, nav, div) VALUES
('2012-04-11', 16, 57.57, 5.75),
('2013-04-19', 16, 102.86, 10.25),
('2014-04-29', 16, 65.93, 16.75),
('2013-04-08', 29, 111.273577, 17.25),
('2014-04-10', 29, 101.964994, 16.3),
('2015-04-15', 29, 109.54, 7.5),
('2016-04-27', 29, 94.25, 0.4),
('2015-04-15', 34, 159.13, 11.4),
('2016-04-27', 34, 124.61, 17.6),
('2017-04-26', 34, 139.79, 9.2)
I'm on a Microsoft SQL Server Enterprise 2016 (and use SSMS 2016).
You can use (if DIV and NAV are always >0):
SELECT A.* , EXP(SUM( LOG(1+DIV/NAV) ) OVER (PARTITION BY INST_ID ORDER BY DATO) )AS FACT_NEW
FROM #tmp A
Actually what you need is an equivalent of aggregate function MULTIPLY() OVER ....
Using a log theorem: LOG(M*N) = LOG(M) + LOG (N) you can do it; for example:
DECLARE #X1 NUMERIC(10,4)=5
DECLARE #X2 NUMERIC(10,4)=7
SELECT #x1*#x2 AS S1, EXP(LOG(#X1)+LOG(#X2)) AS S2
Output:
+------------+---------+-------------------------+------------------------+--------+------------------+
| Dato | Inst_id | nav | div | factor | FACT_NEW |
+------------+---------+-------------------------+------------------------+--------+------------------+
| 2012-04-11 | 16 | 57.5700000000000000000 | 5.7500000000000000000 | NULL | 1.099878408893 |
| 2013-04-19 | 16 | 102.8600000000000000000 | 10.2500000000000000000 | NULL | 1.20948130303111 |
| 2014-04-29 | 16 | 65.9300000000000000000 | 16.7500000000000000000 | NULL | 1.51675889783963 |
| 2013-04-08 | 29 | 111.2735770000000000000 | 17.2500000000000000000 | NULL | 1.155023325977 |
| 2014-04-10 | 29 | 101.9649940000000000000 | 16.3000000000000000000 | NULL | 1.33966395090911 |
| 2015-04-15 | 29 | 109.5400000000000000000 | 7.5000000000000000000 | NULL | 1.43138824917236 |
| 2016-04-27 | 29 | 94.2500000000000000000 | 0.4000000000000000000 | NULL | 1.43746310646293 |
| 2015-04-15 | 34 | 159.1300000000000000000 | 11.4000000000000000000 | NULL | 1.071639539998 |
| 2016-04-27 | 34 | 124.6100000000000000000 | 17.6000000000000000000 | NULL | 1.22299862758278 |
| 2017-04-26 | 34 | 139.7900000000000000000 | 9.2000000000000000000 | NULL | 1.30348784264639 |
+------------+---------+-------------------------+------------------------+--------+------------------+
Using recursive CTE:
WITH DataSource AS
(
SELECT *
,ROW_NUMBER() OVER (PARTITION BY Inst_id ORDER BY Dato) AS [rowId]
FROM #tmp
),
RecursiveDataSource AS
(
SELECT *
,CAST((1 + div / nav ) * 1 AS DECIMAL(26,19)) as [factor_calculated]
FROM DataSource
WHERE [rowId] = 1
UNION ALL
SELECT A.*
,CAST((1 + A.div / A.nav ) * R.factor_calculated AS DECIMAL(26,19)) as [factor_calculated]
FROM RecursiveDataSource R
INNER JOIN DataSource A
ON r.[Inst_id] = A.[Inst_id]
AND R.[rowId] + 1 = A.[rowId]
)
SELECT *
FROM RecursiveDataSource
ORDER BY Inst_id, Dato;
I guess you are getting different values in Excel after row 3, because you are not partitioning by Inst_id there.
Working on pretty large table in SQL-Server. Table has some identical rows. I need to remove duplicate rows. Problem is I cannot alter this table i.e. to create an ID column.
I could update one column value of the other row on pairs of duplicates. Then delete afterwards using this value.
How to update only one these rows?
For example: Firstly / lastly inserted, First occurrence, newest / oldest..
Thanks!
table Structure
NrValue | Comment | Value1 | Value2 | Value3 |
--------|-----------|-----------|-----------|---------------|
00000 | data0 | zz | top | vivalasvegas|
00100 | NULL | N/A | sex | no |
00100 | NULL | N/A | sex | no |
00200 | NULL | female | sex | yes |
00200 | NULL | female | sex | yes |
00300 | NULL | male | sex | yesplease |
00300 | NULL | male | sex | yesplease |
00400 | data21 | M | -- | na |
00500 | NULL | F | ezig | na |
So, I could use 'Comment' -column to update but I cannot touch other than duplicate rows. I know by NrValue which rows can be updated.
Result would be:
NrValue | Comment | Value1 | Value2 | Value3 |
--------|-----------|-----------|-----------|---------------|
00000 | data0 | zz | top | vivalasvegas|
00100 | 1 | N/A | sex | no |
00100 | 2 | N/A | sex | no |
00200 | 3 | female | sex | yes |
00200 | 4 | female | sex | yes |
00300 | 5 | male | sex | yesplease |
00300 | 6 | male | sex | yesplease |
00400 | data21 | M | -- | na |
00500 | NULL | F | ezig | na |
Lastly I delete rows where NrValue = 00100, 00200 or 00300 AND Comment = 2, 4 or 6.
Use something like
ROW_NUMBER() OVER(PARTITION BY AllRelevantColumns ORDER BY SomeOrderCriteria)
This will generate a 1 for all rows, but duplicates get a 2 (or a 3 ...)
You might place this value in a new column or use this for cleaning...
UPDATE Following your test data...
DECLARE #mockup TABLE(NrValue INT,Comment VARCHAR(100),Value1 VARCHAR(100),Value2 VARCHAR(100),Value3 VARCHAR(100));
INSERT INTO #mockup VALUES
(00000,'data0','zz','top','vivalasvegas')
,(00100,'NULL','N/A','sex','no')
,(00100,'NULL','N/A','sex','no')
,(00200,'NULL','female','sex','yes')
,(00200,'NULL','female','sex','yes')
,(00300,'NULL','male','sex','yesplease')
,(00300,'NULL','male','sex','yesplease')
,(00400,'data21','M','--','na')
,(00500,'NULL','F','ezig','na');
WITH Numbered AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY NrValue ORDER BY (SELECT NULL)) AS DupNr
,*
FROM #mockup
)
DELETE FROM Numbered
WHERE DupNr>1;
SELECT * FROM #mockup;
This concept is called updateable CTE. The DELETE FROM Numbered ... will affect the underlying table actually...
If the NrValue is not enough to detect a row as duplicate, just add more columns to the PARTITION BY
You don't need update, you want to delete duplicates so why do you want that intermediate step?
Yor code should look like this:
declare #t table (col1 int, col2 int);
insert into #t values
(1, 1), (1, 1),
(1, 2), (1, 2),(1, 2), (1, 2),
(3, 2), (3, 2),(3, 2);
with cte as
(
select *, row_number() over (partition by col1, col2 order by 1/0) rn
from #t
)
delete cte
where rn > 1;
select *
from #t;
Sorry for not posting it in comment (rows limit and code formatting lost there)