Regex from stringified-array - snowflake-cloud-data-platform

How do you extract regex values from a stringified-array?
Sample data:
with data as (select 1 as id, '["abc_def_123","uvw_xyz_456"]'::string as my_field)
select * from data
my_field could have up to 4 items (if important).
Desired output:
--------------------------------
| id | f1 | f2 | f3 |
--------------------------------
| 1 | abc | def | 123 |
| 1 | uvw | xyz | 456 |
--------------------------------

Try below:
with data as (
select 1 as id, '["abc_def_123","uvw_xyz_456"]'::string as my_field
),
json_data as (
select id, parse_json(my_field) as json from data
),
flattened_data as (
SELECT id, split(f.value::string, '_') as splitted_c
FROM json_data,
lateral flatten(input => json_data.json) f
)
SELECT
id,
splitted_c[0]::string as f1,
splitted_c[1]::string as f2,
splitted_c[2]::string as f3
from flattened_data;
Result:
+----+-----+-----+-----+
| ID | F1 | F2 | F3 |
|----+-----+-----+-----|
| 1 | abc | def | 123 |
| 1 | uvw | xyz | 456 |
+----+-----+-----+-----+

Related

Create rows for missing values in SQL (data preparation for decision tree)

Currently, my table looks like this
id | valName | valCount | type
123 | abb | 3 | 2
123 | abc | 2 | 2
123 | b | 5 | 2
251 | aaa | 2 | 1
251 | ab | 2 | 1
251 | abb | 2 | 1
251 | ac | 2 | 1
and so on.
I want to fill in missing valNames for every id and set valCount to 0. If my set of distinct valName was (aaa, aab, ab, abb, abc, ac, b) it would look like this.
id | valName | valCount | type
123 | aaa | 0 | 2
123 | aab | 0 | 2
123 | ab | 0 | 2
123 | abb | 3 | 2
123 | abc | 2 | 2
123 | ac | 0 | 2
123 | b | 5 | 2
251 | aaa | 2 | 1
251 | aab | 0 | 1
251 | ab | 2 | 1
251 | abb | 2 | 1
251 | abc | 0 | 1
251 | ac | 2 | 1
251 | b | 0 | 1
Also, the dataset is quite large. So efficient query is better.
As Dale suggested, this is my attempt. TABLE in the code is the table I am using.
select C.id, C.valName, C.type, COALESCE(D.valCount,0 ) as count
from (
select *
from (select id, min(type) as type
From TABLE
Group by id
) B
cross join
(select distinct valName FROM TABLE) A
) C
left join TABLE D
on C.id = D.id
and C.valName = D.valName
order by C.id
The idea behind this query is to create the id/valname table using cross join and then get valCount using left join.
This query works but is too slow.
Something like this
with unq_id_type_cte(id, [type]) as (
select distinct id, [type] from mytable)
insert mytable(id, valName, valCount, [type])
select uitc.id, t.v, 0, uitc.[type]
from
(values ('aaa'),('aab'),('ab'),('abb'),('abc'),('ac'),('b')) t(v)
cross join
unq_id_type_cte uitc
where not exists
(select 1 from mytable t_in where uitc.id=t_in.id
and t.v=t_in.valName);
If there are performance issues or concerns then the first thing to try imo would be to insert the cte into an indexed temp table.

T-SQL Limited Cross Join

I want to join 2 tables such that I get the NAR for every combination of Type and BillingID where it exists.
Where a BillingID doesn't have a certain Type, then either NULL or 0 is returned for the NAR along with the Type and BillingID.
Is something like this even possible using SQL?
A simplified version of my data is shown below:
Type list:
+----------+
| Type |
+----------+
| NEW |
| CHNG |
| LAP |
+----------+
Data:
+----------+-----------+-----+
| Type | BillingID | NAR |
+----------+-----------+-----+
| NEW | ABC | 5 |
| CHNG | ABC | 15 |
| LAP | ABC | 10 |
| CHNG | DEF | 20 |
+----------+-----------+-----+
Desired result:
+----------+-----------+-----+
| Type | BillingID | NAR |
+----------+-----------+-----+
| NEW | ABC | 5 |
| CHNG | ABC | 15 |
| LAP | ABC | 10 |
| CHNG | DEF | 20 |
| NEW | DEF | 0 |
| LAP | DEF | 0 |
+----------+-----------+-----+
The last 2 rows are what is causing me problems.
I think you can do it like this:
declare #table table (type1 varchar(5))
insert into #table
values
('new'),
('chng'),
('lap')
declare #table2 table (typeid varchar(5),billingid varchar(5),nar int)
insert into #table2
values
( 'NEW', 'ABC', 5 ),
( 'CHNG' , 'ABC', 15 ),
( 'LAP' , 'ABC', 10 ),
( 'CHNG' , 'DEF', 20 )
select Z.*,case when c.nar IS null then 0 else c.nar end as nar from (
select * from #table a
outer apply (select distinct billingid from #table2 b ) p
)Z
left join #table2 c on Z.type1 = c.typeid and Z.billingid = c.billingid
order by billingid
Result

Getting values from a table that's inside a table (unpivot / cross apply)

I'm having a serious problem with one of my import tables. I've imported an Excel file to a SQL Server table. The table ImportExcelFile now looks like this (simplified):
+----------+-------------------+-----------+------------+--------+--------+-----+---------+
| ImportId | Excelfile | SheetName | Field1 | Field2 | Field3 | ... | Field10 |
+----------+-------------------+-----------+------------+--------+--------+-----+---------+
| 1 | C:\Temp\Test.xlsx | Sheet1 | Age / Year | 2010 | 2011 | | 2018 |
| 2 | C:\Temp\Test.xlsx | Sheet1 | 0 | Value1 | Value2 | | Value9 |
| 3 | C:\Temp\Test.xlsx | Sheet1 | 1 | Value1 | Value2 | | Value9 |
| 4 | C:\Temp\Test.xlsx | Sheet1 | 2 | Value1 | Value2 | | Value9 |
| 5 | C:\Temp\Test.xlsx | Sheet1 | 3 | Value1 | Value2 | | Value9 |
| 6 | C:\Temp\Test.xlsx | Sheet1 | 4 | Value1 | Value2 | | Value9 |
| 7 | C:\Temp\Test.xlsx | Sheet1 | 5 | NULL | NULL | | NULL |
+----------+-------------------+-----------+------------+--------+--------+-----+---------+
I now want to insert those values from Field1 to Field10 to the table AgeYear(in my original table there are about 70 columns and 120 rows). The first row (Age / Year, 2010, 2011, ...) is the header row. The column Field1 is the leading column. I want to save the values in the following format:
+-----------+-----+------+--------+
| SheetName | Age | Year | Value |
+-----------+-----+------+--------+
| Sheet1 | 0 | 2010 | Value1 |
| Sheet1 | 0 | 2011 | Value2 |
| ... | ... | ... | ... |
| Sheet1 | 0 | 2018 | Value9 |
| Sheet1 | 1 | 2010 | Value1 |
| Sheet1 | 1 | 2011 | Value2 |
| ... | ... | ... | ... |
| Sheet1 | 1 | 2018 | Value9 |
| ... | ... | ... | ... |
+-----------+-----+------+--------+
I've tried the following query:
DECLARE #sql NVARCHAR(MAX) =
';WITH cte AS
(
SELECT i.SheetName,
ROW_NUMBER() OVER(PARTITION BY i.SheetName ORDER BY i.SheetName) AS rn,
' + #columns + ' -- #columns = 'Field1, Field2, Field3, Field4, ...'
FROM dbo.ImportExcelFile i
WHERE i.Sheetname LIKE ''Sheet1''
)
SELECT SheetName,
age Age,
y.[Year]
FROM cte
CROSS APPLY
(
SELECT Field1 age
FROM dbo.ImportExcelFile
WHERE SheetName LIKE ''Sheet1''
AND ISNUMERIC(Field1) = 1
) a (age)
UNPIVOT
(
[Year] FOR [Years] IN (' + #columns + ')
) y
WHERE rn = 1'
EXEC (#sql)
So far I'm getting the desired ages and years. My problem is that I don't know how I could get the values. With UNPIVOT I don't get the NULL values. Instead it fills the whole table with the same values even if they are NULL in the source table.
Could you please help me?
Perhaps an alternative approach. This is not dynamic, but with the help of a CROSS APPLY and a JOIN...
The drawback is that you'll have to define the 70 fields.
Example
;with cte0 as (
Select A.ImportId
,A.SheetName
,Age = A.Field1
,B.*
From ImportExcelFile A
Cross Apply ( values ('Field2',Field2)
,('Field3',Field3)
,('Field10',Field10)
) B (Item,Value)
)
,cte1 as ( Select * from cte0 where ImportId=1 )
Select A.SheetName
,[Age] = try_convert(int,A.Age)
,[Year] = try_convert(int,B.Value)
,[Value] = A.Value
From cte0 A
Join cte1 B on A.Item=B.Item
Where A.ImportId>1
Returns

Selecting the longest string in each field

I am trying to clean up a data set similar in structure to the following table:
dataSource
| ID_dec | ID_base | name | field1 | field2 | field3 |
| 1.01 | 1 | AAA | Cat | Brown | Domesticated |
| 1.02 | 1 | AAA | Cat | Brown | Domesticated |
| 1.03 | 1 | AAA | Feline | NULL | Dom. |
| 1.04 | 1 | AAA | Beautiful cat | NULL | NULL |
| 1.05 | 1 | AAA | NULL | Light Brown | NULL |
| 2.01 | 2 | BBB | Dog | Black | Wild |
| 2.02 | 2 | BBB | Barker | NULL | NULL |
| 3.01 | 3 | CCC | Bird | Yellow | Domesticated |
| 4.01 | 4 | DDD | Snake | NULL | NULL |
| 4.02 | 4 | DDD | NULL | Green | NULL |
| 4.03 | 4 | DDD | NULL | Forest Green | NULL |
| 4.04 | 4 | DDD | NULL | Green | Wild |
| 4.05 | 4 | DDD | NULL | NULL | Wild |
I want to pull the longest string of each combination of field[N] and ID_base, like so:
result
| ID_base | name | field1 | field2 | field3 |
| 1 | AAA | Beautiful cat | Light Brown | Domesticated |
| 2 | BBB | Barker | Black | Wild |
| 3 | CCC | Bird | Yellow | Domesticated |
| 4 | DDD | Snake | Forest Green | Wild |
This has been asked before, but only while examining to a single field. The following SQL gets me the desired result, but feels inefficient when scaled up to the real data set of 37 fields and 5665 rows (4029 ID_bases and the most ID_decs to a single ID_base is 10):
SELECT DISTINCT a.id_base, a.name, b.result, c.result, d.result
FROM
dataSource a
LEFT JOIN
(
SELECT y.id_base, max(y.field1) result
FROM dataSource y
LEFT JOIN
(
SELECT id_base, max(len(field1)) leng
FROM dataSource
GROUP BY id_base
) z
ON y.id_base = z.id_base
WHERE len(y.field1) = z.leng
GROUP BY y.id_base
) b
ON a.id_base = b.id_base
LEFT JOIN
(
SELECT y.id_base, max(y.field2) result
FROM dataSource y
LEFT JOIN
(
SELECT id_base, max(len(field2)) leng
FROM dataSource
GROUP BY id_base
) z
ON y.id_base = z.id_base
WHERE len(y.field1) = z.leng
GROUP BY y.id_base
) c
ON a.id_base = c.id_base
LEFT JOIN
(
SELECT y.id_base, max(y.field3) result
FROM dataSource y
LEFT JOIN
(
SELECT id_base, max(len(field3)) leng
FROM dataSource
GROUP BY id_base
) z
ON y.id_base = z.id_base
WHERE len(y.field1) = z.leng
GROUP BY y.id_base
) d
ON a.id_base = d.id_base
What is the best way to go about this query?
WITH a AS (
SELECT id_base, name, max(len(field1)) l1, max(len(field2)) l2, max(len(field3)) l3
FROM datasource
GROUP BY id_base, name
)
SELECT a.*,
(SELECT TOP 1 field1 FROM datasource WHERE id_base = a.id_base AND len(field1) = a.l1),
(SELECT TOP 1 field2 FROM datasource WHERE id_base = a.id_base AND len(field2) = a.l2),
(SELECT TOP 1 field3 FROM datasource WHERE id_base = a.id_base AND len(field3) = a.l3)
from a
Another simpler variation:
SELECT
t.id_base,
t.name
(SELECT TOP 1 field1 FROM table WHERE id_base = t.id_base ORDER BY LEN(field1) DESC),
(SELECT TOP 1 field2 FROM table WHERE id_base = t.id_base ORDER BY LEN(field2) DESC),
(SELECT TOP 1 field3 FROM table WHERE id_base = t.id_base ORDER BY LEN(field3) DESC)
FROM (SELECT DISTINCT id_base, name FROM table) t
Select coalesce(t1.ID_base, t2.ID_base, t3.ID_base) base,
coalesce(t1.Name, t2.Name, t3.Name) Name,
coalesce(t1.field1, t2.field1, t3.field1) field1,
coalesce(t1.field2, t2.field2, t3.field2) field2,
coalesce(t1.field3, t2.field3, t3.field3) field3
from dataSource t1
full join dataSource t2 on t2.ID_base = t1.ID_base
and len(t1.field1) = (Select Max(len(field1)) from dataSource
where ID_base = t1.ID_base)
and len(t2.field2) = (Select Max(len(field2)) from dataSource
where ID_base = t2.ID_base)
full join dataSource t3 on t3.ID_base = t1.ID_base
and len(t3.field3) = (Select Max(len(field3)) from dataSource
where ID_base = t3.ID_base)

Twice Inner Join on same table with Aggregate function

being a novice sql user:
I have a simple table storing some records over night daily. table:
Table: T1
+----+-----+----+-----------+------------+
| Id | A | AB | Value | Date |
+----+-----+----+-----------+------------+
| 1 | abc | I | -48936.08 | 2013-06-24 |
| 2 | def | A | 431266.19 | 2013-06-24 |
| 3 | xyz | I | -13523.90 | 2013-06-24 |
| 4 | abc | A | 13523.90 | 2013-06-23 |
| 5 | xyz | I | -13523.90 | 2013-06-23 |
| 6 | def | A | 13523.90 | 2013-06-22 |
| 7 | def | I | -13523.90 | 2013-06-22 |
+----+-----+----+-----------+------------+
I would like to get all values of columns A,AB, Value for the latest Date on Column A filtered on AB = I
basically the result should look like:
+----+-----+----+-----------+------------+
| Id | A | AB | Value | Date |
+----+-----+----+-----------+------------+
| 1 | abc | I | -48936.08 | 2013-06-24 |
| 3 | xyz | I | -13523.90 | 2013-06-24 |
| 7 | def | I | -13523.90 | 2013-06-22 |
+----+-----+----+-----------+------------+
I have tried to use inner join twice on the same table but failed to come up with correct result.
any help would be appreciated.
thanks :)
This will work with sqlserver 2005+
;WITH a as
(
SELECT id, A,AB, Value, Date
, row_number() over (partition by A order by Date desc) rn
FROM t1
WHERE AB = 'I'
)
SELECT id, A,AB, Value, Date
FROM a WHERE rn = 1
; WITH x AS (
SELECT id
, a
, ab
, "value"
, "date"
, Row_Number() OVER (PARTITION BY a ORDER BY "date" DESC) As row_num
FROM your_table
WHERE ab = 'I'
)
SELECT *
FROM x
WHERE row_num = 1

Resources