Split single row value to multiple rows dynamically in Snowflake - snowflake-cloud-data-platform

split unstructured address into multiple rows using snowflake.
consider the table
col_A
4402, 4420, 4330, 4502 hecson Blvd SW
2643-2714 Nargay Matle Ct, 2685-2733 Osase Ci
4-60 Brook Ave, 2-55 Day Drive, 6-90 Gale Dr, 27-87 Moile Road, 580 More Road
1200 1550 1750 mart Way 12231 12301 12335 12425 12427 Buck Road
241 and 251 A Street, 260 B Street
7232, 7242, 7252, 7262, 7272, 7282 south hawk St.
100 Jamal Pl,100-148 Oaklohoma Hill,11,15 Turn Pl,160-167,170,172,230-238,242 Burrows St,2200 Bentley St
100,111 goldman Place, 228-290, 306-336 Oaklohoma Hill, 340-400 azerban Place
i need to parse the above col_a as multiple rows based on addresses present
for eg:
4402, 4420, 4330, 4502 hecson Blvd SW
the above address has 4 different addresses(4 housenumbers) in comma seperated format with street name need to parse them in the below format. likewise for the other formats as well.
i tried to use lateral flatten to convert them into multiple rows but i got only housenumbers as outcome.
if they are having '2643-2714' ranges then they can be taken as a whole with street name in case of individual housenumbers they should be populated sepereately.
output expected
col_A
col_a_cleansed
4402, 4420, 4330, 4502 hecson Blvd SW
4402 hecson Blvd SW
4402, 4420, 4330, 4502 hecson Blvd SW
4420 hecson Blvd SW
4402, 4420, 4330, 4502 hecson Blvd SW
4330 hecson Blvd SW
4402, 4420, 4330, 4502 hecson Blvd SW
4502 hecson Blvd SW
2643-2714 Nargay Matle Ct, 2685-2733 Osase Ci
2643-2714 Nargay Matle Ct
2643-2714 Nargay Matle Ct, 2685-2733 Osase Ci
2685-2733 Osase Ci
4-60 Brook Ave, 2-55 Day Drive, 6-90 Gale Dr, 27-87 Moile Road, 580 More Road
4-60 Brook Ave
4-60 Brook Ave, 2-55 Day Drive, 6-90 Gale Dr, 27-87 Moile Road, 580 More Road
2-55 Day Drive
4-60 Brook Ave, 2-55 Day Drive, 6-90 Gale Dr, 27-87 Moile Road, 580 More Road
6-90 Gale Dr
4-60 Brook Ave, 2-55 Day Drive, 6-90 Gale Dr, 27-87 Moile Road, 580 More Road
27-87 Moile Road
4-60 Brook Ave, 2-55 Day Drive, 6-90 Gale Dr, 27-87 Moile Road, 580 More Road
580 More Road
1200 1550 1750 mart Way 12231 12301 12335 12425 12427 Buck Road
1200 mart Way
1200 1550 1750 mart Way 12231 12301 12335 12425 12427 Buck Road
1550 mart Way
1200 1550 1750 mart Way 12231 12301 12335 12425 12427 Buck Road
1750 mart Way
1200 1550 1750 mart Way 12231 12301 12335 12425 12427 Buck Road
12231 Buck Road
1200 1550 1750 mart Way 12231 12301 12335 12425 12427 Buck Road
12301 Buck Road
1200 1550 1750 mart Way 12231 12301 12335 12425 12427 Buck Road
12335 Buck Road
1200 1550 1750 mart Way 12231 12301 12335 12425 12427 Buck Road
12425 Buck Road
1200 1550 1750 mart Way 12231 12301 12335 12425 12427 Buck Road
12427 Buck Road
241 and 251 A Street, 260 B Street
241 A Street
241 and 251 A Street, 260 B Street
251 A Street
241 and 251 A Street, 260 B Street
260 B Street
7232, 7242, 7252, 7262, 7272, 7282 south hawk St.
7232 south hawk St
7232, 7242, 7252, 7262, 7272, 7282 south hawk St.
7242 south hawk St
7232, 7242, 7252, 7262, 7272, 7282 south hawk St.
7252 south hawk St
7232, 7242, 7252, 7262, 7272, 7282 south hawk St.
7262 south hawk St
7232, 7242, 7252, 7262, 7272, 7282 south hawk St.
7272 south hawk St
7232, 7242, 7252, 7262, 7272, 7282 south hawk St.
7282 south hawk St
100 Jamal Pl,100-148 Oaklohoma Hill,11,15 Turn Pl,160-167,170,172,230-238,242 Burrows St,2200 Bentley St
100 Jamal Pl
100 Jamal Pl,100-148 Oaklohoma Hill,11,15 Turn Pl,160-167,170,172,230-238,242 Burrows St,2200 Bentley St
100-148 Oaklohoma Hill
100 Jamal Pl,100-148 Oaklohoma Hill,11,15 Turn Pl,160-167,170,172,230-238,242 Burrows St,2200 Bentley St
11 Turn Pl
100 Jamal Pl,100-148 Oaklohoma Hill,11,15 Turn Pl,160-167,170,172,230-238,242 Burrows St,2200 Bentley St
15 Turn Pl
100 Jamal Pl,100-148 Oaklohoma Hill,11,15 Turn Pl,160-167,170,172,230-238,242 Burrows St,2200 Bentley St
160-167 Burrows St
100 Jamal Pl,100-148 Oaklohoma Hill,11,15 Turn Pl,160-167,170,172,230-238,242 Burrows St,2200 Bentley St
170 Burrows St
100 Jamal Pl,100-148 Oaklohoma Hill,11,15 Turn Pl,160-167,170,172,230-238,242 Burrows St,2200 Bentley St
172 Burrows St
100 Jamal Pl,100-148 Oaklohoma Hill,11,15 Turn Pl,160-167,170,172,230-238,242 Burrows St,2200 Bentley St
230-238 Burrows St
100 Jamal Pl,100-148 Oaklohoma Hill,11,15 Turn Pl,160-167,170,172,230-238,242 Burrows St,2200 Bentley St
242 Burrows St
100,111 goldman Place, 228-290, 306-336 Oaklohoma Hill, 340-400 azerban Place
100 goldman Place
100,111 goldman Place, 228-290, 306-336 Oaklohoma Hill, 340-400 azerban Place
111 goldman Place
100,111 goldman Place, 228-290, 306-336 Oaklohoma Hill, 340-400 azerban Place
228-290 Oaklohoma Hill
100,111 goldman Place, 228-290, 306-336 Oaklohoma Hill, 340-400 azerban Place
306-336 Oaklohoma Hill
100,111 goldman Place, 228-290, 306-336 Oaklohoma Hill, 340-400 azerban Place
340-400 azerban Place
have tried the lateral flaten but outcome is not as expected.
SELECT col_A,A.value AS ADDR ,REGEXP_SUBSTR(trim(left(col_a,15),' '), '^+[0-9]+') as start_val
FROM table,
LATERAL SPLIT_TO_TABLE(col_a,',')A

This hopefully is a good starting point. I didn't finish the mart Way|Buck Road combo - but it's pretty straight forward if you follow the same approach.
Extract the wordy bits from the numbers, then stick them back together.
I'm sure there's a much smarter way to do this - hopefully one of the other answery people has a peak.
Functions used :
REGEXP_SUBSTR()
LEAD() - NOTE THE NULLS IGNORED
STRTOK_SPLIT_TO_TABLE()
REPLACE()
with cte as (select '4402, 4420, 4330, 4502 hecson Blvd SW' col_A
union all select '2643-2714 Nargay Matle Ct, 2685-2733 Osase Ci' col_A
union all select '4-60 Brook Ave, 2-55 Day Drive, 6-90 Gale Dr, 27-87 Moile Road, 580 More Road' col_A
union all select '1200 1550 1750 mart Way 12231 12301 12335 12425 12427 Buck Road' col_A
union all select '241 and 251 A Street, 260 B Street' col_A
union all select '7232, 7242, 7252, 7262, 7272, 7282 south hawk St.' col_A
union all select '100 Jamal Pl,100-148 Oaklohoma Hill,11,15 Turn Pl,160-167,170,172,230-238,242 Burrows St,2200 Bentley St' col_A
union all select '100,111 goldman Place, 228-290, 306-336 Oaklohoma Hill, 340-400 azerban Place' col_A)
SELECT
COL_A
,TRIM(REGEXP_SUBSTR(TRIM(VALUE),'[A-Za-z]+\\s*[A-Za-z]+.*'))GRAB_ADDRESS
,TRIM(REPLACE(TRIM(VALUE),COALESCE(GRAB_ADDRESS,''))) GRAB_NUMBER
,GRAB_NUMBER
||' '||
COALESCE(REGEXP_SUBSTR(TRIM(VALUE),'[A-Za-z]+\\s*[A-Za-z]+.*')
,LEAD(GRAB_ADDRESS)IGNORE NULLS OVER(PARTITION BY SEQ ORDER BY INDEX ASC))STICK_TOGETHER
FROM
CTE,
TABLE(STRTOK_SPLIT_TO_TABLE( replace(col_A,'and',','),','))

Related

SQL Revision History - Delete old records using sql store_proc

Need to pickup the max date of created_date and last_updated_date and delete old records - Please help with store_proc
Table Name: Sales
Table records:
Country Region Sales Created_Date Text Last_updated_date
United Kingdom London 99 05-05-18 12:30 ABC NULL
United Kingdom London 100 05-05-18 12:30 ABC 07-05-19 12:30
Canada British 300 06-02-19 12:30 NULL NULL
Canada British 300 06-02-19 12:30 NULL 08-02-19 12:30
India Chennai 499 10-02-19 12:30 XYZ NULL
India Chennai 600 11-02-19 12:30 XYZ NULL
India Chennai 900 12-02-19 12:30 XYZ NULL
Australia Victoria 60 21-02-19 12:30 ASD 22-02-19 12:30
Australia Victoria 90 23-02-19 12:30 ASD 24-02-19 12:30
Required_output:
Country Region Sales Created_Date Text Last_updated_date
United Kingdom London 100 05-05-18 12:30 ABC 07-05-19 12:30
Canada British 300 06-02-19 12:30 NULL 08-02-19 12:30
India Chennai 900 12-02-19 12:30 XYZ NULL
Australia Victoria 90 23-02-19 12:30 ASD 24-02-19 12:30
I tried considering max date of two date columns, but unable to delete NULL records which are in date column
DELETE A1 from Sales A1
Join
(SELECT
Country, Region, Text, Sales,
(SELECT MAX(RevisedDate)
FROM (VALUES (created_date),(last_updated_date)) AS UpdateDate(RevisedDate))
AS RevisedDate
FROM Sales) A2
on A1.[Country] = A2.[Country]
and A1.[Region] = A2.[Region]
and A1.[Text] = A2.[Text]
and A1.[Sales] != A2.[Sales]
where A1.created_date < A2.RevisedDate

SQL Server Query for displaying more than one record in a cell

I have 2 tables:
Employee-
EmployeeID Title EmployeeFirstName EmployeeLastName
1001 Mr Peter Parker
1002 Ms Nancy Hall
HoursWorked-
EmployeeID HoursWorked
1001 15
1001 30
1001 45
1002 15
1002 30
1002 40
I have written Following query:
Select Distinct
E.EmployeeID EmployeeID,
E.Title Title,
E.FirstName EmployeeFirstName,
E.LastName EmployeeLastName,
HW.HoursWorked HoursWorked,
From Employee E
Inner Join HoursWorked HW ON E.EmployeeId = HW.EmployeeId
which gives me following output:
EmployeeID Title EmployeeFirstName EmployeeLastName HoursWorked
1001 Mr Peter Parker 15
1001 Mr Peter Parker 30
1001 Mr Peter Parker 45
1002 Ms Nancy Hall 15
1002 Ms Nancy Hall 30
1002 Ms Nancy Hall 40
I want to display the records in following format:
EmployeeID Title EmployeeFirstName EmployeeLastName HoursWorked
1001 Mr Peter Parker 15,30,45
1002 Ms Nancy Hall 15,30,40
Please let me know if how can I do this?
Use FOR XML to combine the values:
SELECT
E.EmployeeID,
E.Title,
E.FirstName,
E.LastName,
STUFF(
(
SELECT ',' + CONVERT(VARCHAR,HW.HoursWorked)
FROM HoursWorked HW
WHERE E.EmployeeId = HW.EmployeeId
GROUP BY HW.HoursWorked
ORDER BY HW.HoursWorked
FOR XML PATH('')
), 1, 1, ''
) AS HoursWorked
FROM Employee E
GROUP BY E.EmployeeID,
E.Title,
E.FirstName,
E.LastName

Data according to date in MSSQL

I have a data like this
RID Region StartDate EndDate
944 Canada 2016-01-09 00:00:00.000 2016-01-16 23:59:59.000
955 Canada 2016-01-17 00:00:00.000 2016-01-24 23:59:59.000
981 Canada 2016-02-01 00:00:00.000 2016-02-08 23:59:59.000
996 Canada 2016-02-09 00:00:00.000 2016-02-16 23:59:59.000
1006 Canada 2016-01-25 00:00:00.000 2016-01-31 23:59:59.000
1020 Canada 2016-02-17 00:00:00.000 2016-02-24 23:59:59.000
1030 Canada 2016-02-25 00:00:00.000 2016-02-29 23:59:59.000
1041 Canada 2016-03-01 00:00:00.000 2016-03-08 23:59:59.000
1046 Canada 2016-03-09 00:00:00.000 2016-03-16 23:59:59.000
1062 Canada 2016-03-17 00:00:00.000 2016-03-24 23:59:59.000
1073 Canada 2016-03-24 00:00:00.000 2016-03-31 23:59:59.000
1083 Canada 2016-04-01 00:00:00.000 2016-04-08 23:59:59.000
1105 Canada 2016-04-09 00:00:00.000 2016-04-16 23:59:59.000
1118 Canada 2016-04-17 00:00:00.000 2016-04-24 23:59:59.000
1128 Canada 2016-04-25 00:00:00.000 2016-04-30 23:59:59.000
1164 Canada 2016-05-01 00:00:00.000 2016-05-08 23:59:59.000
now i try to select data like this
select * from tab1 where Region='Canada'
and StartDate ='2016-01-09 00:00:00.000'
and EndDate ='2016-01-24 23:59:59.000'
desired result is
RID Region StartDate EndDate
944 Canada 2016-01-09 00:00:00.000 2016-01-16 23:59:59.000
955 Canada 2016-01-17 00:00:00.000 2016-01-24 23:59:59.000
but when i execute this query data is empty
any solution?
I think you were intending to restrict to a date range, but you actually restricted to two points in time instead. Try this query:
SELECT *
FROM tab1
WHERE Region = 'Canada' AND
StartDate >= '2016-01-09 00:00:00.000' AND
EndDate <= '2016-01-24 23:59:59.000'
Try this.
SELECT *
FROM tab1
WHERE Region = 'Canada'
AND StartDate >='2016-01-09 00:00:00.000'
AND EndDate <='2016-01-24 23:59:59.000'
The 'between' must work. I tried this. If in case it is not working, try convert function for those datetime columns.
SELECT *
FROM tab1
WHERE Region = 'Canada' AND
StartDate >= convert(datetime,'2016-01-09 00:00:00.000') AND
EndDate <= convert(datetime,'2016-01-24 23:59:59.000')

How to get the employees with their managers in a Report Format

Can someone help me to resolve the following issues please?
Issues:
I need a SQL Server Query to generate a Report listing the Employees in their order of Job Profile Hierarchy
-
The CTE Query generates a wrong output in the first record with Manager Name and Job Profile where MgrID is NULL
The Sample Data is:
Query:
Select * From MyEmp;
Result:
EmpNo EmpName JobProfile DeptNo MgrID LevelID
7839 KING PRESIDENT 10 NULL 01
7698 BLAKE MANAGER 30 7839 02
7782 CLARK MANAGER 10 7839 02
7566 JONES MANAGER 20 7839 02
7654 MARTIN SALESMAN 30 7698 03
7499 ALLEN SALESMAN 30 7698 03
7844 TURNER SALESMAN 30 7698 03
7900 JAMES CLERK 30 7698 03
7521 WARD SALESMAN 30 7698 03
7902 FORD ANALYST 20 7566 03
7369 SMITH CLERK 20 7902 04
7788 SCOTT ANALYST 20 7566 03
7876 ADAMS CLERK 20 7788 04
7934 MILLER CLERK 10 7782 03
The CTE Query is:
CTE Query:
WITH Subordinates AS
(
(SELECT e.EmpNo, e.EmpName, e.JobProfile, e.LevelID, e.MgrID,
m.EmpName MgrName, m.JobProfile MgrProfile
FROM MyEmp AS e
INNER JOIN
MyEmp AS m ON
e.MgrID is NULL
AND m.MgrID is NULL)
UNION ALL
(SELECT e.EmpNo, e.EmpName, e.JobProfile, e.LevelID, e.MgrID,
sub.EmpName MgrName, sub.JobProfile MgrProfile
FROM MyEmp AS e
INNER JOIN
Subordinates AS sub ON
e.MgrID = sub.EmpNo
)
)
SELECT * FROM Subordinates AS s;
Result:
EmpNo EmpName JobProfile LevelID MgrID MgrName MgrProfile
7839 KING PRESIDENT 01 NULL KING PRESIDENT
7698 BLAKE MANAGER 02 7839 KING PRESIDENT
7782 CLARK MANAGER 02 7839 KING PRESIDENT
7566 JONES MANAGER 02 7839 KING PRESIDENT
7902 FORD ANALYST 03 7566 JONES MANAGER
7788 SCOTT ANALYST 03 7566 JONES MANAGER
7876 ADAMS CLERK 04 7788 SCOTT ANALYST
7369 SMITH CLERK 04 7902 FORD ANALYST
7934 MILLER CLERK 03 7782 CLARK MANAGER
7654 MARTIN SALESMAN 03 7698 BLAKE MANAGER
7499 ALLEN SALESMAN 03 7698 BLAKE MANAGER
7844 TURNER SALESMAN 03 7698 BLAKE MANAGER
7900 JAMES CLERK 03 7698 BLAKE MANAGER
7521 WARD SALESMAN 03 7698 BLAKE MANAGER
The Oracle Query with CONNECT BY ... PRIOR TO is capable of resolving this problem, but I need a Query that can be effective on a SQL Server
Oracle Query
Select e.MgrID,
--m.EmpName MgrName, m.JobProfile MgrProfile,
e.EmpNo, e.EmpName, e.JobProfile
FROM MyEmp AS e
-- LEFT OUTER JOIN
-- MyEmp AS m ON
-- e.MgrID is NULL
--AND m.MgrID is NULL
START WITH e.MgrID is NULL
CONNECT BY PRIOR EmpNo = MgrID
;

need query without cross join

i have a table which has sales at day level
sales_day
loc_id day_id sales
124 2013-01-01 100
124 2013-01-02 120
124 2013-01-03 140
124 2013-01-04 160
124 2013-01-05 180
124 2013-01-06 200
124 2013-01-07 220
there is weekly table which is the aggregate of all the days
loc_id week_id sales
123 201401 1120
Now i need all of the above in table as below
loc_id day_id sales week_sales
124 2013-01-01 100 1120
124 2013-01-02 120 1120
124 2013-01-03 140 1120
124 2013-01-04 160 1120
124 2013-01-05 180 1120
124 2013-01-06 200 1120
124 2013-01-07 220 1120
there are so many loactions and so many weeks,days.
How to get the data exactly without cross join.
Have you tried this:
select loc_id, day_id, sales, week_sales
from table
cross join (
select sum(sales) as week_sales from table
) t
Window analytical function should help you here...
select loc_id,
day_id,
sales,
sum(sales) over(partition by loc_id,date_part('week', day_id)) as week_total_sales
from <table name>
It will sum the sales by location id and the week of the year to give you the total you are looking for.
In your example, 2013-01-07 was included with the other dates, but it isn't actually part of the same calendar week.
It wasn't clear which DBMS you were referring to. The above is for Netezza. For SQL Server etc try changing date_part('week',day_id) to datepart(ww,day_id).

Resources