Need guidance in using REGEXP_REPLACE - snowflake-cloud-data-platform

Need guidance in using REGEXP_REPLACE - snowflake-cloud-data-platform

When I run the below SQL, I'm getting results mentioned below:
SELECT distinct id,date,col_t FROM table1
WHERE col_t LIKE '%true%'
OR col_t LIKE '%https%'
OR col_t LIKE '%.html%'
OR col_t LIKE '%.html%'
OR col_t LIKE '%_____%'
OR col_t LIKE '%null%'
ID DATE COL_T
1 2022-02-02 true
2 2022-02-02 true
3 2022-02-02 PROMOTIONhttps://www.redbus.com/home
4 2022-02-02 google_goog_ob_27pc4https://www.google.com
5 2022-02-02 goog_gl_a1_store_id.html
6 2022-02-02 abc_xyz_def-example_____
7 2022-02-02 elp_car_parking_____
8 2022-02-02 elp1_car2_sum-grill_____
9 2022-02-02 two_abcd_slp_1_null_null_null_null
I want to replace true values with NULL and others with empty string ''.
Can I use REGEXP_REPLACE to get the desired output?
I tried using
REGEXP_REPLACE(COL_T,'\.html|http.*$|_null.*$|____.$|true','')
AS COL_T. But I'm not getting accurate results.The results should look like below:
ID DATE COL_T
1 2022-02-02 NULL
2 2022-02-02 NULL
3 2022-02-02 PROMOTION
4 2022-02-02 google_goog_ob_27pc4
5 2022-02-02 goog_gl_a1_store_id
6 2022-02-02 abc_xyz_def-example
7 2022-02-02 elp_car_parking
8 2022-02-02 elp1_car2_sum-grill
9 2022-02-02 two_abcd_slp_1

Try using Replace instead:
https://docs.snowflake.com/en/sql-reference/functions/replace.html
SELECT distinct id,
date,
CASE
WHEN col_t = 'true' then REPLACE(col_t, 'true', '')
WHEN col_t = '*.html' then REPLACE(col_t, '*.html', '')
ELSE NULL
col_t
FROM table1
WHERE
col_t LIKE '%true%'
OR col_t LIKE '%https%'
OR col_t LIKE '%.html%'
OR col_t LIKE '%.html%'
OR col_t LIKE '%_____%'
OR col_t LIKE '%null%'

Related

Use MAX() of a date in the Google Sheets query function, to get the value of another column

I have a table like this:
id descripcion precio fecha
1 gomitas 5 1/2/2020
1 gomitas 2 2/3/2020
2 DRF 56 2/3/2020
3 BULLDOG 8 2/3/2020
1 gomitas 10 1/3/2020
3 BULLDOG 9 1/4/2020
2 DRF 7 1/4/2020
And this is the desired result:
id precio fecha
1 2 2/3/2020
2 7 1/4/2020
3 9 1/4/2020
That is, group by product (its id), with the maximum date, giving the price detail. In short, to give me the last price of a certain product.
I tried this:
=QUERY(A:D,"SELECT A,C, MAX(D) GROUP BY (A)")
But it asks me to group C (price column). Which doesn't work for me.
I found this too:
=query(A:D,"SELECT A,C,D ORDER BY D DESC LIMIT 3")
But it doesn't work for me, because the limit is variable.

try:
=SORTN(SORT({A:A, C:D}, 3, 0), 9^9, 2, 1, 1)
a simple query can do only:
=QUERY(A:D, "select A,max(D) where A is not null group by A label max(D)''")
or
=QUERY(A:D, "select A,C,max(D) where A is not null group by A,C label max(D)''")

Count all dates within a 6 month period rather than overall

I'm trying to output what every salesperson has sold in the last six months but what I am using counts all dates and outputs them.
SELECT SalespersonNo, COUNT (SalespersonNo) AS ['CarsSold']
FROM CarForSale
WHERE DateSold > '01/08/2018'
GROUP BY SalespersonNo;
As I said above, it outputs all the dates added up instead of what I want which is for it to add up all the cars sold in the past 6 months
These are the results I am getting:
SalespersonNo 'CarsSold'
100001 4
100002 1
100003 1
100004 4
100005 2
100010 1
100011 2
100012 2
100015 1
100017 2
100020 2
I am aiming to get results like this:
SalespersonNo 'CarsSold'
100001 3
100003 1
100004 3
100005 1
100011 2
100015 1
100017 2
100020 1

You probably want to use conditional aggregation:
SELECT SalespersonNo,
COUNT(SalespersonNo) AS [CarsSoldTotal],
COUNT(CASE WHEN DateSold > DATEADD(mm, -6, GETDATE()) THEN 1 END) AS [CarsSold6Month]
FROM CarForSale
WHERE DateSold > '01/08/2018'
GROUP BY SalespersonNo;

using Full join in sql server

i have 2 table data want join all together,each table are less 1 row date
example : table A have 8 row date and table b also have 8 row date but both table have 1 row date is different .i want my result show out as 10 row
table A
RN USERID ClockIn CHECKTIME badgenumber
1 6 8:24AM 2017-03-02 107
1 6 7:57AM 2017-03-03 107
1 6 8:23AM 2017-03-06 107
1 6 8:26AM 2017-03-07 107
1 6 8:57AM 2017-03-08 107
1 6 8:33AM 2017-03-09 107
1 6 8:36AM 2017-03-10 107
1 6 8:15AM 2017-03-13 107
table B
RN USERID ClockOut CHECKTIME badgenumber
1 6 9:31PM 2017-03-01 107
1 6 10:28PM 2017-03-02 107
1 6 8:22PM 2017-03-03 107
1 6 9:18PM 2017-03-06 107
1 6 9:48PM 2017-03-07 107
1 6 9:11PM 2017-03-08 107
1 6 11:31PM 2017-03-09 107
1 6 6:30PM 2017-03-10 107
my result show as
SELECT #clockin.ClockIn, #clockOut.ClockOut,#clockin.USERID,#clockin.CHECKTIME
FROM #clockin
FULL JOIN #clockOut
ON #clockin.CHECKTIME=#clockOut.CHECKTIME
where #clockin.userid = 6 and #clockOut.userid = 6
ORDER BY #clockin.userid;
<!DOCTYPE html>
<html>
<body>
<h2>result</h2>
<img src="https://i.stack.imgur.com/IcdSS.png" alt="result" >
</body>
</html>

Because of your where clause, you are filtering out rows where x.userid is null (where there is no match). This essentially turns your full join into an inner join. You can use coalesce() to return the first non-null value from your two columns and compare that to 6 like so:
SELECT #clockin.ClockIn, #clockOut.ClockOut,#clockin.USERID,#clockin.CHECKTIME
FROM #clockin
FULL JOIN #clockOut
ON #clockin.CHECKTIME=#clockOut.CHECKTIME
where coalesce(#clockin.userid,#clockOut.userid)=6
ORDER BY #clockin.userid;

SELECT ClockIn, ClockOut,
ISNULL(ci.USERID, co.USERID) AS USERID,
CONVERT(VARCHAR(10), ISNULL(ci.CHECKTIME, co.CHECKTIME), 101) AS CHECKTIME
FROM #ClockIn AS ci
FULL JOIN #ClockOut AS co ON (co.CHECKTIME = ci.CHECKTIME);
This should give your desired output for the sample data. However you may have to consider adding RN, USERID etc. in the JOIN filter depending on what you want.
The ISNULL() replaces the NULL id of ClockIn with the id from ClockOut.

Select rows ignoring narrow times

I have a table [EventLog] that contains reads data, recorded by a card reader that controls a gate. However, the same card code [epc] can be read multiple times, during card holder holding for some time near the reader.
I want to show reads for the same code, on the same reader, but ignoring reads for 2 minutes for example.
Example: EventLog
ID EPC ReaderID LogTime
1 1234 1 2016-04-15 12:33:55
2 1234 1 2016-04-15 12:34:05
3 1234 1 2016-04-15 12:34:10
4 4321 2 2016-04-15 12:34:12
5 4321 2 2016-04-15 12:34:14
Desired result:
ID EPC ReaderID LogTime
1 1234 1 2016-04-15 12:33:55
4 4321 2 2016-04-15 12:34:12
What I am using now is the windows function LAG to determine the difference in minutes between each read and it previous one:
SELECT EPC, ReaderName, PersonName, LogTime
FROM (
SELECT EPC, ReaderName, PersonName, LogTime,
DATEDIFF(MINUTE, LAG(LogTime) OVER (PARTITION BY EPC, ReaderID ORDER BY LogTime), LogTime) diff_prev
FROM EventLog l
LEFT OUTER JOIN Person p ON p.EPC = l.EPC
INNER JOIN Reader r ON r.ID = l.ReaderID
) tbl
WHERE diff_prev IS NULL OR diff_prev >= #ignoreMinutes
ORDER BY LogTime
Where #ignoreMinutes is a parameter that specifies how many minutes to ignore the same read.
But this solution is not correct in cases where the card is read once per second, for 3 hours. for Example:
ID EPC ReaderID LogTime diff_prev
1 1234 1 2016-04-15 12:33:55 NULL
2 1234 1 2016-04-15 12:34:05 0
3 1234 1 2016-04-15 12:34:10 0
4 1234 1 2016-04-15 12:34:32 0
5 1234 1 2016-04-15 12:34:54 0
6 1234 1 2016-04-15 12:35:14 0
7 1234 1 2016-04-15 12:35:34 0
8 1234 1 2016-04-15 12:35:54 0
9 1234 1 2016-04-15 12:36:04 0
10 1234 1 2016-04-15 12:36:15 0
11 4321 2 2016-04-15 12:44:12 NULL
12 4321 2 2016-04-15 12:44:14 0
As you see, my solution when executed with #ignoreMinutes = 1, will result in only 2 rows selected ID = 1, 11 since the rest are all diff_prev = 0. But the correct result set should be ID = 1, 6, 10, 11
Can you help? Thanks!

Here's a 'candidate' solution I came up with. At least it works correctly on your last example, returning records 1, 6, 10, 11.
DECLARE #intervalSeconds INT
SET #intervalSeconds = 60;
WITH EL AS
(
-- Select first record for each EPC, this is the baseline for recursion
SELECT
ID,
EPC,
LogTime
FROM EventLog
WHERE LogTime = (SELECT MIN(LogTime) FROM EventLog IEL WHERE IEL.EPC = EventLog.EPC)
-- Add following events
UNION ALL
SELECT
ID,
EPC,
LogTime
FROM
(
SELECT
NextEvent.ID,
NextEvent.EPC,
NextEvent.LogTime,
ROW_NUMBER() OVER(PARTITION BY NextEvent.EPC ORDER BY NextEvent.LogTime) eventNumber
FROM EventLog NextEvent
JOIN
(
SELECT
ID,
ROW_NUMBER() OVER(PARTITION BY EPC ORDER BY LogTime DESC) eventNumber, -- Reverse numbering to get last row by readNumber = 1
EPC,
LogTime
FROM EL -- Recursion
) PreviousEvent -- Here we have all already selected events wich we're interested in
ON PreviousEvent.EPC = NextEvent.EPC
AND PreviousEvent.eventNumber = 1 -- We need only the last one for each EPC
WHERE DATEDIFF(SECOND, PreviousEvent.LogTime, NextEvent.LogTime) > #intervalSeconds
) NextCandidateEvents -- Here we have all events with desired interval offset for each EPC
WHERE NextCandidateEvents.eventNumber = 1 -- We need only the first one for each EPC
)
SELECT * FROM EL
ORDER BY EPC, LogTime

Join/UNION ALL to show result in different columns

I am fetching COUNT from 3 different table based on some conditions but to group them on time interval. (Like: 1 hour, 30 minutes.)
I need the following output:
Date Interval Success Un-Success Closed CLInotFound
2/20/2016 01:01 – 02:00 5 3 2 13
2/20/2016 02:01 – 03:00 14 9 23 5
2/20/2016 03:01 – 04:00 8 67 89 345
2/20/2016 04:01 – 05:00 2 23 92 12
2/20/2016 05:01 – 06:00 44 55 78 98
2/20/2016 06:01 – 07:00 12 87 56 445
I am able to calculate them separately but when I am trying to combine the result gets different.
Query 1 For Success & Un-Success:
SELECT CONVERT(VARCHAR(5), A.InsertionDate ,108) AS 'Interval',
COUNT(CASE WHEN A.call_result = 0 then 1 ELSE NULL END) AS 'Success',
COUNT(CASE WHEN A.call_result = 1 then 1 ELSE NULL END) AS 'Un-Success'
from dbo.AutoRectifier A
WHERE CONVERT(DateTime,A.InsertionDate,101) BETWEEN '2016-02-19 02:10:35.000' AND '2016-02-19 07:15:35.000'
GROUP BY A.InsertionDate;
Query 2 For Closed:
SELECT CONVERT(VARCHAR(5), C.DateAdded ,108) AS 'Interval',
COUNT(*) AS 'Closed' FROM dbo.ChangeTicketState C
WHERE C.SourceFlag = 'S-CNR' AND C.RET LIKE '%CLOSE%'
AND C.DateAdded BETWEEN '2016-02-19 02:10:35.000' AND '2016-02-19 07:15:35.000'
GROUP BY C.DateAdded;
Query 3 For CLI Not Found:
SELECT CONVERT(VARCHAR(5), T.DateAdded ,108) AS 'Interval',
COUNT(*) 'CLI Not Found' FROM dbo.TICKET_INFO T
WHERE T.CONTACT_NUMBER = '' AND T.DateAdded BETWEEN '2016-02-19 02:10:35.000' AND '2016-02-19 07:15:35.000'
GROUP BY T.DateAdded;

You have got several problems to solve in you question.
You have to produce a union result set from Query1, Query2, Query3 to group it. You can use UNION ALL for it but all 3 queries must have similar column list for it. So, add
0 as Closed, 0 as CLInotFound
to select-list of the Query1,
add
0 as Success, 0 as Un-Success, 0 as CLInotFound
to select-list of the Query2 and add
0 as Success, 0 as Un-Success, 0 as Closed
to Query3
Then you can write
select * from Query1
union all
select * from Query2
union all
select * from Query3
Don't convert date to varchar at Query1, Query2, Query3. Better return datetime from query to use it for grouping after union. So, query 1 will look like
SELECT A.InsertionDate AS Date, ...
Query2 -
SELECT C.DateAdded AS Date, ...
etc.
Then you can group results on per-hour basis, for instance using GROUP BY SUBSTRING(CONVERT(VARCHAR(20), Date ,120), 1, 13)
So, the result will look like
SELECT SUBSTRING(CONVERT(VARCHAR(20), Date ,120), 1, 13) as Interval,
sum(Success) as
sum(Un-Success) as,
sum(Closed) as,
sum(CLInotFound) as
from (
select * from Query1
union all
select * from Query2
union all
select * from Query3
) q
GROUP BY SUBSTRING(CONVERT(VARCHAR(20), Date ,120), 1, 13)
Its result have slightly different format of Date and Interval field, but shows the idea.
You can use GROUP BY DATEPART(yy, Date), DATEPART(mm, Date), DATEPART(dd, Date), DATEPART(hh, Date) instead of GROUP BY SUBSTRING(CONVERT(VARCHAR(20), Date ,120), 1, 13) and format if as you wish.
Also result set does not contain intervals that not present at original data.
You can add Query4, containing all intervals required and zeros at all fields to fix it.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Need guidance in using REGEXP_REPLACE - snowflake-cloud-data-platform

Related

Use MAX() of a date in the Google Sheets query function, to get the value of another column

Count all dates within a 6 month period rather than overall

using Full join in sql server

Select rows ignoring narrow times

Join/UNION ALL to show result in different columns

Categories

Resources