Using regex to extract particular text in Snowflake - snowflake-cloud-data-platform

Would like to fetch the value for the key[Work Order Description:] and print it.
Scripts Used:
CREATE TABLE demo3 (id INT, log VARCHAR);
INSERT INTO demo3 (id, log) VALUES
(1, 'Work order submitted on 12-03-2020
Work Order Description:Lights are not working
Work order status:Completed'),
(2, 'Work order submitted on 5-04-2020
Work order Priority:P3
Work Order Description:Electrical equipment issue
Work order status:Completed');
Implemented Solution:
select id, substr(log, regexp_instr(log, 'Work Order Description:') + 23, 300) as log from demo3;
Implemented Solution Output:
id log
1 Lights are not working Work order status:Completed
2 Electrical equipment issue Work order status:Completed
Issue:
Last lines are also attaching to the output. Instead I am looking to trim the output at the end of the line.
Desired Output Required:
1, Lights are not working
2, Electrical equipment issue
Any help will be appreciated. Thanks in advance.

You can use REGEXP_SUBSTR() as in:
WITH data AS (
SELECT 2 AS id, 'Work order submitted on 5-04-2020
Work order Priority:P3
Work Order Description:Electrical equipment issue
Work order status:Completed' AS log
)
SELECT id, REGEXP_SUBSTR(log, 'Work Order Description:(.*)', 1, 1, 'e')
FROM data

Related

What would be the proper method to approach this assignment to get the Date in the proper arranged or consecutive manner in Microsoft SQL server

first the database is a normal single line one but we have to query to get it to the assignment part while creating :
the assignment is to arrange the tasks provided with dates in such format
to create a schedule for workflow
the motive is to arrange them in kind of this format
I have absolutely no clue but have tried couple things with group by and all open to any answers
sorry but new here please help thank you!`
CREATE DATABASE project ;
CREATE TABLE project_phases(
project_id int,
phase varchar(200),
start_date date
);
INSERT INTO project_phases (project_id,phase,start_date)
VALUES (1, 'design', '2021-01-01');
INSERT INTO project_phases (project_id,phase,start_date)
VALUES (1, 'development', '2021-01-02');
INSERT INTO project_phases (project_id,phase,start_date)
VALUES (1, 'deployment', '2021-01-03');`
Seems like the window function lead() over() is a good fit here
Select project_id
,from_phase = phase
,to_phase = lead(phase,1) over (partition by project_id order by start_date)
,start_date
,end_date = lead(start_date,1) over (partition by project_id order by start_date)
from project_phases
Results

SQL Server Difference in dates Query AND TRIM

I've got a very tricky scenario below, if anyone can help i'd appreciate it a lot.
I have two tables:
TX
TX_External
TX has two columns: ID & Date_Reported
TX_External has two columns: ID & Date_Received
Essentially I want to find the difference in days between Date_Received from TX_external and Date_Reported from TX table. It has to be WORKING DAYS ONLY (no weekend or bank holidays).
So my final output should look like this
Customer ID | Days_Taken
MAIN PROBLEM.
My main issue is that the id's in both tables are different ever so slightly, for example:
Tx- ID= AB_123456_ABC
TX_External- ID= AB_123456
So as you can see they are similar and refer to the same transaction but the TX table ID has random extras attached after the number in the version of "_ABC" and this could be extra numbers letters brackets etc. (so AB_123456_ABC, AB_123456_E (1), AB_123456_H, etc). To add to the confusion, the ID is not one consistent size ie they are all not AB_123456 as some could be AB_12345678! So trim to the second _ and trim
How do I trim that so I can get it exactly the same as the Tx_External ID (AB_123456) and then this will let me do a match so for that exact ID i can calculate the working day taken? Can I do this all in one query?
Sorry about massive description in advance, more than happy to explain further.
select * from tx a
cross apply (select case when a.ID like '%[_]%[_]%' then
replace(a.ID, reverse(left(reverse(a.ID), charindex('_', reverse(a.ID)))), '')
else a.ID end txstripped)b
join TX_External c cross apply (select case when c.ID like '%[_]%[_]%' then
replace(c.ID, reverse(left(reverse(c.ID), charindex('_', reverse(c.ID)))), '')
else ID end txexternal_stripped)d
on b.txstripped=d.txexternal_stripped

Updating Aggregated tables with new sequence

General Scenario:
I have an aggregated table per user and date with several measures.
the table stores up to 10 records per user and date (could be less, depending on the user activity)
There is a column which is the sequence occurrence ordered by date.
Sample:
CREATE TABLE #Main (UserId int , DateId int , MeasureA numeric(20,2) , MeasureB numeric(20,2), PlayDaySeq int)
INSERT INTO #Main
VALUES (188, 20180522 ,75.00, 282287.00, 1),
(188, 20180518 ,250.00, 1431725.00, 2),
(188, 20180514 ,25.00, 35500.00, 3),
(188, 20180513 ,115.00, 67100.00, 4),
(188, 20180511 ,75.00, 10625.00, 5),
(188, 20180510 ,40.00, 2500.00, 6),
(188, 20180509 ,40.00, 750.00, 7),
(188, 20180508 ,160.00, 16250.00, 8),
(188, 20180507 ,135.00, 138200.00, 9),
(188, 20180507 ,150.00, 68875.00, 10)
The Column PlayDaySeq is calculated as ROW_NUMBER () OVER (PARTITION BY UserID ORDER BY DateId DESC)
and here is the table that will hold the new aggregated data for this is user:
CREATE TABLE #Inc (UserId int , DateId int , MeasureA numeric(20,2) , MeasureB numeric(20,2), PlayDaySeq int)
INSERT INTO #Inc
VALUES (188, 20180523 ,225.00, 802921.00, 1)
Now, a new record is available so I used The following:
INSERT INTO #Main
SELECT *
FROM #Inc I
WHERE NOT EXISTS
(
SELECT 1
FROM #Main M
WHERE i.UserId = M.UserId
AND i.DateId = M.DateId
)
The Question is
I need to update the PlayDaySeq column so the new record will 1 and all the rest will increment by 1
and delete the records that their sequence will be greater than 10
What is the best way of doing that?
keep in mind that the #main table is pretty large (250M records).
I can update the sequence by running the ROW_NUMBER again, and then DELETE the ones that will be greater than 10,
I'm looking for the most efficient way to do that.
Updating one row resulting an update of every other single record does not sounds a good idea despite how infrequently it is. Like the comment already mentioned that I don't see the need of such a column.
But you stated you have you reason so I will assume that is true.
My suggestion is drop PlayDaySeq on the table and create a view with following as additional column.
ROW_NUMBER () OVER (PARTITION BY UserID ORDER BY DateId DESC) AS PlayDaySeq
And then whatever your code was using that table now should use the view, should keep the change minimal. But you need to test this out see what's the performance like. Also if you changing the view to indexed view, SQL server stores the value as a table like thing, which when you insert new record it would automatically update things for you, again you need test performance, on insert.
If I were you I would be more willing to try a different approach, such as instead of make it 1,2,3 I set it to 100,200,300, hence when insertion need are smaller like 20 records a day I then never need update rest record but just put in 11,12 101,102 which would still keeps the order correct, and a nightly job to update whole table to be 100,200,300 again for a fresh start next day, or make the code to only do it when running out of numbers, , but due to how you are using it as you state this other meaning, it may not work at all.

Using Set and Between in the same query to populate a table

What I am trying to do is populate a row using a counter or some function of sql I am unaware of.
I can use
INSERT INTO EmpSkillsBridge
(EmpIdFK , EmpSkillFK)
Values
(1,1);
This updates one entire record I can even do it in batches of parentheses.
However since I am setting up a the table for the first time and since the data is for test (cough homework cough) reasons. I am trying make it so that the I can create a whole batch of data I have 200 EmpID and 6 different skills. all I want right now is to make the first 25 EmpIdFK and the EmpSkillFK be (1,1)
If i use a where empIdFk < 26 I get an error.
I tried using a loop but being new I got a little lost on how to implement
Then I read I could use the between statement. so my question is can I use a set statement in conjuction with between and make the code work that way?
Set into EmpSkillsBridge
(EmpIdFK , EmpSkillFK)
WHERE (EMPID BETWEEN 1 AND 26)
Values
(1,1);
would this be the best way to go around that?
You can do this:
Insert into EmpSkillsBridge
Select empid, 1
from employees
where empid between 1 and 25;
Something like this should work for you:-
insert into EmpSkillsBridge
(EmpID, EmpSkillsFK)
select empid, 1 from
(
select ROW_NUMBER() over(order by number) empid
from master..spt_values
) v
where empid between 1 and 25

joining latest of various usermetadata tags to user rows

I have a postgres database with a user table (userid, firstname, lastname) and a usermetadata table (userid, code, content, created datetime). I store various information about each user in the usermetadata table by code and keep a full history. so for example, a user (userid 15) has the following metadata:
15, 'QHS', '20', '2008-08-24 13:36:33.465567-04'
15, 'QHE', '8', '2008-08-24 12:07:08.660519-04'
15, 'QHS', '21', '2008-08-24 09:44:44.39354-04'
15, 'QHE', '10', '2008-08-24 08:47:57.672058-04'
I need to fetch a list of all my users and the most recent value of each of various usermetadata codes. I did this programmatically and it was, of course godawful slow. The best I could figure out to do it in SQL was to join sub-selects, which were also slow and I had to do one for each code.
This is actually not that hard to do in PostgreSQL because it has the "DISTINCT ON" clause in its SELECT syntax (DISTINCT ON isn't standard SQL).
SELECT DISTINCT ON (code) code, content, createtime
FROM metatable
WHERE userid = 15
ORDER BY code, createtime DESC;
That will limit the returned results to the first result per unique code, and if you sort the results by the create time descending, you'll get the newest of each.
I suppose you're not willing to modify your schema, so I'm afraid my answe might not be of much help, but here goes...
One possible solution would be to have the time field empty until it was replaced by a newer value, when you insert the 'deprecation date' instead. Another way is to expand the table with an 'active' column, but that would introduce some redundancy.
The classic solution would be to have both 'Valid-From' and 'Valid-To' fields where the 'Valid-To' fields are blank until some other entry becomes valid. This can be handled easily by using triggers or similar. Using constraints to make sure there is only one item of each type that is valid will ensure data integrity.
Common to these is that there is a single way of determining the set of current fields. You'd simply select all entries with the active user and a NULL 'Valid-To' or 'deprecation date' or a true 'active'.
You might be interested in taking a look at the Wikipedia entry on temporal databases and the article A consensus glossary of temporal database concepts.
A subselect is the standard way of doing this sort of thing. You just need a Unique Constraint on UserId, Code, and Date - and then you can run the following:
SELECT *
FROM Table
JOIN (
SELECT UserId, Code, MAX(Date) as LastDate
FROM Table
GROUP BY UserId, Code
) as Latest ON
Table.UserId = Latest.UserId
AND Table.Code = Latest.Code
AND Table.Date = Latest.Date
WHERE
UserId = #userId

Resources