SQL Server - extract dates from strings in several formats - sql-server

I've inherited quite a mess of a database table column called DOB, of type nvarchar - here is just a sample of the data in this column:
DOB: 1998-09-04US
Sex: M Race: White Year of Birth: 1950
12/31/00
January 5th, 1998
Date of Birth: 12/19/1938
AGE; 46
DOB: 11-24-1967
May 31, 1942, Split, Croatia
DOB:   12/28/1986
D.O.B.31-OCT-92
D.O.B.: January 8, 1973
31/07/1974 (44 years old)
Date Of Birth: 08/01/1979
78  (DOB: 12/09/1940)
1961 (56 years old)
12/31/1985 (PRIMARY)
DOB: 05/27/67
8-Jun-43
9/9/78
12/31/84 0:00
NA
Birth Year 2018
nacido el 29 de junio de 1959
I am trying to determine whether there is any way to extract the dates from these fields, with so many varying formats, without using something like RegEx patterns for every single possible variation in this column.
The resulting extracted data would look like this:
1998-09-04
1950
12/31/00
January 5th, 1998
12/19/1938
11-24-1967
May 31, 1942
12/28/1986
31-OCT-92
January 8, 1973
31/07/1974
08/01/1979
12/09/1940
1961
12/31/1985
05/27/67
8-Jun-43
9/9/78
12/31/84
NA
2018
29 de junio de 1959
While it may be a complete pipe dream, I was wondering if this could be accomplished with SQL, with some kind of "if it looks like a date, attempt to extract it" method. And if not out-of-the-box, perhaps with a helper extension or plugin?

It is possible, but there are potential pitfalls. This will certainly have to be expanded and maintained.
This is a brute-force pattern match where the longest matching pattern is selected
Example - See Full Working Demo
Select ID
,DOB
,Found
From (
Select *
,Found = substring(DOB,patindex(PatIdx,DOB),PatLen)
,RN = Row_Number() over (Partition By ID Order by PatLen Desc)
From #YourTable A
Left Join (
Select *
,PatIdx = '%'+replace(replace(Pattern, 'A', '[A-Z]'), '0', '[0-9]') +'%'
,PatLen = len(Pattern)
From #FindPattern
) B
on patindex(PatIdx,DOB)>0
) A
Where RN=1
Returns

Related

How to Convert Text type to Date type in Google Data Studio?

I am exstracting Text format data from Firebase and I want to show it in Looker Studio by Date format.
Now the text data looks like this,
Sample_Date
AAA 1/2
AAA 1/17
AAA /12/7
AAA 12/23
and the goal is to show YYYY/MM/DD in date format.
Sample_Date
2023/01/02
2023/01/17
2022/12/07
2022/12/23
What I tried:
So I have exstracted Text data like this
To delete unnecessary text (e.g "AAA "and"(W)"), I've created New Fields and enterd bellow.
CASE
WHEN LENGTH(Sample_Date) = 12 THEN SUBSTR(Sample_Date, -8, 5)
WHEN LENGTH(Sample_Date) = 11 THEN SUBSTR(Sample_Date, -7, 4)
WHEN LENGTH(Sample_Date) = 10 THEN SUBSTR(Sample_Date, -6, 3)
ELSE "Other"
END
Now the table looks like this
At last to change into Date format, I've created another fields and enterd bellow.
PARSE_DATE("%m/%d", 1/RejectUnnecessaryText)
Finaly the table looks like this
I want to change year to 2023 though, I dont know how.
Here is Publicly editable sample data of Looker Studio.
I’d be grateful if you could give me advice.
I have solved this question by myself.
Field1: Delete unneccesary texts.
CASE
WHEN LENGTH(Sample_Date) = 12 THEN SUBSTR(Sample_Date, -8, 5)
WHEN LENGTH(Sample_Date) = 11 THEN SUBSTR(Sample_Date, -7, 4)
WHEN LENGTH(Sample_Date) = 10 THEN SUBSTR(Sample_Date, -6, 3)
ELSE "Other"
END
Field2: Extract current year from another Date event.
EXTRACT(YEAR FROM DateEvent)
Field3: Add Field1 a current year and “/".
CONCAT( Field2 , “/“ , Field1)
Field4: Convert text format to date format.
PARSE_DATE(“%Y/%m/%d”, Field3)
*Please comment if you have any better way.
Thank you.

How to convert "weekday month day time" to datetime (YYYY-MM-DD)?

I have a column fecha formatted as a string in the form of
Mon Feb 22 07:55:55 CET 2021
How do I convert it to a date format like YYYY-MM-DD? I have tried CAST, CONVERT ...
And I always get the following error message:
Msg 241, Level 16, State 1, Line 4
Conversion failed when converting date and/or time from character string
Assuming, as I comment, the value is always ddd MMM dd hh:mm:ss tz yyyy you could do some messy string manipulation to convert the value to a valid date and time data type. Then you can format the date however you want in your presentation layer, as it's strongly typed:
SELECT V.DateString,
TRY_CONVERT(datetime2(0),STUFF(STUFF(LEFT(STUFF(V.DateString,1,8,''),12),3,0,RIGHT(V.DateString,5)),3,0,SUBSTRING(V.DateString,4,4)),113) AS ConvertedDate
FROM (VALUES('Mon Feb 22 07:55:55 CET 2021'))V(DateString);
Of course, the real lesson here is don't store date and time values as a varchar in your database. Date and Time data types exist for a reason; use them.
I don't think there are standard functions to deal with your source strings, so I would combine simpler standard transformations to get that result :
declare #Cadena varchar(50) = 'Mon Feb 22 07:55:55 CET 2021'
select substring(#Cadena, 25, 4) + '-' +
replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(substring(#Cadena, 5, 3), 'Jan', '01'), 'Feb', '02'), 'Mar', '03'), 'Apr', '04'), 'May', '05'), 'Jun', '06'), 'Jul', '07'), 'Aug', '08'), 'Sep', '09'), 'Oct', '10'), 'Nov', '11'), 'Dec', '12') + '-' +
substring(#Cadena, 9, 2)
This will fail if your string are not always in the exact same format, in that case more code would be needed.

Using powerquery to unpivot table with multiple columns into a table with two columns that represent pairs of dates from original table?

Imagine I had a 'horizontal' data set that contained:
Unique Key
Multiple 'pairs' of dates across multiple columns (i.e. Event A Start, Event B Start, Event C Start, etc and separate columns for Event A End, Event B End, Event C End, etc).
A single date (not a pair) for a specific 'Event'.
In essence, looks something like this:
Data Set
Unique Key
Event A Start
Event A End
Single Date Event
Event B Start
Event B End
2nd Single Date Event
Key 1
1 Jan 2021
3 Jan 2021
2 Jan 2021
5 Jan 2021
10 Jan 2021
10 Jan 2021
Key 2
7 Jan 2021
10 Jan 2021
null
null
null
null
How would I convert the Data Set above into a table like this using PowerQuery?
Expected Output:
Unique Key
Event
Start Date
End Date
Key 1
Event A
1 Jan 2021
3 Jan 2021
Key 1
Single Date Event
null
2 Jan 2021
Key 1
Event B
5 Jan 2021
10 Jan 2021
Key 1
2nd Single Date Event
null
10 Jan 2021
Key 2
Event A
7 Jan 2021
10 Jan 2021
I've tried:
Unpivot but I can't rename both "Event A Start" and "Event A End" into "Event A". I even tried renaming all "Event [x] Start" as "Event [x]", did a 'unpivot selected' of all "Event [x]'. Then I renamed all "Event [x] End" into "Event [x]" and then performed an unpivot on those columns. Unfortunately, the Key and Event columns don't line up.
Merge Query: I have tried merging one query with another but it's not quite getting the desired output. I created two separate queries (one with Key, Event, and Start Date; another with Key, Event and End Date). But this not having the desired effect. I think this is because of the Single Date Events being 'null'?
I feel I am definitely doing something wrong, so asking here to see if the output that I want is even achievable with PowerQuery based on the input data?
You definitely have to do a bit of extra work on top of an unpivot.
Here's how I'd approach it:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8k6tVDBU0lEyVPBKzFMwMjACcYyROUbIHFNkjqEBLl6sDsRkI6C4OW4teaU5Odip2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Unique Key" = _t, #"Event A Start" = _t, #"Event A End" = _t, #"Single Date Event" = _t, #"Event B Start" = _t, #"Event B End" = _t, #"2nd Single Date Event" = _t]),
#"Unpivoted Columns" = Table.UnpivotOtherColumns(Source, {"Unique Key"}, "Event", "Value"),
#"Changed Type" = Table.TransformColumnTypes(#"Unpivoted Columns",{{"Value", type date}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each if Text.Contains([Event], "Start") then "Start Date" else "End Date"),
#"Transformed Text" = Table.TransformColumns(#"Added Custom",{{"Event", each if Text.EndsWith(_, "Start") or Text.EndsWith(_, "End") then Text.BeforeDelimiter(_, " ", {0, RelativePosition.FromEnd}) else _, type text}}),
#"Pivoted Column" = Table.Pivot(#"Transformed Text", List.Distinct(#"Transformed Text"[Custom]), "Custom", "Value")
in
#"Pivoted Column"
Steps:
Unpivot the date columns
Add a new column to tag each row as Start Date / End Date
Strip off " Start" / " End" suffix in the [Event] column
Pivot on the new column from Step 2

How to find out which of the games happened on Mondays only?

I have tried 2 codes, the first one hasn't worked, while the second has. I basically have to display how many games were played on Mondays and show the teams that played them.
MATCH (t:Teams)
WHERE date({year:2019, month: 1 }) > t.Date <= date({year:2018, month:12})
RETURN t.HomeTeam AS HomeTeam,
t.AwayTeam AS AwayTeam,
t.Date AS Date
The result is: (No changes, no records) - nothing
MATCH (t:Teams)
WITH [item in split(t.Date, "/") | toInteger(item)] AS dateComponents
WITH ({day: dateComponents[0], month: dateComponents[1], year: dateComponents[2]}).dayOfWeek AS date
WHERE date = 1
RETURN COUNT(*)
The result is: Count(*) 0
I think there may be a couple of things going on in your first query. The date matching line
WHERE date({year:2019, month: 1 }) > t.Date <= date({year:2018, month:12})
is looking for a date that is less than 20190101 and less than or equal to 20181201. If you are actually looking for a date between those two values you need to change the operator to greather than equals for 201801.
That said, if Date is actually a string then the date comparison will not work either.
In your second query, it looks like you decided that Date was indeed a string and you split it up but still did not get any results. Although you break the date string up into its components you did not supply the date() function around your date components in this line...
WITH ({day: dateComponents[0], month: dateComponents[1], year: dateComponents[2]}).dayOfWeek AS date
Try this for your second query.
MATCH (t:Teams)
WITH [item in split(t.Date, "/") | toInteger(item)] AS dateComponents
WITH date({day: dateComponents[0], month: dateComponents[1], year: dateComponents[2]}).dayOfWeek AS date
WHERE date = 1
RETURN COUNT(*)

Calculate days in between certain events based on multiple conditions

I am curious if one of you guys can help me to calculate how much days (DAYS BETWEEN TravelStart AND TravelEnd) a certain boat BoatID has been used ONLY for luxury tours BoutTourID = Luxury by different captains CaptainID,
and now for the weird part: UNTIL the next Standard tour BoutTourID = Standard starts. I don't want to take the Cancelled trips into account Status = Cancelled.
CaptainID BoatID BoatTourID Status TravelStart TravelEnd
Jack AlphaBoat Standard 1-7-2019 20-7-2019
Kevin AlphaBoat Luxury 21-7-2019 31-7-2019
Eric AlphaBoat Luxury Cancelled 1-8-2019 10-8-2019
Nick AlphaBoat Standard 11-8-2019 20-8-2019
John AlphaBoat Luxury 21-8-2019 30-8-2019
Lionel BigBoat Standard 1-8-2019 20-8-2019
Jeffrey BigBoat Luxury 20-8-2019 25-8-2019
Chris BigBoat Standard 26-8-2019 28-8-2019
This in SQL should give the following results, so in the basis the table shows the exact same amount of records:
CaptainID
Jack 0 --since BoatTourID = Standard, it should not be calculated
Kevin 10
Eric 0 --since Status = Cancelled
Nick 0
John 9
Lionel 19
Jeffrey 5
Chris 2
It should be possible to run it in 1 SQL query.
The code I wrote so far is very messy and doesn't come close to solving it, so I rather not post it, since I hope for a fresh idea. In case I will still post it, if necessary!
The following query should do what you want:
SELECT
CaptainID
,CASE WHEN BoatTourID = 'Standard' OR [Status] = 'Cancelled' THEN 0
ELSE DATEDIFF(DAY,TravelStart,TravelEnd) AS [Date Difference]
FROM YourTable
You can achieve it by CASE expression along with group by. You can add additional columns in select and group by clause upon your requirement
select CaptinID,
DATEDIFF( DAY,
MIN (CASE WHEN
BoutTourID = 'Luxury' and NOT [Status] = 'Cancelled'
THEN TravelStart Else cast( GETDATE() as date ) ),
MAX (CASE WHEN
BoutTourID = 'Luxury' and NOT [Status] = 'Cancelled'
THEN TravelEnd Else cast ( GETDATE() as date ) )
) Duration_Days
from yourtbale
Group by CaptinID
MIN, MAX and Group by would help you get one line per CaptinID duration from whole trips. In case whole trips per captin not a requirement, you can change remove group by just use CASE expression accordingly..

Resources