Using powerquery to unpivot table with multiple columns into a table with two columns that represent pairs of dates from original table? - pivot-table

Imagine I had a 'horizontal' data set that contained:
Unique Key
Multiple 'pairs' of dates across multiple columns (i.e. Event A Start, Event B Start, Event C Start, etc and separate columns for Event A End, Event B End, Event C End, etc).
A single date (not a pair) for a specific 'Event'.
In essence, looks something like this:
Data Set
Unique Key
Event A Start
Event A End
Single Date Event
Event B Start
Event B End
2nd Single Date Event
Key 1
1 Jan 2021
3 Jan 2021
2 Jan 2021
5 Jan 2021
10 Jan 2021
10 Jan 2021
Key 2
7 Jan 2021
10 Jan 2021
null
null
null
null
How would I convert the Data Set above into a table like this using PowerQuery?
Expected Output:
Unique Key
Event
Start Date
End Date
Key 1
Event A
1 Jan 2021
3 Jan 2021
Key 1
Single Date Event
null
2 Jan 2021
Key 1
Event B
5 Jan 2021
10 Jan 2021
Key 1
2nd Single Date Event
null
10 Jan 2021
Key 2
Event A
7 Jan 2021
10 Jan 2021
I've tried:
Unpivot but I can't rename both "Event A Start" and "Event A End" into "Event A". I even tried renaming all "Event [x] Start" as "Event [x]", did a 'unpivot selected' of all "Event [x]'. Then I renamed all "Event [x] End" into "Event [x]" and then performed an unpivot on those columns. Unfortunately, the Key and Event columns don't line up.
Merge Query: I have tried merging one query with another but it's not quite getting the desired output. I created two separate queries (one with Key, Event, and Start Date; another with Key, Event and End Date). But this not having the desired effect. I think this is because of the Single Date Events being 'null'?
I feel I am definitely doing something wrong, so asking here to see if the output that I want is even achievable with PowerQuery based on the input data?

You definitely have to do a bit of extra work on top of an unpivot.
Here's how I'd approach it:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8k6tVDBU0lEyVPBKzFMwMjACcYyROUbIHFNkjqEBLl6sDsRkI6C4OW4teaU5Odip2FgA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Unique Key" = _t, #"Event A Start" = _t, #"Event A End" = _t, #"Single Date Event" = _t, #"Event B Start" = _t, #"Event B End" = _t, #"2nd Single Date Event" = _t]),
#"Unpivoted Columns" = Table.UnpivotOtherColumns(Source, {"Unique Key"}, "Event", "Value"),
#"Changed Type" = Table.TransformColumnTypes(#"Unpivoted Columns",{{"Value", type date}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each if Text.Contains([Event], "Start") then "Start Date" else "End Date"),
#"Transformed Text" = Table.TransformColumns(#"Added Custom",{{"Event", each if Text.EndsWith(_, "Start") or Text.EndsWith(_, "End") then Text.BeforeDelimiter(_, " ", {0, RelativePosition.FromEnd}) else _, type text}}),
#"Pivoted Column" = Table.Pivot(#"Transformed Text", List.Distinct(#"Transformed Text"[Custom]), "Custom", "Value")
in
#"Pivoted Column"
Steps:
Unpivot the date columns
Add a new column to tag each row as Start Date / End Date
Strip off " Start" / " End" suffix in the [Event] column
Pivot on the new column from Step 2

Related

Measure does not work for Month Threshold

I build this Dax measure
_Access_Daily = CALCULATE(
DISTINCTCOUNTNOBLANK(ApplicationAccessLog[ApplicationUserID]),
FILTER('Date','Date'[DateId]=SELECTEDVALUE('DateSelector'[DateId],MAX('DateSelector'[DateId]))))+0
_Access__PreviousDay = CALCULATE(
DISTINCTCOUNTNOBLANK(ApplicationAccessLog[ApplicationUserID]), FILTER('Date','Date'[DateId]=SELECTEDVALUE('DateSelector'[DateId],MAX('DateSelector'[DateId]))-1 ))+0
The Date Selector table is a disconnected table containing dates from the 20th Jan to now. Dateid is a whole number like 20200131.
The Date table is a standard date table with all the dates between 1970 and 2038. Date id is a whole number like 20200131.
However it does not seems to work for the month threshold between Jan and Feb ? So if selected date is 01/02/2020 then it does not return correctly for the 31/01/2020.
As mentioned in the comments, the root problem here is that the whole numbers you use are not dates. As a result, when you subtract 1 and cross month (or year) boundaries, there is no calendar intelligence that can adjust the numbers properly.
Your solution (using 'Date'[DayDateNext]) might work, and if for some additional considerations this design is a must, go with it. However, I'd suggest to revisit the overall approach and use real dates instead of "DateId". You will then be able to use built-in DAX time intelligence, and your code will be more elegant and faster.
For example, if your "Date" and "DateSelector" tables have regular date fields, your code can be re-written as follows:
_Access_Daily =
VAR Selected_Date = SELECTEDVALUE ( 'DateSelector'[Date], MAX ( 'DateSelector'[Date] ) )
VAR Result =
CALCULATE (
DISTINCTCOUNTNOBLANK ( ApplicationAccessLog[ApplicationUserID] ),
'Date'[Date] = Selected_Date
)
RETURN
Result + 0
and:
_Access_PreviousDay =
CALCULATE ( [_Access_Daily], PREVIOUSDAY ( 'Date'[Date] ) )

How to find out which of the games happened on Mondays only?

I have tried 2 codes, the first one hasn't worked, while the second has. I basically have to display how many games were played on Mondays and show the teams that played them.
MATCH (t:Teams)
WHERE date({year:2019, month: 1 }) > t.Date <= date({year:2018, month:12})
RETURN t.HomeTeam AS HomeTeam,
t.AwayTeam AS AwayTeam,
t.Date AS Date
The result is: (No changes, no records) - nothing
MATCH (t:Teams)
WITH [item in split(t.Date, "/") | toInteger(item)] AS dateComponents
WITH ({day: dateComponents[0], month: dateComponents[1], year: dateComponents[2]}).dayOfWeek AS date
WHERE date = 1
RETURN COUNT(*)
The result is: Count(*) 0
I think there may be a couple of things going on in your first query. The date matching line
WHERE date({year:2019, month: 1 }) > t.Date <= date({year:2018, month:12})
is looking for a date that is less than 20190101 and less than or equal to 20181201. If you are actually looking for a date between those two values you need to change the operator to greather than equals for 201801.
That said, if Date is actually a string then the date comparison will not work either.
In your second query, it looks like you decided that Date was indeed a string and you split it up but still did not get any results. Although you break the date string up into its components you did not supply the date() function around your date components in this line...
WITH ({day: dateComponents[0], month: dateComponents[1], year: dateComponents[2]}).dayOfWeek AS date
Try this for your second query.
MATCH (t:Teams)
WITH [item in split(t.Date, "/") | toInteger(item)] AS dateComponents
WITH date({day: dateComponents[0], month: dateComponents[1], year: dateComponents[2]}).dayOfWeek AS date
WHERE date = 1
RETURN COUNT(*)

SQL Server - extract dates from strings in several formats

I've inherited quite a mess of a database table column called DOB, of type nvarchar - here is just a sample of the data in this column:
DOB: 1998-09-04US
Sex: M Race: White Year of Birth: 1950
12/31/00
January 5th, 1998
Date of Birth: 12/19/1938
AGE; 46
DOB: 11-24-1967
May 31, 1942, Split, Croatia
DOB:   12/28/1986
D.O.B.31-OCT-92
D.O.B.: January 8, 1973
31/07/1974 (44 years old)
Date Of Birth: 08/01/1979
78  (DOB: 12/09/1940)
1961 (56 years old)
12/31/1985 (PRIMARY)
DOB: 05/27/67
8-Jun-43
9/9/78
12/31/84 0:00
NA
Birth Year 2018
nacido el 29 de junio de 1959
I am trying to determine whether there is any way to extract the dates from these fields, with so many varying formats, without using something like RegEx patterns for every single possible variation in this column.
The resulting extracted data would look like this:
1998-09-04
1950
12/31/00
January 5th, 1998
12/19/1938
11-24-1967
May 31, 1942
12/28/1986
31-OCT-92
January 8, 1973
31/07/1974
08/01/1979
12/09/1940
1961
12/31/1985
05/27/67
8-Jun-43
9/9/78
12/31/84
NA
2018
29 de junio de 1959
While it may be a complete pipe dream, I was wondering if this could be accomplished with SQL, with some kind of "if it looks like a date, attempt to extract it" method. And if not out-of-the-box, perhaps with a helper extension or plugin?
It is possible, but there are potential pitfalls. This will certainly have to be expanded and maintained.
This is a brute-force pattern match where the longest matching pattern is selected
Example - See Full Working Demo
Select ID
,DOB
,Found
From (
Select *
,Found = substring(DOB,patindex(PatIdx,DOB),PatLen)
,RN = Row_Number() over (Partition By ID Order by PatLen Desc)
From #YourTable A
Left Join (
Select *
,PatIdx = '%'+replace(replace(Pattern, 'A', '[A-Z]'), '0', '[0-9]') +'%'
,PatLen = len(Pattern)
From #FindPattern
) B
on patindex(PatIdx,DOB)>0
) A
Where RN=1
Returns

Looping in Stata

I am trying to run regression over each id. I aslo need to narrow it down to regression by each year within a particular id.
tsset id date
forvalues i=1/3 {
eststo:quietly arch rtr mon tue wed thu fri lag1r lag2r if id == `i' & Year==`i', noconstant arch(1/1) tarch(1/1) garch(1/1) distribution(t)
}
esttab using d:\Return_reg.csv, append cells("b(fmt(8))")
It returns the following error:
no observations.
I suspect it's because years are different within each id.
How do I need to improve code so I achieve my goal?
As mentioned in the comments, it's a typo (unless your year variable really only takes the values 1, 2 and 3). Furthermore, tsset takes only one argument; if you want to declare panel data, you need to use xtset. Try the following:
xtset id date
levelsof Year, local(years) //create list containing all values of year
levelsof id, local(ids) //create list containing all values of id
foreach id in `ids'{
foreach yr in `years'{
eststo: quietly arch rtr mon tue wed thu fri lag1r lag2r if id == `id' & Year==`yr', noconstant arch(1/1) tarch(1/1) garch(1/1) distribution(t)
}
}

Scala Slick Lifted Date GroupBy

I'm using Scala 2.10 with Slick 1.0.0 and trying to do a lifted query.
I have a table, "Logins", where I'm attempting to do a load, and groupBy on a Timestamp column. However, when I attempt to groupBy, I am running into an issue when I try and format the Timestamp field to extract only the day portion, to group the objects by the same day.
Given the objects:
id | requestTimestamp
1 | Jan 1, 2013 01:02:003
2 | Jan 1, 2013 03:04:005
3 | Jan 1, 2013 05:06:007
4 | Jan 2, 2013 01:01:001
I'd like to return a grouping out of the database by similar days, where, for the sake of brevity, the the following Formatted timestamp to id relationship happens, where the id's would actually be a list of objects
Jan 1, 2013 -> (1, 2, 3)
Jan 2, 2013 (4)
I've got the following slick table object:
private implicit object Logins extends Table[(Int, Timestamp)]("LOGINS") {
def id = column[Int]("ID", O.PrimaryKey)
def requestTimeStamp = column[Timestamp]("REQUESTTIMESTAMP", O.NotNull)
def * = logId ~ requestTimeStamp
}
The following Query method:
val q = for {
l <- Logins if (l.id >= 1 && l.id <= 4)
} yield l
val dayGroupBy = new java.text.SimpleDateFormat("MM/dd/yyyy")
val q1 = q.groupBy(l => dayGroupBy.format(l.requestTimeStamp))
db.withSession {
q1.list
}
However, instead of getting the expected grouping, I get an exception on the line where I attempt the groupBy:
java.lang.IllegalArgumentException: Cannot format given Object as a Date
Does anyone have any suggestions on properly grouping by Timestamps out of the database?
Timestamp and Date are not the same thing! Try to convert Timestamp to Human understandable text using calendar or SimpleDateTime.
Not so sure about the second one though!

Resources