Convert Date Stored as NUMERIC to DATETIME - sql-server

I am currently working on a query that needs to calculate the difference in days between two different dates. I've had issues with our DATE columns before, because they are all being stored as numeric columns which is a complete pain.
I tried using CONVERT as I had done in the past to try and get the different pieces of the DATETIME string built, but I am not having any luck.
The commented line --convert(datetime,) is where I am having the issue. Basically, I need to convert PO_DATE and LINE_DOCK_DATE to a format that is usable, so I can calculate the difference between the two in days.
USE BWDW
GO
SELECT
[ITEM_NO]
,[ITEM_DESC]
,[HEADER_DUE_DATE]
,[BWDW].[dbo].[DS_tblDimWhs].WHS_SHORT_NAME AS 'Warehouse'
,[BWDW].[dbo].[DS_tblFactPODtl].[PO_NO] AS 'PO NUMBER'
,[BWDW].[dbo].[DS_tblFactPODtl].[PO_DATE] AS 'Start'
,[BWDW].[dbo].[DS_tblFactPODtl].[PO_STATUS] AS 'Status'
,[BWDW].[dbo].[DS_tblFactPODtl].[LINE_DOCK_DATE] AS 'End'
--,(SELECT CONVERT(DATETIME, CONVERT(CHAR(8), [BWDW].[dbo].[DS_tblFactPODtl].[PO_DATE])) FROM dbo.DS_tblFactPODtl)
FROM [BWDW].[dbo].[DS_tblFactPODtl]
INNER JOIN [BWDW].[dbo].[DS_tblDimWhs] ON [BWDW].[dbo].[DS_tblFactPODtl].WAREHOUSE = [BWDW].[dbo].[DS_tblDimWhs].WAREHOUSE
INNER JOIN [BWDW].[dbo].[DS_tblFactPO] ON [BWDW].[dbo].[DS_tblFactPODtl].PO_NO = [BWDW]. [dbo].[DS_tblFactPO].PO_NO
WHERE [BWDW].[dbo].[DS_tblFactPODtl].[PO_STATUS] = 'Closed'
AND [BWDW].[dbo].[DS_tblFactPODtl].[LINE_DOCK_DATE] <> 0
I have a snippet I saved from a previous project I worked on that needed to only display results from today through another year. That had a bunch of CAST and CONVERTS in it, but I tried the same methodology with no success.
In the long run, I want to add a column to each database table to contain a proper datetime column that is usable in the future... but that is another story. I have read numerous posts on stackoverflow that talk about converting to NUMERIC and such, but nothing out of a NUMERIC back to DATETIME.
Example data:
Start | End | Difference
--------------------------------
20110501 | 20111019 | 171
20120109 | 20120116 | 7
20120404 | 20120911 | 160
Just trying to calculate the difference..
MODIFIED PER AARON:
SELECT
FPODtl.[ITEM_NO] AS [Item]
,FPODtl.[ITEM_DESC] AS [Description]
,D.WHS_SHORT_NAME AS [Warehouse]
,FPODtl.[PO_NO] AS [PO NUMBER]
,FPODtl.[PO_DATE] AS [Start]
,FPODtl.[PO_STATUS] AS [Status]
,FPODtl.[LINE_DOCK_DATE] AS [End]
,DATEDIFF
(
DAY,
CASE WHEN ISDATE(CONVERT(CHAR(8), FPODtl.PO_DATE)) = 1
THEN CONVERT(DATETIME, CONVERT(CHAR(8), FPODtl.PO_DATE)) END,
CASE WHEN ISDATE(CONVERT(CHAR(8), FPODtl.[LINE_DOCK_DATE])) = 1
THEN CONVERT(DATETIME, CONVERT(CHAR(8), FPODtl.[LINE_DOCK_DATE])) END
)
FROM [dbo].[DS_tblFactPODtl] AS FPODtl
INNER JOIN [dbo].[DS_tblDimWhs] AS D
ON FPODtl.WAREHOUSE = D.WAREHOUSE
INNER JOIN [dbo].[DS_tblFactPO] AS FPO
ON FPODtl.PO_NO = FPO.PO_NO
WHERE FPODtl.[PO_STATUS] = 'Closed'
AND FPODtl.[LINE_DOCK_DATE] <> 0;

DECLARE #x NUMERIC(10,0);
SET #x = 20110501;
SELECT CONVERT(DATETIME, CONVERT(CHAR(8), #x));
Result:
2011-05-01 00:00:00.000
To compare two:
DECLARE #x NUMERIC(10,0), #y NUMERIC(10,0);
SELECT #x = 20110501, #y = 20111019;
SELECT DATEDIFF
(
DAY,
CONVERT(DATETIME, CONVERT(CHAR(8), #x)),
CONVERT(DATETIME, CONVERT(CHAR(8), #y))
);
Result:
171
More importantly, fix the table. Stop storing dates as numbers. Store them as dates. If you get errors with this conversion, it's because your poor data choice has allowed bad data into the table. You can get around that - potentially - by writing the old version of TRY_CONVERT():
SELECT DATEDIFF
(
DAY,
CASE WHEN ISDATE(col1)=1 THEN CONVERT(DATETIME, col1) END,
CASE WHEN ISDATE(col2)=1 THEN CONVERT(DATETIME, col2) END
)
FROM
(
SELECT
col1 = CONVERT(CHAR(8), col1),
col2 = CONVERT(CHAR(8), col2)
FROM dbo.table
) AS x;
This will produce nulls for any row where there is garbage in either column. Here is a modification to your original query:
SELECT
[ITEM_NO] -- what table does this come from?
,[ITEM_DESC] -- what table does this come from?
,[HEADER_DUE_DATE] -- what table does this come from?
,D.WHS_SHORT_NAME AS [Warehouse] -- don't use single quotes for aliases!
,FPODtl.[PO_NO] AS [PO NUMBER]
,FPODtl.[PO_DATE] AS [Start]
,FPODtl.[PO_STATUS] AS [Status]
,FPODtl.[LINE_DOCK_DATE] AS [End]
,DATEDIFF
(
DAY,
CASE WHEN ISDATE(CONVERT(CHAR(8), FPODtl.PO_DATE)) = 1
THEN CONVERT(DATETIME, CONVERT(CHAR(8), FPODtl.PO_DATE)) END,
CASE WHEN ISDATE(CONVERT(CHAR(8), FPODtl.[LINE_DOCK_DATE])) = 1
THEN CONVERT(DATETIME, CONVERT(CHAR(8), FPODtl.[LINE_DOCK_DATE])) END
)
FROM [dbo].[DS_tblFactPODtl] AS FPODtl
INNER JOIN [dbo].[DS_tblDimWhs] AS D
ON FPODtl.WAREHOUSE = D.WAREHOUSE
INNER JOIN [dbo].[DS_tblFactPO] AS FPO
ON FPODtl.PO_NO = FPO.PO_NO
WHERE FPODtl.[PO_STATUS] = 'Closed'
AND FPODtl.[LINE_DOCK_DATE] <> 0;

If the date stored as a number is like this: 20130226 for today, then the simpler way to convert to DATE or DATETIME would be:
SELECT CONVERT(DATETIME,CONVERT(VARCHAR(8),NumberDate),112)

Here is a quick formula to create a date from parts :
DateAdd( Month, (( #Year - 1900 ) * 12 ) + #Month - 1, #Day - 1 )
Simply use substrings from your original field to extract #Year, #Month and #Day. For instance, if you have a numeric like 19531231 for december 31th, 1953, you could do :
DateAdd( Month, (( SubString(Cast(DateField As Varchar(8)), 1, 4) - 1900 ) * 12 ) +
SubString(Cast(DateField As Varchar(8)), 5, 2) - 1,
SubString(Cast(DateField As Varchar(8)), 7, 2) - 1 )

Related

Evaluating datetime into timewindows

Im trying to establish for any given datetime a tag that is purely dependent on the time part.
However because the time part is cyclic I cant make it work with simple greater lower than conditions.
I tried a lot of casting and shift one time to 24hour mark to kinda break the cycle However it just gets more and more complicated and still doesnt work.
Im using SQL-Server, here is the situation:
DECLARE #tagtable TABLE (tag varchar(10),[start] time,[end] time);
DECLARE #datetimestable TABLE ([timestamp] datetime)
Insert Into #tagtable (tag, [start], [end])
values ('tag1','04:00:00.0000000','11:59:59.9999999'),
('tag2','12:00:00.0000000','19:59:59.9999999'),
('tag3','20:00:00.0000000','03:59:59.9999999');
Insert Into #datetimestable ([timestamp])
values ('2022-07-24T23:05:23.120'),
('2022-07-27T13:24:40.650'),
('2022-07-26T09:00:00.000');
tagtable:
tag
start
end
tag1
04:00:00.0000000
11:59:59.9999999
tag2
12:00:00.0000000
19:59:59.9999999
tag3
20:00:00.0000000
03:59:59.9999999
for given datetimes e.g. 2022-07-24 23:05:23.120, 2022-07-27 13:24:40.650, 2022-07-26 09:00:00.000
the desired result would be:
date
tag
2022-07-25
tag3
2022-07-27
tag2
2022-07-26
tag1
As I wrote i tried to twist this with casts and adding and datediffs
SELECT
If(Datepart(Hour, a.[datetime]) > 19,
Cast(Dateadd(Day,1,a.[datetime]) as Date),
Cast(a.[datetime] as Date)
) as [date],
b.[tag]
FROM #datetimestable a
INNER JOIN #tagtable b
ON SomethingWith(a.[datetime])
between SomethingWith(b.[start]) and SomethingWith(b.[end])
The only tricky bit here is that your tag time ranges can go over midnight, so you need to check that your time is either between start and end, or if it spans midnight its between start and 23:59:59 or between 00:00:00 and end.
The only other piece is splitting your timestamp column into date and time using a CTE, to save having to repeat the cast.
;WITH splitTimes AS
(
SELECT CAST(timestamp AS DATE) as D,
CAST(timestamp AS TIME) AS T
FROM #datetimestable
)
SELECT
DATEADD(
day,
CASE WHEN b.[end]<b.start THEN 1 ELSE 0 END,
a.D) as timestamp,
b.[tag]
FROM [splitTimes] a
INNER JOIN #tagtable b
ON a.T between b.[start] and b.[end]
OR (b.[end]<b.start AND (a.T BETWEEN b.[start] AND '23:59:59.99999'
OR a.T BETWEEN '00:00:00' AND b.[end]))
Live example: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=506aef05b5a761afaf1f67a6d729446c
Since they're all 8-hour shifts, we can essentially ignore the end (though, generally, trying to say an end time is some specific precision of milliseconds will lead to a bad time if you ever use a different data type (see the first section here) - so if the shift length will change, just put the beginning of the next shift and use >= start AND < end instead of BETWEEN).
;WITH d AS
(
SELECT datetime = [timestamp],
date = CONVERT(datetime, CONVERT(date, [timestamp]))
FROM dbo.datetimestable
)
SELECT date = DATEADD(DAY,
CASE WHEN t.start > t.[end] THEN 1 ELSE 0 END,
CONVERT(date, date)),
t.tag
FROM d
INNER JOIN dbo.tagtable AS t
ON d.datetime >= DATEADD(HOUR, DATEPART(HOUR, t.start), d.date)
AND d.datetime < DATEADD(HOUR, 8, DATEADD(HOUR,
DATEPART(HOUR, t.start), d.date));
Example db<>fiddle
Here's a completely different approach that defines the intervals in terms of starts and durations rather than starts and ends.
This allows the creation of tags that can span multiple days, which might seem like an odd capability to have here, but there might be a use for it if we add some more conditions down the line. For example, say we want to be able say "anything from 6pm friday to 9am monday gets the 'out of hours' tag". Then we could add a day of week predicate to the tag definition, and still use the duration-based interval.
I have defined the duration granularity in terms of hours, but of course this can easily be changed
create table #tags
(
tag varchar(10),
startTimeInclusive time,
durationHours int
);
insert #tags
values ('tag1','04:00:00', 8),
('tag2','12:00:00', 8),
('tag3','20:00:00', 8);
create table #dateTimes (dt datetime)
insert #dateTimes
values ('2022-07-24T23:05:23.120'),
('2022-07-27T13:24:40.650'),
('2022-07-26T09:00:00.000');
select dt.dt,
t.tag
from #datetimes dt
join #tags t on cast(dt.dt as time) >= t.startTimeInclusive
and dt.dt < dateadd
(
hour,
t.durationHours,
cast(cast(dt.dt as date) as datetime) -- strip the time from dt
+ cast(t.startTimeInclusive as datetime) -- add back the time from t
);
Maybe I am looking at this to simple, but,
can't you just take the first tag with an hour greater then your hour in table datetimestable.
With an order by desc it should always give you the correct tag.
This will work well as long as you have no gaps in your tagtable
select case when datepart(hour, tag.tagStart) > 19 then dateadd(day, 1, convert(date, dt.timestamp))
else convert(date, dt.timestamp)
end as [date],
tag.tag
from datetimestable dt
outer apply ( select top 1
tt.tag,
tt.tagStart
from tagtable tt
where datepart(Hour, dt.timestamp) > datepart(hour, tt.tagStart)
order by tt.tagStart desc
) tag
It returns the correct result in this DBFiddle
The result is
date
tag
2022-07-25
tag3
2022-07-27
tag2
2022-07-26
tag1
EDIT
If it is possible that there are gaps in the table,
then I think the most easy and solid solution would be to split that row that passes midnight into 2 rows, and then your query can be very simple
See this DBFiddle
select case when datepart(hour, tag.tagStart) > 19 then dateadd(day, 1, convert(date, dt.timestamp))
else convert(date, dt.timestamp)
end as [date],
tag.tag
from datetimestable dt
outer apply ( select tt.tag,
tt.tagStart
from tagtable tt
where datepart(Hour, dt.timestamp) >= datepart(hour, tt.tagStart)
and datepart(Hour, dt.timestamp) <= datepart(hour, tt.tagEnd)
) tag

Check if date falls within Month and Year range

If a user selects a range such as:
Start: November 2016
End: September 2017,
I want to include all results that fall within the range of 2016-11-01 to 2017-09-30.
I tried concatenating together the year, month, and day, however the issue comes that not all months have the same last day. While I know all months start on day 01, a month's end day can be 28, 29, 30, or 31.
Is there a way to do this without constructing the date? SqlServer 2008 doesn't have the EOMONTH function, and I feel like anything more complex than that is not the right solution. I would like to avoid this:
WHERE
DateCol >= '2016' + '-' + '11' + '-01' AND
DateCol <= '2017' + '-' + '09' + '-30'
It really seems to me that the easiest and best answer is to go from the first of the beginning month to the first of the month after the ending month, and make the second comparison not inclusive.
In other words, instead of this:
WHERE
DateCol >= '2016' + '-' + '11' + '-01' AND
DateCol <= '2017' + '-' + '09' + '-30'
simply this:
WHERE
DateCol >= '2016' + '-' + '11' + '-01' AND
DateCol < '2017' + '-' + '10' + '-01'
There is a faster way to do so :
DECLARE #minDate DATE
DECLARE #maxDate DATE
SET #minDate = XXXXX
SET #maxDate = YYYYY
-- Get the first day of the month minDate.
SET #minDate = CONVERT(datetime,CONVERT(varchar(6),#minDate,112)+'01',112)
-- Get the last day of the month minDate.
SET #maxDate = CONVERT(datetime,CONVERT(varchar(6),#maxDate,112)+'01',112)
SET #maxDate = DATEADD(day, -1, DATEADD(month, 1, #maxDate))
SELECT * FROM myTABLE WHERE DateCol >= #minDate AND DateCol <= #maxDate
Or :
SELECT * FROM myTABLE
WHERE DateCol >= CONVERT(datetime,CONVERT(varchar(6),XXXXX,112)+'01',112)
AND DateCol <= DATEADD(day, -1, DATEADD(month, 1, CONVERT(datetime,CONVERT(varchar(6),YYYYY,112)+'01',112)))
Use syntax like CONVERT(datetime,'20170930',112) or CONVERT(datetime,'09-30-2017',110) for XXXXX and YYYYY rather than '2017-09-30' that use SQL Server implicit convertion from char to datetime (rely on the server configuration : can be hazardous!!!)).
Using this syntax is faster because #minDate and #maxDate do not need any evaluation. So that indexes can be used directly...
Otherwise a scalar function that will simulate the eomonth() behaviour could be usefull...
You could use following select statement to get last date of any month (and any year) by passing a field or date to it:
DECLARE #dtDate DATE
SET #dtDate = '09/25/2016'
SELECT CAST(DATEADD(s,-1,DATEADD(mm, DATEDIFF(m,0,#dtDate)+1,0)) AS DATE) AS LastDay_AnyMonth
Please provide some example data and desired result and I will update my answer further.
Here you go:
DECLARE #YourTable TABLE (YourData DATE);
INSERT INTO #YourTable VALUES
('2016-11-01'),
('2016-09-05'),
('2017-03-03'),
('2017-11-11'),
('2017-12-14'),
('2017-09-30');
WITH CTE AS (
SELECT YourData
FROM #YourTable
WHERE YEAR(YourData) =2016 AND MONTH (YourData) >= 11
)
SELECT YourData
FROM #YourTable
WHERE YEAR(YourData) =2017 AND MONTH (YourData) <= 9
UNION ALL
SELECT YourData
FROM CTE;
There is no need to know the end of the month (28 or 30 or 31).
For 2008, you can simply convert the string
Example
Select Date1=convert(date,'November 2016')
,Date2=dateadd(DAY,-1,dateadd(MONTH,1,convert(date,'September 2017')))
Returns
Date1 Date2
2016-11-01 2017-09-30
So the WHERE would be somthing like this
...
Where DateCol between convert(date,'November 2016')
and dateadd(DAY,-1,dateadd(MONTH,1,convert(date,'September 2017')))
A useful construct for performing date-based operations is to make use of a Date Dimension table. You are creating a lookup table that is populated with a lot of information about dates over a large span of time. You can then query the table based on the information that you do have. The table is small enough so that it does not impose significant performance concerns.
In your particular case, you have the month and year. You would plug that into the date dimension table to get the first of the month from the beginning month and the last of the month from the ending month. You now have a time range to search over without any complex logic or calculations on the fly.
Aaron Bertrand explains it in depth here: https://www.mssqltips.com/sqlservertip/4054/creating-a-date-dimension-or-calendar-table-in-sql-server/

Set based solution for processing rows in a SQL table

Can someone steer me in the right direction for solving this issue with a set-based solution versus cursor-based?
Given a table with the following rows:
Date Value
2013-11-01 12
2013-11-12 15
2013-11-21 13
2013-12-01 0
I need a query that will give me a row for each date between 2013-11-1 and 2013-12-1, as follows:
2013-11-01 12
2013-11-02 12
2013-11-03 12
...
2013-11-12 15
2013-11-13 15
2013-11-14 15
...
2013-11-21 13
2013-11-21 13
...
2013-11-30 13
2013-11-31 13
Any advice and/or direction will be appreciated.
The first thing that came to my mind was to fill in the missing dates by looking at the day of the year. You can do this by joining to the spt_values table in the master DB and adding the number to the first day of the year.
DECLARE #Table AS TABLE(ADate Date, ANumber Int);
INSERT INTO #Table
VALUES
('2013-11-01',12),
('2013-11-12',15),
('2013-11-21',13),
('2013-12-01',0);
SELECT
DateAdd(D, v.number, MinDate) Date
FROM (SELECT number FROM master.dbo.spt_values WHERE name IS NULL) v
INNER JOIN (
SELECT
Min(ADate) MinDate
,DateDiff(D, Min(ADate), Max(ADate)) DaysInSpan
,Year(Min(ADate)) StartYear
FROM #Table
) dates ON v.number BETWEEN 0 AND DaysInSpan - 1
Next I would wrap that to make a derived table, and add a subquery to get the most recent number. Your end result may look something like:
DECLARE #Table AS TABLE(ADate Date, ANumber Int);
INSERT INTO #Table
VALUES
('2013-11-01',12),
('2013-11-12',15),
('2013-11-21',13),
('2013-12-01',0);
-- Uncomment the following line to see how it behaves when the date range spans a year end
--UPDATE #Table SET ADate = DateAdd(d, 45, ADate)
SELECT
AllDates.Date
,(SELECT TOP 1 ANumber FROM #Table t WHERE t.ADate <= AllDates.Date ORDER BY ADate DESC)
FROM (
SELECT
DateAdd(D, v.number, MinDate) Date
FROM
(SELECT number FROM master.dbo.spt_values WHERE name IS NULL) v
INNER JOIN (
SELECT
Min(ADate) MinDate
,DateDiff(D, Min(ADate), Max(ADate)) DaysInSpan
,Year(Min(ADate)) StartYear
FROM #Table
) dates ON v.number BETWEEN 0 AND DaysInSpan - 1
) AllDates
Another solution, not sure how it compares to the two already posted performance wise but it's a bit more concise:
Uses a numbers table:
Linky
Query:
DECLARE #SDATE DATETIME
DECLARE #EDATE DATETIME
DECLARE #DAYS INT
SET #SDATE = '2013-11-01'
SET #EDATE = '2013-11-29'
SET #DAYS = DATEDIFF(DAY,#SDATE, #EDATE)
SELECT Num, DATEADD(DAY,N.Num,#SDATE), SUB.[Value]
FROM Numbers N
LEFT JOIN MyTable M ON DATEADD(DAY,N.Num,#SDATE) = M.[Date]
CROSS APPLY (SELECT TOP 1 [Value]
FROM MyTable M2
WHERE [Date] <= DATEADD(DAY,N.Num,#SDATE)
ORDER BY [Date] DESC) SUB
WHERE N.Num <= #DAYS
--
SQL Fiddle
It's possible, but neither pretty nor very performant at scale:
In addition to your_table, you'll need to create a second table/view dates containing every date you'd ever like to appear in the output of this query. For your example it would need to contain at least 2013-11-01 through 2013-12-01.
SELECT m.date, y.value
FROM your_table y
INNER JOIN (
SELECT md.date, MAX(my.date) AS max_date
FROM dates md
INNER JOIN your_table my ON md.date >= my.date
GROUP BY md.date
) m
ON y.date = m.max_date

Normalizing a table

I have a legacy table, which I can't change.
The values in it can be modified from legacy application (application also can't be changed).
Due to a lot of access to the table from new application (new requirement), I'd like to create a temporary table, which would hopefully speed up the queries.
The actual requirement, is to calculate number of business days from X to Y. For example, give me all business days from Jan 1'st 2001 until Dec 24'th 2004. The table is used to mark which days are off, as different companies may have different days off - it isn't just Saturday + Sunday)
The temporary table would be created from a .NET program, each time user enters the screen for this query (user may run query multiple times, with different values, table is created once), so I'd like it to be as fast as possible. Approach below runs in under a second, but I only tested it with a small dataset, and still it takes probably close to half a second, which isn't great for UI - even though it's just the overhead for first query.
The legacy table looks like this:
CREATE TABLE [business_days](
[country_code] [char](3) ,
[state_code] [varchar](4) ,
[calendar_year] [int] ,
[calendar_month] [varchar](31) ,
[calendar_month2] [varchar](31) ,
[calendar_month3] [varchar](31) ,
[calendar_month4] [varchar](31) ,
[calendar_month5] [varchar](31) ,
[calendar_month6] [varchar](31) ,
[calendar_month7] [varchar](31) ,
[calendar_month8] [varchar](31) ,
[calendar_month9] [varchar](31) ,
[calendar_month10] [varchar](31) ,
[calendar_month11] [varchar](31) ,
[calendar_month12] [varchar](31) ,
misc.
)
Each month has 31 characters, and any day off (Saturday + Sunday + holiday) is marked with X. Each half day is marked with an 'H'. For example, if a month starts on a Thursday, than it will look like (Thursday+Friday workdays, Saturday+Sunday marked with X):
' XX XX ..'
I'd like the new table to look like so:
create table #Temp (country varchar(3), state varchar(4), date datetime, hours int)
And I'd like to only have rows for days which are off (marked with X or H from previous query)
What I ended up doing, so far is this:
Create a temporary-intermediate table, that looks like this:
create table #Temp_2 (country_code varchar(3), state_code varchar(4), calendar_year int, calendar_month varchar(31), month_code int)
To populate it, I have a union which basically unions calendar_month, calendar_month2, calendar_month3, etc.
Than I have a loop which loops through all the rows in #Temp_2, after each row is processed, it is removed from #Temp_2.
To process the row there is a loop from 1 to 31, and substring(calendar_month, counter, 1) is checked for either X or H, in which case there is an insert into #Temp table.
[edit added code]
Declare #country_code char(3)
Declare #state_code varchar(4)
Declare #calendar_year int
Declare #calendar_month varchar(31)
Declare #month_code int
Declare #calendar_date datetime
Declare #day_code int
WHILE EXISTS(SELECT * From #Temp_2) -- where processed = 0)
BEGIN
Select Top 1 #country_code = t2.country_code, #state_code = t2.state_code, #calendar_year = t2.calendar_year, #calendar_month = t2.calendar_month, #month_code = t2.month_code From #Temp_2 t2 -- where processed = 0
set #day_code = 1
while #day_code <= 31
begin
if substring(#calendar_month, #day_code, 1) = 'X'
begin
set #calendar_date = convert(datetime, (cast(#month_code as varchar) + '/' + cast(#day_code as varchar) + '/' + cast(#calendar_year as varchar)))
insert into #Temp (country, state, date, hours) values (#country_code, #state_code, #calendar_date, 8)
end
if substring(#calendar_month, #day_code, 1) = 'H'
begin
set #calendar_date = convert(datetime, (cast(#month_code as varchar) + '/' + cast(#day_code as varchar) + '/' + cast(#calendar_year as varchar)))
insert into #Temp (country, state, date, hours) values (#country_code, #state_code, #calendar_date, 4)
end
set #day_code = #day_code + 1
end
delete from #Temp_2 where #country_code = country_code AND #state_code = state_code AND #calendar_year = calendar_year AND #calendar_month = calendar_month AND #month_code = month_code
--update #Temp_2 set processed = 1 where #country_code = country_code AND #state_code = state_code AND #calendar_year = calendar_year AND #calendar_month = calendar_month AND #month_code = month_code
END
I am not an expert in SQL, so I'd like to get some input on my approach, and maybe even a much better approach suggestion.
After having the temp table, I'm planning to do (dates would be coming from a table):
select cast(convert(datetime, ('01/31/2012'), 101) -convert(datetime, ('01/17/2012'), 101) as int) - ((select sum(hours) from #Temp where date between convert(datetime, ('01/17/2012'), 101) and convert(datetime, ('01/31/2012'), 101)) / 8)
Besides the solution of normalizing the table, the other solution I implemented for now, is a function which does all this logic of getting the business days by scanning the current table. It runs pretty fast, but I'm hesitant to call a function, if I can instead add a simpler query to get result.
(I'm currently trying this on MSSQL, but I would need to do same for Sybase ASE and Oracle)
This should fulfill the requirement, "...calculate number of business days from X to Y."
It counts each space as a business day and anything other than an X or a space as a half day (should just be H, according to the OP).
I pulled this off in SQL Server 2008 R2:
-- Calculate number of business days from X to Y
declare #start date = '20120101' -- X
declare #end date = '20120101' -- Y
-- Outer query sums the length of the full_year text minus non-work days
-- Spaces are doubled to help account for half-days...then divide by two
select sum(datalength(replace(replace(substring(full_year, first_day, last_day - first_day + 1), ' ', ' '), 'X', '')) / 2.0) as number_of_business_days
from (
select
-- Get substring start value for each year
case
when calendar_year = datepart(yyyy, #start) then datepart(dayofyear, #start)
else 1
end as first_day
-- Get substring end value for each year
, case
when calendar_year = datepart(yyyy, #end) then datepart(dayofyear, #end)
when calendar_year > datepart(yyyy, #end) then 0
when calendar_year < datepart(yyyy, #start) then 0
else datalength(full_year)
end as last_day
, full_year
from (
select calendar_year
-- Get text representation of full year
, calendar_month
+ calendar_month2
+ calendar_month3
+ calendar_month4
+ calendar_month5
+ calendar_month6
+ calendar_month7
+ calendar_month8
+ calendar_month9
+ calendar_month10
+ calendar_month11
+ calendar_month12 as full_year
from business_days
-- where country_code = 'USA' etc.
) as get_year
) as get_days
A where clause can go on the inner-most query.
It is not an un-pivot of the legacy format, which the OP spends much time on and which will probably take more (and possibly unnecessary) computing cycles. I'm assuming such a thing was "nice to see" rather than part of the requirements. Jeff Moden has great articles on how a tally table could help in that case (for SQL Server, anyway).
It might be necessary to watch trailing spaces depending upon how a particular DBMS is set (notice that I'm using datalength and not len).
UPDATE: Added the OP's requested temp table:
select country_code
, state_code
, dateadd(d, t.N - 1, cast(cast(a.calendar_year as varchar(8)) as date)) as calendar_date
, case substring(full_year, t.N, 1) when 'X' then 0 when 'H' then 4 else 8 end as business_hours
from (
select country_code
, state_code
, calendar_year
, calendar_month
+ calendar_month2
+ calendar_month3
+ calendar_month4
+ calendar_month5
+ calendar_month6
+ calendar_month7
+ calendar_month8
+ calendar_month9
+ calendar_month10
+ calendar_month11
+ calendar_month12
as full_year
from business_days
) as a, (
select a.N + b.N * 10 + c.N * 100 + 1 as N
from (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a
, (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) b
, (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) c
) as t -- cross join with Tally table built on the fly
where t.N <= datalength(a.full_year)
Given your temp table is slow to create, are you able to pre-calculate it?
If you're able to put a trigger on the existing table, perhaps you could fire a proc which will drop and create the temp table. Or have an agent job which checks to see if the existing table has been updated (raise a flag somewhere) and then recomputes the temp table.
The existing table's structure is so woeful that I wouldn't be surprised if it will always be expensive to normalize it. Pre-calculating is an easy and simple way around that problem.

SQL Query to return 24 hour, hourly count even when no values exist?

I've written a query that groups the number of rows per hour, based on a given date range.
SELECT CONVERT(VARCHAR(8),TransactionTime,101) + ' ' + CONVERT(VARCHAR(2),TransactionTime,108) as TDate,
COUNT(TransactionID) AS TotalHourlyTransactions
FROM MyTransactions WITH (NOLOCK)
WHERE TransactionTime BETWEEN CAST(#StartDate AS SMALLDATETIME) AND CAST(#EndDate AS SMALLDATETIME)
AND TerminalId = #TerminalID
GROUP BY CONVERT(VARCHAR(8),TransactionTime,101) + ' ' + CONVERT(VARCHAR(2),TransactionTime,108)
ORDER BY TDate ASC
Which displays something like this:
02/11/20 07 4
02/11/20 10 1
02/11/20 12 4
02/11/20 13 1
02/11/20 14 2
02/11/20 16 3
Giving the number of transactions and the given hour of the day.
How can I display all hours of the day - from 0 to 23, and show 0 for those which have no values?
Thanks.
UPDATE
Using the tvf below works for me for one day, however I'm not sure how to make it work for a date range.
Using the temp table of 24 hours:
-- temp table to store hours of the day
DECLARE #tmp_Hours TABLE ( WhichHour SMALLINT )
DECLARE #counter SMALLINT
SET #counter = -1
WHILE #counter < 23
BEGIN
SET #counter = #counter + 1
--print
INSERT INTO #tmp_Hours
( WhichHour )
VALUES ( #counter )
END
SELECT MIN(CONVERT(VARCHAR(10),[dbo].[TerminalTransactions].[TransactionTime],101)) AS TDate, [#tmp_Hours].[WhichHour], CONVERT(VARCHAR(2),[dbo].[TerminalTransactions].[TransactionTime],108) AS TheHour,
COUNT([dbo].[TerminalTransactions].[TransactionId]) AS TotalTransactions,
ISNULL(SUM([dbo].[TerminalTransactions].[TransactionAmount]), 0) AS TransactionSum
FROM [dbo].[TerminalTransactions] RIGHT JOIN #tmp_Hours ON [#tmp_Hours].[WhichHour] = CONVERT(VARCHAR(2),[dbo].[TerminalTransactions].[TransactionTime],108)
GROUP BY [#tmp_Hours].[WhichHour], CONVERT(VARCHAR(2),[dbo].[TerminalTransactions].[TransactionTime],108), COALESCE([dbo].[TerminalTransactions].[TransactionAmount], 0)
Gives me a result of:
TDate WhichHour TheHour TotalTransactions TransactionSum
---------- --------- ------- ----------------- ---------------------
02/16/2010 0 00 4 40.00
NULL 1 NULL 0 0.00
02/14/2010 2 02 1 10.00
NULL 3 NULL 0 0.00
02/14/2010 4 04 28 280.00
02/14/2010 5 05 11 110.00
NULL 6 NULL 0 0.00
02/11/2010 7 07 4 40.00
NULL 8 NULL 0 0.00
02/24/2010 9 09 2 20.00
So how can I get this to group properly?
The other issue is that for some days there will be no transactions, and these days also need to appear.
Thanks.
You do this by building first the 23 hours table, the doing an outer join against the transactions table. I use, for same purposes, a table valued function:
create function tvfGetDay24Hours(#date datetime)
returns table
as return (
select dateadd(hour, number, cast(floor(cast(#date as float)) as datetime)) as StartHour
, dateadd(hour, number+1, cast(floor(cast(#date as float)) as datetime)) as EndHour
from master.dbo.spt_values
where number < 24 and type = 'p');
Then I can use the TVF in queries that need to get 'per-hour' basis data, even for missing intervals in the data:
select h.StartHour, t.TotalHourlyTransactions
from tvfGetDay24Hours(#StartDate) as h
outer apply (
SELECT
COUNT(TransactionID) AS TotalHourlyTransactions
FROM MyTransactions
WHERE TransactionTime BETWEEN h.StartHour and h.EndHour
AND TerminalId = #TerminalID) as t
order by h.StartHour
Updated
Example of a TVF that returns 24hours between any arbitrary dates:
create function tvfGetAnyDayHours(#dateFrom datetime, #dateTo datetime)
returns table
as return (
select dateadd(hour, number, cast(floor(cast(#dateFrom as float)) as datetime)) as StartHour
, dateadd(hour, number+1, cast(floor(cast(#dateFrom as float)) as datetime)) as EndHour
from master.dbo.spt_values
where type = 'p'
and number < datediff(hour,#dateFrom, #dateTo) + 24);
Note that since master.dbo.spt_values contains only 2048 numbers, the function will not work between dates further apart than 2048 hours.
You have just discovered the value of the NUMBERS table. You need to create a table with a single column containing the numbers 0 to 23 in it. Then you join again this table using an OUTER join to ensure you always get 24 rows returned.
So going back to using Remus' original function, I've re-used it in a recursive call and storing the results in a temp table:
DECLARE #count INT
DECLARE #NumDays INT
DECLARE #StartDate DATETIME
DECLARE #EndDate DATETIME
DECLARE #CurrentDay DATE
DECLARE #tmp_Transactions TABLE
(
StartHour DATETIME,
TotalHourlyTransactions INT
)
SET #StartDate = '2000/02/10'
SET #EndDate = '2010/02/13'
SET #count = 0
SET #NumDays = DateDiff(Day, #StartDate, #EndDate)
WHILE #count < #NumDays
BEGIN
SET #CurrentDay = DateAdd(Day, #count, #StartDate)
INSERT INTO #tmp_Transactions (StartHour, TotalHourlyTransactions)
SELECT h.StartHour ,
t.TotalHourlyTransactions
FROM tvfGetDay24Hours(#CurrentDay) AS h
OUTER APPLY ( SELECT COUNT(TransactionID) AS TotalHourlyTransactions
FROM [dbo].[TerminalTransactions]
WHERE TransactionTime BETWEEN h.StartHour AND h.EndHour
AND TerminalId = 4
) AS t
ORDER BY h.StartHour
SET #count = #Count + 1
END
SELECT *
FROM #tmp_Transactions
group by datepart('hour', thetime). to show those hours with no values you'd have to left join a table of times against the grouping (coalesce(transaction.amount, 0))
I've run into a version of this problem before. The suggestion that worked the best was to setup a table (temporary, or not) with the hours of the day, then do an outer join to that table and group by datepart('h', timeOfRecord).
I don't remember why, but probably due to lack of flexibility because of the need for the other table, I ended up using a method where I group by whatever datepart I want and order by the datetime, then loop through and fill any spaces that are skipped with a 0. This approach worked well for me because I'm not reliant on the database to do all my work for me, and it's also MUCH easier to write an automated test for it.
Step 1, Create #table or a CTE to generate a hours days table. Outer loop for days and inner loop hours 0-23. This should be 3 columns Date, Days, Hours.
Step 2, Write your main query to also have days and hours columns and alias it so you can join it. CTE's have to be above this main query and pivots should be inside CTE's for it to work naturally.
Step 3, Do a select from step 1 table and Left join this Main Query table
ON A.[DATE] = B.[DATE]
AND A.[HOUR] = B.[HOUR]
You can also create a order by if your date columns like
ORDER BY substring(CONVERT(VARCHAR(15), A.[DATE], 105),4,2)
Guidlines
This will then give you all data for hours and days and including zeros for hours with no matches to do that use isnull([col1],0) as [col1].
You can now graph facts against days and hours.

Resources