Counting records by year and month including zero counts - database

I am using an SQL Server Compact Edition server and I want to count the number of comments per month that correspond to a certain tutorial within a range of dates and include months which have a count of zero. I know I need to join a "calendar" table to my table to account for the missing months, but I need help with correct implementation of this.
I have a table of all the comments from different tutorials. This table is called Comments and the columns I need are [Tutorial] (nvarchar) and [DateAdded] (DateTime).
Tutorial | DateAdded
---------+-------------
sample | 2013-09-02
sample | 2013-09-04
sample | 2013-09-12
sample | 2013-09-12
example | 2013-09-15
sample | 2013-09-16
sample | 2013-09-21
sample | 2013-09-30
sample | 2013-10-01
sample | 2013-11-11
sample | 2013-11-11
example | 2013-11-14
sample | 2013-11-15
sample | 2013-11-19
sample | 2013-11-21
sample | 2013-11-25
sample | 2014-02-04
sample | 2014-02-06
And I have a Calendar table which has a year and month column like so:
Year | Month
-----+------
2000 | 01
2000 | 02
. | .
. | .
. | .
2099 | 12
If I were looking for the monthly count of the 'sample' comments from the past year (as of Feb. 14th, 2014), then the ideal output would be:
Tutorial | Year | Month | Count
---------+------+-------+------
sample | 2013 | 09 | 7
sample | 2013 | 10 | 1
sample | 2013 | 11 | 6
sample | 2013 | 12 | 0
sample | 2014 | 01 | 0
sample | 2014 | 02 | 2
I was able to figure out how to do the following query, but I need the months that do not have comments to return 0 as well.
SELECT
Tutorial,
datepart(year, DateAdded) AS Year,
datepart(month, DateAdded) AS Month,
COUNT(*) AS Count From Comments
WHERE
DateAdded > DATEADD(year,-1,GETDATE())
AND
Tutorial='sample'
GROUP BY
Tutorial,
datepart(year, DateAdded),
datepart(month, DateAdded)
Output using sample data from above.
Tutorial | Year | Month | Count
---------+------+-------+------
sample | 2013 | 09 | 7
sample | 2013 | 10 | 1
sample | 2013 | 11 | 6
sample | 2014 | 02 | 2
I know I need to join the tables, but I can't seem to figure out which join to use or how to implement it correctly. Please keep in mind that this is for SQL Server CE, so not all commands from SQL Server can be used.
Thanks so much in advance!

If you have a Calendar table with Month and Year you should try something like
SELECT t2.Tutorial, t1.[Month], t1.[Year], COALESCE(t2.Number, 0) AS Result
FROM Calendar AS t1 LEFT JOIN (
SELECT
Tutorial,
CONVERT(NCHAR(6), DateAdded, 112) AS tutDate,
COUNT(*) AS Count From Comments
WHERE
DateAdded > DATEADD(year,-1,GETDATE())
AND
Tutorial='sample'
GROUP BY
Tutorial,
CONVERT(NCHAR(6), [Order Date], 112)
) AS t2
ON (t1.[Year] + t1.[Month]) = t2.tutDate
ORDER BY t1.[Year] + t1.[Month]

What follows is a standalone script you can use to try things out and not touch any of your real database objects in production. The bottom third of the code contains the help with the joins you're looking for.
SQL Server CE will allow you to write a stored procedure, which can in turn be used as the source of a report. Stored procs are nice because they can take input parameters, something that is ideal for doing reporting.
-- create dummy Comments table for prototyping
create table #Comments (
ID int identity(1,1) not null,
Tutorial nvarchar(50) not null,
DateAdded datetime not null,
primary key clustered(DateAdded,ID,Tutorial)
);
-- populate dummy Comments table
declare #startDate datetime = '2000-01-01';
declare #endDate datetime = '2014-02-14';
declare #numTxns int = 5000;
set nocount on;
declare #numDays int = cast(#endDate as int) - cast(#startDate as int) + 1;
declare #i int = 1;
declare #j int = #i + #numTxns;
declare #rnd float;
while #i <= #j
begin
set #rnd = RAND();
insert into #Comments (Tutorial,DateAdded)
select
-- random tutorial titles
coalesce (
case when #rnd < .25 then 'foo' else null end,
case when #rnd between .5 and .75 then 'baz' else null end,
case when #rnd > .75 then 'qux' else null end,
'bar'
) as Tutorial,
-- random dates between #startDate and #endDate
cast(cast(rand() * #numDays + #startDate as int) as datetime) as DateAdded
set #i = #i + 1
end;
-- try deleting some months to see what happens
delete from #Comments
where DateAdded between '2013-11-01' and '2013-11-30'
or DateAdded between '2014-01-01' and '2014-01-31';
set nocount off;
go
-- ### following could easily be rewritten as a stored procedure
-- stored procedure parameters
declare #startDate datetime = '2000-01-01';
declare #endDate datetime = '2014-03-31';
-- pick only one option below
--declare #Tutorial nvarchar(50) = 'foo'; -- this only gets data for Tutorials called 'foo'
declare #Tutorial nvarchar(50) = 'all'; -- this gets data for all tutorials
-- begin stored procedure code
set nocount on;
-- this temp table is an alternative to
-- creating ***and maintaining*** a table full of dates,
-- months, etc., and cluttering up your database
-- in production, it will automatically delete itself
-- once the user has completed running the report.
create table #dates (
DateAdded datetime not null,
YearAdded int null,
MonthAdded int null,
primary key clustered (DateAdded)
);
-- now we put dates into #dates table
-- based on the parameters supplied by
-- the user running the report
declare #date datetime = #startDate;
while #date <= #endDate
begin
insert into #dates
select #date, YEAR(#date), MONTH(#date);
set #date = #date + 1;
end;
-- ## Why put every day of the month in this table?
-- ## I asked for a monthy report, not daily!
-- Yes, but looping through dates is easier, simply add 1 for the next date.
-- You can always build a monthly summary table later if you'd like.
-- This *is* kind of a brute-force solution, but easy to write.
-- More answers to this question in the code below, where they'll make more sense.
set nocount off;
-- now we return the data to the user
-- any month with no Tutorials will still show up in the report
-- but the counts will show as zero
select YearAdded, MonthAdded, SUM(Count_From_Comments) as Count_From_Comments,
SUM(foo) as Count_Foo, SUM(bar) as Count_Bar,
SUM(baz) as Count_Baz, SUM(qux) as Count_Qux
from (
-- ## you can reuse the following code for a detail report by day
-- ## another answer to 'Why not by month?' from above
-- start daily report code
select t1.DateAdded, t1.YearAdded, t1.MonthAdded, t2.Tutorial,
coalesce(Count_From_Comments,0) as Count_From_Comments,
case when t2.Tutorial = 'foo' then 1 else 0 end as foo,
case when t2.Tutorial = 'bar' then 1 else 0 end as bar,
case when t2.Tutorial = 'baz' then 1 else 0 end as baz,
case when t2.Tutorial = 'qux' then 1 else 0 end as qux
from #dates as t1 -- no where clause needed because #dates only contains the ones we want
left join ( -- left join here so that we get all dates, not just ones in #Comments
select *, 1 AS Count_From_Comments
from #Comments
where #Tutorial in (Tutorial,'all')
) as t2
on t1.DateAdded = t2.DateAdded -- ## join on one field instead of two, another answer to 'Why not by month?' from above
-- end daily report code
) as qDetail
group by YearAdded, MonthAdded
order by YearAdded, MonthAdded
-- end stored procedure code
go
-- ## Not required in production code,
-- ## but handy when testing this script.
drop table #dates;
-- #### Since this will be a real table in production
-- #### we definitely only want this for testing!
drop table #Comments;
go
Happy coding.

Related

Create a select statement that returns a record for each day after a given created date

I have a Dimension table containing machines.
Each machine has a date created value.
I would like to have a Select statement that generates for each day after a certain start date the available number of machines. A machine is available after the date created on wards
As I have read only access to the database I am not able to create a physical calendar table
I hope somebody can help me solving my issue
I assume this is what you want. Based on this sample table:
USE tempdb;
GO
CREATE TABLE dbo.Machines
(
MachineID int,
CreatedDate date
);
INSERT dbo.Machines VALUES(1,'20200104'),(2,'20200202'),(3,'20200214');
Then say you wanted the number of active machines starting on January 1st:
DECLARE #StartDate date = '20200101';
;WITH x AS
(
SELECT n = 0 UNION ALL SELECT n + 1 FROM x
WHERE n < DATEDIFF(DAY, #StartDate, GETDATE())
),
days(d) AS
(
SELECT DATEADD(DAY, x.n, #StartDate) FROM x
)
SELECT days.d, MachineCount = COUNT(m.MachineID)
FROM days
LEFT OUTER JOIN dbo.Machines AS m
ON days.d >= m.CreatedDate
GROUP BY days.d
ORDER BY days.d
OPTION (MAXRECURSION 0);
Results:
d MachineCount
---------- ------------
2020-01-01 0
2020-01-02 0
2020-01-03 0
2020-01-04 1
2020-01-05 1
...
2020-01-31 1
2020-02-01 1
2020-02-02 2
2020-02-03 2
...
2020-02-12 2
2020-02-13 2
2020-02-14 3
2020-02-15 3
Clean up:
DROP TABLE dbo.Machines;
(Yes, some people hiss at recursive CTEs. You can replace it with any number of set generation techniques, some I talk about here, here, and here.)

How can I replace duplicate strings with increasing order in T-SQL?

I have a single row table:
Id | Description
---------------
1 #Hello#, Its 5 am. #Hello#, Its 9 am. #Hello# its 12 pm.
I want to replace these duplicate string #Hello# with an increasing order. I need output like
Id | Description
---------------
1 #Hello#, Its 5 am. #Hello1#, Its 9 am. #Hello2# its 12 pm
Try this one,
DECLARE #V_STR NVARCHAR(1000) = (SELECT [Description] FROM [Table1])
,#V_COUNT INT = 0
,#V_TMP NVARCHAR(100) = '#Hello#'
WHILE ((CHARINDEX(#V_TMP,#V_STR)) > 0)
BEGIN
SELECT #V_STR = STUFF(#V_STR,(CHARINDEX(#V_TMP,#V_STR)),LEN(#V_TMP),'#Hello'+CAST(#V_COUNT AS NVARCHAR)+'#')
SET #V_COUNT += 1
END
SELECT #V_STR

SQL Server Pivot Table with multiple column with dates

I have a PIVOT situation.
Source table columns:
Title Description Datetime RecordsCount
A California 2015-07-08 10:44:39.040 5
A California 2015-07-08 12:44:39.040 6
A California 2015-05-08 15:44:39.040 3
B Florida 2015-07-08 16:44:39.040 2
B Florida 2015-05-08 19:44:39.040 4
Now I need this pivoted as
2015-07-08 2015-05-08
Title Description
A California 11 3
B Florida 2 4
if we have two record counts on same dates (no matter of time) then sum them, else display in different column.
Trying to write something like this, but it throws errors.
Select * from #DataQualTest
PIVOT (SUM(RecordCount) FOR DateTime IN (Select Datetime from #DataQualTest) )
AS Pivot_Table
Please help me out with this.
Thanks
Not exactly the word for word solution but this should give you a direction.
create table #tmp
(
country varchar(max)
, date1 datetime
, record int
)
insert into #tmp values ('California', '2010-01-01', 2)
insert into #tmp values ('California', '2010-01-01', 5)
insert into #tmp values ('California', '2012-01-01', 1)
insert into #tmp values ('Florida', '2010-01-01', 3)
insert into #tmp values ('Florida', '2010-01-01', 5)
select * from #tmp
pivot (sum(record) for date1 in ([2010-01-01], [2012-01-01])) as avg
output
country 2010-01-01 2012-01-01
California 7 1
Florida 8 NULL
If you want to be more flexible, you need some pre-processing to get from full timestamps to days (in order for later on the PIVOT's grouping to actually have the anticipated effect):
CREATE VIEW DataQualTestView AS
SELECT
title
, description
, DATEFROMPARTS (DATEPART(yyyy, date_time),
DATEPART(mm, date_time),
DATEPART(dd, date_time)) AS day_from_date_time
, recordsCount
FROM DataQualTest
;
From there you could continue:
DECLARE #query AS NVARCHAR(MAX)
DECLARE #columns AS NVARCHAR(MAX)
SELECT #columns = ISNULL(#columns + ',' , '')
+ QUOTENAME(day_from_date_time)
FROM (SELECT DISTINCT
day_from_date_time
FROM DataQualTestView) AS TheDays
SET #query =
N'SELECT
title
, description
, ' + #columns + '
FROM DataQualTestView
PIVOT(SUM(recordsCount)
FOR day_from_date_time IN (' + #columns + ')) AS Pivoted'
EXEC SP_EXECUTESQL #query
GO
... and would get:
| title | description | 2015-05-08 | 2015-07-08 |
|-------|-------------|------------|------------|
| A | California | 3 | 11 |
| B | Florida | 4 | 2 |
See it in action: SQL Fiddle.
Please comment, if and as this requires adjustment / further detail.

Holiday Availability Calender - sum available days still left to sell over consecutive days

I require is a min & max of the BaseDate where the available to sell = 1 and there are 3 or more consecutive days still available to sell. However, the sum needs to be excluded if the properties changeoverday starts on the same day as the BaseDate, as we are only interested in the gaps that we can't sell due to changeover restrictions. The data would have to be grouped by Code, as we have over 1,000 properties. BaseDates are for 2015 & 2016.
NB: Some properties have more than 1 changeoverDay & are currently held in one column comma separated i.e. Saturday, Sunday
Example Data:-
DECLARE #sampleData TABLE (
Code VARCHAR(5) NOT NULL
, BaseDate DATE NOT NULL
, DayName VARCHAR(9) NOT NULL
, ChangeoverDay VARCHAR(8) NOT NULL
, AvailabletoSell BIT NOT NULL
);
INSERT INTO #sampleData VALUES
('PERCH','2015-05-06','Wednesday','Saturday',0),
('PERCH','2015-05-07','Thursday','Saturday',0),
('PERCH','2015-05-08','Friday','Saturday',0),
('PERCH','2015-05-09','Saturday','Saturday',1), -- Not this one as changeover day is the same as the BaseDate
('PERCH','2015-05-10','Sunday','Saturday',1),
('PERCH','2015-05-11','Monday','Saturday',1),
('PERCH','2015-05-12','Tuesday','Saturday',0),
('PERCH','2015-05-13','Wednesday','Saturday',0),
('PERCH','2015-05-14','Thursday','Saturday',1), -- This one = 3
('PERCH','2015-05-15','Friday','Saturday',1),
('PERCH','2015-05-16','Saturday','Saturday',1),
('PERCH','2015-05-17','Sunday','Saturday',0),
('PERCH','2015-05-18','Monday','Saturday',1), -- This one = 4
('PERCH','2015-05-19','Tuesday','Saturday',1),
('PERCH','2015-05-20','Wednesday','Saturday',1),
('PERCH','2015-05-21','Thursday','Saturday',1),
('PERCH','2015-05-22','Friday','Saturday',0),
('PERCH','2015-05-23','Saturday','Saturday',0),
('PERCH','2015-05-24','Sunday','Saturday',0),
('PERCH','2015-05-25','Monday','Saturday',0),
('PERCH','2015-05-26','Tuesday','Saturday',0),
('PERCH','2015-05-27','Wednesday','Saturday',1), -- Not this one, as only 2 consecutive days
('PERCH','2015-05-28','Thursday','Saturday',1),
('PERCH','2015-05-29','Friday','Saturday',0),
('PERCH','2015-05-30','Saturday','Saturday',0);
I would require the output as below:-
+-------+---------------+-------------+----------------------+
| Code | StartBaseDate | EndBaseDate | TotalAvailabletoSell |
+-------+---------------+-------------+----------------------+
| PERCH | 14/05/2015 | 16/05/2015 | 3 |
| PERCH | 18/05/2015 | 21/05/2015 | 4 |
+-------+---------------+-------------+----------------------+
This gives you what you want. But I feel there's a way to reduce the number of times it touches the table
WITH Groupings AS (
SELECT
Code
,LastChange
,MIN(BaseDate) AS StartBaseDate
,MAX(BaseDate) AS EndBaseDate
,COUNT(*) AS DaysInPeriod
FROM
#sampleData AS s1
CROSS APPLY (
SELECT
MAX(BaseDate) AS LastChange
FROM
#sampleData AS cv
WHERE
s1.BaseDate > cv.BaseDate
AND s1.AvailabletoSell != cv.AvailabletoSell
AND s1.Code = cv.Code
) AS cv
WHERE
s1.AvailabletoSell = 1
GROUP BY
Code
,LastChange
)
SELECT
g.Code
,g.StartBaseDate
,g.EndBaseDate
,CASE WHEN a.DayName = a.ChangeoverDay THEN DaysInPeriod - 1 ELSE DaysInPeriod END AS TotalAvailableToSell
FROM
Groupings AS g
INNER JOIN #sampleData AS a
ON a.BaseDate = g.StartBaseDate AND a.Code = g.Code
WHERE
CASE WHEN a.DayName = a.ChangeoverDay THEN DaysInPeriod - 1 ELSE DaysInPeriod END > 2
The logic is pretty much:
Find the last date where the AvailableToSell flag flipped before "this row"
Group into sets by those dates and count the rows in it
Decrement by 1 if the start date has DayName as the ChangeoverDay
I havent accounted for your note about the ChangeoverDay being a comma separated field. There are plenty of resources on breaking that out which you could then join to. But I think you also need to expand what happens in this scenario with regards to DayName is in the list of ChangeoverDays

Performance Issue in While Clause

Okay everyone,
Apologies in advance for the length. This one's actually kind of fun, though.
I wrote up a SQL script that I was semi-proud of yesterday because I thought it was quite clever. Turns out it gets ruined by performance issues, and I can't even test it because of that, so it may not even be doing what I think sigh.
This problem is best explained with an example:
Column A | Column B | Column C | Column D
Heart | K | 2/1/2013 | 3/1/2013
Heart | K | 2/1/2013 | 3/1/2013
Heart | K | 1/1/2013 | 3/1/2013
Heart | K | 2/1/2013 | 4/1/2013
Spade | 4 | 2/1/2013 | 3/1/2013
Spade | 3 | 2/1/2013 | 3/1/2013
Club | 4 | 2/1/2013 | 3/1/2013
With this table I need to: 1. Starting with the first, update the row with the data following it if the values in Column A match, 2. delete the second row after the update if there was a match, and 3. move on to the next row if there was no match and rerun the same process.
If there's a match, the higher row updates based on the following:
Column A: Nothing
Column B: If both values are the same, keep the value in one, otherwise write 'Multiple'
Column C: Keep the earlier date between the two,
Column D: Keep the later date between the two,
Then I delete the lower row.
My example should result in the following:
Column A | Column B | Column C | Column D
Heart | K | 1/1/2013 | 4/1/2013
Spade | Multiple | 2/1/2013 | 3/1/2013
Club | 4 | 2/1/2013 | 3/1/2013
To do all this I created two table variables, inserted the same data into both, and then cycled through the second (#ScheduleB) looking for matches to update the row in the first table (#ScheduleA). I then deleted the row below the row in #A (because it's the same as B). Finally, when there wasn't a match, I moved to the next row in #A to start the process over. At least that's what the code's supposed to do -- see below.
The problem is performance is TERRIBLE. I've considered using a Cursor, but don't know if the performance would help there.
Any suggestions?
Declare #ScheduleA Table
(
RowNumber int,
Period nvarchar(MAX),
Program nvarchar(MAX),
ControlAccount Nchar(50),
WorkPackage Nchar(50),
CAM Nchar(50),
EVM Nchar(50),
Duration int,
BLStart datetime,
BLFinish datetime
)
Declare #ScheduleB Table
(
RowNumber int,
Period nvarchar(MAX),
Program nvarchar(MAX),
ControlAccount Nchar(50),
WorkPackage Nchar(50),
CAM Nchar(50),
EVM Nchar(50),
Duration int,
BLStart datetime,
BLFinish datetime
)
Insert INTO #ScheduleA
Select ROW_NUMBER() OVER(order by workpackage desc) as [Row], Period, Program,
ControlAccount, WorkPackage, CAM, EVM, Duration, BLStart, BLFinish
From ScheduleData
where program = #Program and period = #Period
Insert INTO #ScheduleB
Select ROW_NUMBER() OVER(order by workpackage desc) as [Row], Period, Program,
ControlAccount, WorkPackage, CAM, EVM, Duration, BLStart, BLFinish
From ScheduleData
where program = #Program and period = #Period
declare #i int = 1
declare #j int = 2
--Create a loop for the second variable that counts up to the last row of the B table
While #j < (select MAX(ROWNUMBER) + 1 from #ScheduleB)
Begin
--if the tables match by WorkPackage THEN
IF ((select WorkPackage from #ScheduleA where RowNumber = #i) =
(select workpackage from #ScheduleB where RowNumber = #j))
Begin
Update #ScheduleA
--Update the Schedule CAM, BLStart, BLFinish of the A table (if necessary)
set CAM =
Case
--Set values in #ScheduleA Column B based on logic
End,
BLStart =
Case
--Set values in #ScheduleA Column C based on logic
End,
BLFinish =
Case
--Set values in #ScheduleA Column D based on logic
End
Where RowNumber = #i
Delete from #ScheduleA
where RowNumber = #i + 1
set #j = #j + 1 --next row in B
End
ELSE
set #i = #i + 1
END
EDIT: To clarify, column B is NOT an integer column, I was simply using this as an example because cards are pretty easily understood. I've since updated the column to include K's.
Based on your requirements I think a solution like this would work:
SELECT
[column a],
CASE WHEN MAX([column b]) <> MIN([column b]) THEN 'multiple' ELSE CAST(MAX([column b]) AS NVARCHAR(10)) END,
MIN([column c]),
MAX([column d])
FROM Table
GROUP BY [column a]
EDIT:
SQL Fiddle

Resources