TSQL get COUNT of rows that are missing from right table - sql-server

There was one other SIMILAR answer but it is 2 pages long and my requirement doesn't need that. I have 2 tables, tableA and a tableB, and I need to find the COUNTS of rows that are present in tableA but are not present in tableB OR if update_on in tableB is not today's date.
My tables:
tableA:
release_id book_name release_begin_date
----------------------------------------------------
1122 midsummer 2016-01-01
1123 fool's errand 2016-06-01
1124 midsummer 2016-04-01
1125 fool's errand 2016-08-01
tableB:
release_id book_name updated_on
-----------------------------------------
1122 midsummer 2016-08-17
1123 fool's errand 2016-08-16**
Expected result: Since each book is missing one release id, 1 is count. But in addition fool's errand's existing row in tableB has updated_on date of yesterday and not today, it needs to be counted in count_of_not_updated.
book_name count_of_missing count_of_not_updated
-------------------------------------------------------
midsummer 1 0
fool's errand 1 1
Note: Even though fool's errand is present in tableB, I need to show it in count_of_missing because it's updated_on date is yesterday and not today. I know it has to be a combination of a left join and something else, but the kicker here is not only getting the missing rows from left table but at the same time checking if the updated_on table was today's date and if not, count that row in count_of_not_updated.

select sum(case when b.release_id is null then 1 else 0 end) as noReleaseID
, sum(case when datediff(d, b.release_date, getdate()) > 0 then 1 else 0 end) as releaseDateNotToday
, a.release_id
from tableA a
left outer join tableB b on a.release_id = b.release_id
Group by a.release_id
This example uses a sum function on a case statement to add up the instances where the case statement returns true. Note that the current code assumes, as in your example, that you are looking to count all old release dates from table b - more steps would be required if each book has multiple old release dates in table b, and you only want to compare to the most recent release date.

Try this
DECLARE #tableA TABLE (release_id INT, book_name NVARCHAR(50), release_begin_date DATETIME)
DECLARE #tableB TABLE (release_id INT, book_name NVARCHAR(50), updated_on DATETIME)
INSERT INTO #tableA
VALUES
(1122, 'midsummer', '2016-01-01'),
(1123, 'fool''s errand', '2016-06-01'),
(1124, 'midsummer', '2016-04-01'),
(1125, 'fool''s errand', '2016-08-01')
INSERT INTO #tableB
VALUES
(1122, 'midsummer', '2016-08-17'),
(1123, 'fool''s errand', '2016-08-16')
;WITH TmpTableA
AS
(
SELECT
book_name,
COUNT(1) CountOfTableA
FROM
#tableA
GROUP BY
book_name
), TmpTableB
AS
(
SELECT
book_name,
COUNT(1) CountOfTableB,
SUM(CASE WHEN CONVERT(VARCHAR(11), updated_on, 112) = CONVERT(VARCHAR(11), GETDATE(), 112) THEN 0 ELSE 1 END) count_of_not_updated
FROM
#tableB
GROUP BY
book_name
)
SELECT
A.book_name ,
A.CountOfTableA - ISNULL(B.CountOfTableB, 0) AS count_of_missing,
ISNULL(B.count_of_not_updated, 0) AS count_of_not_updated
FROM
TmpTableA A LEFT JOIN
TmpTableB B ON A.book_name = B.book_name
Result:
book_name count_of_missing count_of_not_updated
-------------------- ---------------- --------------------
fool's errand 1 1
midsummer 1 1

Related

Need to generate rows with missing data in a large dataset - SQL

We are comparing values between months over multiple years. As time moves on the number of years and months in the dataset increases. We are only interested in months where there were values for every year, i.e. a full set.
Consider the following example for 1 month (1) over 3 years (1,2,3) and two activities (101, 102)
Dataset:
Activity Month year Count
------- ---- ------ ------
101 1 1 2
101 1 2 3
101 1 3 1
102 1 1 1
102 1 2 1
In the example above only activity 101 will come into consideration as it satisfies the condition that there must be a count for the activity for month 1 IN year 1, 2 and 3.
Activity 102 doesn't qualify for further analysis as it has no record for year 3.
I would like to generate a record with which I can then evaluate this. The record will effectively generate the new record with the missing row (in this case 102, 1, 3 , 0) to complete the dataset
Activity Month year Count
------- ---- ------ ------
102 1 3 0
We find the problem difficult as the data keeps in growing, the number of activities keep expanding and it is a combination of activity, year and month that need to be evaluated.
An elegant solution will be appreciated.
As I mention in my comment, presumably you have both an Activity table and some kind of Calendar table with details of your activities and the years in your system. As such you can therefore do a CROSS JOIN between these 2 objects and then LEFT JOIN to your table to get the data set you want:
--Create sample objects/data
CREATE TABLE dbo.Activity (Activity int); --Obviously your table has more columns
INSERT INTO dbo.Activity (Activity)
VALUES (101),(102);
GO
CREATE TABLE dbo.Calendar (Year int,
Month int);--Likely your table has more columns
INSERT INTO dbo.Calendar (Year, Month)
VALUES(1,1),
(2,1),
(3,1);
GO
CREATE TABLE dbo.YourTable (Activity int,
Year int,
Month int,
[Count] int);
INSERT INTO dbo.YourTable (Activity,Month, Year, [Count])
VALUES(101,1,1,2),
(101,1,2,3),
(101,1,3,1),
(102,1,1,1),
(102,1,2,1);
GO
--Solution
SELECT A.Activity,
C.Month,
C.Year,
ISNULL(YT.[Count],0) AS [Count]
FROM dbo.Activity A
CROSS JOIN dbo.Calendar C
LEFT JOIN dbo.YourTable YT ON A.Activity = YT.Activity
AND C.[Year] = YT.[Year]
AND C.[Month] = YT.[Month]
WHERE C.Month = 1; --not sure if this is needed
If you don't have an Activity and Calendar table (I suggest, however, you should), then you can use subqueries with a DISTINCT, but note this will be far from performant with large data sets:
SELECT A.Activity,
C.Month,
C.Year,
ISNULL(YT.[Count],0) AS [Count]
FROM (SELECT DISTINCT Activity FROM dbo.YourTable) A
CROSS JOIN (SELECT DISTINCT Year, Month FROM dbo.YourTable) C
LEFT JOIN dbo.YourTable YT ON A.Activity = YT.Activity
AND C.[Year] = YT.[Year]
AND C.[Month] = YT.[Month]
WHERE C.Month = 1; --not sure if this is needed

FInd duplicate rows and show only the earliest

I have the following table:
respid, uploadtime
I need a query that will show all the records that respid is duplicate and show them except the latest (by upload time)
exmple:
4 2014-01-01
4 2014-06-01
4 2015-01-01
4 2015-06-01
4 2016-01-01
In this case the query should return four records (the latest is : 4 2016-01-01 )
Thank you very much.
Use ROW_NUMBER:
WITH cte AS (
SELECT respid, uploadtime,
ROW_NUMBER() OVER (PARTITION BY respid ORDER BY uploadtime DESC) rn
FROM yourTable
)
SELECT respid, uploadtime
FROM cte
WHERE rn > 1
ORDER BY respid, uploadtime;
The logic here is to show all records except those having the first row number value, which would be the latest records for each respid group.
If I interpreted your question correctly, then you want to see all records where respid occurs multiple times, but exclude the last duplicate.
Translating this to SQL could sound like "show all records that have a later record for the same respid". That is exactly what the solution below does. It says that for every row in the result a later record with the same respid must exists.
Sample data
declare #MyTable table
(
respid int,
uploadtime date
);
insert into #MyTable (respid, uploadtime) values
(4, '2014-01-01'),
(4, '2014-06-01'),
(4, '2015-01-01'),
(4, '2015-06-01'),
(4, '2016-01-01'), --> last duplicate of respid=4, not part of result
(5, '2020-01-01'); --> has no duplicate, not part of result
Solution
select mt.respid, mt.uploadtime
from #MyTable mt
where exists ( select top 1 'x'
from #MyTable mt2
where mt2.respid = mt.respid
and mt2.uploadtime > mt.uploadtime );
Result
respid uploadtime
----------- ----------
4 2014-01-01
4 2014-06-01
4 2015-01-01
4 2015-06-01

Create a select statement that returns a record for each day after a given created date

I have a Dimension table containing machines.
Each machine has a date created value.
I would like to have a Select statement that generates for each day after a certain start date the available number of machines. A machine is available after the date created on wards
As I have read only access to the database I am not able to create a physical calendar table
I hope somebody can help me solving my issue
I assume this is what you want. Based on this sample table:
USE tempdb;
GO
CREATE TABLE dbo.Machines
(
MachineID int,
CreatedDate date
);
INSERT dbo.Machines VALUES(1,'20200104'),(2,'20200202'),(3,'20200214');
Then say you wanted the number of active machines starting on January 1st:
DECLARE #StartDate date = '20200101';
;WITH x AS
(
SELECT n = 0 UNION ALL SELECT n + 1 FROM x
WHERE n < DATEDIFF(DAY, #StartDate, GETDATE())
),
days(d) AS
(
SELECT DATEADD(DAY, x.n, #StartDate) FROM x
)
SELECT days.d, MachineCount = COUNT(m.MachineID)
FROM days
LEFT OUTER JOIN dbo.Machines AS m
ON days.d >= m.CreatedDate
GROUP BY days.d
ORDER BY days.d
OPTION (MAXRECURSION 0);
Results:
d MachineCount
---------- ------------
2020-01-01 0
2020-01-02 0
2020-01-03 0
2020-01-04 1
2020-01-05 1
...
2020-01-31 1
2020-02-01 1
2020-02-02 2
2020-02-03 2
...
2020-02-12 2
2020-02-13 2
2020-02-14 3
2020-02-15 3
Clean up:
DROP TABLE dbo.Machines;
(Yes, some people hiss at recursive CTEs. You can replace it with any number of set generation techniques, some I talk about here, here, and here.)

Time and Date Clash - Sql Server

stuck on a project. I wrote this code in sql server which finds the duplicate date matches for a staff member, but I'm stuck when trying to expand it to narrow it down to when the time ranges overlap each other also.
So there is a table called 'Rosters' with columns 'StaffID', 'Date', 'Start', 'End'
SELECT
y.[Date],y.StaffID,y.Start,y.[End]
FROM Rosters y
INNER JOIN (SELECT
[Date],StaffID, COUNT(*) AS CountOf
FROM Rosters
GROUP BY [Date],StaffID
HAVING COUNT(*)>1)
dd ON y.[Date]=dd.[Date] and y.StaffID=dd.StaffID
It returns all duplicate dates for each staff member, I wish to add the logic-
y.Start <= dd.[End] && dd.Start <= y.[End]
Is it possible with the way I'm currently doing it? Any help would be appreciated.
#TT. Sorry, below is probably a better visual explanation -
e.g This would be the roster table
ID Date Start End
1 01/01/2000 8:00 12:00
1 01/01/2000 9:00 11:00
2 01/01/2000 10:00 14:00
2 01/01/2000 8:00 9:00
3 01/01/2000 14:00 18:00
3 02/02/2002 13:00 19:00
And I'm trying to return what is below for the example as they are the only 2 rows that clash for ID, Date, and the Time range (start - end)
ID Date Start End
1 01/01/2000 8:00 12:00
1 01/01/2000 9:00 11:00
This is the logic that you would need to filter your results to overlapping time ranges, though I think this can be handled without your intermediate step of finding the duplicates. If you simply post your source table schema with some test data and your desired output, you will get a much better answer:
declare #t table (RowID int
,ID int
,DateValue date --\
,StartTime Time -- > Avoid using reserved words for your object names.
,EndTime Time --/
);
insert into #t values
(1,1, '01/01/2000', '8:00','12:00' )
,(2,1, '01/01/2000', '9:00','11:00' )
,(3,2, '01/01/2000', '10:00','14:00')
,(4,2, '01/01/2000', '8:00','9:00' )
,(5,3, '01/01/2000', '14:00','18:00')
,(6,3, '02/02/2002', '13:00','19:00');
select t1.*
from #t t1
inner join #t t2
on(t1.RowID <> t2.RowID -- If you don't have a unique ID for your rows, you will need to specify all columns so as no to match on the same row.
and t1.ID = t2.ID
and t1.DateValue = t2.DateValue
and t1.StartTime <= t2.EndTime
and t1.EndTime >= t2.StartTime
)
order by t1.RowID
Try this
with cte as
(
SELECT ROW_NUMBER() over (order by StaffID,Date,Start,End) as rno
,StaffID, Date, Start, End
FROM Rosters
)
select distinct t1.*
from cte t1
inner join cte t2
on(t1.rno <> t2.rno
and t1.StaffID = t2.StaffID
and t1.Date = t2.Date
and t1.Start <= t2.End
and t1.End >= t2.Start
)
order by t1.rno
Made some changes in #iamdave's Answer
If you use SQL Server 2012 up, you can try below script:
declare #roster table (StaffID int,[Date] date,[Start] Time,[End] Time);
insert into #roster values
(1, '01/01/2000', '9:00','11:00' )
,(1, '01/01/2000', '8:00','12:00' )
,(2, '01/01/2000', '10:00','14:00')
,(2, '01/01/2000', '8:00','9:00' )
,(3, '01/01/2000', '14:00','18:00')
,(3, '02/02/2002', '13:00','19:00');
SELECT t.StaffID,t.Date,t.Start,t.[End] FROM (
SELECT y.StaffID,y.Date,y.Start,y.[End]
,CASE WHEN y.[End] BETWEEN
LAG(y.Start)OVER(PARTITION BY y.StaffID,y.Date ORDER BY y.Start) AND LAG(y.[End])OVER(PARTITION BY y.StaffID,y.Date ORDER BY y.Start) THEN 1 ELSE 0 END
+CASE WHEN LEAD(y.[End])OVER(PARTITION BY y.StaffID,y.Date ORDER BY y.Start) BETWEEN y.Start AND y.[End] THEN 1 ELSE 0 END AS IsOverlap
,COUNT (0)OVER(PARTITION BY y.StaffID,y.Date) AS cnt
FROM #roster AS y
) t WHERE t.cnt>1 AND t.IsOverlap>0
StaffID Date Start End
----------- ---------- ---------------- ----------------
1 2000-01-01 08:00:00.0000000 12:00:00.0000000
1 2000-01-01 09:00:00.0000000 11:00:00.0000000

How to get a derive 'N' Date Rows from a single record with From / To date columns?

Title sounds confusing but let me please explain:
I have a table that has two columns that provide a date range, and one column that provides a value. I need to query that table and "detail" the data such as this
Is it possible to do only using TSQL?
Additional Info
The table in question is about 2-3million records long (and growing)
Assuming the range of dates is fairly narrow, an alternative is to use a recursive CTE to create a list of all dates in the range and then join interpolate to it:
WITH LastDay AS
(
SELECT MAX(Date_To) AS MaxDate
FROM MyTable
),
Days AS
(
SELECT MIN(Date_From) AS TheDate
FROM MyTable
UNION ALL
SELECT DATEADD(d, 1, TheDate) AS TheDate
FROM Days CROSS JOIN LastDay
WHERE TheDate <= LastDay.MaxDate
)
SELECT mt.Item_ID, mt.Cost_Of_Item, d.TheDate
FROM MyTable mt
INNER JOIN Days d
ON d.TheDate BETWEEN mt.Date_From AND mt.Date_To;
I've also assumed an that date from and date to represent an inclusive range (i.e. includes both edges) - it is unusual to use inclusive BETWEEN on dates.
SqlFiddle here
Edit
The default MAXRECURSION on a recursive CTE in Sql Server is 100, which will limit the date range in the query to a span of 100 days. You can adjust this to a maximum of 32767.
Also, if you are filtering just a smaller range of dates in your large table, you can adjust the CTE to limit the number of days in the range:
WITH DateRange AS
(
SELECT CAST('2014-01-01' AS DATE) AS MinDate,
CAST('2014-02-16' AS DATE) AS MaxDate
),
Days AS
(
SELECT MinDate AS TheDate
FROM DateRange
UNION ALL
SELECT DATEADD(d, 1, TheDate) AS TheDate
FROM Days CROSS APPLY DateRange
WHERE TheDate <= DateRange.MaxDate
)
SELECT mt.Item_ID, mt.Cost_Of_Item, d.TheDate
FROM MyTable mt
INNER JOIN Days d
ON d.TheDate BETWEEN mt.Date_From AND mt.Date_To
OPTION (MAXRECURSION 0);
Update Fiddle
This can be achieved using Cursors.
I've simulated the test data provided and created another table with the name "DesiredTable" to store the data inside, and created the following cusror which achieved exactly what you are looking for:
SET NOCOUNT ON;
DECLARE #ITEM_ID int, #COST_OF_ITEM Money,
#DATE_FROM date, #DATE_TO date;
DECLARE #DateDiff INT; -- holds number of days between from & to columns
DECLARE #counter INT = 0; -- for loop counter
PRINT '-------- Begin the Date Expanding Cursor --------';
-- defining the cursor target statement
DECLARE Date_Expanding_Cursor CURSOR FOR
SELECT [ITEM_ID]
,[COST_OF_ITEM]
,[DATE_FROM]
,[DATE_TO]
FROM [dbo].[OriginalTable]
-- openning the cursor
OPEN Date_Expanding_Cursor
-- fetching next row data into the declared variables
FETCH NEXT FROM Date_Expanding_Cursor
INTO #ITEM_ID, #COST_OF_ITEM, #DATE_FROM, #DATE_TO
-- if next row is found
WHILE ##FETCH_STATUS = 0
BEGIN
-- calculate the number of days in between the date columns
SELECT #DateDiff = DATEDIFF(day,#DATE_FROM,#DATE_TO)
-- reset the counter to 0 for the next loop
set #counter = 0;
WHILE #counter <= #DateDiff
BEGIN
-- inserting rows inside the new table
insert into DesiredTable
Values (#COST_OF_ITEM, DATEADD(day,#counter,#DATE_FROM))
set #counter = #counter +1
END
-- fetching next row
FETCH NEXT FROM Date_Expanding_Cursor
INTO #ITEM_ID, #COST_OF_ITEM, #DATE_FROM, #DATE_TO
END
-- cleanup code
CLOSE Date_Expanding_Cursor;
DEALLOCATE Date_Expanding_Cursor;
The code fetches every row from your original table, then it calculates the number of days between DATE_FROM and DATE_TO columns, then using this number the script will create identical rows to be inserted inside the new table DesiredTable.
give it a try and let me know of the results.
You can generate an increment table and join it to your date From:
Query:
With inc(n) as (
Select ROW_NUMBER() over (order by (select 1)) -1 From (
Select 1 From (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x1(n)
Cross Join (values(1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) as x2(n)
) as x(n)
)
Select item_id, cost, DATEADD(day, n, dateFrom), n From #dates d
Inner Join inc i on n <= DATEDIFF(day, dateFrom, dateTo)
Order by item_id
Output:
item_id cost Date n
1 100 2014-01-01 00:00:00.000 0
1 100 2014-01-02 00:00:00.000 1
1 100 2014-01-03 00:00:00.000 2
2 105 2014-01-08 00:00:00.000 2
2 105 2014-01-07 00:00:00.000 1
2 105 2014-01-06 00:00:00.000 0
2 105 2014-01-09 00:00:00.000 3
3 102 2014-02-14 00:00:00.000 3
3 102 2014-02-15 00:00:00.000 4
3 102 2014-02-16 00:00:00.000 5
3 102 2014-02-11 00:00:00.000 0
3 102 2014-02-12 00:00:00.000 1
3 102 2014-02-13 00:00:00.000 2
Sample Data:
declare #dates table(item_id int, cost int, dateFrom datetime, dateTo datetime);
insert into #dates(item_id, cost, dateFrom, dateTo) values
(1, 100, '20140101', '20140103')
, (2, 105, '20140106', '20140109')
, (3, 102, '20140211', '20140216');
Yet another way is to create and maintain calendar table, containing all dates for many years (in our app we have table for 30 years or so, extending every year). Then you can just link to calendar:
select <whatever you need>, calendar.day
from <your tables> inner join calendar on calendar.day between <min date> and <max date>
This approach allows to include additional information (holidays etc) in calendar table - sometimes very helpful.

Resources