SQL Server : select data between date range where there is missing data - sql-server

Using SQL Server 2012, I have a script that inserts 4 rows of data into a table each day. The data looks like this:
Sometimes one or more of the rows is not inserted as the source data is incomplete. This looks like this - the second column is called commodityID:
There are times where more than one row could be missing.
I am trying to write a query that will show me what data is missing, so for the example above it would be for commodityID = 2. I have another table which is a calendar (lists all dates) table which I use with the query below to see if no data exists for a given date for all commodities.
SELECT *
FROM [Calendar]
WHERE (NOT EXISTS (SELECT ID, CommodityID, Price, DateEntered
FROM Spectron_DailyPricing
WHERE (CONVERT(date, DateEntered) = Calendar.date)))
AND (date BETWEEN '2015-05-14' AND dateadd(day, -1,GETDATE()))
ORDER BY date asc
I want to be able to run a SQL query that will look through all of the data and show the date and the commodityID that is missing. So for the example above it would show the following.
As my data spans a few years and there is the odd missing row there would be numerous results.

WITH
CTE_Numbers AS (
SELECT * FROM (VALUES (1),(2),(3),(4)) AS a(CommodityID)),
CTE_Calendar AS (
SELECT *
FROM [Calendar] c
CROSS JOIN CTE_Numbers n
WHERE c.date BETWEEN '2015-05-14' AND dateadd(day, -1,GETDATE()))
SELECT *
FROM CTE_Calendar c
LEFT JOIN Spectron_DailyPricing s
ON c.CommodityID = s.CommodityID
AND c.date = CONVERT(date, s.DateEntered)
WHERE
s.CommodityID IS NULL
ORDER BY c.date asc

Related

SQl Server - Where clause uses maximum date in data

I'm struggling with something i thought would be easy.
I have a table that is updated via an append on most days and has a report date field that shows the date the rows were updated.
I want to join to this table but only pull back the records from the date the table was last updated
Most of the time I could get away just looking for yesterdays date as the table is updated most days
Where [reportdate] > DATEADD(DAY, -1, GETDATE())
But as its not always updated daily, I wanted to rule this issue out. Is there anyway of returning the max date?
I was trying to figure out max (date), but I can't figure out the grouping. I need to return all the fields. The below just seems to return the whole table
SELECT max ([ReportDate]) as reportdate
,[GUID]
,[Make]
,[Model]
,[MPxN]
,[PaymentMode]
,[Consent]
,[Category]
,[Fuel]
,[pkCommCompID]
FROM table
group by guid
,[Make]
,[Model]
,[MPxN]
,[PaymentMode]
,[Consent]
,[Category]
,[Fuel]
,[pkCommCompID]
I could get round it with a temp table that just has the max report date and then using this as the left part of a join
SELECT max ([ReportDate]) as reportdate
FROM [DOMCustomers].[dbo].[DCC_Device_Comms_Compiled]
But The SQL is triggered in Excel so temp tables are problematic (i think).
Is there anyway of returning the max date?
Like this:
SELECT *
FROM SomeTable
where ReportDate = (select max(ReportDate) from SomeTable)
Here is a conceptual example.
It will produce a latest row for each car make.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, make VARCHAR(20), ReportDate DATETIME);
INSERT INTO #tbl (make, ReportDate) VALUES
('Ford', '2020-12-31'),
('Ford', '2020-10-17'),
('Tesla', '2020-10-25'),
('Tesla', '2020-12-30');
-- DDL and sample data population, end
;WITH rs AS
(
SELECT *
, ROW_NUMBER() OVER (PARTITION BY make ORDER BY ReportDate DESC) AS seq
FROM #tbl
)
SELECT * FROM rs
WHERE seq = 1;
Seems like a DENSE_RANK and TOP would work (assuming ReportDate is a date):
SELECT TOP (1) WITH TIES
[ReportDate]
,[GUID]
,[Make]
,[Model]
,[MPxN]
,[PaymentMode]
,[Consent]
,[Category]
,[Fuel]
,[pkCommCompID]
FROM YourTable
ORDER BY DENSE_RANK() OVER (ORDER BY ReportDate DESC);
If ReportDate is a date and time value, and you want everything for the latest date (ignoring time), then replace ReportDate with CONVERT(date,ReportDate) in the ORDER BY.

T-SQL: GROUP BY, but while keeping a non-grouped column (or re-joining it)?

I'm on SQL Server 2008, and having trouble querying an audit table the way I want to.
The table shows every time a new ID comes in, as well as every time an IDs Type changes
Record # ID Type Date
1 ae08k M 2017-01-02:12:03
2 liei0 A 2017-01-02:12:04
3 ae08k C 2017-01-02:13:05
4 we808 A 2017-01-03:20:05
I'd kinda like to produce a snapshot of the status for each ID, at a certain date. My thought was something like this:
SELECT
ID
,max(date) AS Max
FROM
Table
WHERE
Date < 'whatever-my-cutoff-date-is-here'
GROUP BY
ID
But that loses the Type column. If I add in the type column to my GROUP BY, then I'd get get duplicate rows per ID naturally, for all the types it had before the date.
So I was thinking of running a second version of the table (via a common table expression), and left joining that in to get the Type.
On my query above, all I have to join to are the ID & Date. Somehow if the dates are too close together, I end up with duplicate results (like say above, ae08k would show up once for each Type). That or I'm just super confused.
Basically all I ever do in SQL are left joins, group bys, and common table expressions (to then left join). What am I missing that I'd need in this situation...?
Use row_number()
select *
from ( select *
, row_number() over (partition by id order by date desc) as rn
from table
WHERE Date < 'whatever-my-cutoff-date-is-here'
) tt
where tt.rn = 1
I'd kinda like know how many IDs are of each type, at a certain date.
Well, for that you use COUNT and GROUP BY on Type:
SELECT Type, COUNT(ID)
FROM Table
WHERE Date < 'whatever-your-cutoff-date-is-here'
GROUP BY Type
Basing on your comment under Zohar Peled answer you probably looking for something like this:
; with cte as (select distinct ID from Table where Date < '$param')
select [data].*, [data2].[count]
from cte
cross apply
( select top 1 *
from Table
where Table.ID = cte.ID
and Table.Date < '$param'
order by Table.Date desc
) as [data]
cross apply
( select count(1) as [count]
from Table
where Table.ID = cte.ID
and Table.Date < '$param'
) as [data2]

How to calculate the days between 2 dates from 2 successive IDs

I need some help to create a new column in a database in SQL Server 2008.
I have the following data table
Please have a look at a snapshot of my table
Table
In the blank column I would like to put the difference between the current status date and the next status' date. And for the last ID_Status for each ID_Ticket I would like to have the difference between now date and it's date !
I hope that you got an idea about my problem.
Please share if you have any ideas about how to do .
Many thanks
kind regards
You didn't specify your RDBMS, so I'll post an answer for both since they are almost identical :
SQL-Server :
SELECT ss.id_ticket,ss.id_status,ss.date_status,
DATEDIFF(day,ss.date_status,ss.coalesce(ss.next_date,GETDATE())) as diffStatus
FROM (
SELECT t.*,
(SELECT TOP 1 s.date_status FROM YourTable s
WHERE t.id_ticket = s.id_ticket and s.date_status > t.date_status
ORDER BY s.date_status ASC) as next_date)
FROM YourTable t) ss
MySQL :
SELECT ss.id_ticket,ss.id_status,ss.date_status,
DATEDIFF(ss.date_status,ss.coalesce(ss.next_date,now())) as diffStatus
FROM (
SELECT t.*,
(SELECT s.date_status FROM YourTable s
WHERE t.id_ticket = s.id_ticket and s.date_status > t.date_status
ORDER BY s.date_status ASC limit 1) as next_date)
FROM YourTable t) ss
This basically first use a correlated sub query to bring the next date using limit/top , and then wrap it with another select to calculate the difference between them using DATEDIFF().
Basically it can be done without the wrapping query, but it won't be readable since the correlated query will be inside the DATEDIFF() function, so I prefer this way.

Create a view with sum total across two tables grouped by date (SQL Server 2008)

I currently have a view like this:
CREATE VIEW dbo.audit
WITH schemabinding
AS
SELECT
CONVERT(date, DateAdded) AS dt,
COUNT_BIG(*) AS cnt
FROM
dbo.Table1
GROUP BY
CONVERT(date, DateAdded)
Which returns:
dt cnt
-----------------
3/13/2015 5000
3/12/2015 1324
I'm trying to get a sum total count from both tables grouped by date into a single view. Is this possible?
i.e.
Table 1 Table 2
dt cnt | dt cnt
3/13/2015 5000 | 3/13/2015 1000
3/12/2015 1324 | 3/12/2015 1
To:
View 1
dt cnt
3/13/2015 6000
3/12/2015 1325
It would be nice to keep this in a single view. As it's just a running total of how many new items got added. Any ideas?
Assuming that there are two views and depending on relationship between these two views (based on values from dt columns: View1.dt and View2.dt) you could use a INNER/LEFT/RIGHT or FULL JOIN thus:
SELECT ISNULL(v1.dt, v2.dt) AS dt, ISNULL(v1.cnt, 0) + ISNULL(v2.cnt, 0) AS cnt
FROM dbo.View1 v1 /*INNER/LEFT/RIGHT*/ FULL JOIN dbo.View2 v2 ON v1.dt = v2.dt
I've used FULL JOIN because I assumed that there are values in View1.dt column that doesn't exist in View2.dt column and also there are values in View2.dt column that doesn't exist in View1.dt. More, some dt values could exist in both columns(views).
Note: I assume that second view has the same definition but it uses Table2 as data source: FROM dbo.Table2.
Assuming your data is such that there can be days missing from the tables it's easier to handle the dates by creating a table of dates (one row per day) so that you can join the tables using it, like this:
CREATE VIEW dbo.audit WITH schemabinding AS
select
Dates.Date as dt,
count_big(Table1.date) as ct_1,
count_big(Table2.date) as ct_2
from
Dates
left outer join Table1 on convert(date, Table1.Date) = Dates.Date
left outer join Table2 on convert(date, Table2.Date) = Dates.Date
group by
Dates.Date
SQL Fiddle: http://sqlfiddle.com/#!6/bf116/3
If the tables are huge there might be some problems with performance because SQL Server isn't going to use index for the dates because there is a conversion to date -- and this is in case you have a where clause on the view. If you need something like that an inline table value function might work better because then you can have variables for the date ranges.
If I understand your question correctly, try something like this:
CREATE VIEW dbo.audit WITH schemabinding AS
SELECT CONVERT(Table1.date,DateAdded) AS Table1_dt,
COUNT_BIG(Table1.*) AS Table1_cnt,
CONVERT(Table1.date,DateAdded) AS Table2_dt,
COUNT_BIG(Table2.*) AS Table2_cnt
FROM dbo.Table1 INNER JOIN dbo.Table2 ON(Table1.date = Table2.Date)
GROUP BY CONVERT(Table1,DateAdded)
This solution assumes the same column names in both tables and also the same dates to be selected.

How can I optimize a SQL query that performs a count nested inside a group-by clause?

I have a charting application that dynamically generates SQL Server queries to compute values for each series on a given chart. This generally works quite well, but I have run into a particular situation in which the generated query is very slow. The query looks like this:
SELECT
[dateExpr] AS domainValue,
(SELECT COUNT(*) FROM table1 WHERE [dateExpr]=[dateExpr(maintable)] AND column2='A') AS series1
FROM table1 maintable
GROUP BY [dateExpr]
ORDER BY domainValue
I have abbreviated [dateExpr] because it's a combination of CAST and DATEPART functions that convert a datetime field to a string in the form of 'yyyy-MM-dd' so that I can easily group by all values in a calendar day. The query above returns both those yyyy-MM-dd values as labels for the x-axis of the chart and the values from the data series "series1" to display on the chart. The data series is supposed to count the number of records that fall into that calendar day that also contain a certain value in [column2]. The "[dateExpr]=[dateExpr(maintable)]" expression looks like this:
CAST(DATEPART(YEAR,dateCol) AS VARCHAR)+'-'+CAST(DATEPART(MONTH,dateCol) AS VARCHAR) =
CAST(DATEPART(YEAR,maintable.dateCol) AS VARCHAR)+'-'+CAST(DATEPART(MONTH,maintable.dateCol) AS VARCHAR)
with an additional term for the day (ommitted above for the sake of space). That is the source of the slowness of the query, but I don't know how to rewrite the query so that it returns the same result more efficiently. I have complete control over the generation of the query, so if I could find more efficient SQL that returned the same results, I could modify the query generator appropriately. Any pointers would be greatly appreciated.
I havent tested but i think it can be done by:
SELECT
[dateExpr] AS domainValue,
SUM (CASE WHEN column2='A' THEN 1 ELSE 0 END) AS series1
FROM table1 maintable
GROUP BY [dateExpr]
ORDER BY domainValue
The fastest way to do this would be to use calendar tables. Create a sql table with an entry for every month for next who knows how many years. Then select from that calendar table, joining in the entries from table1 that have dates between the start and end date for the month. Then, if your clustered index is on the dateCol in table1, the query will run very quickly.
EDIT: Example Query. This assumes a months table exists with two columns, StartDate and EndDate where EndDate is the midnight on the first day of the next month. The clustered index on the months table should be on StartDate
SELECT
months.StartDate,
COUNT(*) AS [Count]
FROM months
INNER JOIN table1
ON table1.dateCol >= months.StartDate AND table1.dateCol < months.EndDate
GROUP BY months.StartDate;
With Calendar As
(
Select DateAdd(d, DateDiff(d, 0, Min( dateCol ) ), 0) As [date]
From Table1
Union All
Select DateAdd(d, 1, [date])
From Calendar
Where [date] <= (
Select Max( DateAdd(d, DateDiff(d, 0, dateCol) + 1, 0) )
From Table1
)
)
Select C.date, Count(Table1.PK) As Total
From Calendar As C
Left Join Table1
On Table1.dateCol >= C.date
And Table1.dateCol < DateAdd(d, 1, C.date )
And Table1.column2 = 'A'
Group By C.date
Option (Maxrecursion 0);
Rather than try to force the display format in SQL, you should do that in your report or chart generator. However, what you can do in the SQL is to strip the time portion from the datetime values as I've done in my solution.

Resources