SQL query distict count using inner join - sql-server

Need help ensuring the below query doesn't return inaccurate results.
select #billed = count(a.[counter]) from [dbo].cxitems a with (nolock)
inner join [dbo].cxitemhist b with (nolock) on a.[counter] = b.cxlink
where b.[eventtype] in ('BILLED','REBILLED')
and b.[datetime] between #begdate and #enddate
The query is "mostly" accurate as is, however there is a slight possibility that cxitemhist table could contain more than 1 "billed" record for given date range. I only need to count item as "Billed" once during given date range.

You can join on a sub query the limits you to one row for each combination of fields used for the join:
select #billed = count(a.[counter])
from [dbo].cxitems a
inner join (
select distinct cxlink
from [dbo].cxitemhist
where [eventtype] in ('BILLED','REBILLED')
and [datetime] between #begdate and #enddate
) b on a.[counter] = b.cxlink
You can also use the APPLY operator instead of a join here, but you'll have to check against your data to see which gives better performance.

If you only need to count records from the cxitems table, that have any corresponding records from the cxitemhist table, you can use the exists clause with a subquery.
select #billed = count(a.[counter]) from [dbo].cxitems a
where exists(select * from [dbo].cxitemhist b
where a.[counter] = b.cxlink
and b.[eventtype] in ('BILLED','REBILLED')
and b.[datetime] between #begdate and #enddate)
Cannot really say how this will affect performance, without specific data, though, but it should be comparably fast with your code.

Related

SQL Server ORDER BY seems surprisingly slow

Original query:
SELECT V.Date, V.Amount, I.Number
FROM Values V
JOIN Items I ON V.ItemId = I.Id AND I.AssetId = V.AssetId
WHERE I.Type IN (10023, 10025) AND V.AssetId = 100
ORDER BY V.Date
Times out after some long time. After poking about a bit, I commented out the ORDER BY:
SELECT V.Date, V.Amount, I.Number
FROM Values V
JOIN Items I ON V.ItemId = I.Id AND I.AssetId = V.AssetId
WHERE I.Type IN (10023, 10025) AND V.AssetId = 100
--ORDER BY V.Date
This returns two rows in zero millis.
I was under the impression that order by against a JOIN would occur after the query was complete, that is, it would make a temp (name?) table for the results and then order them. Apparently this impression is wrong.
Any suggestions? I don't have SHOWPLAN (et all) on this server, so I'm a bit in the dark.
ORDER BY can affect the execution plan. If the query does indeed only return two rows, then the timeout is surprising.
I would rewrite the query as:
SELECT V.Date, V.Amount, I.Number
FROM Values V JOIN
Items I
ON V.ItemId = I.Id AND I.AssetId = V.AssetId
WHERE I.Type IN (10023, 10025) AND I.AssetId = 100
-----------------------------------^ the only change
ORDER BY V.Date;
Then the best indexes are Items(AssetId, Type, Id, Number) and Values(ItemId, Assetid, Date, Amount). These are covering indexes for the query.
It's hard to troubleshoot without an execution plan.
First thing I would do is to make sure statistics are up to date. If stats are not up to date, SQL can produce inefficient plans.
If you cannot do this, you can change your query to force the correct plan.
For example, you can use a table variable to ensure ORDER BY is done last.
--declare staging table
declare #stage([Date] date, [Amount] decimal(19,4), [Number] int);
--insert data into staging table
INSERT INTO #stage([Date], [Amount], [Number])
SELECT V.Date, V.Amount, I.Number
FROM Values V
JOIN Items I ON V.ItemId=I.Id AND I.AssetId=V.AssetId
WHERE I.Type IN (10023, 10025) AND V.AssetId=100) as t1;
--retrieve data from staging table with sorting
SELECT * FROM #stage ORDER BY Date;
This is not ideal, but if you don't have DBA permissions it's the best you can do.
Another thing to try is to use the MAXDOP 1 hint. This tells SQL engine not to use parallel execution which sometimes helps avoid inefficient plans.
SELECT V.Date, V.Amount, I.Number
FROM Values V
JOIN Items I ON V.ItemId=I.Id AND I.AssetId=V.AssetId
WHERE I.Type IN (10023, 10025) AND V.AssetId=100) as t1
ORDER BY Date
OPTION (MAXDOP 1);
Note that I just added OPTION (MAXDOP 1) to your original query.

Aggregate function not allowed in Set statment in TSQL

I need to update one table with values from another table (msdb.dbo.sysjobhistory). Since I need to get the max values of the run_time and run_date, I kept getting 'aggregate function not allowed in set statement' error. As a workaround, I tried the following but something isn't right because ALL the values in every row are the same, indicating either an error in the join or something I can't figure out.This is what I have (which is not correct):
UPDATE inventory.dbo.books SET
auth_time = t1.at,
auth_date = t1.ad
FROM (SELECT MAX(run_time) AS at, MAX(run_date) AS ad
FROM msdb.dbo.sysjobhistory h
INNER JOIN inventory.dbo.books t
ON h.job_id = t.jobid) t1
ALSO, I need to be able to convert the run_time into decimal(10,2) format (as the auth_time field is that) and run_date into datetime (as auth_time is datetime format).
You're so close!
Just move your subquery reference from the From Clause to a subquery in the Set for each field.
Try this:
UPDATE inventory.dbo.books SET
auth_time = (SELECT MAX(run_time)
FROM msdb.dbo.sysjobhistory h
INNER JOIN inventory.dbo.books t
ON h.job_id = t.jobid)
, auth_date = (SELECT MAX(run_date)
FROM msdb.dbo.sysjobhistory h
INNER JOIN inventory.dbo.books t
ON h.job_id = t.jobid);
The subquery in the SET for each field essentially returns a single value when the subquery is executed. Therefore, the use of a subquery doesn't break any of the rules of the set operation.
If your logic starts getting too complicated and your set fields too numerous to want to repeat everything as a subquery for each field, then you can also use a CTE.
With CTE as (
Select MAX(run_time) as at
, MAX(run_date) as ad
FROM msdb.dbo.sysjobhistory h
INNER JOIN inventory.dbo.books t
ON h.job_id = t.jobid
)
Update #Temp
Set auth_time = CTE.at
, auth_date = CTE.ad
From CTE
Let me know if you have any questions!
You can join to a subquery.
And in that subquery you're allowed to use group by.
Then do the casts or converts when the destination fields are set.
For example:
UPDATE b
SET
auth_time = left(replace(cast(h.max_run_time as varchar),':',''),8),
auth_date = cast(h.max_run_date as datetime)
FROM inventory.dbo.books b
JOIN (
select
job_id,
max(run_time) as max_run_time,
max(run_date) as max_run_date
from msdb.dbo.sysjobhistory
group by job_id
) h
ON (b.job_id = h.job_id);
I didn't know what kind of number is expected in that auth_time.
So the max_run_time is just converted from a TIME to a VARCHAR.
The varchar is implicitly converted to the DECIMAL of the destination field.
For example: time '12:15:25.0000000' --> decimal(10,2) 121525.00

Join subquery with min

I'm pulling my hair out over a subquery that I'm using to avoid about 100 duplicates (out of about 40k records). The records that are duplicated are showing up because they have 2 dates in h2.datecreated for a valid reason, so I can't just scrub the data.
I'm trying to get only the earliest date to return. The first subquery (that starts with "select distinct address_id", with the MIN) works fine on it's own...no duplicates are returned. So it would seem that the left join (or just plain join...I've tried that too) couldn't possibly see the second h2.datecreated, since it doesn't even show up in the subquery. But when I run the whole query, it's returning 2 values for some ipc.mfgid's, one with the h2.datecreated that I want, and the other one that I don't want.
I know it's got to be something really simple, or something that just isn't possible. It really seems like it should work! This is MSSQL. Thanks!
select distinct ipc.mfgid as IPC, h2.datecreated,
case when ad.Address is null
then ad.buildingname end as Address, cast(trace.name as varchar)
+ '-' + cast(trace.Number as varchar) as ONT,
c.ACCOUNT_Id,
case when h.datecreated is not null then h.datecreated
else h2.datecreated end as Install
from equipmentjoin as ipc
left join historyjoin as h on ipc.id = h.EQUIPMENT_Id
and h.type like 'add'
left join circuitjoin as c on ipc.ADDRESS_Id = c.ADDRESS_Id
and c.GRADE_Code like '%hpna%'
join (select distinct address_id, equipment_id,
min(datecreated) as datecreated, comment
from history where comment like 'MAC: 5%' group by equipment_id, address_id, comment)
as h2 on c.address_id = h2.address_id
left join (select car.id, infport.name, carport.number, car.PCIRCUITGROUP_Id
from circuit as car (NOLOCK)
join port as carport (NOLOCK) on car.id = carport.CIRCUIT_Id
and carport.name like 'lead%'
and car.GRADE_Id = 29
join circuit as inf (NOLOCK) on car.CCIRCUITGROUP_Id = inf.PCIRCUITGROUP_Id
join port as infport (NOLOCK) on inf.id = infport.CIRCUIT_Id
and infport.name like '%olt%' )
as trace on c.ccircuitgroup_id = trace.pcircuitgroup_id
join addressjoin as ad (NOLOCK) on ipc.address_id = ad.id
The typical approach to only getting the lowest row is one of the following. You didn't bother to specify what version of SQL Server you're using, what you want to do with ties, and I have little interest to try to work this into your complex query, so I'll show you an abstract simplification for different versions.
SQL Server 2000
SELECT x.grouping_column, x.min_column, x.other_columns ...
FROM dbo.foo AS x
INNER JOIN
(
SELECT grouping_column, min_column = MIN(min_column)
FROM dbo.foo GROUP BY grouping_column
) AS y
ON x.grouping_column = y.grouping_column
AND x.min_column = y.min_column;
SQL Server 2005+
;WITH x AS
(
SELECT grouping_column, min_column, other_columns,
rn = ROW_NUMBER() OVER (ORDER BY min_column)
FROM dbo.foo
)
SELECT grouping_column, min_column, other_columns
FROM x
WHERE rn = 1;
This subqery:
select distinct address_id, equipment_id,
min(datecreated) as datecreated, comment
from history where comment like 'MAC: 5%' group by equipment_id, address_id, comment
Probably will return multiple rows because the comment is not guaranteed to be the same.
Try this instead:
CROSS APPLY (
SELECT TOP 1 H2.DateCreated, H2.Comment -- H2.Equipment_id wasn't used
FROM History H2
WHERE
H2.Comment LIKE 'MAC: 5%'
AND C.Address_ID = H2.Address_ID
ORDER BY DateCreated
) H2
Switch that to OUTER APPLY in case you want rows that don't have a matching desired history entry.

SQL joining with parameter problem

I asked this question before:
How do I bring back an entire range of dates in SQL between two dates, even when there is no data?
but I now need to only select incidents that have a Status of "E" for emergency.
I can't put WHERE status='E' though, because that will stop it returning an entry for every single date.
How can I solve this?
Just add it to the LEFT OUTER JOIN ... ON, since this is a contition for the joined rows as far as I understand the question.
Something like this:
WITH DateRange(date) AS (
SELECT #dateFrom dt
UNION ALL
SELECT DATEADD(dd, 1, date) date FROM DateRange WHERE date < #dateTo
)
SELECT DateRange.date, count(incident.id)
FROM DateRange
LEFT OUTER JOIN incident
ON incident.date >= DateRange.date
AND incident.date < DATEADD(dd, 1, DateRange.date)
AND incident.status = 'E'
GROUP BY DateRange.date
ORDER BY DateRange.date
Like you said in your question, putting the condition in your WHERE clause effectively turns your LEFT JOIN into an INNER JOIN.
You should add this to your LEFT JOIN criteria as the intent is that it's a condition of the join.
So something like table1
LEFT JOIN table2 ON table1.field = table2.field AND table1.status='E'

Query Executing Problem

Using SQL 2005: “Taking too much time to execute”
I want to filter the date, the date should not display in holidays, and I am using three tables with Inner Join
When I run the below query, It taking too much time to execute, because I filter the cardeventdate with three table.
Query
SELECT
PERSONID, CardEventDate tmp_cardevent3
WHERE (CardEventDate NOT IN
(SELECT T_CARDEVENT.CARDEVENTDATE
FROM T_PERSON
INNER JOIN T_CARDEVENT ON T_PERSON.PERSONID = T_CARDEVENT.PERSONID
INNER JOIN DUAL_PRO_II_TAS.dbo.T_WORKINOUTTIME ON T_CARDEVENT.CARDEVENTDAY = DUAL_PRO_II_TAS.dbo.T_WORKINOUTTIME.DAYCODE
AND T_PERSON.TACODE = DUAL_PRO_II_TAS.dbo.T_WORKINOUTTIME.TACODE
WHERE (DUAL_PRO_II_TAS.dbo.T_WORKINOUTTIME.HOLIDAY = 'true')
)
)
ORDER BY PERSONID, CardEventDate DESC
For the above mentioned Query, there is any other way to do date filter.
Expecting alternative queries for my query?
I'm pretty sure that it's not the joined tables that is the problem, but rather the "not in" that makes it slow.
Try to use a join instead:
select m.PERSONID, m.CardEventDate
from T_PERSON p
inner join T_CARDEVENT c on p.PERSONID = c.PERSONID
inner join DUAL_PRO_II_TAS.dbo.T_WORKINOUTTIME w
on c.CARDEVENTDAY = w.DAYCODE
and p.TACODE = w.TACODE
and w.HOLIDAY = 'true'
right join tmp_cardevent3 m on m.CardEventDate = c.CardEventDate
where c.CardEventDate is null
order by m.PERSONID, m.CardEventDate desc
(There is a from clause missing from your query, so I don't know what table you are trying to get the data from.)
Edit:
Put tmp_cardevent3 in the correct place.
Have you created indices on all of the columns that you are using to do the joins? In particular, I'd consider indices on PERSONID in T_CARDEVENT, TACODE in both T_PERSON and T_WORKINOUTTIME, and HOLIDAY in T_WORKINOUTTIME.

Resources