multiple aggregate functions with multiple tables in SQL Server - sql-server

I am trying to get data for entered person, I want to pull out data as No of invoices and No of line items for particular person.
The output is
Entered_by No of line items
CD 9
CD 136084
deepa 7
deepa 18
dolly 757
dolly 22350
kroshni 666
kroshni 16161
lokesh 4
lokesh 999
MHeera 639
MHeera 20427
nandini 7
nandini 5318
Here the data in No of line items is mixing of both ’ No of line items’ count and ‘No of invoices’ count, I want to show like
Entered_by No of line items No of invoices
CD 136084 9
deepa 18 7
dolly 22350 757
Please help me with this somebody ….. 
Here is the T-SQL query
select ENTERED_BY, count(entered_by) 'NO OF LINE ITEMS'
from im_invoice, im_invoice_line_item, im_invoice_inventory
where invoice_rid = invoice_fk
and invoice_inventory_rid = invoice_inv_fk
and enter_date between dateadd(mm, -3, getdate()) and dateadd(mm,0,getdate())
group by entered_by
union
select entered_by, count(invoice_num) 'NO OF INVOICES' from im_invoice
where enter_date between dateadd(mm, -3, getdate()) and dateadd(mm,0,getdate())
group by entered_by

As Joe said, if you give us a more detailed description we can give you better answers, but until then, quick and dirty way to accomplish this is as follows:
Get rid of the union
Turn the 2 queries into derived tables
Select from them joining on entered_by.
Eg.
SELECT LineItems.ENTERED_BY, [NO OF LINE ITEMS], [NO OF INVOICES]
FROM
(SELECT ENTERED_BY,COUNT(entered_by) 'NO OF LINE ITEMS'
FROM im_invoice, im_invoice_line_item,im_invoice_inventory
WHERE invoice_rid = invoice_fk
AND invoice_inventory_rid = invoice_inv_fk
AND enter_date BETWEEN dateadd(mm, -3, getdate()) AND dateadd(mm,0,getdate())
GROUP BY entered_by) AS LineItems
INNER JOIN
(SELECT entered_by, count(invoice_num) 'NO OF INVOICES'
FROM im_invoice
WHERE enter_date BETWEEN dateadd(mm, -3, getdate()) AND dateadd(mm,0,getdate())
GROUP BY entered_by ) AS invoices
ON invoices.entered_by = LineItems.ENTERED_BY

Related

Getting an "Aggregate" at a GROUP BY query

1029/5000
I have 2 tables which are linked by the serial number (DeviceID).
Table 1 (C) lists all downloaded cyclist data.
Table 2 (T) lists the data about the device and when the last download took place.
Now I want to do a group by with the average speed of the last 3 6 or 12 months (counted from today).
This goes without problems.
However, when I get the average speed of the last 3 6 or 12 months counted from the last download I am going to get:
An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.
Code 1 that goes OK:
SELECT
C.DeviceID
, AVG(C.Speed) AS AVG_Speed
, DATEDIFF(MONTH, C.LogDateTime, GETDATE()) AS Months
FROM Compass C
JOIN Transfer T ON C.DeviceID = T.DeviceID
WHERE DATEDIFF(MONTH, C.LogDateTime, GETDATE()) <= #EvalTimeFrame
GROUP BY C.DeviceID
Code 2 that goes wrong:
SELECT
C.DeviceID
, AVG(C.Speed) AS AVG_Speed
, DATEDIFF(MONTH, C.LogDateTime, GETDATE())) - (DATEDIFF(MONTH, MAX(T.TransferDateTime), GETDATE()) AS Months
FROM Compass C
JOIN Transfer T ON C.DeviceID = T.DeviceID
WHERE (DATEDIFF(MONTH, C.LogDateTime, GETDATE())) - (DATEDIFF(MONTH, MAX(T.TransferDateTime), GETDATE())) <= #EvalTimeFrame
GROUP BY C.DeviceID
Acually what I want to have is:
GROUP BY C.DeviceID, (DATEDIFF(MONTH, C.LogDateTime, GETDATE())) - (DATEDIFF(MONTH, MAX(T.TransferDateTime), GETDATE()))
It would help me a lot - any idea?

SQL- Finding a gap that is x amount of months with the same foreign key

I am editing this to clarify my question.
Let's say I have a table that holds patient information. I need to find new patients for this year, and the date of their prescription first prescription when they were considered new. Anytime there is a six month gap they are considered a new patient.
How do I accomplish this using SQL. I can do this in Java and any other imperative language easily enough, but I am having problems doing this in SQL. I need this script to be run in Crystal by non-SQL users
Table:
Patient ID Prescription Date
-----------------------------------------
1 12/31/16
1 03/13/17
2 10/10/16
2 05/11/17
2 06/11/17
3 01/01/17
3 04/20/17
4 01/31/16
4 01/01/17
4 07/02/17
So Patients 2 and 4 are considered new patients. Patient 4 is considered a new patient twice, so I need dates for each time patient 4 was considered new 1/1/17 and 7/2/17. Patients 1 and 3 are not considered new this year.
So far I have the code below which tells me if they are new this year, but not if they had another six month gap this year.
SELECT DISTINCT
this_year.patient_id
,this_year.date
FROM (SELECT
patient_id
,MIN(prescription_date) as date
FROM table
WHERE prescription_date BETWEEN '2017-01-01 00:00:00.000' AND '2017-
12-31 00:00:00.000'
GROUP BY [patient_id]) AS this_year
LEFT JOIN (SELECT
patient_id
,MAX(prescription_date) as date
FROM table
WHERE prescription_date BETWEEN '2016-01-01 00:00:00.000' AND '2016-
12-31 00:00:00.000'
GROUP BY [patient_id]) AS last_year
WHERE DATEDIFF(month, last_year.date, this_year.date) > 6
OR last_year.date IS NULL
Patient 2 in your example does not meet the criteria you specified ... that being said ...
You can try something like this ... untested but should be similar (assuming you can put this in a stored procedure):
WITH ordered AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY [Prescription Date]) rn
FROM table1
)
SELECT o1.[PatientID], DATEDIFF(s, o1.[Prescription Date], o2.[Prescription Date]) diff
FROM ordered o1 JOIN ordered o2
ON o1.rn + 1 = o2.rn
WHERE DATEDIFF(m, o1.[Prescription Date], o2.[Prescription Date]) > 6
Replace table1 with the name of your table.
I assume that you mean the patient has not been prescribed in the last 6 months.
SELECT DISTINCT user_id
FROM table_name
WHERE prescribed_date >= DATEADD(month, -6, GETDATE())
This gives you the list of users that have been prescribed in the last 6 months. You want the list of users that are not in this list.
SELECT DISTINCT user_id
FROM table_name
WHERE user_id NOT IN (SELECT DISTINCT user_id
FROM table_name
WHERE prescribed_date >= DATEADD(month, -6, GETDATE()))
You'll need to amend the field and table names.

SQL- Joining tables on common column

This query:
SELECT CID, count(*) as NumOccurences
FROM Violations
WHERE DateOfViolation BETWEEN (dateadd(day, -30, getdate())) AND getdate()
GROUP BY CID
ORDER BY count(*) DESC;
gives the following result:
CID NumOccurences
1921 5
1042 5
1472 5
1543 5
2084 5
2422 5
NumOccurences is verified to be correct. Since CID exists in another tables, I want to tie CID to its intersection, a column in said other table Placement[CID,Intersection,...], and display that instead.
My desired output is:
Intersection NumOccurences
Elston and Charles 5
Diservey and Pkwy 5
Grand and Chicago 5
...
...
I tried this:
SELECT Intersection, count(DateOfViolation) as NumOccurences
FROM Violations
inner join Placement on Violations.CID = Placement.CID
WHERE DateOfViolation BETWEEN (dateadd(day, -30, getdate())) AND getdate()
GROUP BY Intersection
ORDER BY count(*) DESC;
but get this result (not correct):
Intersection NumOccurences
CALIFORNIA AND DIVERSEY 90
BELMONT AND KEDZIE 83
KOSTNER AND NORTH 82
STONEY ISLAND AND 79TH 78
RIDGE AND CLARK 60
ROOSEVELT AND HALSTED 60
ROOSEVELT AND KOSTNER 60
In fact, I've got no idea what my attempt query is even returning or where it's coming from.
EDIT
Running the query
SELECT CID, count(*) as num
from Placement
where Intersection = 'BELMONT AND KEDZIE'
group by Intersection, Address, CID
order by Intersection, Address, CID
yeilds
CID num
1372 1
1371 1
1373 1
I think you could do something like this:
SELECT
MIN(Placement.Intersection) AS Intersection,
COUNT(DISTINCT Violation.VID /* Violation ID? */) AS NumOccurences
FROM Violations INNER JOIN Placement ON Violations.CID = Placement.CID
WHERE DateOfViolation
BETWEEN cast(dateadd(day, -30, getdate()) as date) AND cast(getdate() as date)
GROUP BY Violations.CID
ORDER BY NumOccurences DESC;
Also be careful with that date range. I'm not sure whether you're dealing with date or datetime.
You might also try:
SELECT
(
SELECT MIN(Intersection) FROM Placement
WHERE Placement.CID = Violations.CID
) AS Intersection,
COUNT(*) AS NumOccurences
FROM Violations
WHERE DateOfViolation
BETWEEN cast(dateadd(day, -30, getdate()) as date) AND cast(getdate() as
GROUP BY CID
ORDER BY NumOccurences DESC;
You may not even need the MIN() in that second one.
There would have to be a one-to-one relationship between CIDs and Intersections for you to get the result you are after.
83 is actually a prime number, which would suggest that not only are there multiple entries for the BELMONT and KEDZIE intersection in the Placement table, but also that there is more than one CID corresponding to that intersection. The same may be true for other intersections
Try this:
SELECT Intersection, CID, count(*) as num
from Placement
-- where Intersection = 'BELMONT AND KEDZIE'
group by Intersection, CID
order by Intersection, CID
That will show you how many of each (intersection, CID) combination in your Placement table (uncomment the where clause to look at 'Belmont and Kenzie' specifically). Then re-ask yourself what you're trying to do.

Need Top N Row for Large Dataset. Query is taking a long time. How to optimize?

I have two tables (SalesforceTasks and SalesforceContacts) that I am using for a scoring system project. A simple SELECT statement with a ROW_NUMBER() calculation is taking a very long time to run and actually stops querying once it hits a certain number of rows. The query doesn't stop executing, but it stops returning data.
Here is the query in question. It is a very vanilla process, where I need to get the newest date in the SalesforceTasks table and link it to the contact ID in the SalesforceContacts table. The SalesforceTasks table has 2,091,946 rows and the SalesforceContacts table has 446,772 rows.
Here is the query in question:
SELECT SC.ID
,CASE
WHEN DATEDIFF(DD, ST.CREATEDDATE, GETDATE()) BETWEEN 360 AND 1500
THEN 15
WHEN DATEDIFF(DD, ST.CREATEDDATE, GETDATE()) BETWEEN 181 AND 360
THEN 10
WHEN DATEDIFF(DD, ST.CREATEDDATE, GETDATE()) BETWEEN 60 AND 180
THEN 5
ELSE 0
END AS Score
,ROW_NUMBER() OVER (PARTITION BY ST.ACCOUNTID ORDER BY ACTIVITYDATE) AS LastCall
FROM Salesforce.dbo.SalesforceTasks AS ST
JOIN Salesforce.dbo.SalesforceContacts AS SC
ON ST.ACCOUNTID = SC.ACCOUNTID
WHERE STATUS = 'Completed'
AND TYPE LIKE 'Call%'
What is the best plan of attack here? As stated, the query is taking a very, very long time to run. Is there a better way to get the newest date from the SalesforceTasks table?
You could try breaking the statement down in to a 2 step process.
First filter records into #temp table and get the datediff without the CASE:
SELECT SC.ID
,DATEDIFF(DD, ST.CREATEDDATE, GETDATE()) AS ScoreDiff
,ROW_NUMBER() OVER (PARTITION BY ST.ACCOUNTID ORDER BY ACTIVITYDATE) AS LastCall
INTO #TEMP
FROM Salesforce.dbo.SalesforceTasks AS ST
JOIN Salesforce.dbo.SalesforceContacts AS SC
ON ST.ACCOUNTID = SC.ACCOUNTID
WHERE STATUS = 'Completed'
AND TYPE LIKE 'Call%'
AND DATEDIFF(DD, ST.CREATEDDATE, GETDATE()) BETWEEN 60 AND 1500
With the reduced dataset, you then perform the Scoring operation:
SELECT Id,
CASE ScoreDiff
WHEN BETWEEN 360 AND 1500
THEN 15
WHEN BETWEEN 181 AND 360
THEN 10
WHEN BETWEEN 60 AND 180
THEN 5
ELSE 0
END AS Score,
LastCall
FROM #temp
If purpose is just to get latest one then you can try this else need to find out other way
SELECT SC.ID,CASE
WHEN DATEDIFF(DD, ST.CREATEDDATE, GETDATE()) BETWEEN 360 AND 1500
THEN 15
WHEN DATEDIFF(DD, ST.CREATEDDATE, GETDATE()) BETWEEN 181 AND 360
THEN 10
WHEN DATEDIFF(DD, ST.CREATEDDATE, GETDATE()) BETWEEN 60 AND 180
THEN 5
ELSE 0
END AS Score,
SFC.ACTIVITYDATE
FROM Salesforce.dbo.SalesforceTasks AS ST
JOIN Salesforce.dbo.SalesforceContacts AS SC
CROSS APPLY
(
SELECT MAX(SFC.ID) AS SCID,MAX(SFC.ACTIVITYDATE) AS ACTIVITYDATE FROM Salesforce.dbo.SalesforceContacts SFC
WHERE SFC.ACCOUNTID=SC.ACCOUNTID
GROUP BY BY SFC.ACCOUNTID
HAVING MAX(SFC.ID)= SC.ID
)
ON ST.ACCOUNTID = SC.ACCOUNTID
WHERE STATUS = 'Completed'
AND TYPE LIKE 'Call%'
AND DATEDIFF(DD, ST.CREATEDDATE, GETDATE()) BETWEEN 60 AND 1500

How to calculate overlapping subscription days from orders with sql-server

I have an ordertable with orders. I want to calculate the amount of subscriptiondays for each user (preffered in a set-based way) for a specific day.
create table #orders (orderid int, userid int, subscriptiondays int, orderdate date)
insert into #orders
select 1, 2, 10, '2011-01-01'
union
select 2, 1, 10, '2011-01-10'
union
select 3, 1, 10, '2011-01-15'
union
select 4, 2, 10, '2011-01-15'
declare #currentdate date = '2011-01-20'
--userid 1 is expected to have 10 subscriptiondays left
(since there is 5 left when the seconrd order is placed)
--userid 2 is expected to have 5 subscriptionsdays left
I'm sure this has been done before, I just dont know what to search for.
Pretty much like a running total?
So when I set #currentdate to '2011-01-20' I want this result:
userid subscriptiondays
1 10
2 5
When I set #currentdate to '2011-01-25'
userid subscriptiondays
1 5
2 0
When I set #currentdate to '2011-01-11'
userid subscriptiondays
1 9
2 0
Thanks!
I think you would need to use a recursive common table expression.
EDIT: I've also added a procedural implementation further below instead of using a recursive common table expression. I recommend using that procedural approach, as I think there may be a number of data scenarios that the recursive CTE query that I've included probably doesn't handle.
The query below gives the correct answers for the scenarios that you've provided, but you would probably want to think up some additional complex scenarios and see whether there are any bugs.
For instance, I have a feeling that this query may break down if you have multiple previous orders overlapping with a later order.
with CurrentOrders (UserId, SubscriptionDays, StartDate, EndDate) as
(
select
userid,
sum(subscriptiondays),
min(orderdate),
dateadd(day, sum(subscriptiondays), min(orderdate))
from #orders
where
#orders.orderdate <= #currentdate
-- start with the latest order(s)
and not exists (
select 1
from #orders o2
where
o2.userid = #orders.userid
and o2.orderdate <= #currentdate
and o2.orderdate > #orders.orderdate
)
group by
userid
union all
select
#orders.userid,
#orders.subscriptiondays,
#orders.orderdate,
dateadd(day, #orders.subscriptiondays, #orders.orderdate)
from #orders
-- join any overlapping orders
inner join CurrentOrders on
#orders.userid = CurrentOrders.UserId
and #orders.orderdate < CurrentOrders.StartDate
and dateadd(day, #orders.subscriptiondays, #orders.orderdate) > CurrentOrders.StartDate
)
select
UserId,
sum(SubscriptionDays) as TotalSubscriptionDays,
min(StartDate),
sum(SubscriptionDays) - datediff(day, min(StartDate), #currentdate) as RemainingSubscriptionDays
from CurrentOrders
group by
UserId
;
Philip mentioned a concern about the recursion limit on common table expressions. Below is a procedural alternative using a table variable and a while loop, which I believe accomplishes the same thing.
While I've verified that this alternative code does work, at least for the sample data provided, I'd be glad to hear anyone's comments on this approach. Good idea? Bad idea? Any concerns to be aware of?
declare #ModifiedRows int
declare #CurrentOrders table
(
UserId int not null,
SubscriptionDays int not null,
StartDate date not null,
EndDate date not null
)
insert into #CurrentOrders
select
userid,
sum(subscriptiondays),
min(orderdate),
min(dateadd(day, subscriptiondays, orderdate))
from #orders
where
#orders.orderdate <= #currentdate
-- start with the latest order(s)
and not exists (
select 1
from #orders o2
where
o2.userid = #orders.userid
and o2.orderdate <= #currentdate
-- there does not exist any other order that surpasses it
and dateadd(day, o2.subscriptiondays, o2.orderdate) > dateadd(day, #orders.subscriptiondays, #orders.orderdate)
)
group by
userid
set #ModifiedRows = ##ROWCOUNT
-- perform an extra update here in case there are any additional orders that were made after the start date but before the specified #currentdate
update co set
co.SubscriptionDays = co.SubscriptionDays + #orders.subscriptiondays
from #CurrentOrders co
inner join #orders on
#orders.userid = co.UserId
and #orders.orderdate <= #currentdate
and #orders.orderdate >= co.StartDate
and dateadd(day, #orders.subscriptiondays, #orders.orderdate) < co.EndDate
-- Keep attempting to update rows as long as rows were updated on the previous attempt
while(#ModifiedRows > 0)
begin
update co set
SubscriptionDays = co.SubscriptionDays + overlap.subscriptiondays,
StartDate = overlap.orderdate
from #CurrentOrders co
-- join any overlapping orders
inner join (
select
#orders.userid,
sum(#orders.subscriptiondays) as subscriptiondays,
min(orderdate) as orderdate
from #orders
inner join #CurrentOrders co2 on
#orders.userid = co2.UserId
and #orders.orderdate < co2.StartDate
and dateadd(day, #orders.subscriptiondays, #orders.orderdate) > co2.StartDate
group by
#orders.userid
) overlap on
overlap.userid = co.UserId
set #ModifiedRows = ##ROWCOUNT
end
select
UserId,
sum(SubscriptionDays) as TotalSubscriptionDays,
min(StartDate),
sum(SubscriptionDays) - datediff(day, min(StartDate), #currentdate) as RemainingSubscriptionDays
from #CurrentOrders
group by
UserId
EDIT2: I've made some adjustments to the code above to address various special cases, such as if there just happen to be two orders for a user that both end on the same date.
For instance, changing the setup data to the following caused issues with the original code, which I've now corrected:
insert into #orders
select 1, 2, 10, '2011-01-01'
union
select 2, 1, 10, '2011-01-10'
union
select 3, 1, 10, '2011-01-15'
union
select 4, 2, 6, '2011-01-15'
union
select 5, 2, 4, '2011-01-17'
EDIT3: I've made some additional adjustments to address other special cases. In particular, the previous code ran into issues with the following setup data, which I've now corrected:
insert into #orders
select 1, 2, 10, '2011-01-01'
union
select 2, 1, 6, '2011-01-10'
union
select 3, 1, 10, '2011-01-15'
union
select 4, 2, 10, '2011-01-15'
union
select 5, 1, 4, '2011-01-12'
If my clarifying comment/question is correct, then you want to use DATEDIFF:
DATEDIFF(dd, orderdate, #currentdate)
My interpretation of the problem:
On day X, customer buys a “span” of subscription days (i.e. good for N days)
The span starts on the day of purchase and is good for X through day X + (N - 1)... but see below
If customer purchases a second span after the first expires (or any new span after all existing spans expire), repeat process. (A single 10-day purchase 30 days ago has no impact on a second purhcase made today.)
If customer purchases a span while existing span(s) are still in effect, the new span applies to day immediately after end of current span(s) through that date + (N – 1)
This is iterative. If customer buys 10-day spans on Jan 1st, Jan 2nd, and Jan 3rd, it would look something like:
As of 1st: Jan 1 – Jan 10
As of 2nd: Jan 1 – Jan 10, Jan 11 – Jan 20 (in effect, Jan 1 to Jan 20)
As of 3rd: Jan 1 – Jan 10, Jan 11 – Jan 20, Jan 21 – Jan 30 (in effect, Jan 1 to Jan 30)
If this is indeed the problem, then it is a horrible problem to solve in T-SQL. To deterimine the “effective span” of a given purchase, you have to calculate the effective span of all prior purchases in the order that they were purchased, because of that overall cumulative effect. This is a trivial problem with 1 user and 3 rows, but non-trivial with thousands of users with dozens of purchases (which, presumably, is what you want).
I would solve it like so:
Add column EffectiveDate of datatype date to the table
Build a one-time process to walk through every row user-by-user and orderdate by orderdate, and calculate the EffectiveDate as discussed above
Modify the process used to insert the data to calculate the EffectiveDate at the time a new entry is made. Done this way, you’d only ever have to reference the most recent purchase made by that user.
Wrangle out subsequent issues regarding deleting (cancelled?) or updating (mis-set?) orders
I may be wrong, but I don't see any way to address this using set-based tactics. (Recursive CTEs and the like would work, but they can only recurse to so many levels, and we don't know the limit for this problem -- let alone how often you'll need to run it, or how well it must perform.) I'll watch and upvote anyone who solves this without recursion!
And of course this only applies if my understanding of the problem is correct. If not, please disregard.
In fact, we need calculate summ of subscriptiondays minus days beetwen first subscrible date and #currentdate like:
select userid,
sum(subsribtiondays)-
DATEDIFF('dd',
(select min(orderdate)
from #orders as a
where a.userid=userid), #currentdate)
from #orders
where orderdate <= #currentdata
group by userid

Resources