I have two tables, Table A has a date field for every day of the year and a rate field for each of those days. Table B has a list of certain dates. How do I get the average of the rate field in Table A for the first date in Table B and the second date in Table B, then the average rate in Table A for the second date and the third date in Table B and so on and so forth.
Below is my attempt thus far.
Select Tran_Date, avg(rate)
From [dbo].[10_Year_TBill_Rates] a
left join #cashwithrn b
on a.observation_date = b.Tran_Date
where Tran_Date is null
group by Tran_date
You have to remove the row: where Tran_Date IS NULL:
SELECT Tran_Date, ISNULL(AVG(rate), 0)
FROM [dbo].[10_Year_TBill_Rates] a
LEFT JOIN #cashwithrn b
ON a.observation_date = b.Tran_Date
GROUP BY Tran_date
Since you're only average two numbers at any time, it might be easier to do something like this:
; With CTE as (Select b.date
, a.rate
, row_number() over (order by b.date) as RN
from TableB b
left join TableA a
on b.date = a.date)
Select a.date as Date1
, b.date as Date2
, (a.rate + b.rate)/2 as AverageRate from CTE a
left join CTE b
on a.rn + 1 = b.rn
Note that the row_number is unnecessary if you already have sequentially assigned IDs in your table.
The first part retrieves the rate for each of the dates in B, the second part connects that rate with the rate from the row immediately following it, and averages them.
Depending on your version of SQL Server, you might be able to simplify this with lag or lead. If you are uncomfortable with CTEs, you can apply the same logic using a temp table instead.
Related
I am trying to find out the average time per month it takes for someone to complete a task but where one group of people have a disability where as the other group don't.
I have a temp table named #Temp that holds the unique identifier for each person that holds a disability. The join value Number is the unique identifier for each person.
The query currently looks like;
DROP TABLE IF EXISTS #Temp
SELECT *
INTO #Temp
FROM [Table]
WHERE [Disability] = 'Y'
SELECT [MonthName]
, AVG(DATEDIFF(DAY, [DateStarted], [DateEnded])) AS [Average Length In Days For Completion For Disabled Users]
FROM TableName
LEFT JOIN #Temp AS T ON T.[Number] = [Number]
LEFT JOIN [Calendar] AS Cal ON Cal.[Date] = [DateStarted]
WHERE [DateStarted] >= '20220101'
AND T.[Disability] = 'Y'
GROUP BY [MonthName]
ORDER BY [MonthName]
SELECT [MonthName]
, AVG(DATEDIFF(DAY, [DateStarted], [DateEnded])) AS [Average Length In Days For Completion For Non-Disabled Users]
FROM TableName
LEFT JOIN [Calendar] AS Cal ON Cal.[Date] = [DateStarted]
WHERE [DateStarted] >= '20220101'
GROUP BY [MonthName]
ORDER BY [MonthName]
How can I merge both these queries together so that there is one record per month for each average? If I do a subquery, it returns 2 rows per month with the non-disability people having NULL records as I have to group it by disability.
Since avg ignores null values you can combine the two queries using conditional aggregation:
SELECT [MonthName]
, AVG( D.DAYS ) AS [Average Length In Days For Completion For All Users]
, AVG( CASE WHEN T.[Disability] = 'Y' THEN D.DAYS END ) AS [Average Length In Days For Completion For Disabled Users]
, AVG( CASE WHEN T.[Disability] <> 'Y' THEN D.DAYS END ) AS [Average Length In Days For Completion For Non-Disabled Users]
FROM TableName
LEFT JOIN #Temp AS T ON T.[Number] = [Number]
LEFT JOIN [Calendar] AS Cal ON Cal.[Date] = [DateStarted]
CROSS APPLY ( DATEDIFF( DAY, [DateStarted], [DateEnded] ) AS DAYS ) AS D
WHERE [DateStarted] >= '20220101'
GROUP BY [MonthName]
ORDER BY [MonthName]
The semantics of Disability were not provided by the OP so I have taken the liberty of making an uneducated guess that 'Y' and something else are present for all users, a fact belied by the use of left outer join. Some tweaking of the case conditions may be needed to make the logic correct, e.g. checking for T.[Disability] IS NULL OR T.[Disability] <> 'Y'.
Note: Best practice would be to use a table alias on each column reference. Since the OP declined to share DDL for the tables I have not attempted to add the aliases.
I have been trying to filter one table by two dates with an order of importance (date2 > date1) as follows:
SELECT
t1.customer, t1.weights, t1.max(t1.date1) as date1, t1.date2
FROM
(SELECT *
FROM table
WHERE CAST(date2 AS smalldatetime) = '10/29/2017') t2
INNER JOIN
table t1 ON t1.customer = t2.customer
AND t1.date2 = t2.date2
GROUP BY
t1.customer, t1.date2
ORDER BY
t1.customer;
It filters the table correctly by date2 first, the max(t1.date1) doesn't what I want it to do though. I get duplicate customers, that share the same (and correct) date2, but show different date1's. These duplicate records have the following in common: The weight row is different. What would I need to do to output just the the customer records connected to the most current date1 without taking other columns into consideration?
I am still a noob, help would be greatly appreciated!
Solution for t-sql (all based on the accepted answer):
SELECT * FROM (
SELECT row_number() over(partition by t1.customer order by t1.date1 desc) as rownum, t1.customer, t1.weights, t1.date1 , t1.date2
FROM
(SELECT *
FROM table
WHERE CAST(date2 AS smalldatetime) = '10/29/2017') t2
INNER JOIN
table t1 ON t1.customer = t2.customer
AND t1.date2 = t2.date2
)t3
where rownum = 1;
If I understood correctly, then instead of a group by logic, I would just use a qualify row statement :)
Try the code below and tell me if it's what you needed - what I'm telling it to do is to bring back only one row per customer ID....but where we select the row based on the dates (by sorting them in ascending order) - however, I'm unclear of what you mean by importance of the 2 dates so I may be completely off base here...can you please give an example of input and desired output?
SELECT t1.customer, t1.weights, t1.date1, t1.date2
FROM
(
Select *
FROM table
WHERE Cast(date2 as smalldatetime)='10/29/2017'
) t2
Inner Join table t1
ON t1.customer = t2.customer
AND t1.date2 = t2.date2
Qualify row_number() over(partition by t1.customer order by date2 , date1)=1
Order By t1.customer;
select D.[Date], E.emp_name, E.emp_jde, count(C.[agent_no]) calls, count(S.[EMPJDENUM]) sales
from
(select cast([start_date] as date) dte, [agent_no]
from call_table
where [skill_name] like '%5700 sales l%'
and [Agent_Time] != '0'
) C
full outer join
(select [AC#DTE_dt], [EMPJDENUM]
from sales_table
where [ICGCD2] in ('LAWN', 'HORT')
and [CHANNEL]= 'INQ'
and [ITMQTY]>3
) S on c.dte=s.[AC#DTE_dt]
right join
(select [Date]
from Date_table
) D on c.dte=d.[Date] or s.[AC#DTE_dt]=d.[Date]
right join
(select [emp_name], [emp_jde], [agent_no]
from Employee_table
) E on C.[agent_no]=E.agent_no and S.[EMPJDENUM]=E.emp_jde
group by D.[Date], E.emp_name, E.emp_jde
Date Tables -
Note: Not all dates will have both calls and sales.
Additional Tables -
What needs to be accomplished -
1) Join and Aggregate calls and sales by Employee by joining the calls table (on agent_no) and sales (on JDE) table
2) Since not all dates will include both calls and sales - use the date dimension table to ensure all dates are represented
The desired result would look like this -
The query I wrote executes - it takes so long I just end up canceling the query.
Any help would be appreciated.
Without seeing the query plan, it is a little tricky, but here are a couple of suggestions that might improve the performance:
remove the leading wildcard in where [skill_name] like '5700 sales l%'
put the group by into the subqueries
I have an example here that implements both of those. (Note that I did some reformatting just to try to understand what your query was doing.)
select D.[Date], E.emp_name, E.emp_jde, C.Calls, S.Sales
from Date_table As D
Left Join (
select cast([start_date] as date) As CallDate, [agent_no], Count(*) As Calls
from call_table
where [skill_name] like '5700 sales l%'
and [Agent_Time] != '0'
Group By Cast([start_date] As date), [agent_no]) As C On D.[Date] = C.CallDate
Left Join (
select [AC#DTE_dt] As SaleDate, [EMPJDENUM], Count(*) As Sales
from sales_table
where [ICGCD2] in ('LAWN', 'HORT')
and [CHANNEL]= 'INQ'
and [ITMQTY]>3
Group By [AC#DTE_dt], [EMPJDENUM]) As S on D.[Date] = s.SaleDate
right join Employee_table As E
on C.[agent_no]=E.agent_no
and S.[EMPJDENUM]=E.emp_jde;
Edit
In order to get a row for each possible combination of date and employee, you will need a cross join of the date table and the employee table.
select D.[Date], E.emp_name, E.emp_jde, C.Calls, S.Sales
from Date_table As D,
Employee_table as E
Left Join (
select cast([start_date] as date) As CallDate, [agent_no], Count(*) As Calls
from call_table
where [skill_name] like '5700 sales l%'
and [Agent_Time] != '0'
Group By Cast([start_date] As date), [agent_no]) As C
On D.[Date] = C.CallDate
And E.agent_no = C.agent_no
Left Join (
select [AC#DTE_dt] As SaleDate, [EMPJDENUM], Count(*) As Sales
from sales_table
where [ICGCD2] in ('LAWN', 'HORT')
and [CHANNEL]= 'INQ'
and [ITMQTY]>3
Group By [AC#DTE_dt], [EMPJDENUM]) As S
on D.[Date] = s.SaleDate
and E.emp_jde = S.[EMPJDENUM];
I need some help structuring a query to only pull back recurrences that are after a set number of days, in my case 30.
My table structure is as follows:
PatientID Date
1 2015-09-01
1 2015-09-03
2 2015-03-04
2 2015-03-07
2 2015-09-15
In this example, I only want to return rows 1, 3, and 5.
I tried doing a left join on itself, where the date in the second is > DATEADD(D,30,Date).
My other thought was a recursive CTE with the first query pulling the min date for each patient then a union where the table date was at least 30 days greater than the max of each patients CTE date but you can't have a max in the join statement.
I'm pretty stumped. Any advice would be greatly appreciated.
This is how I would do it:
SELECT * FROM MyTable t1
WHERE NOT EXISTS(
SELECT * FROM MyTable t2
WHERE t1.PatientId=t2.PatientId
AND t2.Date < t1.Date
AND DATEDIFF(dd, t2.Date, t1.Date) < 30
)
ORDER BY t1.PatientId, t1.Date ASC
I think something like this should work (notepad coding here, so the syntax may be a little off)
WITH CTE(
SELECT PatientId, Min(Date) as Date
FROM MyTable
Group BY PatientId)
SELECT A.*
FROM MyTable A
LEFT OUTER JOIN CTE CTE
ON A.PatientId = CTE.PatientId
AND (A.Date = CTE.Date OR A.Date > DATEAdd(dd, 30, CTE.Date)
WHERE CTE.PatientId IS NOT NULL
The objective is below the list of tables.
Tables:
Table: Job
JobID
CustomerID
Value
Year
Table: Customer
CustomerID
CustName
Table: Invoice
SaleAmount
CustomerID
The Objective
Part 1: (easy) I need to select all invoice records and sort by Customer (To place nice w/ Crystal Reports)
Select * from Invoice as A inner join Customer as B on A.CustomerID = B.CustomerID
Part 2: (hard) Now, we need to add two fields:
JobID associated with that customer's job that has the Maximum Value (from 2008)
Value associated with that job
Pseudo Code
Select * from
Invoice as A
inner join Customer as B on A.CustomerID = B.CustomerID
inner join
(select JobID, Value from Jobs where Job:JobID has the highest value out of all of THIS customer's jobs from 2008)
General Thoughts
This is fairly easy to do If I am only dealing with one specific customer:
select max(JobId), max(Value) as MaxJobID from Jobs where Value = (select max(Value) from Jobs where CustomerID = #SpecificCustID and Year = '2008') and CustomerID = SpecificCustID and CustomerID = '2008'
This subquery determines the max Value for this customer in 2008, and then its a matter of choosing a single job (can't have dupes) out of potential multiple jobs from 2008 for that customer that have the same value.
The Difficulty
What happens when we don't have a specific customer ID to compare against? If my goal is to select ALL invoice records and sort by customer, then this subquery needs access to which customer it is currently dealing with. I suppose this can "sort of" be done through the ON clause of the JOIN, but that doesn't really seem to work because the sub-sub query has no access to that.
I'm clearly over my head. Any thoughts?
How about using a CTE. Obviously, I can't test, but here is the idea. You need to replace col1, col2, ..., coln with the stuff you want to select.
Inv( col1, col2, ... coln)
AS
(
SELECT col1, col2, ... coln,
ROW_NUMBER() OVER (PARTITION BY A.CustomerID
ORDER BY A.Value DESC) AS [RowNumber]
FROM Invoice A INNER JOIN Customer B ON A.CustomerID = B.CustomerID
WHERE A.CustomerID = #CustomerID
AND A.Year = #Year
)
SELECT * FROM Inv WHERE RowNumber = 1
If you don't have a CustomerID, this will return the top value for each customer (that will hurt on performance tho).
The row_number() function can give you what you need:
Select A.*, B.*, C.JobID, C.Value
from
Invoice as A
inner join Customer as B on A.CustomerID = B.CustomerID
inner join (
select JobID, Value, CustomerID,
ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY Value DESC) AS Ordinal
from Jobs
WHERE Year = 2008
) AS C ON (A.CustomerID = C.customerID AND C.Ordinal = 1)
The ROW_NUMBER() function in this query will order by value in descending order and the PARTITION BY clause will do this separately for each different value of CustomerID. This means that the highest Value for each customer will always be 1, so we can join to that value.
The over function is an awesome, but often neglected function. You can use it in a subquery to pull back your valid jobs, like so:
select
a.*
from
invoice a
inner join customer b on
a.customerid = b.customerid
inner join (select customerid, max(jobid) as jobid, maxVal from
(select customerid,
jobid,
value,
max(value) over (partition by customerid) as maxVal
from jobs
where Year = '2008') s
where s.value = s.maxVal
group by customerid, maxVal) c on
b.customerid = c.customerid
and a.jobid = c.jobid
Essentially, that first inner query looks like this:
select
customerid,
jobid,
value,
max(value) over (partition by customerid) as maxVal
from jobs
where Year = '2008'
You'll see that this pulls back all of the jobs, but with that additional column which lets you know what the maximum value is for each customer. With the next subquery, we filter out any rows that have value and maxVal equal. Additionally, it finds the max JobID based on customerid and maxVal, because we need to pull back one and only one JobID (as per the requirements).
Now, you have a complete listing of CustomerID and JobID that meet the conditions of having the highest JobID that contains the maximum Value for that CustomerID in a given year. All that's left is to join it to Invoice and Customer, and you're good to go.
Just to be complete with the non row_number solution for those < MSSQL 2005. Personanly, I find it easier to follow myslef...but that could be biased considering how much time I spend in MSSQL 2000 vs 2005+.
SELECT *
FROM Invoice as A
INNER JOIN Customer as B ON
A.CustomerID = B.CustomerID
INNER JOIN (
SELECT
CustomerId,
--MAX in case dupe Values.
==If UC on CustomerId, Value (or CustomerId, Year, Value) then not needed
MAX(JobId) as JobId
FROM Jobs
JOIN (
SELECT
CustomerId,
MAX(Value) as MaxValue
FROM Jobs
WHERE Year = 2008
GROUP BY
CustomerId
) as MaxValue ON
Jobs.CustomerId = MaxValue.CustomerId
AND Jobs.Value = MaxValue.MaxValue
WHERE Year = 2008
GROUP BY
CustomerId
) as C ON
B.CustomerID = C.CustomerID