I have two tables Person and Salary.
Person:
PersonId | Name | Surname
--------------------------------
1 John Deer
2 Mark Bear
Salary:
SId | PersonId | Date | Salary
----------------------------------------------------
1 2 2013-01-01 00:00:00.000 100
2 2 2012-01-01 00:00:00.000 90
3 2 2011-01-01 00:00:00.000 80
What I am trying to do is, if a person has a salary record then it should display the most current salary info in the results, if no salary record then it should display the salary info as null, which is like...
Result
------------------------------------------------------------------------
PersonId | Name | Surname | Date | Salary
1 John Deer NULL NULL
2 Mark Bear 2013-01-01 00:00:00.000 100
I know it has to be something like this but with lack of knowledge I just couldn't achieve..
SELECT
P.PersonId, P.Name, P.Surname, SL.Date, SL.Salary
FROM
PERSON P
LEFT OUTER JOIN
(SELECT TOP 1 S.PersonId, S.Date, S.Salary
FROM Salary
WHERE S.PersonId = P.PersonId ORDER BY Date DESC) SL
I would start by ranking the salaries by person and date with a CTE and the ROW_NUMBER() function. This will put the most recent salary by person in descending order in the first position, which we can filter for later (where rank = 1). After that, it becomes a simple LEFT JOIN from Person to the aliased CTE:
WITH RankedSalaries AS
(
SELECT
PersonId
,Date
,Salary
,ROW_NUMBER() OVER (PARTITION BY PersonId ORDER BY Date DESC) AS RowNum
FROM
Salary
)
SELECT
p.PersonId
,p.Name
,p.Surname
,s.Date
,s.Salary
FROM
Person p
LEFT JOIN
RankedSalaries s
ON
p.PersonId = s.PersonId
WHERE
s.RowNum = 1
Alternatively, you could take the contents of the CTE and move it in between the parenthesis of the query you started (i.e. LEFT JOIN (<CTE query>)). Just remember to add the = 1 constraint.
Related
I have a table that contains employee bank data
Employee |Bank |Date |Delta
---------------------------------------------------
Smith |Vacation |2023-01-01 |15.0
Smith |Vacation |2023-01-02 |Null
Smith |Vacation |2023-01-03 |Null
Smith |Vacation |2023-01-04 |7.5
I would like to write a statement so that I can update 2023-01-02 and 2023-01-03 with the Delta value from January 1. Essentially, I want to use the value from the most recent row that isn't > than the date on the row.
Once complete, I want the table to look like this:
Employee |Bank |Date |Delta
---------------------------------------------------
Smith |Vacation |2023-01-01 |15.0
Smith |Vacation |2023-01-02 |15.0
Smith |Vacation |2023-01-03 |15.0
Smith |Vacation |2023-01-04 |7.5
The source table has a unique index consisting of Employee, Bank and Date descending. There could be up to 2 billion rows in the table.
I currently update the table with the following, but I am wondering if there is a more efficient way to do so?
WITH cte_date
AS (SELECT dd.date_key,
db.balance_key,
feb.employee_key
FROM shared.dim_date dd
CROSS JOIN
(
SELECT DISTINCT
employee_key
FROM wfms.fact_employee_balance
) feb
CROSS JOIN wfms.dim_balance db
WHERE dd.date BETWEEN DATEFROMPARTS(DATEPART(YY, GETDATE()) - 2, 12, 31) AND GETDATE())
SELECT dd.*,
t.delta
INTO wfms.test2
FROM cte_date dd
LEFT JOIN wfms.test1 t ON dd.balance_key = t.balance_key
AND dd.employee_key = t.employee_key
AND t.date_key = (SELECT TOP 1 tt1.date_key
FROM wfms.test1 tt1
WHERE tt1.balance_key = t.balance_key
AND tt1.employee_key = t.employee_key
AND tt1.date_key < dd.date_key);
Just for fun, I wanted to test an idea.
For the moment, lets assume the gaps are not too wide ... In this example 7 days.
On a relative to batch, the lag() over() approach was 22% while the Cross Apply was 78%.
Again, Just for fun
Select Employee
,Bank
,Date
,Delta = coalesce(A.Delta
,lag(Delta,1) over (partition by Employee,Bank order by date)
,lag(Delta,2) over (partition by Employee,Bank order by date)
,lag(Delta,3) over (partition by Employee,Bank order by date)
,lag(Delta,4) over (partition by Employee,Bank order by date)
,lag(Delta,5) over (partition by Employee,Bank order by date)
,lag(Delta,6) over (partition by Employee,Bank order by date)
,lag(Delta,7) over (partition by Employee,Bank order by date)
)
From YourTable A
Versus
Select Employee
,Bank
,Date
,Delta = coalesce(A.Delta,B.Delta)
From YourTable A
Cross Apply ( Select top 1 Delta
From YourTable
Where Employee=A.Employee
and A.Bank = Bank
and Delta is not null
and A.Date>=Date
Order By Date desc
) B
Update
Same results with 20 days
Here is another way. Using sum() with window function to find the group "Grp" of rows (1 row with not null with subsequent rows of null). Finally max(Delta) of the Grp to return the not null value.
select Employee, Bank, [Date], max (max(Delta))
over (partition by Employee, Bank, Grp)
from
(
select *, Grp = sum (case when Delta is not null then 1 else 0 end)
over (partition by Employee,Bank
order by [Date])
from YourTable
) t
group by Employee, Bank, [Date], Grp
I'm trying to get the result of 2 queries into a single result set. I'm using SQL Server 2019 Express.
Here is the data I'm working with:
Table Sales
SaleDate
SaleAmt
CustomerID
11/1/2021
500
123
11/1/2021
100
234
11/1/2021
300
345
11/2/2021
500
456
11/2/2021
100
567
11/2/2021
200
678
Table Customers
CustomerID
CustomerName
123
Jon Doe
234
Jane Doe
456
Bob Doe
678
Jim Doe
Query #1:
select sales.saledate, sum(sales.saleamt) as 'Total Sales from All'
from Sales
group by sales.saledate
Query #2:
select sales.saledate, sum(sales.saleamt) as 'Total Sales from Customers'
from Sales
where sales.customerid in (select customerid from customers)
group by sales.saledate
This is my desired result:
SaleDate
Total Sales from All
Total Sales from Customers
11/1/2021
900
600
11/2/2021
800
700
you can use join on the date of the sale
select s1.saledate, All_Total AS 'Total Sales from All', CustomersTotal as 'Total Sales from Customers'
from (
select sales.saledate, sum(sales.saleamt) as All_Total
from Sales
group by sales.saledate
) s1
inner join
(
select sales.saledate, sum(sales.saleamt) as CustomersTotal
from Sales
where sales.customerid in (select customerid from customers)
group by sales.saledate
) s2 on s1.saledate = s2.saledate
you can combine it in one single query using case expression.
select s.saledate,
sum(s.saleamt) as [Total Sales from All],
sum(case when exists
(
select *
from customers c
where c.customerid = s.customerid
)
then s.salesamt
end) as [Total Sales from Customers]
from Sales s
group by s.saledate
You can use a LEFT JOIN with conditional aggregation
select
s.saledate,
sum(s.saleamt) as [Total Sales from All],
sum(case when c.customerid is not null then s.saleamt end) as [Total Sales from Customers]
from Sales s
left join customers c on s.customerid = c.customerid
group by s.saledate;
I would want to check ID in consecutive months, IF Same ID is present in two consecutive months then consider that ID only for 1st month.
If ID's are not in consecutive month then show the distinct ID's grouped by start date month.(We consider only start date)
For example, ID 1 is present in start date months january and Feb , then Distinct count of this ID will be 1 in Jan, how ever ID 2 and 3 are
present in Jan and March and Feb and May Resp, now I would like to see this distinct count of ID in Jan and March.
Current Data
Table1:
ID StartDate EndDate
1 2017-01-12 2017-01-28
1 2017-01-19 2017-01-28
1 2017-01-29 2017-02-11
1 2017-02-01 2017-02-11
1 2017-02-19 2017-02-24
2 2017-01-12 2017-01-28
2 2017-01-19 2017-01-28
2 2017-03-09 2017-03-20
3 2017-02-12 2017-02-28
3 2017-02-19 2017-02-28
3 2017-05-05 2017-05-29
3 2017-05-09 2017-05-29
I tried with below logic bt I know I am missing on something here.
select t.* from Table1 t
join Table1 t t1
on t1.ID=t.ID
and datepart(mm,t.StartDate)<> datepart(mm,t1.StartDate)+1
Expected Result:
DistinctCount StartDateMonth(In Numbers)
1 1(Jan)
2 1(Jan)
2 3(March)
3 2(Feb)
3 5(May)
Any help is appreciated!
Here's my solution. The thinking for this is:
1) Round all the dates to the first of the month, then work with the distinct dataset of (ID, StartDateRounded). From your dataset, the result should look like this:
ID StartDateRounded
1 2017-01-01
1 2017-02-01
2 2017-01-01
2 2017-03-01
3 2017-02-01
3 2017-05-01
2) From this consolidated dataset, find all records by ID that do not have a record for the previous month (which means it's not a consecutive month and thus is a beginning of a new data point). This is your final dataset
with DatesTable AS
(
SELECT DISTINCT ID
,DATEADD(month,DateDiff(month,0,StartDate),0) StartDateRounded
,DATEADD(month,DateDiff(month,0,StartDate)+1,0) StartDateRoundedPlusOne
FROM Table1
)
SELECT t1.ID, DatePart(month,t1.StartDateRounded) AS StartDateMonth
FROM DatesTable t1
LEFT JOIN DatesTable t2
ON t1.ID = t2.ID
AND t1.StartDateRounded = t2.StartDateRoundedPlusOne
WHERE t2.ID IS NULL; --Verify no record exists for prior month
sqlfiddler for reference. Let me know if this helps
Just need to take advantage of the lag on the inner query to compare values between rows, and apply the logic in question on the middle query, and then do a final select.
/*SAMPLE DATA*/
create table #table1
(
ID int not null
, StartDate date not null
, EndDate date null
)
insert into #table1
values (1, '2017-01-12', '2017-01-28')
, (1, '2017-01-19', '2017-01-28')
, (1, '2017-01-29', '2017-02-11')
, (1, '2017-02-01', '2017-02-11')
, (1, '2017-02-19', '2017-02-24')
, (2, '2017-01-12', '2017-01-28')
, (2, '2017-01-19', '2017-01-28')
, (2, '2017-03-09', '2017-03-20')
, (3, '2017-02-12', '2017-02-28')
, (3, '2017-02-19', '2017-02-28')
, (3, '2017-05-05', '2017-05-29')
, (3, '2017-05-09', '2017-05-29')
/*ANSWER*/
--Final Select
select c.ID
, c.StartDateMonth
from (
--Compare record values to rule a record in/out based on OP's logic
select b.ID
, b.StartDateMonth
, case when b.StartDateMonth = b.StartDateMonthPrev then 0 --still the same month?
when b.StartDateMonth = b.StartDateMonthPrev + 1 then 0 --immediately prior month?
when b.StartDateMonth = 1 and b.StartDateMonthPrev = 12 then 0 --Dec/Jan combo
else 1
end as IncludeFlag
from (
--pull StartDateMonth of previous record into current record
select a.ID
, datepart(mm, a.StartDate) as StartDateMonth
, lag(datepart(mm, a.StartDate), 1, NULL) over (partition by a.ID order by a.StartDate asc) as StartDateMonthPrev
from #table1 as a
) as b
) as c
where 1=1
and c.IncludeFlag = 1
Output:
+----+----------------+
| ID | StartDateMonth |
+----+----------------+
| 1 | 1 |
| 2 | 1 |
| 2 | 3 |
| 3 | 2 |
| 3 | 5 |
+----+----------------+
Try the below query,
SELECT ID,MIN(YEARMONTH) AS YEARMONTH
FROM (
SELECT ID
,YEAR([StartDate])*100+MONTH([StartDate]) AS YEARMONTH
,LAG(YEAR([StartDate])*100+MONTH([StartDate]))
OVER(ORDER BY ID) AS PREVYEARMONTH
,ROW_NUMBER() OVER(ORDER BY ID) AS ROW_NO
FROM #Table1
GROUP BY ID,((YEAR([StartDate])*100)+MONTH([StartDate]))
) AS T
GROUP BY ID
,(CASE WHEN YEARMONTH - PREVYEARMONTH > 1 THEN ROW_NO ELSE 0 END)
ORDER BY ID
Output:
ID YEARMONTH
1 201701
2 201701
2 201703
3 201702
3 201705
Thank you all guys. most of the logic seemed to work..but I tried just with below one and I Was good with thiis.
SELECT t1.ID, DatePart(month,t1.Startdate) AS StartDateMonth
FROM DatesTable t1
LEFT JOIN DatesTable t2
ON t1.ID = t2.ID
AND DatePart(month,t1.Startdate) = DatePart(month,t2.Startdate)+1
WHERE t2.ID IS NULL;
Thanks again
Ok, I wrote my first query without checking, believed that will work correctly. This is my updated version, should be faster than other solutions
select
id
, min(st)%12 --this will return start month
, min(st)/12 + 1 --this will return year, just in case if you need it
from (
select
id, st, gr = st - row_number() over (partition by ID order by st)
from (
select
distinct ID, st = (year(StartDate) - 1) * 12 + month(StartDate)
from
#table2
) t
) t
group by id, gr
I'm trying to do this query, in sql server, but something is wrong. Need some help...
I have a table with item movements and another one with other movements (buy) where I find the cost of each item in each date when I buy it. So, I just need first table with last cost based on the date of movement finding the cost on second table on the last date.
In other words, only must search the records from the second table with date lower than the first table date for that item and return the cost of the most recent date.
Examples:
First Table
REF DATE
1 2015-10-15
1 2015-08-30
2 2015-09-11
3 2015-05-22
2 2015-03-08
2 2015-07-15
3 2015-11-14
1 2015-11-20
Second Table (Buy)
REF DATE COST
1 2015-08-20 150
1 2015-10-12 120
2 2015-04-04 270
2 2015-06-15 280
3 2015-03-01 75
3 2015-10-17 80
I need this result:
REF DATE Cost
1 2015-10-15 120
1 2015-08-30 150
2 2015-09-11 280
3 2015-05-22 75
2 2015-03-08 -
2 2015-07-15 280
3 2015-11-14 80
1 2015-11-20 120
Any help appreciated.
You can do it using OUTER APPLY:
SELECT [REF], [DATE], [COST]
FROM Table1 AS t1
OUTER APPLY (
SELECT TOP 1 COST
FROM Table2 AS t2
WHERE t1.REF = t2.REF AND t1.DATE >= t2.DATE
ORDER BY t2.DATE DESC) AS t3
Demo here
;WITH cte AS (
SELECT ft.*,
st.[Cost],
ROW_NUMBER() OVER (PARTITION BY ft.[Ref],ft.[Date] ORDER BY st.[Date] DESC) RN
FROM FirstTable ft
LEFT JOIN SecondTable st ON ft.[Ref] = st.[Ref]
AND ft.[Date] >= st.[Date]
)
SELECT Ref,
[Date],
[Cost]
FROM cte
WHERE RN = 1
or if you dont want to use a cte.
SELECT
Ref,
[Date],
[Cost]
FROM
(SELECT
ft.*,
st.[Cost],
ROW_NUMBER() OVER (PARTITION BY ft.[Ref],ft.[Date] ORDER BY st.[Date] DESC) RN
FROM
FirstTable ft
LEFT JOIN SecondTable st ON ft.[Ref] = st.[Ref]
AND ft.[Date] >= st.[Date]
) t
WHERE
t.RN = 1
Using SQL Server 2005
Table1
ID FromDate ToDate
001 23-02-2009 25-02-2009
001 27-02-2009 29-02-2009
002 12-02-2009, 25-03-2009
...,
Table2
ID Name Total
001 Raja 30
002 Ravi 22
I want to get total day for the personid
Tried Query,
SELECT
table2.Id, table2.name, table2.total,
datediff(day, table1.fromdate, table2.todate)
FROM table1
LEFT OUTER JOIN table2 ON table1.personid = table2.personid
Getting output
ID Name Total Days
001 Raja 30 3
001 Raja 30 3
...,
It should total the days and it should display in one line,
Note: Suppose I am selecting the particular period date means it should display that days only
For example
where date between 26-02-2009 to 03-03-2009, It should display
ID Name Total Days
001 Raja 30 3
...,
Because am taking date after 25-02-2009,
Expected Output
ID Name Total Days
001 Raja 30 6
002 Ravi 22 16
How to modify my query?
DATEDIFF gives the number of days difference between two dates, so in the same way the different between 1 and 3 is 2 (3 - 1 = 2), DATEDIFF(d) is effectively D2 - D1. So to compensate for the extra day you want to count, you need to DATEADD a day to either (ToDate or FromDate) to offset your dates:
SELECT table2.id, table2.Name, table2.Total, SUM(DATEDIFF(d, DATEADD(d, -1, table1.FromDate), table1.ToDate))
FROM table1
INNER JOIN table2 ON table1.id = table2.id
GROUP BY table2.id, table2.Name, table2.Total
I think a GROUP BY query would be simpler:
SELECT table2.Id, table2.name, table2.total,
SUM(DATEDIFF(day, table1.fromdate, table1.todate)) AS Days
FROM table1
left outer join table2 on
table1.personid = table2.personid
GROUP BY table2.Id, table2.name, table2.total
SELECT table2.Id, table2.name, table2.total,
COALESCE(
(
SELECT SUM(DATEDIFF(day, table1.fromdate, table1.todate) + 1)
FROM table1
WHERE table1.personid = table2.personid
), 0) AS [days]
FROM table2