How to delete the duplicate based on date

How to delete the duplicate based on date - sql-server

I have a table where I have several cust_id duplicates. I would like to keep the row where prendate_next is nearest to the current date and delete the rest of the duplicates. Please help me how. I am new to this
cust_id prendate_next
1000105737 2014-11-30 00:00:00.000
1000105836 2014-11-20 00:00:00.000
1000143646 2014-11-10 00:00:00.000
1000143646 2015-03-09 00:00:00.000
1000179487 2014-12-05 00:00:00.000
1000182253 2015-01-01 00:00:00.000
1000192740 2014-10-02 00:00:00.000
1000192740 2015-01-10 00:00:00.000
1000199419 2015-09-30 00:00:00.000
1000170578 2014-12-26 00:00:00.000
1000188890 2015-06-23 00:00:00.000
1000189075 2015-03-01 00:00:00.000
1000189075 2015-03-01 00:00:00.000
1000189144 2015-04-04 00:00:00.000

;WITH cte AS (
SELECT cust_id, prendate_next,
ROW_NUMBER() OVER (PARTITION BY cust_id ORDER BY ABS(DATEDIFF(DAY,prendate_next,GETDATE()))) AS RowNumber
FROM MyTable
)
DELETE MyTable
FROM MyTable
INNER JOIN cte ON MyTable.cust_id = cte.cust_id
AND MyTable.prendate_next = cte.prendate_next
WHERE cte.RowNumber != 1
ABS(DATEDIFF(DAY,prendate_next,GETDATE())) counts how many days prendate_next is from today.

Related

SQL Server : subquery to return the latest cost revision of a unit of inventory on hand

I have 2 tables that I am trying to marry together to end up with a result, which tells me the standard cost of the item that is still on hand (based on a FIFO costing method). The first table is inventory receipts , which tells me the parts left to consume and the transaction dates of those receipts. The second is a standard cost view which tells me the cost history of the item (rev = revision number which increases by 1 each time the standard cost of the part gets updated).
I currently have a solution which works using TOP 1 and ordering by DESC on effective date of cost, however, when I run this for the entire inventory list of the company , it takes over 16 minutes due to the TOP 1 sub-query inefficiency and cost.
Sample data (inventory receipts on hand):
partID warehouse transDate seqn orderID qtytoconsume
-------------------------------------------------------------
P0003 W01 2019-01-24 00:00:00.000 1 ORD0187 2
P0003 W01 2018-06-24 00:00:00.000 1 ORD0099 3
P0003 W01 2018-11-24 00:00:00.000 1 ORD0165 1
P0003 W04 2018-12-14 00:00:00.000 1 ORD0175 1
P0002 W02 2019-01-14 00:00:00.000 1 ORD0184 4
P0002 W02 2019-03-24 00:00:00.000 1 ORD0199 1
P0002 W03 2018-05-27 00:00:00.000 1 ORD0093 1
P0002 W03 2018-12-06 00:00:00.000 1 ORD0171 2
P0001 W04 2018-09-09 00:00:00.000 1 ORD0146 5
P0001 W02 2019-04-22 00:00:00.000 1 ORD0200 4
P0001 W03 2019-03-29 00:00:00.000 1 ORD0200 2
P0001 W02 2018-02-14 00:00:00.000 1 ORD0061 1
and standard cost view:
partID document effdate rev costamt
-----------------------------------------------------
P0001 IV0001 2018-01-28 00:00:00.000 1 1000.00
P0001 IV0023 2018-06-30 00:00:00.000 2 1200.00
P0001 IV0045 2019-01-01 00:00:00.000 3 1300.00
P0002 IV0001 2018-01-28 00:00:00.000 1 45.00
P0002 IV0013 2018-04-10 00:00:00.000 2 42.00
P0002 IV0045 2019-01-01 00:00:00.000 3 56.00
P0003 IV0001 2018-01-28 00:00:00.000 1 23400.00
P0003 IV0003 2018-02-20 00:00:00.000 2 11200.00
P0003 IV0045 2019-01-01 00:00:00.000 3 15000.00
P0003 IV0047 2019-02-27 00:00:00.000 4 13400.00
P0003 IV0078 2019-05-03 00:00:00.000 5 14670.00
And my result (which equals my expected result), but for large row sets is less than ideal.
partID warehouse transDate seqn orderID qty costamt
-------------------------------------------------------------
P0003 W01 2019-01-24 00:00:00.000 1 ORD0187 2 15000.00
P0003 W01 2018-06-24 00:00:00.000 1 ORD0099 3 11200.00
P0003 W01 2018-11-24 00:00:00.000 1 ORD0165 1 11200.00
P0003 W04 2018-12-14 00:00:00.000 1 ORD0175 1 11200.00
P0002 W02 2019-01-14 00:00:00.000 1 ORD0184 4 56.00
P0002 W02 2019-03-24 00:00:00.000 1 ORD0199 1 56.00
P0002 W03 2018-05-27 00:00:00.000 1 ORD0093 1 42.00
P0002 W03 2018-12-06 00:00:00.000 1 ORD0171 2 42.00
P0001 W04 2018-09-09 00:00:00.000 1 ORD0146 5 1200.00
P0001 W02 2019-04-22 00:00:00.000 1 ORD0200 4 1300.00
P0001 W03 2019-03-29 00:00:00.000 1 ORD0200 2 1300.00
P0001 W02 2018-02-14 00:00:00.000 1 ORD0061 1 1000.00
My query is:
SELECT
ioh.*, sc.costamt, sc.effdate
FROM
inventoryonHand ioh
LEFT JOIN
standardcosts sc ON sc.partID = ioh.partID
AND sc.effdate = (SELECT TOP 1 sc2.effDate
FROM standardcosts sc2
WHERE sc2.partID = sc.partID
AND sc2.effDate < ioh.transDate
ORDER BY sc2.partID ASC, sc2.effDate DESC);
Thanks so much guys!

You can try it (if your consider partID and transdate can be unique into your inventoryonHand table, otherwhise use partition by on his key) :
select * from (
select f1.*,
f2.effdate, f2.costamt, f2.rev,
row_number() over(partition by f1.partid, f1.transdate order by f2.effdate desc, f2.rev desc) as lasteffDaterank
from inventoryonHand f1
left outer join standardcosts f2 on f1.partid=f2.partid and f2.effDate < f1.transDate
) tmp
where lasteffDaterank=1

You could try to simplify the subquery using max().
(SELECT max(sc1.effdate)
FROM standardcosts sc2
WHERE sc2.partid = sc.partid
AND sc2.effdate < ioh.transdate)
For performance try an index on standardcosts (partid ASC, effdate DESC).

You can ty this too, not really sur its better ;)
select f1.*, f3.*
from inventoryonHand f1
outer apply
(
select top 1 f2.costamt from standardcosts f2
where f1.partid=f2.partid and f2.effDate < f1.transDate
order by f2.effdate desc, f2.rev desc
) f3

Order by Descending and continue with the same set of group

I want records in descending order of DATE and continue with the same set of group i.e. here the MAX date is 2018-10-25 00:00:00.000 then the next 3 records should be of REC = 5
REC DATE
===========================
1 2018-01-02 00:00:00.000
1 2018-01-03 00:00:00.000
1 2018-01-04 00:00:00.000
2 2018-06-01 00:00:00.000
2 2018-06-02 00:00:00.000
3 2018-03-01 00:00:00.000
3 2018-05-02 00:00:00.000
3 2018-01-03 00:00:00.000
3 2018-08-04 00:00:00.000
3 2018-10-05 00:00:00.000
4 2018-10-06 00:00:00.000
5 2018-10-25 00:00:00.000
5 2018-05-03 00:00:00.000
5 2018-09-09 00:00:00.000
This is what I have tried but no success.
SELECT t1.REC, t1.DATE
FROM TEMP AS t1
INNER JOIN (SELECT REC, MAX(DATE) AS MaxDate
FROM TEMP
GROUP BY REC) AS t2
ON (t1.REC = t2.REC AND t1.DATE = t2.MaxDate)
Expected result should be something like this:
REC DATE
===============================
5 2018-10-25 00:00:00.000
..........{Remaining dates of `REC` 5}
4 2018-10-06 00:00:00.000
..........{Remaining dates of `REC` 4}
3 2018-10-05 00:00:00.000
..........{Remaining dates of `REC` 3}
2 2018-06-02 00:00:00.000
..........{Remaining dates of `REC` 2}
1 2018-01-04 00:00:00.000
..........{Remaining dates of `REC` 1}

max_date is maximum date per REC
SELECT *, max_date = MAX(DATE) OVER (PARTITION BY REC)
FROM yourtable
ORDER BY max_date DESC, DATE DESC

MSSQL - split records per week_start and week_end

I have a table similar to the one represented below.
myID | some data | start_date | end_date
1 Tom 2016-01-01 2016-05-09
2 Mike 2015-03-01 2017-03-09
...
I have a function that when provided with start_date, end_date, interval (for example weeks)
returns me data as below. (splits the start and end dates to week intervals)
select * from my_function('2016-01-01','2016-01-12', 'ww')
2015-12-28 00:00:00.000 | 2016-01-03 00:00:00.000 15W53
2016-01-04 00:00:00.000 | 2016-01-10 00:00:00.000 16W1
2016-01-11 00:00:00.000 | 2016-01-17 00:00:00.000 16W2
I would like to be able to write a query that returns all of the values from the 1 table, but splits Start date and end date in to multiple rows using the function.
myID | some data | Week_start_date | Week_end_date | (optional)week_num
1 Tom 2015-12-28 2016-01-03 15W53
1 Tom 2016-01-04 2016-01-10 16W1
1 Tom 2016-01-11 2016-01-17 16W2
...
2 Mike etc....
Could someone please help me with creating such a query ?

select myID,some_data,b.Week_start_date,b.Week_end_date,b.(optional)week_num from #a cross apply
(select * from my_function('2016-01-01','2016-01-12', 'ww'))b
like sample data i tried
create table #a
(
myID int, some_data varchar(50) , start_date date, end_date date)
insert into #a values
(1,'Tom','2016-01-01','2016-05-09'),
(2,'Mike','2015-03-01','2017-03-09')
here iam keeping function result into one temp table
create table #b
(
a datetime,b datetime, c varchar(50)
)
insert into #b values
('2015-12-28 00:00:00.000','2016-01-03 00:00:00.000','15W53'),
('2016-01-04 00:00:00.000','2016-01-10 00:00:00.000','16W1 '),
('2016-01-11 00:00:00.000','2016-01-17 00:00:00.000','16W2 ')
select myID,some_data,b.a,b.b,b.c from #a cross apply
(select * from #b)b
output like this
myID some_data a b c
1 Tom 2015-12-28 00:00:00.000 2016-01-03 00:00:00.000 15W53
1 Tom 2016-01-04 00:00:00.000 2016-01-10 00:00:00.000 16W1
1 Tom 2016-01-11 00:00:00.000 2016-01-17 00:00:00.000 16W2
2 Mike 2015-12-28 00:00:00.000 2016-01-03 00:00:00.000 15W53
2 Mike 2016-01-04 00:00:00.000 2016-01-10 00:00:00.000 16W1
2 Mike 2016-01-11 00:00:00.000 2016-01-17 00:00:00.000 16W2

Based on your current result and expected result,the only difference ,i see is myID
so you will need to frame your query like this..
;with cte
as
(
select * from my_function('2016-01-01','2016-01-12', 'ww')
)
select dense_rank() over (order by somedata) as col,
* from cte
Dense Rank assigns same values for the same partition and assigs the sequential value to next partition ,unlike Rank
Look here for more info:
https://stackoverflow.com/a/7747342/2975396

Update table conditionally based on two columns and multiple rows in another table

I need to update a foreign key in table 1 with the correct entry based on table 2. The correct foreign key is the earliest date that falls after, but not before the next effective dates in table 2. If there are multiple entries in table 2 with the same effective date, then use the modified date column as a tie breaker and pick the most recent one. Here is the based table structure (all dates are in Date format):
Table 1
pK1 PeriodStartDate pK2
1 2016-04-01 00:00:00.000
2 2016-07-01 00:00:00.000
Table 2
pK2 EffectiveFrom ModifiedDate
3 2016-03-01 00:00:00.000 2016-04-01 00:00:00.000
4 2016-05-01 00:00:00.000 2016-06-01 00:00:00.000
5 2016-05-01 00:00:00.000 2016-06-02 00:00:00.000
So in the above example table 1 would look like this:
pK1 PeriodStartDate pK2
1 2016-04-01 00:00:00.000 3
2 2016-07-01 00:00:00.000 5
This is because for row 1 it falls between March 1st and May 1st (from table 2). And for row 2 it is after the last date, but as there are two similar start dates we choose the last modified.
I'm not sure of the solution. I was trying something like this:
UPDATE table1
SET pK2 = table2.pK2
FROM table2
WHERE PeriodStartDate > (SELECT FIRST(table2.EffectiveFrom) FROM table2)
I'm just not sure how to find an entry that is bounded by another row (and then needs another column for the tie breaker)

First off, you need to apply a row_number() over Table2, partitioned on the PeriodStart and ordered by the ModifiedDate (desc). Call this MaxModified; and 1 is always the most recently modified record.
pK2 PeriodStart ModifiedDate MaxModified
3 2016-03-01 00:00:00.000 2016-04-01 00:00:00.000 1
5 2016-05-01 00:00:00.000 2016-06-02 00:00:00.000 1
4 2016-05-01 00:00:00.000 2016-06-01 00:00:00.000 2
Then, for only where MaxModified=1, you add a new "id" to this so we can line up a start date, with the next rows start date (our end date). This is also done with the row_number() function ordered by the PeriodStart.
pK2 PeriodStart ModifiedDate MaxModified myID
3 2016-03-01 00:00:00.000 2016-04-01 00:00:00.000 1 1
5 2016-05-01 00:00:00.000 2016-06-02 00:00:00.000 1 2
Then we take that result and join it to itself offset by one row to get an end date value for each original row.
pK2 PeriodStart ModifiedDate MaxModified myID PeriodEnd
3 2016-03-01 00:00:00.000 2016-04-01 00:00:00.000 1 1 2016-05-01 00:00:00.000
5 2016-05-01 00:00:00.000 2016-06-02 00:00:00.000 1 2 NULL
Once we have that, its a simple matter of joining on the start/end dates to get our pk2 value.
Full script...
DECLARE #Table1 TABLE (pK1 INT, PeriodStart DATETIME, pK2 INT)
DECLARE #Table2 TABLE (pK2 INT, PeriodStart DATETIME, ModifiedDate DATETIME)
INSERT INTO #Table1
VALUES (1,'2016-04-01',NULL),
(2,'2016-07-01',NULL)
INSERT INTO #Table2
VALUES (3,'2016-03-01','2016-04-01'),
(4,'2016-05-01','2016-06-01'),
(5,'2016-05-01','2016-06-02')
;WITH OrderedList AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY PeriodStart ORDER BY ModifiedDate DESC) AS MaxModified
FROM #Table2
),X AS
(
SELECT *,
ROW_NUMBER() OVER(ORDER BY PeriodStart) AS myID
FROM OrderedList
WHERE MaxModified=1
), Y AS
(
SELECT L.*, R.PeriodStart AS PeriodEnd
FROM X L
LEFT JOIN X R ON L.myID=R.myID-1 AND R.MaxModified=1
WHERE L.MaxModified=1
)
UPDATE T SET pK2=Y.pK2
FROM #Table1 T
LEFT JOIN Y ON T.PeriodStart >= Y.PeriodStart AND T.PeriodStart < COALESCE(Y.PeriodEnd,CURRENT_TIMESTAMP)
SELECT *
FROM #Table1

SQL DATEFROMPARTS - Fails on YEAR(GETDATE())

I'm trying to create several date variables based on an original value, but with the current year. The only code I can get to run seems overly complex and gives me inaccurate results:
, DATEADD(DAY,DATEPART(DAYOFYEAR, o.AnnualReviewDate),
DATEADD(YEAR,YEAR(GETDATE())-1900,0)) AS ARDateCurr
DATEADD(DAY,DATEPART(DAYOFYEAR, o.AnnualReviewDate)+30,
DATEADD(YEAR,YEAR(GETDATE())-1900,0)) AS ARDatePlus30
Why does:
DECLARE #Now AS DATE = GETDATE()
DECLARE #Year AS INT = DATEPART(YEAR,#Now)
...
, DATEFROMPARTS(YEAR(#Now),MONTH(o.AnnualReviewDate)-1,
DAY(o.AnnualReviewDate)) AS ARDateMin30
give me the error message:
'Cannot construct data type date, some of the arguments have values which are not valid.'

At a guess, are these the values you're looking for:
declare #t table (AnnualReviewDate datetime)
insert into #t (AnnualReviewDate) values
('20121004'),('20090924'),('20101007'),('20141008'),('20090508'),
('20120229')
select
AnnualReviewDate,
r.ARCurrDue,
DATEADD(day,-30,ARCurrDue) as ARDateMin30,
DATEADD(day,30,ARCurrDue) as ARDatePlus30
from #t t
cross apply (SELECT DATEADD(year,DATEDIFF(year,AnnualReviewDate,GETDATE())
,AnnualReviewDate) as ARCurrDue) r
Results:
AnnualReviewDate ARCurrDue ARDateMin30 ARDatePlus30
----------------------- ----------------------- ----------------------- -----------------------
2012-10-04 00:00:00.000 2015-10-04 00:00:00.000 2015-09-04 00:00:00.000 2015-11-03 00:00:00.000
2009-09-24 00:00:00.000 2015-09-24 00:00:00.000 2015-08-25 00:00:00.000 2015-10-24 00:00:00.000
2010-10-07 00:00:00.000 2015-10-07 00:00:00.000 2015-09-07 00:00:00.000 2015-11-06 00:00:00.000
2014-10-08 00:00:00.000 2015-10-08 00:00:00.000 2015-09-08 00:00:00.000 2015-11-07 00:00:00.000
2009-05-08 00:00:00.000 2015-05-08 00:00:00.000 2015-04-08 00:00:00.000 2015-06-07 00:00:00.000
2012-02-29 00:00:00.000 2015-02-28 00:00:00.000 2015-01-29 00:00:00.000 2015-03-30 00:00:00.000
That is, ARCurrDue should be the same day and month as AnnualReviewDate, but in the current year, and then the other columns are plus and minus 30 days from that?
You'll note I've included a 29th February in the sample so you can see what's computed for it (you should always think about what the requirement is for such dates and include them in sample data)
The magic here is using this expression to reset a date's year to the current one, without affecting the month and day (except for Feb 29):
DATEADD(year,DATEDIFF(year,AnnualReviewDate,GETDATE())
,AnnualReviewDate)
The inner expression (DATEDIFF) is "how many year boundaries have passed since AnnualReviewDate". The outer expression then adds that same number of whole years onto AnnualReviewDate. Note that this same expression even works if AnnualReviewDate is a date in the future.

You shouldn't use integer operations in a DATETIME calculation. Do this instead:
DATEFROMPARTS(YEAR(#Now),MONTH(DATEADD(M,-1,o.AnnualReviewDate)),DAY(o.AnnualReviewDate)) AS ARDateMin30