SQL Delete Rows with Duplicate Key Keeping Most Recent - sql-server

Dearest Professionals,
I have a table that sometimes has rows created with duplicate Invoice #'s (EMP_ID). In these rows, there separate date (FILE_DATE) and time (FILE_TIME) columns (genius database design there). I need to remove the older rows of any duplicated EMP_ID's in this database, keeping the most recent date (from FILE_DATE) + time (from FILE_TIME).
Both FILE_DATE and FILE_TIME are date/time field in the database. The software we use writes to this table, adding the date of the invoice to the FILE_DATE column, with YYYY-MM-DD 00:00:00.000 (the zeros all hard coded). Then the FILE_TIME field has 1900-01-01 HH:mm:ss.SSS, the 1900-01-01 hard coded. (the time stamp comes from the time the row was written to the database)
So, long story short, I need to marry these two together, to get the DATE portion of FILE_DATE and the time portion of FILE_TIME, to get the most recent (IF duplicates exist of EMP_ID) and delete all duplicated that are not the most recent of the married FILE_DATE & FILE_TIME.
Here is a sample of what a Before & After situation would look like.
BEFORE:
AFTER:
Any and all help would be insanely appreciated.

Using some good old CTE "magic":
WITH CTE AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY EMP_ID
ORDER BY FILE_DATE DESC, FILE_TIME DESC) AS RN
FROM YourTable)
DELETE FROM CTE
WHERE RN > 1;

I think this can be accomplished using MAX and GROUP BY:
select B.EMP_ID
, B.File_date
, Max(B.File_Time) as MaxFileTime
, B.DESC_TEXT_1
from Before B
group
by B.EMP_ID
, B.File_date
, B.DESC_TEXT_1

Related

SQL Server: Slowly Changing Dimension Type 2 on historical records

I am trying to set up a SCD of Type 2 for historical records within my Customer table. Attached is how the Customer table is set up alongside the expected outcome. Note that the Customer table in practice has 2 million distinct Customer IDs. I tried to use the query below, but the Start_Date and End_Date are repeating for each row.
SELECT t.Customer_ID, t.Lifecyle_ID, t.Date As Start_Date,
LEAD(t.Date) OVER (ORDER BY t.Date) AS End_Date
FROM Customer AS t
I think a three step query is likely needed.
Use LEAD and LAG, partitioned by Customer and ordered by date, to peek at the next row's values for both Date and Lifecycle.
Use a CASE statement to emit a value for End Date when the current row's Lifecycle <> the next row's lifecycle (otherwise emit NULL). Now do the same using LAG for the Effective Date.
Group By or Distinct on the output from Step #2.
Hopefully that makes sense. I'll try to post a code example later today, but hopefully that's enough to get you started.

How to get just first(datetime2 data type) value form each date?

I have SQL query for filtering working time of employees. Actually I need to create a report(using visual studio reports) that displays arrival time for every employee and for each day in month. In my SQL I have a few records(arrival time) for the same date. Only I want to do is to show just first arrival time, not second and further. I called that column Start_Session as datetime2 type.
How can I filter it, is it possible?
It seems you aren't understanding the other question that Larnu linked you to in the comments. The answer to that question is not suggesting you want to be using a top(1), but instead filtering for the first row in a defined group. Per those answers, that can be achieved with the row_number() window function. In your case, this would look something like the following:
with r as
(
select Employee
,cast(ArrivalTime as date) as ArrivalDate
,ArrivalTime
,row_number() over (partition by Employee, cast(ArrivalTime as date) order by ArrivalTime) as rn
from YourTable
)
select Employee
,ArrivalDate
,ArrivalTime
from r
where rn = 1;

How to aggregate over two columns with duplicates in both, and then some?

This is probably very simple but I'm stupid and stuck and failed to find a thread that quite matched my problem...
I need to do an insert from a table, say tblGameRecords, that looks something like this:
tblGameRecords(ID:match_no, soccer_team_id, stadium, fake_injuries, hair_wax, date)
...into another table, tblTeamRecords, that needs to look like this:
tblTeamRecords(ID:soccer_team_id, stadium, fake_injuries, hair_wax, date)
Now, my problem is that in tblGameRecords:
1. There are natural multiple occurrences of the same soccer_team_id's.
2. There are natural multiple occurrences of the same date.
3. There are sometimes multiple occurrences of the same soccer_team_id on the same date (sigh...)
I want to insert into tblTeamRecord one row per soccer_team_id. I want the earliest record of that team from tblGameRecords.
If the team makes its entrée in tblGameRecords as a duplicate, several times on the same date, I'm fine with any one row of those, because the other columns need to be filled with the respective values from that row, regardless of the actual values which may or may not differ from the other duplicates.
And I'm obviously having trouble formulating a query that lets me narrow down these multiples to just one. This is part of a stored procedure btw.
* EDIT again: Deleted the redundant additional info *
You can use ROW_NUMBER to generate row numbers per date for each value of soccer_team_id (to partition the numbering by it) and then insert only those rows, where row number equals one:
;with cte as (
select soccer_team_id, stadium, fake_injuries, hair_wax, date, row_number() over(partition by soccer_team_id order by date) as row_no
from tblGameRecords
)
insert into tblTeamRecords(soccer_team_id, stadium, fake_injuries, hair_wax, date)
select soccer_team_id, stadium, fake_injuries, hair_wax, date
from cte
where row_no = 1

Factoring public holidays in to a SQL code

Apologies if this is a simple one. I'm looking for some help with the following:
SELECT *
FROM (
SELECT TOP 7
RIGHT (CONVERT (VARCHAR, CompletedDate, 108), 8) AS Time,
WorkType
FROM Table
WHERE WorkType = 'WorkType1'
OR DATEPART (DW, CompletedDate) IN ('7','1')
AND WorkType = 'WorkType2'
ORDER BY CompletedDate DESC) Table
ORDER BY CompletedDate ASC
Multiple events run every day, and the above searches for the last one scheduled to run each day, and pulls the time from it for the past 7 days. This time marks the end of the day's events, and is the value I'm after.
Events run at a different order on weekends, so I search for a different WorkType. WorkType 1 is unique to weekdays. WorkType2 is run both at weekdays and weekends, however it is not the final event on a weekday, so I don't search for it then.
However, this kind of falls apart when public holidays such as bank holidays come into play, as they use the weekend timings. I still need to capture these times, but the above skips over them. If I were to remove or expand the DATEPART, I would end up with duplicate values for each day that don't mark the final job of the day.
What changes can I make to this to capture these lost holiday timings, without manually going through and checking every holiday? Is there a way that I can return a value for JobType2, if JobType1 does not appear on a day?
I suggest a materialized calendar table with one row per date along with the desired WorkType for that day. That will allow you to simply join on to the calendar table to determine the proper WorkType value without embedding the logic in the query itself.
With this table loaded with all dates for your reporting domain:
CREATE TABLE dbo.WorkTypeCalendar(
CalendarDate date NOT NULL
CONSTRAINT PK_Calendar PRIMARY KEY CLUSTERED
, WorkType varchar(10) NOT NULL
);
GO
The query can be refactored as below:
SELECT *
FROM ( SELECT TOP 7
RIGHT(CONVERT (varchar, CompletedDate, 108), 8) AS Time
, WorkType
FROM Table1 AS t
JOIN WorkTypeCalendar AS c ON t.WorkType = c.WorkType
AND t.CompletedDate >= c.CalendarDate
AND t.CompletedDate < DATEADD(DAY,
1,
c.CalendarDate)
ORDER BY CompletedDate DESC
) Table1
ORDER BY CompletedDate ASC
You also might consider making this a generalized utility calendar table. See http://www.dbdelta.com/calendar-table-and-datetime-functions/ for an complete example of such a table and script to load US holidays you can adjust for your needs and locale.

SQL Query to determine VAT rate

I'm looking to create a 3 column VAT_Parameter table with the following columns:
VATID, VATRate, EffectiveDate
However, I can't get my head around how I would identify which vat rate applies to an invoice date.
for example if the table was populated with:
1, 17.5, 1/4/1991
2, 15, 1/1/2009
3, 20, 4/1/2011
Say for example I have an invoice dated 4/5/2010, how would an SQL query select the correct VAT rate for that date?
select top 1 *
from VatRate
where EffectiveDate<=#InvoiceDate
order by EffectiveDate desc
Or, with a table of invoices
select id, invoicedate, rate
from
(
select
inv.id, inv.invoicedate, vatrate.rate, ROW_NUMBER() over (partition by inv.id order by vatrate.effectivedate desc) rn
from inv
inner join vatrate
on inv.invoicedate>=vatrate.effectivedate
) v
where rn = 1
PS. The rules for the rate of VAT to be charged when the rate changes are more complicated than just the invoice date. For example, the date of supply also matters.
I've run into this kind of thing before. There are two choices I can think of:
1. Expand the table to have two dates: EffectiveFrom and EffectiveTo. (You'll have to have a convention about whether each of these is exclusive or inclusive - but that's always a problem when using dates). This raises the problem of validating that the table population, as a whole, makes sense. e.g. that you don't end up with one row with Rate1 effective from 1/1/2000-1/1/2002, and another (overlapping) with Rate2 effective from 30/10/2001-1/1/2003. Or an uncovered gap in time, where no rate applies. Since this sounds like a very slowly-changing table, populated occasionally (by people who know what they're doing?), this could be the best solution. The SQL to get the effective rate would then be simple:
SELECT VATRate FROM VATTable WHERE (EffectiveFrom<=[YourInvoiceDate]) AND (EffectiveTo>=[YourInvoiceDate])
or
2. Use your existing table structure, and use some slightly more complicated SQL to determine the effective rate for an invoice.
Using your existing structure, something like this would work:
SELECT VATTAble.VATRate FROM
VATTable
INNER JOIN
(SELECT Max(EffectiveDate) AS LatestDate FROM VATTable WHERE EffectiveDate<=
YourInvoiceDate) latest
ON VATTable.EffectiveDate=latest.LatestDate
An easier option may just be to denormalise your data structure and store the VAT rate in the invoice table itself.

Resources