Copy Distinct Records Based on 3 Cols - sql-server

I have loads of data in a table called Temp. This data consists of duplicates.
Not Entire rows but the same data in 3 columns. They are HouseNo,DateofYear,TimeOfDay.
I want to copy only the distinct rows from "Temp" into another table, "ThermData."
Basically what i want to do is copy all the distinct rows from Temp to ThermData where distinct(HouseNo,DateofYear,TimeOfDay). Something like that.
I know we can't do that. An alternative to how i can do that.
Do help me out. I have tried lots of things but haven't solved got it.
Sample Data. Values which are repeated are like....
I want to delete the duplicate row based on the values of HouseNo,DateofYear,TimeOfDay
HouseNo DateofYear TimeOfDay Count
102 10/1/2009 0:00:02 AM 2
102 10/1/2009 1:00:02 AM 2
102 10/1/2009 10:00:02 AM 2

Here is a Northwind example based on the Orders table.
There are duplicates based on the (EmployeeID , ShipCity , ShipCountry) columns.
If you only execute the code between these 2 lines:
/* Run everything below this line to show crux of the fix */
/* Run everything above this line to show crux of the fix */
you'll see how it works. Basically:
(1) You run a GROUP BY on the 3 columns of interest. (derived1Duplicates)
(2) Then you join back to the table using these 3 columns. (on ords.EmployeeID = derived1Duplicates.EmployeeID and ords.ShipCity = derived1Duplicates.ShipCity and ords.ShipCountry = derived1Duplicates.ShipCountry)
(3) Then for each group, you tag them with Cardinal numbers (1,2,3,4,etc) (using ROW_NUMBER())
(4) Then you keep the row in each group that has the cardinal number of "1". (where derived2DuplicatedEliminated.RowIDByGroupBy = 1)
Use Northwind
GO
declare #DestinationVariableTable table (
NotNeededButForFunRowIDByGroupBy int not null ,
NotNeededButForFunDuplicateCount int not null ,
[OrderID] [int] NOT NULL,
[CustomerID] [nchar](5) NULL,
[EmployeeID] [int] NULL,
[OrderDate] [datetime] NULL,
[RequiredDate] [datetime] NULL,
[ShippedDate] [datetime] NULL,
[ShipVia] [int] NULL,
[Freight] [money] NULL,
[ShipName] [nvarchar](40) NULL,
[ShipAddress] [nvarchar](60) NULL,
[ShipCity] [nvarchar](15) NULL,
[ShipRegion] [nvarchar](15) NULL,
[ShipPostalCode] [nvarchar](10) NULL,
[ShipCountry] [nvarchar](15) NULL
)
INSERT INTO #DestinationVariableTable (NotNeededButForFunRowIDByGroupBy , NotNeededButForFunDuplicateCount , OrderID,CustomerID,EmployeeID,OrderDate,RequiredDate,ShippedDate,ShipVia,Freight,ShipName,ShipAddress,ShipCity,ShipRegion,ShipPostalCode,ShipCountry )
Select RowIDByGroupBy , MyDuplicateCount , OrderID,CustomerID,EmployeeID,OrderDate,RequiredDate,ShippedDate,ShipVia,Freight,ShipName,ShipAddress,ShipCity,ShipRegion,ShipPostalCode,ShipCountry
From
(
/* Run everything below this line to show crux of the fix */
Select
RowIDByGroupBy = ROW_NUMBER() OVER(PARTITION BY ords.EmployeeID , ords.ShipCity , ords.ShipCountry ORDER BY ords.OrderID )
, derived1Duplicates.MyDuplicateCount
, ords.*
from
[dbo].[Orders] ords
join
(
select EmployeeID , ShipCity , ShipCountry , COUNT(*) as MyDuplicateCount from [dbo].[Orders] GROUP BY EmployeeID , ShipCity , ShipCountry /*HAVING COUNT(*) > 1*/
) as derived1Duplicates
on ords.EmployeeID = derived1Duplicates.EmployeeID and ords.ShipCity = derived1Duplicates.ShipCity and ords.ShipCountry = derived1Duplicates.ShipCountry
/* Run everything above this line to show crux of the fix */
)
as derived2DuplicatedEliminated
where derived2DuplicatedEliminated.RowIDByGroupBy = 1
select * from #DestinationVariableTable
emphasized text*emphasized text*emphasized text

Related

Select only the most recent datarows [duplicate]

This question already has answers here:
Get top 1 row of each group
(19 answers)
Closed 1 year ago.
I have a table that takes multiple entries for specific products, you can create a sample like this:
CREATE TABLE test(
[coltimestamp] [datetime] NOT NULL,
[col2] [int] NOT NULL,
[col3] [int] NULL,
[col4] [int] NULL,
[col5] [int] NULL)
GO
Insert Into test
values ('2021-12-06 12:31:59.000',1,8,5321,1234),
('2021-12-06 12:31:59.000',7,8,4047,1111),
('2021-12-06 14:38:07.000',7,8,3521,1111),
('2021-12-06 12:31:59.000',10,8,3239,1234),
('2021-12-06 12:31:59.000',27,8,3804,1234),
('2021-12-06 14:38:07.000',27,8,3957,1234)
You can view col2 as product number if u like.
What I need is a query for this kind of table that returns unique data for col2, it must choose the most recent timestamp for not unique col2 entries.
In other words I need the most recent entry for each product
So in the sample the result will show two rows less: the old timestamp for col2 = 7 and col2 = 27 are removed
Thanks for your advanced knowledge
Give a row number by ROW_NUMBER() for each col2 value in the descending order of timestamp.
;with cte as(
Select rn=row_number() over(partition by col2 order by coltimestamp desc),*
From table_name
)
Select * from cte
Whwre rn=1;

SQL Server 2016 Compare values from multiple columns in multiple rows in single table

USE dev_db
GO
CREATE TABLE T1_VALS
(
[SITE_ID] [int] NULL,
[LATITUDE] [numeric](10, 6) NULL,
[UNIQUE_ID] [int] NULL,
[COLLECT_RANK] [int] NULL,
[CREATED_RANK] [int] NULL,
[UNIQUE_ID_RANK] [int] NULL,
[UPDATE_FLAG] [int] NULL
)
GO
INSERT INTO T1_VALS
(SITE_ID,LATITUDE,UNIQUE_ID,COLLECT_RANK,CREATED_RANK,UNIQUEID_RANK)
VALUES
(207442,40.900470,59664,1,1,1)
(207442,40.900280,61320,1,1,2)
(204314,40.245220,48685,1,2,2)
(204314,40.245910,59977,1,1,1)
(202416,39.449530,9295,1,1,2)
(202416,39.449680,62264,1,1,1)
I generated the COLLECT_RANK and CREATED_RANK columns from two date columns (not shown here) and the UNIQUEID_RANK column from the UNIQUE_ID which is used here.
I used a SELECT OVER clause with ranking function to generate these columns. A _RANK value of 1 means the latest date or greatest UNIQUE_ID value. I thought my solution would be pretty straight forward using these rank values via array and cursor processing but I seem to have painted myself into a corner.
My problem: I need to choose LONGITUDE value and its UNIQUE_ID based upon the following business rules and set the update value, (1), for that record in its UPDATE_FLAG column.
Select the record w/most recent Collection Date (i.e. RANK value = 1) for a given SITE_ID. If multiple records exist w/same Collection Date (i.e. same RANK value), select the record w/most recent Created Date (RANK value =1) for a given SITE_ID. If multiple records exist w/same Created Date, select the record w/highest Unique ID for a given SITE_ID (i.e. RANK value = 1).
Your suggestions would be most appreciated.
I think you can use top and order by:
select top 1 t1.*
from t1_vals
order by collect_rank asc, create_rank, unique_id desc;
If you want this for sites, which might be what your question is asking, then use row_number():
select t1.*
from (select t1.*,
row_number() over (partition by site_id order by collect_rank asc, create_rank, unique_id desc) as seqnum
from t1_vals
) t1
where seqnum = 1;

SQL Query for Outer Join with Group By

I have the Months Table with MonthName, MonthNumber and Fiscal Year starts with July so I have assigned the values to the months like
MonthName=July and MonthNumber=1
MonthName=August and MonthNumber=2.
I have another Domain table BudgetCategory and it has BudgetCategoryId, BudgetCategoryName.
The PurchaseOrder table has OrderID, PurchaseMonth, BudgetCategoryId.
Now I want the query to find out the Monthly Purchases SUM(TotalCost) for every BudgetCategory. If there are no purchases for any BudgetCategoryId I want to display the zero in report.
Schema of Table:
CREATE TABLE [dbo].[BudgetCategory](
[BudgetCategoryId] [numeric](18, 0) NOT NULL,
[BudgetCategoryName] [varchar](50) NULL,
[TotalBudget] [nvarchar](50) NULL)
CREATE TABLE [dbo].[PurchaseOrder](
[OrderId] [bigint] NOT NULL,
[BudgetCategoryId] [bigint] NULL,
[PurchaseMonth] [nvarchar](50) NULL,
[QTY] [bigint] NULL,
[CostPerItem] [decimal](10, 2) NULL,
[TotalCost] [decimal](10, 2) NULL)
CREATE TABLE [dbo].[MonthTable](
[MonthNumber] [bigint] NULL,
[MonthName] [nvarchar](30) NULL)
Try this:
select a.BudgetCategoryName,
ISNULL(c.MonthName,'No purchase') as Month,
sum(ISNULL(TotalCost,0)) as TotalCost
from #BudgetCategory a left join #PurchaseOrder b on a.BudgetCategoryId = b.BudgetCategoryId
left join #MonthTable c on b.PurchaseMonth = c.[MonthName]
group by a.BudgetCategoryName,c.MonthName
order by a.BudgetCategoryName
Tested with this data
INSERT #BudgetCategory
VALUES (1,'CategoryA',1000),
(2,'CategoryB',2000),
(3,'CategoryC',1500),
(4,'CategoryD',2000)
INSERT #PurchaseOrder (OrderId,BudgetCategoryId,TotalCost,PurchaseMonth)
VALUES (1,1,550,'July'),
(2,1,700,'July'),
(3,2,600,'August')
INSERT #MonthTable
VALUES
(1,'July'),
(2,'August')
It will produce this results:
Let me know if this could help you
SELECT b.*, m.MonthNumber, q.[BudgetCategoryId], q.[PurchaseMonth], ISNULL(q.[TotalCost],0)
FROM [dbo].[BudgetCategory] b
LEFT JOIN
(
SELECT [BudgetCategoryId], [PurchaseMonth], sum([TotalCost]) [TotalCost]
FROM [dbo].[PurchaseOrder] p
GROUP BY p.[BudgetCategoryId], [PurchaseMonth]
) q ON b.BudgetCategoryId = q.BudgetCategoryId
LEFT JOIN [dbo].[MonthTable] m ON q.[PurchaseMonth] = m.[MonthName]

SQL Server table design to define WHERE condition

I have an existing Stored procedure which has lots of hard-coding with IF conditions. The procedure checks the values of following input fields and displays relevant message: The fields are:
• BrandId
• ProductId
• SchemeId
• RegionId
The existing Message table:
MsgId MsgText
1 AAAA
2 BBBB
3 CCCC
4 MMMM
Existing stored proc. pseudo code:
IF(BrandId in (5,10))
IF(#ProductId in (5))
SELECT ‘BBBB’ as MsgText
END IF
END IF
IF(SchemeId in (1,5,10))
SELECT ‘AAAA’ as MsgText
IF(SchemeId =2 AND #RegionId=4)
SELECT ‘BBBB’ as MsgText
IF (#RegionId=6)
SELECT ‘MMMM’ as MsgText
In order to remove hard-coding and re-writing the procedure cleanly from scratch, I want to design new tables which will store "MsgId"s against a BrandId/ProdId/PlanId/SchemeId value or against a combination of these fields (e.g SchemeId =2 AND RegionId=4).With this kind of design I can directly fetch the relevant MsgId against a specific field or combination of fields.
Could anybody suggest table designs to meet the requirement?
Based on your responses to the comments, this might work out.
create table dbo.[Messages] (
MessageId int not null
, MessageText nvarchar(1024) not null
, constraint pk_Messages primary key clustered (MessageId)
);
insert into dbo.[Messages] (MessageId,MessageText) values
(1,'AAAA')
, (2,'BBBB')
, (13,'MMMM');
create table dbo.Messages_BrandProduct (
BrandId int not null
, ProductId int not null
, MessageId int not null
, constraint pk_Messages_BrandProduct primary key clustered
(BrandId, ProductId, MessageId)
);
insert into dbo.Messages_BrandProduct (BrandId, ProductId, MessageId) values
(5,5,2)
,(10,5,2);
create table dbo.Messages_SchemeRegion (
SchemeId int not null
, RegionId int not null
, MessageId int not null
, constraint pk_Messages_SchemeRegion primary key clustered
(SchemeId, RegionId, MessageId)
);
insert into dbo.Messages_SchemeRegion (SchemeId, RegionId, MessageId)
select SchemeId = 1, RegionId , MessageId = 1 from dbo.Regions
union all
select SchemeId = 5, RegionId , MessageId = 1 from dbo.Regions
union all
select SchemeId = 10, RegionId , MessageId = 1 from dbo.Regions
union all
select SchemeId = 2, RegionId = 4, MessageId = 2
union all
select SchemeId , RegionId = 6, MessageId = 13 from dbo.Schemes;
In your procedure you could pull the messages like this:
select MessageId
from dbo.Messages_BrandProduct mbp
inner join dbo.[Messages] m on mbp.MessageId=m.MessageId
where mbp.BrandId = #BrandId and mbp.ProductId = #ProductId
union -- union all if you don't need to deduplicate messages
select MessageId
from dbo.Messages_SchemeRegion msr
inner join dbo.[Messages] m on msr.MessageId=m.MessageId
where msr.SchemeId = #SchemeId and msr.RegionId = #RegionId;
This should do it.
CREATE TABLE [dbo].[IDs](
[BrandID] [int] NOT NULL,
[ProductID] [int] NOT NULL,
[SchemeID] [int] NOT NULL,
[RegionID] [int] NOT NULL,
[MsgID] [int] NOT NULL
)
You can adjust the table and column names as needed. Cheers.

SQL Query to return an item within range or nearest range

I have a table of ranges that looks like
CREATE TABLE [dbo].[WeightRange](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Description] [nvarchar](50) NULL,
[LowerBound] [decimal](18, 2) NULL,
[UpperBound] [decimal](18, 2) NULL,
[GroupID] [int] NULL
)
Given a weight and group id I need to find the matching (or nearest) range id.
Example
WeightRanges
1, 0-100kgs, 0, 100, 1
2, 101-250kgs, 101, 250, 1
3, 501-1000kgs, 501, 1000, 1
If the weight is 10 the it should return id 1, if the weight is 1500 it should return id 3, and if the weight is 255 it should return id 2. I have left the group out of the example for simplicity.
At this stage I don't really want to change the database design.
I'd use a CASE statement to create a column with the "distance", and then order by distance and take the first item.
Snippet which may help:
SELECT TOP 1 d.id
FROM (
SELECT id, CASE WHEN (#weight >= LowerBound)
AND (#weight <= UpperBound) THEN 0
WHEN (#weight < LowerBound) THEN LowerBound-#weight
WHEN (#weight > UpperBound) THEN #weight-UpperBound
END AS distance
FROM WeightRange
) d
WHERE d.distance IS NOT NULL
ORDER BY d.distance ASC
I think this stored function should to the trick - it uses a CTE (Common Table Expression) internally, so it'll work with SQL Server 2005 and up:
CREATE FUNCTION dbo.FindClosestID(#WeightValue DECIMAL(17,2))
RETURNS INT
AS BEGIN
DECLARE #ReturnID INT;
WITH WeightDistance AS
(
SELECT ID, ABS(Lowerbound - #WeightValue) 'Distance'
FROM WeightRange
UNION ALL
SELECT ID, ABS(upperbound - #WeightValue) 'Distance'
FROM WeightRange
)
SELECT TOP 1 #ReturnID = ID
FROM WeightDistance
ORDER BY Distance
RETURN #ReturnID
END
These queries will return the following values:
SELECT
dbo.FindClosestID(75.0),
dbo.FindClosestID(300.0),
dbo.FindClosestID(380.0),
dbo.FindClosestID(525.0),
dbo.FindClosestID(1500.0)
1 2 3 3 3
Marc

Resources