Multiple Column Duplicate Criteria - sql-server

I am using SQL Server. This is my sample data set:
IDNO| Consigment | SO_Number | Acc Number | OfficeNumber|PL9 |Remarks
--- | -----------| ----------| -----------| ------------|-------|-------
1 | AA12345MY | 1024450191| 8800400431 |B213 |W449401|Stay
2 | AA12345MY | 1024450192| 8800400431 |B213 |W449401|Remove
3 | BA12345MY | 1024460121| 8800400726 |K678 |W229790|Stay
4 | BA12345MY | 1024460124| 8800400726 |K678 |W229790|Remove
I want to put a remarks on row 2 and 4 as it is a duplicates.
Duplicate criteria must match these 4 columns:
Consigment
Acc Number
OfficeNumber
PL9
I am removing the youngest SO number (which one is the latest)
I haven't got a clue on how to start as I never found a perfect reference
Regards,
Fadlisham Fadzil

One approach here to create a CTE which labels duplicate records and then delete from that CTE:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Consigment, [Acc Number], OfficeNumber, PL9
ORDER BY SO_Number) rn
FROM yourTable
)
DELETE FROM cte
WHERE rn > 1;

Related

sql selection of one value from several identical

I have the result of executing a query. it collects data from several tables. he is such a:
|Name|date |number|Id
|alex|01-01-2021 |1111 | 1
|mike|01-01-2021 |2222 | 2
|alex|02-01-2021 |1111 | 3
|alex|03-01-2021 |1111 | 4
|john|04-01-2021 |3333 | 5
i need to get the following result:
|Name|date |number| Id
|mike|01-01-2021|2222 | 2
|alex|any value |1111 | Any value
|john|04-01-2021|3333 | 5
I need to select one of the repeated values and show it.I have a large query with many columns. here I gave only a short version to explain the essence of the problem
select Name,max(date) as date,number
from atable
group by Name, number
You may use this CTE and manage which date (first or last) you will get
WITH data AS (
SELECT
Name,
date,
number,
row_number() OVER (PARTITION BY Name ORDER BY date) AS row_num
FROM test01
)
SELECT
Name,
date,
number
FROM data
WHERE row_num = 1

How to repeat values in case of null values on left join

I have a table with a calendar, and a table with rates. In the table with the rates, there are no values existing for days in the weekend. I'm trying to join the two, in order to have a table where there is a rate for all days, and I need the rates in the weekend to be the latest available rate. Instad of it showing NULL values, as it would when you make a left join and the record doesn't exist, it should just take the latest available, repeating the previous value.
I have the below code, which works, but it takes 2 min to do on 7,397 rows, which is way too long.
Does anyone know a faster way to get the same results?
SELECT
c.CalendarID,
MAX(r.RateID)
FROM Dim_Calendar c
LEFT JOIN Dim_Rates r ON r.RateDate <= c.CalendarID
What I get without <= and just an = is the following
CalendarID | RateID
20131001 | 2
20131002 | 3
20131003 | 4
20131004 | 5
20131005 | NULL
20131006 | NULL
20131007 | 6
And this is the desired table:
CalendarID | RateID
20131001 | 2
20131002 | 3
20131003 | 4
20131004 | 5
20131005 | 5
20131006 | 5
20131007 | 6
You can use LAG() window function:
SELECT c.CalendarID,
COALESCE(
r.RateID,
LAG(r.RateID, 1) OVER (ORDER BY c.CalendarID),
LAG(r.RateID, 2) OVER (ORDER BY c.CalendarID)
) RateID
FROM Dim_Calendar c LEFT JOIN Dim_Rates r
ON r.RateDate = c.CalendarID
ORDER BY c.CalendarID
See the demo.
Results:
> CalendarID | RateID
> ---------: | :-----
> 20131001 | 2
> 20131002 | 3
> 20131003 | 4
> 20131004 | 5
> 20131005 | 5
> 20131006 | 5
> 20131007 | 6
You could use a correlated subquery to fill the gaps:
SELECT
c.CalendarID,
(SELECT TOP 1 r.RateID FROM Dim_Rates r
WHERE r.RateDate <= c.CalendarID AND r.RateID IS NOT NULL
ORDER BY r.RateDate DESC) AS RateID
FROM Dim_Calendar c
ORDER BY c.CalendarID;
This query can be improved by using the following index:
CREATE INDEX idx ON Dim_Rates (RateDate, RateID);
As pointed out, you need to check for proper and covering indexing. It appears you are running a against a DW DB and if that is the case then you can replace the CTE with indexed temp tables if the esitmated row count approximation is way off in the query plan.
;WITH NormalizedData AS
(
SELECT
RateID,CalendarID,
VirtualGroupID = SUM(LastRecordBeforeGap) OVER (ORDER BY CalendarID ROWS UNBOUNDED PRECEDING)
FROM
(
SELECT RateID,CalendarID,
LastRecordBeforeGap = CASE WHEN LEAD(RateID) OVER(ORDER BY CalendarID) IS NULL AND RateID IS NOT NULL THEN 1 ELSE 0 END
FROM
Dim_Calendar c
LEFT JOIN Dim_Rates r ON r.RateDate = c.CalendarID
)AS x
)
SELECT
RateID = ISNULL(RateID, SUM(RateID) OVER(PARTITION BY VirtualGroupID)),
CalendarID
FROM
NormalizedData

SQL: doing several joins to evaluate different columns

I'm quite new to SQL but use it a lot now in my work now (Microsoft SQL Server).
So the issue is this: I collect data that is atypical for a certain column.
Let's say I got different Burgers and they should have a standardized calories value. So I did this with a query
------------------------------------------
| Burger | calories | numBurgers | Rank |
------------------------------------------
| Chicken| 600 | 20 | 1 |
| Chicken| 400 | 3 | 2 |
| Beef | 700 | 35 | 1 |
| Beef | 850 | 4 | 2 |
-------------------------------------------
To get a list of all the "wrong" burgers I use a temporary table and filter out GroupRank = 1
USE database;
GO
WITH GapRanking AS
(
SELECT TOP 100 PERCENT Burger, calories, COUNT(calories),
ROW_NUMBER() OVER(PARTITION BY Burger ORDER BY COUNT(calories) DESC) AS Rank
)
SELECT * FROM GapRanking
WHERE Rank <> 1
...
I get all combinations of Burgers and calories that are not "standard"
Then I do an Inner Join with the original table and all columns on the one above.
SELECT * FROM BaseTable as base
INNER JOIN
(SELECT * FROM GapRanking
WHERE Rank <> 1) AS err
ON (base.Burgers = err.Burgers
AND base.calories = err.calories)
This way I get a table with complete information about the "not-standard" burgers. So far so good.
Now I want to add other rows where there is a deviation in another criteria, price for example, not just calories and add it to the list if its not already there.
So I thought of UNION or JOIN.
So what is the best approach. UNION the above query with the same query just different column (price instead of calories)?
Or do a JOIN with the same query just different column (price instead of calories)?
The code gets quite "ugly" and I'm not sure if I do the right approach here.
Also because of me using the temporary table using WITH a UNION does not seem possible so easily.
I'm really glad for any ideas here. Cheers
use sub-query and join below is just sudo-code not actual you can follow like this way
select t1.*, t2.required_colum
(SELECT TOP 100 PERCENT Burger, calories, COUNT(calories),
ROW_NUMBER() OVER(PARTITION BY Burger ORDER BY COUNT(calories) DESC) AS Rank
) as t1
join
(SELECT TOP 100 PERCENT Burger, calories, COUNT(calories),
ROW_NUMBER() OVER(PARTITION BY Burger ORDER BY COUNT(calories) DESC) AS Rank
) as t2
on t1.colname = t2.colname
where t1.Rank != 1 and t2.Rank != 1

TSQL Conditional Where or Group By?

I have a table like the following:
id | type | duedate
-------------------------
1 | original | 01/01/2017
1 | revised | 02/01/2017
2 | original | 03/01/2017
3 | original | 10/01/2017
3 | revised | 09/01/2017
Where there may be either one or two rows for each id. If there are two rows with same id, there would be one with type='original' and one with type='revised'. If there is one row for the id, type will always be 'original'.
What I want as a result are all the rows where type='revised', but if there is only one row for a particular id (thus type='original') then I want to include that row too. So desired output for the above would be:
id | type | duedate
1 | revised | 02/01/2017
2 | original | 03/01/2017
3 | revised | 09/01/2017
I do not know how to construct a WHERE clause that conditionally checks whether there are 1 or 2 rows for a given id, nor am I sure how to use GROUP BY because the revised date could be greater than or less than than the original date so use of aggregate functions MAX or MIN don't work. I thought about using CASE somehow, but also do not know how to construct a conditional that chooses between two different rows of data (if there are two rows) and display one of them rather than the other.
Any suggested approaches would be appreciated.
Thanks!
you can use row number for this.
WITH T AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Type DESC) AS RN
FROM YourTable
)
SELECT *
FROM T
WHERE RN = 1
Is something like this sufficient?
SELECT *
FROM mytable m1
WHERE type='revised'
or 1=(SELECT COUNT(*) FROM mytable m2 WHERE m2.id=m1.id)
You could use a subquery to take the MAX([type]). In this case it works for [type] since alphabetically we want revised first, then original and "r" comes after "o" in the alphabet. We can then INNER JOIN back on the same table with the matching conditions.
SELECT T2.*
FROM (
SELECT id, MAX([type]) AS [MAXtype]
FROM myTABLE
GROUP BY id
) AS dT INNER JOIN myTable T2 ON dT.id = T2.id AND dT.[MAXtype] = T2.[type]
ORDER BY T2.[id]
Gives output:
id type duedate
1 revised 2017-02-01
2 original 2017-03-01
3 revised 2017-09-01
Here is the sqlfiddle: http://sqlfiddle.com/#!6/14121f/6/0

Netezza: Show dates even if 0 data for that day

I have this query through an odbc connection in excel for a refreshable report with data for every 4 weeks. I need to show the dates in each of the 4 weeks even if there is no data for that day because this data is then linked to a Graph. Is there a way to do this?
thanks.
Select b.INV_DT, sum( a.ORD_QTY) as Ordered, sum( a.SHIPPED_QTY) as Shipped
from fct_dly_invoice_detail a, fct_dly_invoice_header b, dim_invoice_customer c
where a.INV_HDR_SK = b.INV_HDR_SK
and b.DIM_INV_CUST_SK = c.DIM_INV_CUST_SK
and a.SRC_SYS_CD = 'ABC'
and a.NDC_NBR is not null
**and b.inv_dt between CURRENT_DATE - 16 and CURRENT_DATE**
and b.store_nbr in (2851, 2963, 3249, 3385, 3447, 3591, 3727, 4065, 4102, 4289, 4376, 4793, 5209, 5266, 5312, 5453, 5569, 5575, 5892, 6534, 6571, 7110, 9057, 9262, 9652, 9742, 10373, 12392, 12739, 13870
)
group by 1
The general purpose solution to this is to create a date dimension table, and then perform an outer join to that date dimension table on the INV_DT column.
There are tons of good resources you can search for on creating a good date dimension table, so I'll just create a quick and dirty (and trivial) example here. I highly recommend some research in that area if you'll be doing a lot of BI/reporting.
If our table we want to report from looks like this:
Table "TABLEZ"
Attribute | Type | Modifier | Default Value
-----------+--------+----------+---------------
AMOUNT | BIGINT | |
INV_DT | DATE | |
Distributed on random: (round-robin)
select * from tablez order by inv_dt
AMOUNT | INV_DT
--------+------------
1 | 2015-04-04
1 | 2015-04-04
1 | 2015-04-06
1 | 2015-04-06
(4 rows)
and our report looks like this:
SELECT inv_dt,
SUM(amount)
FROM tablez
WHERE inv_dt BETWEEN CURRENT_DATE - 5 AND CURRENT_DATE
GROUP BY inv_dt;
INV_DT | SUM
------------+-----
2015-04-04 | 2
2015-04-06 | 2
(2 rows)
We can create a date dimension table that contains a row for every date (or ate last 1024 days in the past and 1024 days in the future using the _v_vector_idx view in this example).
create table date_dim (date_dt date);
insert into date_dim select current_date - idx from _v_vector_idx;
insert into date_dim select current_date + idx +1 from _v_vector_idx;
Then our query would look like this:
SELECT d.date_dt,
SUM(amount)
FROM tablez a
RIGHT OUTER JOIN date_dim d
ON a.inv_dt = d.date_dt
WHERE d.date_dt BETWEEN CURRENT_DATE -5 AND CURRENT_DATE
GROUP BY d.date_dt;
DATE_DT | SUM
------------+-----
2015-04-01 |
2015-04-02 |
2015-04-03 |
2015-04-04 | 2
2015-04-05 |
2015-04-06 | 2
(6 rows)
If you actually needed a zero value instead of a NULL for the days where you had no data, you could use a COALESCE or NVL like this:
SELECT d.date_dt,
COALESCE(SUM(amount),0)
FROM tablez a
RIGHT OUTER JOIN date_dim d
ON a.inv_dt = d.date_dt
WHERE d.date_dt BETWEEN CURRENT_DATE -5 AND CURRENT_DATE
GROUP BY d.date_dt;
DATE_DT | COALESCE
------------+----------
2015-04-01 | 0
2015-04-02 | 0
2015-04-03 | 0
2015-04-04 | 2
2015-04-05 | 0
2015-04-06 | 2
(6 rows)
I agree with #ScottMcG that you need to get the list of dates. However if you are in a situation where you aren't allowed to create a table. You can simplify things. All you need is a table that has at least 28 rows. Using your example, this should work.
select date_list.dt_nm, nvl(results.Ordered,0) as Ordered, nvl(results.Shipped,0) as Shipped
from
(select row_number() over(order by sub.arb_nbr)+ (current_date -28) as dt_nm
from (select rowid as arb_nbr
from fct_dly_invoice_detail b
limit 28) sub ) date_list left outer join
( Select b.INV_DT, sum( a.ORD_QTY) as Ordered, sum( a.SHIPPED_QTY) as Shipped
from fct_dly_invoice_detail a inner join
fct_dly_invoice_header b
on a.INV_HDR_SK = b.INV_HDR_SK
and a.SRC_SYS_CD = 'ABC'
and a.NDC_NBR is not null
**and b.inv_dt between CURRENT_DATE - 16 and CURRENT_DATE**
and b.store_nbr in (2851, 2963, 3249, 3385, 3447, 3591, 3727, 4065, 4102, 4289, 4376, 4793, 5209, 5266, 5312, 5453, 5569, 5575, 5892, 6534, 6571, 7110, 9057, 9262, 9652, 9742, 10373, 12392, 12739, 13870)
inner join
dim_invoice_customer c
on b.DIM_INV_CUST_SK = c.DIM_INV_CUST_SK
group by 1 ) results
on date_list.dt_nm = results.inv_dt

Resources