Merging duplicated rows in sql table

Merging duplicated rows in sql table - sql-server

FirstDate LastDate BaseCUR ConvertedCUR RATE
20070501 20070531 USD........EUR.................1.369748
20070601 20070615 USD........EUR.................1.354772
20070616 20070702 USD........EUR.................1.354772
20070703 20070727 USD........EUR.................1.343621
20070728 20070731 USD........EUR.................1.343621
20070801 20070831 USD........EUR.................1.376050
20070901 20071002 USD........EUR.................1.369748
Here is sample of my database, my task is to merge the 'FirstDate' and 'LastDate' of the following row(s) with the same 'RATE' into a single row.
I've tried to use ROW_NUMBER() to group those duplicate 'RATE' but it also group those rows which are not adjacent to others, so yet I can't merge all of them.
Is it possible for implementing this query without using any if-else or while-loop?
The result must look like:
FirstDate LastDate BaseCUR ConvertedCUR RATE
20070501 20070531 USD........EUR........1.369748
20070601 20070702 USD........EUR........1.354772<<< the dates are merged
20070703 20070731 USD........EUR........1.343621<<< the dates are merged
20070801 20070831 USD........EUR........1.376050
20070901 20071002 USD........EUR........1.369748
Any solutions or directions would be greatly appreciated, thank you in advance.

You can use the following query:
SELECT MIN(FirstDate) AS FirstDate, MAX(LastDate) AS LastDate,
BaseCUR, ConvertedCUR, RATE
FROM (
SELECT *,
ROW_NUMBER() OVER (ORDER BY FirstDate) -
RANK() OVER (PARTITION BY RATE ORDER BY FirstDate) AS rnk
FROM mytable ) t
GROUP BY BaseCUR, ConvertedCUR, RATE, rnk
ORDER BY FirstDate
GROUP BY RATE, rnk identifies islands of successive RATE values. Using MIN, MAX functions we can calculate the starting and ending dates of each of those islands.
Demo here

I would solve this in two steps. Here is pseudo code for the approach I would take:
UPDATE the table, setting FirstDate=MIN(FirstDate) and LastDate=MAX(LastDate) and grouping by all the other columns.
Use ROW_NUMBER to eliminate all duplicate rows.

I'm supposing MERGE will be get the earliest FirsDate and the latest LastDate for the same RATE, BaseCUR and ConvertedCUR.
If that is the case:
SELECT MIN (FirstDate), MAX (LastDate), BaseCUR, ConvertedCUR, RATE FROM YourTable GROUP BY BaseCUR, ConvertedCUR, RATE;
P.S. Please excuse the format, I'm using mobile version on a going jetty.

Related

How can I retrieve "exception" data from a table without knowing the data in advance?

I have a table that updates all the time.
The table maintains a list that links stores to clubs, and manages, among other things, "discount percentages" per store + club.
Table name: Policy_supplier
Column: POLXSUP_DISCOUNT
Suppose all the "vendors" in the table are marked with a 10% discount.
And someone accidentally signs one vendor with 8% or 15% (or even NULL)
How do I generate a query to retrieve the "abnormal" vendor?

You can find the mode of your discounts and then just pick out the records that aren't equal to that mode:
WITH mode_discount AS (SELECT TOP 1 POLXSUP_DISCOUNT FROM table GROUP BY POLXSUP_DISCOUNT ORDER BY count(*) DESC)
SELECT * FROM table WHERE POLXSUP_DISCOUNT <> (SELECT POLSXUP_DISCOUNT FROM mode_discount);

You can use the OVER clause with aggregates to calculate an aggregate over a data range and include it in the results. For example,
SELECT avg(POLXSUP_DISCOUNT)
from Policy_supplier
Would return a single average value while
SELECT POLXSUP_DISCOUNT, avg(POLXSUP_DISCOUNT) OVER()
from Policy_supplier
Would return the overall average in each row. Typically OVER is used with a PARTITION BY clause. If you wanted the average per supplier you could have written AVG() OVER(PARTITION BY supplierID).
To find anomalies, you should use one of the PERCENTILE functions, eg PERCENTILE_CONT. For example
select PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY POLXSUP_DISCOUNT) over()
from Policy_Supplier
Will return a discount value below which you'll find 95% of the records. The other 5% of discounts that are above this are probably anomalies.
Similarly, PERCENTILE_CONT(0.05) will return a discount below which you'll find 5% of the records
You can combine both to find potentially exceptional records, eg:
with percentiles as (
select ID,
POLXSUP_DISCOUNT,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY POLXSUP_DISCOUNT) over() as pct95,
PERCENTILE_CONT(0.05) WITHIN GROUP (ORDER BY POLXSUP_DISCOUNT) over() as pct05,
from Policy_Supplier)
select ID,POLXSUP_DISCOUNT
from percentiles
where POLXSUP_DISCOUNT>pct95 or POLXSUP_DISCOUNT<pct05

How to get multiple average values using subqueries

There are many accountants and each of them has jobs (paid by the hour) and I need to get the accountant name of every accountant who has an average job cost higher than the overall average of job costs. How do I do this?
SELECT Accountant_Name, AVG(job_cost) as 'Average'
FROM job_view
WHERE Average > (SELECT AVG (job_cost) AS AV
FROM job_view)
GROUP BY Accountant_Name;
Everything needed is in a view named the job_view. The above code is not working any help on modifications would be appreciated. Thanks in advance.

This should do it for you:
SELECT Accountant_Name
, AVG(Job_Cost) as 'Average'
FROM Job_View
GROUP BY Accountant_Name
HAVING AVG(Job_Cost) > (SELECT AVG(Job_Cost) FROM Job_View)
As per your comment, the error you're getting at WHERE Average > is because the alias Average is not visible in a WHERE clause and usually requires you to put the entire contents of the column just as you defined it in the SELECT part.
But because in the SELECT part the Average column is a aggregate function, these can only go lower in the HAVING section, because HAVING handles filtering of aggregate conditions.
Why all this? Because there are rules for order of execution of statements in a query, as explained here.

You'll still need to Group by Accountant_Name
SELECT Accountant_Name, AVG(job_cost) as 'Average'
FROM job_view
GROUP BY Accountant_Name
Having AVG(job_cost) > (SELECT AVG (job_cost) FROM job_view);

SQL Get Second Record

I am looking to retrieve only the second (duplicate) record from a data set. For example in the following picture:
Inside the UnitID column there is two separate records for 105. I only want the returned data set to return the second 105 record. Additionally, I want this query to return the second record for all duplicates, not just 105.
I have tried everything I can think of, albeit I am not that experience, and I cannot figure it out. Any help would be greatly appreciated.

You need to use GROUP BY for this.
Here's an example: (I can't read your first column name, so I'm calling it JobUnitK
SELECT MAX(JobUnitK), Unit
FROM JobUnits
WHERE DispatchDate = 'oct 4, 2015'
GROUP BY Unit
HAVING COUNT(*) > 1
I'm assuming JobUnitK is your ordering/id field. If it's not, just replace MAX(JobUnitK) with MAX(FieldIOrderWith).

Use RANK function. Rank the rows OVER PARTITION BY UnitId and pick the rows with rank 2 .
For reference -
https://msdn.microsoft.com/en-IN/library/ms176102.aspx

Assuming SQL Server 2005 and up, you can use the Row_Number windowing function:
WITH DupeCalc AS (
SELECT
DupID = Row_Number() OVER (PARTITION BY UnitID, ORDER BY JobUnitKeyID),
*
FROM JobUnits
WHERE DispatchDate = '20151004'
ORDER BY UnitID Desc
)
SELECT *
FROM DupeCalc
WHERE DupID >= 2
;
This is better than a solution that uses Max(JobUnitKeyID) for multiple reasons:
There could be more than one duplicate, in which case using Min(JobUnitKeyID) in conjunction with UnitID to join back on the UnitID where the JobUnitKeyID <> MinJobUnitKeyID` is required.
Except, using Min or Max requires you to join back to the same data (which will be inherently slower).
If the ordering key you use turns out to be non-unique, you won't be able to pull the right number of rows with either one.
If the ordering key consists of multiple columns, the query using Min or Max explodes in complexity.

Compute sum for distinct order numbers in ssrs report

I'm using a SQL Server 2008R2 Database and SSRS Report Builder 3.0
Trying to compute the sum of the amount owed for each order id (need to show the itemids)...but when I do, the amount owed is showing 400 (instead of 200 - line 4, 100 instead of 50 in line 7, line 9 is correct. As a result the Total line is way off)
=Sum(Fields!owe.Value)
The report is grouped by the campus.
I understand that ssrs is probably not the best place to do this computation but I don't know how to do outside of ssrs....I tried distinct and group by so far with no results.
Below is how I need the report to show like....
Thanks in advance.
Incorrect amounts are
Another example as it should display the subtotals

I would modify the SQL to produce an extra column just for purposes of summing the Owe on an OrderId. Use the Row Number to get the first item in each order, and only supply the Owe value for that item for each order:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY OrderId ORDER BY ItemId) AS rn
FROM MyTable
WHERE (whatever filters you use)
)
SELECT *,
CASE WHEN rn=1 THEN Owe ELSE 0 END AS OrderOwe
FROM cte
ORDER BY Campus, CustomerId, OrderId, ItemId
Then simply change the expression for the "Owe" textbox in your SubTotal row to this:
=Sum(Fields!OrderOwe.Value)
And you will get the sum of the Owe per order instead of per item.

Well if your owe is always the same for each item in the group you could add a Sum/Count of the item in the group which would give you the correct results in all the cases above.

SQL Query to determine VAT rate

I'm looking to create a 3 column VAT_Parameter table with the following columns:
VATID, VATRate, EffectiveDate
However, I can't get my head around how I would identify which vat rate applies to an invoice date.
for example if the table was populated with:
1, 17.5, 1/4/1991
2, 15, 1/1/2009
3, 20, 4/1/2011
Say for example I have an invoice dated 4/5/2010, how would an SQL query select the correct VAT rate for that date?

select top 1 *
from VatRate
where EffectiveDate<=#InvoiceDate
order by EffectiveDate desc
Or, with a table of invoices
select id, invoicedate, rate
from
(
select
inv.id, inv.invoicedate, vatrate.rate, ROW_NUMBER() over (partition by inv.id order by vatrate.effectivedate desc) rn
from inv
inner join vatrate
on inv.invoicedate>=vatrate.effectivedate
) v
where rn = 1
PS. The rules for the rate of VAT to be charged when the rate changes are more complicated than just the invoice date. For example, the date of supply also matters.

I've run into this kind of thing before. There are two choices I can think of:
1. Expand the table to have two dates: EffectiveFrom and EffectiveTo. (You'll have to have a convention about whether each of these is exclusive or inclusive - but that's always a problem when using dates). This raises the problem of validating that the table population, as a whole, makes sense. e.g. that you don't end up with one row with Rate1 effective from 1/1/2000-1/1/2002, and another (overlapping) with Rate2 effective from 30/10/2001-1/1/2003. Or an uncovered gap in time, where no rate applies. Since this sounds like a very slowly-changing table, populated occasionally (by people who know what they're doing?), this could be the best solution. The SQL to get the effective rate would then be simple:
SELECT VATRate FROM VATTable WHERE (EffectiveFrom<=[YourInvoiceDate]) AND (EffectiveTo>=[YourInvoiceDate])
or
2. Use your existing table structure, and use some slightly more complicated SQL to determine the effective rate for an invoice.
Using your existing structure, something like this would work:
SELECT VATTAble.VATRate FROM
VATTable
INNER JOIN
(SELECT Max(EffectiveDate) AS LatestDate FROM VATTable WHERE EffectiveDate<=
YourInvoiceDate) latest
ON VATTable.EffectiveDate=latest.LatestDate

An easier option may just be to denormalise your data structure and store the VAT rate in the invoice table itself.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Merging duplicated rows in sql table - sql-server

I would solve this in two steps. Here is pseudo code for the approach I would take: UPDATE the table, setting FirstDate=MIN(FirstDate) and LastDate=MAX(LastDate) and grouping by all the other columns. Use ROW_NUMBER to eliminate all duplicate rows.

Related

How can I retrieve "exception" data from a table without knowing the data in advance?

How to get multiple average values using subqueries

SQL Get Second Record

Compute sum for distinct order numbers in ssrs report

SQL Query to determine VAT rate

Categories

Resources