Unable to remove duplicates in SQL Query with JOIN and DISTINCT - sql-server

I have a sort of abstract question with a real world example. I'm attempting to run a query that has an issue with the tables I am joining.
In my first draft of the query, if I add a Distinct and only have the one Inner Join needed, I sum up values that are correct.
The values I yield needed to be broken into 4 other totals depended on certain values. When I add the table in my query that has those values and add it to my join or where clause, it takes those totals and sums up each iteration of the value with the corresponding value.
My Query:
SELECT DISTINCT SUM(CASE WHEN Tax_Records.TaxValue = '0.06' THEN Bill_Summary.NonSalesTax
WHEN Tax_Records.TaxValue = '0.065' THEN Bill_Summary.NonSalesTax
WHEN Tax_Records.TaxValue = '0.07' THEN Bill_Summary.NonSalesTax
WHEN Tax_Records.TaxValue = '0.075' THEN Bill_Summary.NonSalesTax ELSE 0.0 END)
AS 'UnTaxable Sales'
FROM Order_Records INNER JOIN Bill_Summary ON Order_Records.RowNum = Bill_Summary.OrderNumID
LEFT JOIN Tax_Records ON Order_Records.OZipCode = Tax_Records.tZipCode
WHERE Order_Records.Date Between 'DATE' And 'DATE'
AND Order_Records.cState = 'state'
GROUP BY Tax_Records.TaxValue
My query runs correctly, but I get the wrong totals, if I remove the LEFT JOIN and it's corresponding items in the SELECT Statement i get the correct totals.
The Tax_Records table has no relation to any other table in the database so I know putting that in the Join will cause issues.
I changed my query to see why I'm getting the incorrect totals and it's because it will sum up a value depening on the cases on my select.
For instance there's an Bill_Summary with a value of 5, it will sum up 5 4 times, 1 for each tax value. So I know why it would do that, but I want to know how i can add the information from the Tax Table to my query to derive the 4 values from my original correct totals.
I've tried different JOINS, embedded SELECTs, and CTE's but nothing works correctly.
EDIT: All this data is coming from order's placed by customers.
What we want to see is the total value of Tax Collected from a certain State Tax in a period of 1 month. So for the month of March 1st to April 1st.
All the sales charged with a 6% Tax Rate Equals $50.
All the sales charged with a 6.5% Tax Rate Equals $65.
All the sales charged with a 7% Tax Rate equals $20.
All the Sales charged with a 7.5% Tax Rate equals $15.
If I run a query without joining the Tax_Records table, I get my correct total of $145.
No I want to show the total broken up into the 4 values as shown earlier by combining the Zip Codes found in the Order_Records table with the Zip Codes in the Tax_Records table.
What happens if I do that is let's say for the 7.5% Value, the total of those sales are $15. Where one sale was $8 and another $7, if I join the Tax_Records table, it runs the query to show that the total number of tax collected from the sales is $8 for 6%, 6.5%, 7%, and 7.5% same thing for the $7 order which then now shows my total for 7.5% to be $60 as opposed to $15 which it should be.

You can try like this
select * from demo;
+------+-------+
| id | des |
+------+-------+
| 1 | afgg |
| 2 | aaaaa |
+------+-------+
select * from test;
+------+---------+
| id | name |
+------+---------+
| 2 | aaaaa |
| 1 | assdasa |
+------+---------+
select id as id,des as description,'' as id,'' as name from demo UNION select '' as id ,''as description,id as id,name as name from test;
+------+-------------+------+---------+
| id | description | id | name |
+------+-------------+------+---------+
| 1 | afgg | | |
| 2 | aaaaa | | |
| | | 2 | aaaaa |
| | | 1 | assdasa |
+------+-------------+------+---------+
4 rows in set (0.00 sec)

Related

How to efficiently match on dates in SQL Server?

I am trying to return the first registration for a person based on the minimum registration date and then return full information. The data looks something like this:
Warehouse_ID SourceID firstName lastName firstProgramSource firstProgramName firstProgramCreatedDate totalPaid totalRegistrations
12345 1 Max Smith League Kid Hockey 2017-06-06 $100 3
12345 6 Max Smith Activity Figure Skating 2018-09-26 $35 1
The end goal is to return one row per person that looks like this:
Warehouse_ID SourceID firstName lastName firstProgramSource firstProgramName firstProgramCreatedDate totalPaid totalRegistrations
12345 1 Max Smith League Kid Hockey 2017-06-06 $135 4
So, this would aggregate the totalPaid and totalRegistrations variables based on the Warehouse_ID and would pull the rest of the information based on the min(firstProgramCreatedDate) specific to the Warehouse_ID.
This will end up in Tableau, so what I've recently tried ignores aggregating totalPaid and totalRegistrations for now (I can get that in another query pretty easily). The query I'm using seems to work, but it is taking forever to run; it seems to be going row by row for >50,000 rows, which is taking forever.
select M.*
from (
select Warehouse_ID, min(FirstProgramCreatedDate) First
from vw_FirstRegistration
group by Warehouse_ID
) B
left join vw_FirstRegistration M on B.Warehouse_ID = M.Warehouse_ID
where B.First in (M.FirstProgramCreatedDate)
order by B.Warehouse_ID
Any advice on how I can achieve my goal without this query taking an hour plus to run?
A combination of the ROW_NUMBER windowing function, plus the OVER clause on a SUM expression should perform pretty well.
Here's the query:
SELECT TOP (1) WITH TIES
v.Warehouse_ID
,v.SourceID
,v.firstName
,v.lastName
,v.firstProgramSource
,v.firstProgramName
,v.firstProgramCreatedDate
,SUM(v.totalPaid) OVER (PARTITION BY v.Warehouse_ID) AS totalPaid
,SUM(v.totalRegistrations) OVER (PARTITION BY v.Warehouse_ID) AS totalRegistrations
FROM
#vw_FirstRegistration AS v
ORDER BY
ROW_NUMBER() OVER (PARTITION BY v.Warehouse_ID
ORDER BY CASE WHEN v.firstProgramCreatedDate IS NULL THEN 1 ELSE 0 END,
v.firstProgramCreatedDate)
And here's a Rextester demo: https://rextester.com/GNOB14793
Results (I added another kid...):
+--------------+----------+-----------+----------+--------------------+------------------+-------------------------+-----------+--------------------+
| Warehouse_ID | SourceID | firstName | lastName | firstProgramSource | firstProgramName | firstProgramCreatedDate | totalPaid | totalRegistrations |
+--------------+----------+-----------+----------+--------------------+------------------+-------------------------+-----------+--------------------+
| 12345 | 1 | Max | Smith | League | Kid Hockey | 2017-06-06 | 135.00 | 4 |
| 12346 | 6 | Joe | Jones | Activity | Other Activity | 2017-09-26 | 125.00 | 4 |
+--------------+----------+-----------+----------+--------------------+------------------+-------------------------+-----------+--------------------+
EDIT: Changed the ORDER BY based on comments.
Try to use ROW_NUMBER() with PARTITIYION BY.
For more information please refer to:
https://learn.microsoft.com/en-us/sql/t-sql/functions/row-number-transact-sql?view=sql-server-2017

SQL command tallies totals into 2nd table

I have a SQL command that SUMS up incidents from TableA and imports the totals into TableB. Then another command that calculates the totals from B and INSERTS INTO TableC. Is it possible to include in TableC the names of those that have the recorded incidents? (Right now it only SUMS up totals and reports as a whole with no names)
I'll give some examples:
TableB
Day 1
Name | Incidents
Tim | 1
Frank | 2
Jay | 1
Day 2
Name | incidents
Tim | 1
Frank | 1
Jay | 1
TableC
Name | Incidents
Tim | 2
Frank | 3
Jay | 2
TableC continues to record data while TableB will be dropped and re recorded daily.
Here is the SQL command to fill TableB:
SELECT [Name], SUM(TableAColumnA) AS TableBColumnB INTO TableB FROM TableA GROUP BY [Name]
Here is the SQL I've tried to populate TableC:
INSERT INTO TableC(ImportDate, DayofData, Name, ColumnBTalbeB)
SELECT GETDATE() AS ImportDate, DATEADD(day, -1, GETDATE()) AS DayofData,
(SELECT SUM(ColumnBTableB) FROM TableB);
What this does is give NULL value to Name and calculate all incidents recorded in TableB.ColumnB. I basically need to show the names of those that had contributed to the total of incidents into TableC. TableC looks like this:
TableC
Name | Incidents | ImportDate | DayofData
NULL | 4 | today's date/time | yesterday's date/time
Was hoping to do something like this.
TableC
Name | incidents | totalincidents | importdate | dayofdata
Tim | 1 | 4 | today's date/tome | yesterday's date/time
Is this possible or do I need to have it calculate into a whole separate table entirely? or just wishful thinking gone too far?
If you could do without TotalIncidents, you would use GROUP BY:
INSERT INTO TableC(ImportDate, DayofData, Name, Incidents)
SELECT GETDATE() AS ImportDate, DATEADD(day, -1, GETDATE()) AS DayofData, Name, Incidents
FROM (SELECT Name, SUM(ColumnBTableB) AS Incidents
FROM TableB
GROUP BY Name);
Since TotalIncidents can be obtained from other data by query:
SELECT SUM(Incidents) AS TotalIncidents
FROM TableC
WHERE DayOfData BETWEEN CONVERT(datetime, '1/24/2016', 101)
AND CONVERT(datetime, '1/25/2016', 101);
Do you really need to store TotalIncidents as a column? It just adds complexity.

Find the top ranked unique item for each grouping in a set

Given the following dataset which contains a series of products for a customer, along with a number of related products for each, I want to pick the top ranked unique Related Product ID for each of the Product IDs.
Sample Data
This table shows what the data looks like for a single Customer. There will be multiple Customers.
The items selected in yellow are an example of what the results would look like for this example Customer ID.
So, a single Product ID may have multiple Related Product IDs. For a single customer with, say 6 Product IDs, I want to return the top ranked Related Product ID for each individual Product ID.
Rules
The catch is, that I want to eliminate duplication as much as possible. So if the same Related Product ID is the top ranked for more than one Product ID, the selection should move down to the next highest ranked Related Product ID.
The goal is to, where possible, provide a unique (within each Customer ID) Related Product ID for each Product ID.
Where it is not possible for a unique Related Product ID to be selected (because there are only duplicate Related Product IDs available), then the top ranked should be selected.
Results
For Product 2, the Related Product ID 23194 is the highest ranked, but it is not unique, so is skipped in favour of 23287. For Product 4, we could use either 23194 or 23300, but because neither is unique, we take the highest ranked item.
I've tried doing this using a recursive CTE, but this will iterate through the items and allocate the Related Product on the first Products before finding out if the Related Products are repeated later in the set.
How else can I approach this?
You can use ROW_NUMBER and COUNT OVER():
SQL Fiddle
;WITH Cte AS(
SELECT *,
RN = (RelatedProductRanking + COUNT(*) OVER(PARTITION BY ProductID)) *
COUNT(*) OVER(PARTITION BY RelatedProductID)
FROM tbl
),
CteRnk AS(
SELECT *,
RNK = ROW_NUMBER() OVER(PARTITION BY ProductID ORDER BY RN)
FROM Cte
)
SELECT
CustomerID, ProductRanking, ProductID, RelatedProductRanking, RelatedProductID
FROM CteRnk
WHERE RNK = 1
ORDER BY ProductRanking, RelatedProductRanking
RESULT
| CustomerID | ProductRanking | ProductID | RelatedProductRanking | RelatedProductID |
|------------|----------------|-----------|-----------------------|------------------|
| 12436 | 1 | 14553 | 1 | 14481 |
| 12436 | 2 | 33017 | 2 | 23287 |
| 12436 | 3 | 14203 | 1 | 14289 |
| 12436 | 4 | 23038 | 1 | 23194 |
| 12436 | 5 | 15120 | 1 | 14520 |
| 12436 | 6 | 23014 | 1 | 23300 |

Finding Location of Duplicate Column [duplicate]

I have a table
+--------+--------+--------+--------+--------+
| Market | Sales1 | Sales2 | Sales3 | Sales4 |
+--------+--------+--------+--------+--------+
| 68 | 1 | 2 | 3 | 4 |
| 630 | 5 | 3 | 7 | 8 |
| 190 | 9 | 10 | 11 | 12 |
+--------+--------+--------+--------+--------+
I want to find duplicates between all the above sales fields. In above example markets 68 and 630 have a duplicate Sales value that is 3.
My problem is displaying the Market having duplicate sales.
This problem would be incredibly simple to solve if you normalised your table.
Then you would just have the columns Market | Sales, or if the 1, 2, 3, 4 are important you could have Market | Quarter | Sales (or some other relevant column name).
Given that your table isn't in this format, you could use a CTE to make it so and then select from it, e.g.
WITH cte AS (
SELECT Market, Sales1 AS Sales FROM MarketSales
UNION ALL
SELECT Market, Sales2 FROM MarketSales
UNION ALL
SELECT Market, Sales3 FROM MarketSales
UNION ALL
SELECT Market, Sales2 FROM MarketSales
)
SELECT a.Market
,b.Market
FROM cte a
INNER JOIN cte b ON b.Market > a.Market
WHERE a.Sales = b.Sales
You can easily do this without the CTE, you just need a big where clause comparing all the combinations of Sales columns.
Supposing the data size is not so big,
make a new temporay table joinning all data:
Sales
Market
then select grouping by Sales and after take the ones bigger than 1:
select Max(Sales), Count(*) as Qty
from #temporary
group by Sales

Finding duplicate in SQL Server Table

I have a table
+--------+--------+--------+--------+--------+
| Market | Sales1 | Sales2 | Sales3 | Sales4 |
+--------+--------+--------+--------+--------+
| 68 | 1 | 2 | 3 | 4 |
| 630 | 5 | 3 | 7 | 8 |
| 190 | 9 | 10 | 11 | 12 |
+--------+--------+--------+--------+--------+
I want to find duplicates between all the above sales fields. In above example markets 68 and 630 have a duplicate Sales value that is 3.
My problem is displaying the Market having duplicate sales.
This problem would be incredibly simple to solve if you normalised your table.
Then you would just have the columns Market | Sales, or if the 1, 2, 3, 4 are important you could have Market | Quarter | Sales (or some other relevant column name).
Given that your table isn't in this format, you could use a CTE to make it so and then select from it, e.g.
WITH cte AS (
SELECT Market, Sales1 AS Sales FROM MarketSales
UNION ALL
SELECT Market, Sales2 FROM MarketSales
UNION ALL
SELECT Market, Sales3 FROM MarketSales
UNION ALL
SELECT Market, Sales2 FROM MarketSales
)
SELECT a.Market
,b.Market
FROM cte a
INNER JOIN cte b ON b.Market > a.Market
WHERE a.Sales = b.Sales
You can easily do this without the CTE, you just need a big where clause comparing all the combinations of Sales columns.
Supposing the data size is not so big,
make a new temporay table joinning all data:
Sales
Market
then select grouping by Sales and after take the ones bigger than 1:
select Max(Sales), Count(*) as Qty
from #temporary
group by Sales

Resources