sum column with duplicates in another table - sql-server

Wrong Result
So i have two tables
Order
Staging
Order Table having column structure
+-------+---------+-------------+---------------+----------+
| PO | cashAmt | ClaimNumber | TransactionID | Supplier |
+-------+---------+-------------+---------------+----------+
| 12345 | 100 | 99876 | abc123 | 0101 |
| 12346 | 50 | 99875 | abc123 | 0102 |
| 12345 | 100 | 99876 | abc123 | 0101 |
+-------+---------+-------------+---------------+----------+
Staging Table having column structure
+----------+------------+-------------+---------------+
| PONumber | paymentAmt | ClaimNumber | TransactionID |
+----------+------------+-------------+---------------+
| 12345 | 100 | 99876 | abc123 |
| 12346 | 50 | 99875 | abc123 |
+----------+------------+-------------+---------------+
The query i am executing is
select sum(cashAmt) CheckAmount, count(ClaimNumber) TotalLines
FROM [order] with (nolock)
WHERE TransactionID='abc123'
union
select sum(paymentAmt) CheckAmount, count(ClaimNumber) TotalLines
from Staging with (nolock)
where TransactionID='abc123'
but the sum is getting messed up because there is duplicate in one of the tables.
How can i edit that i get only uniques from the order table and the sums are correct

First ask yourself why are there duplicates in the Orders table? There must be a reason why they are there. I would deal with that first.
That issue aside, if the duplicates in the Orders table have a purpose and yet are not to be considered for this particular query, then you should be able to leave out the duplicates by simply changing the query to use DISTINCT on whatever field in the Orders table can reliably identify a duplicate.
select Distinct fieldname sum(cashAmt)... etc.

Assuming duplicates in your table are OK.
Not sure why you are using no lock, it seems like it shouldn't be included.
You could use a table variable to store the distinct values. You'll need to adjust the data types in the table variable to match your table structure.
I haven't tested the code below but it should look something like this.
DECLARE #OrderTmp TABLE (
cashAmt MyNumericColumn numeric(10,2)
, ClaimNumber int
, TransactionID Int
)
INSERT INTO #OrderTmp
select Distinct
cashAmt
,ClaimNumber
,TransactionID
FROM
[order]
WHERE TransactionID='abc123'
SELECT DISTINCT
select sum(cashAmt) CheckAmount, count(ClaimNumber) TotalLines
FROM #OrderTmp
where TransactionID='abc123'
union
select sum(paymentAmt) CheckAmount, count(ClaimNumber) TotalLines
from Staging
where TransactionID='abc123'

Related

SQL - split data into two columns based on identifier

I have a table that contains the product categories/headers and product names but all in one column. I need to split it out into separate Category and Product columns. I also have a helper column Header_Flag which can be used to determine if the row contains the header or product name. The input table looks like this:
|---------------------|------------------|
| Product | Header_Flag |
|---------------------|------------------|
| Furniture | Y |
| Bed | N |
| Table | N |
| Chair | N |
| Cosmetics | Y |
| Lip balm | N |
| Lip stick | N |
| Eye liner | N |
| Apparel | Y |
| Shirt | N |
| Trouser | N |
|---------------------|------------------|
The output format I'm looking for would be like this:
|---------------------|------------------|
| Category | Product |
|---------------------|------------------|
| Furniture | Bed |
| Furniture | Table |
| Furniture | Chair |
| Cosmetics | Lip balm |
| Cosmetics | Lip stick |
| Cosmetics | Eye liner |
| Apparel | Shirt |
| Apparel | Trouser |
|---------------------|------------------|
With the data is it stands, you cannot get the results you are after. To be able to achieve this, you need to be able to order your data, using an ORDER BY clause, and ordering on either of these column does not achieve the same result as the sample data:
CREATE TABLE dbo.YourTable (Product varchar(20),HeaderFlag char(1));
GO
INSERT INTO dbo.YourTable
VALUES('Furniture','Y'),
('Bed','N'),
('Table','N'),
('Chair','N'),
('Cosmetics','Y'),
('Lipbalm','N'),
('Lipstick','N'),
('Eyeliner','N'),
('Apparel','Y'),
('Shirt','N'),
('Trouser','N');
GO
SELECT *
FROM dbo.YourTable
ORDER BY Product;
GO
SELECT *
FROM dbo.YourTable
ORDER BY HeaderFlag
GO
DROP TABLE dbo.YourTable;
AS you can see, the orders both differ.
If you add a column you can order on though (I'm going to use an IDENTITY) then you can achieve this:
CREATE TABLE dbo.YourTable (I int IDENTITY, Product varchar(20),HeaderFlag char(1));
GO
INSERT INTO dbo.YourTable
VALUES('Furniture','Y'),
('Bed','N'),
('Table','N'),
('Chair','N'),
('Cosmetics','Y'),
('Lipbalm','N'),
('Lipstick','N'),
('Eyeliner','N'),
('Apparel','Y'),
('Shirt','N'),
('Trouser','N');
GO
SELECT *
FROM dbo.YourTable YT
ORDER BY I;
Then you can use a cumulative COUNT to put the values into groups and get the header:
WITH Grps AS(
SELECT YT.I,
YT.Product,
YT.HeaderFlag,
COUNT(CASE YT.HeaderFlag WHEN 'Y' THEN 1 END) OVER (ORDER BY I ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM dbo.YourTable YT),
Split AS(
SELECT G.I,
MAX(CASE G.HeaderFlag WHEN 'Y' THEN Product END) OVER (PARTITION BY G.Grp) AS Category,
G.Product,
G.HeaderFlag
FROM Grps G)
SELECT S.Category,
S.Product
FROM Split S
WHERE HeaderFlag = 'N'
ORDER BY S.I;

SQL Server find sum of values based on criteria within another table

I have a table consisting of ID, Year, Value
---------------------------------------
| ID | Year | Value |
---------------------------------------
| 1 | 2006 | 100 |
| 1 | 2007 | 200 |
| 1 | 2008 | 150 |
| 1 | 2009 | 250 |
| 2 | 2005 | 50 |
| 2 | 2006 | 75 |
| 2 | 2007 | 65 |
---------------------------------------
I then create a derived, aggregated table consisting of an ID, MinYear, and MaxYear
---------------------------------------
| ID | MinYear | MaxYear |
---------------------------------------
| 1 | 2006 | 2009 |
| 2 | 2005 | 2007 |
---------------------------------------
I then want to find the sum of Values between the MinYear and MaxYear foreach ID in the aggregated table, but I am having trouble determining a proper query.
The final table should look something like this
----------------------------------------------------
| ID | MinYear | MaxYear | SumVal |
----------------------------------------------------
| 1 | 2006 | 2009 | 700 |
| 2 | 2005 | 2007 | 190 |
----------------------------------------------------
Right now I can perform all the joins to create the second table. But then I use a fast forward cursor to iterate through each record of the second table with the code inside the for loop looking like the following
DECLARE #curMin int
DECLARE #curMax int
DECLARE #curID int
FETCH Next FROM fastCursor INTo #curISIN, #curMin , #curMax
WHILE ##FETCH_STATUS = 0
BEGIN
SELECT Sum(Value) FROM ValTable WHERE Year >= #curMin and Year <= #curMax and ID = #curID
Group By ID
FETCH Next FROM fastCursor INTo #curISIN, #curMin , #curMax
Having found the sum of values between specified years, I can connect it back to the second table and I wind up the desired result (the third table).
However, the second table in reality is roughly 4 million rows, so this iteration is extremely time consuming (~generating 300 results a minute) and presumably not the best solution.
My question is, is there a way to generate the third table's results without having to use a cursor/for loop?
During a group by the sum will only be for the ID in question -- since the min year and max year is for the ID itself then you don't need to double query. The query below should give you exactly what you need. If you have a different requirement let me know.
SELECT ID, MIN(YEAR) as MinYear, MAX(YEAR) as MaxYear, SUM(VALUE) as SUMVALUE
FROM tablenameyoudidnotsay
GROUP BY ID
You could use query as bellow
TableA is your first table, and TableB is the second one
SELECT *,
(select SUM(Value) FROM TableA where tablea.ID=TableB.ID AND tableA.Year BETWEEN
TableB.MinYear AND TableB.MaxYear) AS SumValue
from TableB
You can put your criteria into a join and obtain the result all as one set which should be faster:
SELECT b.Id, b.MinYear, b.MaxYear, sum(a.Value)
FROM Table2 b
JOIN Table1 a ON a.Id=b.Id AND b.MinYear <= a.Year AND b.MaxYear >= a.Year
GROUP BY b.Id, b.MinYear, b.MaxYear

Check if records exists at least once in LEFT JOINED Table

I have an Images, Orders and OrderItems table, I want to match for any images, if any has already been bought by the User passed as parameters by displaying true or false in an IsBought column.
Select Images.Id,
Images.Title,
Images.Description,
Images.Location,
Images.PriceIT,
Images.PostedAt,
CASE WHEN OrderItems.ImageId = Images.Id THEN CAST(1 AS BIT)
ELSE CAST(0 AS BIT) END
AS 'IsBought'
FROM Images
INNER JOIN Users as u on Images.UserId = u.Id
LEFT JOIN Orders on Orders.UserId = #userId
LEFT JOIN OrderItems on Orders.Id = OrderItems.OrderId and OrderItems.ImageId = Images.Id
Group By Images.Id,
Images.Title,
Images.Description,
Images.Location,
Images.PriceIT,
Images.PostedAt,
OrderItems.ImageId,
Orders.UserId
When I use this CASE WHEN I have duplicates when the item has been bought where IsBought is True and the duplicate is False.
In the case where the Item has never been bought, there is no duplicates, IsBought is just equal to False
----------------------------------
| User | type |
----------------------------------
| Id | nvarchar(450) |
----------------------------------
| .......|
----------------------------------
----------------------------------
| Orders | type |
----------------------------------
| Id | nvarchar(255) |
----------------------------------
| UserId | nvarchar(450) |
----------------------------------
| ........................... |
----------------------------------
----------------------------------
| OrderItems | type |
----------------------------------
| Id | nvarchar(255) |
----------------------------------
| OrderId | nvarchar(255) |
----------------------------------
| ImageId | int |
----------------------------------
----------------------------------
| Images | type |
----------------------------------
| Id | int |
----------------------------------
| UserId | nvarchar(450) |
----------------------------------
| Title | nvarchar(MAX) |
----------------------------------
| Description| nvarhar(MAX) |
----------------------------------
| ......................... |
----------------------------------
Any ideas on how I could just have one row per Images with IsBought set to true or false but not duplicates?
I would like something like this:
----------------------------------------------------------------------------
| Id | Title | Description | Location | PriceIT | Location | IsBought |
----------------------------------------------------------------------------
| 1 | Eiffel Tower | .... | ...... | 20.0 | Paris | true |
----------------------------------------------------------------------------
| 2 | Tore di Pisa | .... | ...... | 20.0 | Italia | false |
---------------------------------------------------------------------------
| etc ......
---------------------------------------------------------------------------
Your query logic looks suspicious. It is unusual to see a join that consists only of a comparison of a column from the unpreserved table to a parameter. I suspect that you don't need a join to users at all since you seem to be focused on things "bought" by a person and not things "created" (which is implied by the name "author") by that same person. And a group by clause with no aggregate is often a cover-up for a logically flawed query.
So start over. You want to see all images apparently. For each, you simply want to know if that image is associated with any order of a given person.
select img.*, -- you would, or course, only select the columns needed
(select count(*) from Sales.SalesOrderDetail as orddet
where orddet.ProductID = img.ProductID) as [Order Count],
(select count(*) from Sales.SalesOrderDetail as orddet
inner join Sales.SalesOrderHeader as ord
on orddet.SalesOrderID = ord.SalesOrderID
where orddet.ProductID = img.ProductID
and ord.CustomerID = 29620
) as [User Order Count],
case when exists(select * from Sales.SalesOrderDetail as orddet
inner join Sales.SalesOrderHeader as ord
on orddet.SalesOrderID = ord.SalesOrderID
where orddet.ProductID = img.ProductID
and ord.CustomerID = 29620) then 1 else 0 end as [Has Ordered]
from Production.ProductProductPhoto as img
where img.ProductID between 770 and 779
order by <something useful>;
Notice the aliases - it is much easier to read a long query when you use aliases that are shorter but still understandable (i.e., not single letters). I've included 3 different subqueries to help you understand correlation and how you can build your logic to achieve your goal and help debug any issues you find.
This is based on AdventureWorks sample database - which you should install and use as a learning tool (and to help facilitate discussions with others using a common data source). Note that I simply picked a random customer ID value - you would use your parameter. I filtered the query to a range of images to simplify debugging. Those are very simple but effective methods to help write and debug sql.

How to INSERT rows based on other rows?

I need to run a query that will INSERT new rows into a SQL Server join table.
Suppose I have the following tables to describe which products a store sells and in which states:
products:
+------------+--------------+
| product_id | product_name |
+------------+--------------+
| 1 | Laptop |
| 2 | Aspirin |
| 3 | Mattress |
+------------+--------------+
stores:
+----------+------------+
| store_id | store_name |
+----------+------------+
| 1 | Walmart |
| 2 | Best Buy |
| 3 | Sam's Club |
+----------+------------+
products_stores_states:
+------------+----------+-------+
| product_id | store_id | state |
+------------+----------+-------+
| 1 | 2 | AL |
| 1 | 2 | AR |
| 2 | 2 | AL |
| 2 | 2 | AR |
| 3 | 2 | AL |
| 3 | 2 | AR |
+------------+----------+-------+
So here we see that Best Buy sells all 3 products in AL and AR.
What I need to do is somehow insert rows into the products_stores_states table to add AZ for all products it currently sells.
With a small dataset, I could do this manually, row by row:
INSERT INTO products_stores_states (product_id, store_id, state) VALUES
(1,2,'AZ'),
(2,2,'AZ'),
(3,2,'AZ');
Since this is a large dataset, this is not really an option.
How would I go about inserting a new state for Best Buy for every product_id that the products_stores_states table already contains for Best Buy?
Bonus: If a query could be made to do this for multiple states that the same time, that would be even better.
Right now, I cannot wrap my head around how to do this, but I assume there would need to be a subquery to get the list of matching product_id values I need to use.
The following query will do what you want to do
DECLARE #temp TABLE (
state VARCHAR(20)
)
-- we are inserting state names into a temp table to use it further
INSERT INTO #temp (state)
VALUES
('AZ'),
('MA'),
('TX');
INSERT INTO products_stores_states(product_id, store_id, state )
SELECT
T.product_id,
T.store_id,
temp.state
FROM(
SELECT DISTINCT
Product_id, store_id
FROM
products_stores_states
WHERE store_id = 2 -- the store_id for which you want to make changes
) AS T
CROSS JOIN
#temp AS temp
At first, we are storing the state names into a table variable. Then we need to select only the distinct store_id and product_id combinations for a specific store.
Then we should insert the distinct values cross join with the table variable where we stored state names.
Here is the live demo.
Hope, this helps! Thanks.
If you are positive the inserted state is "NEW" to the table, something like this would work, changing the state variable to whatever you want to insert the new records.
DECLARE #State CHAR(2), #StoreId INT;
SET #State = 'AZ';
SET #StoreId = 2;
INSERT INTO products_stores_states (product_id, store_id, state)
SELECT DISTINCT product_id, #StoreId, #State
FROM dbo.products_stores_states
WHERE store_id = #StoreId;
You could first see what this statement would add using this:
DECLARE #State CHAR(2)
SET #State = 'AZ';
SET #StoreId = 2;
SELECT DISTINCT product_id, #StoreId, #State
FROM dbo.products_stores_states
WHERE store_id = #StoreId;

How to efficiently match on dates in SQL Server?

I am trying to return the first registration for a person based on the minimum registration date and then return full information. The data looks something like this:
Warehouse_ID SourceID firstName lastName firstProgramSource firstProgramName firstProgramCreatedDate totalPaid totalRegistrations
12345 1 Max Smith League Kid Hockey 2017-06-06 $100 3
12345 6 Max Smith Activity Figure Skating 2018-09-26 $35 1
The end goal is to return one row per person that looks like this:
Warehouse_ID SourceID firstName lastName firstProgramSource firstProgramName firstProgramCreatedDate totalPaid totalRegistrations
12345 1 Max Smith League Kid Hockey 2017-06-06 $135 4
So, this would aggregate the totalPaid and totalRegistrations variables based on the Warehouse_ID and would pull the rest of the information based on the min(firstProgramCreatedDate) specific to the Warehouse_ID.
This will end up in Tableau, so what I've recently tried ignores aggregating totalPaid and totalRegistrations for now (I can get that in another query pretty easily). The query I'm using seems to work, but it is taking forever to run; it seems to be going row by row for >50,000 rows, which is taking forever.
select M.*
from (
select Warehouse_ID, min(FirstProgramCreatedDate) First
from vw_FirstRegistration
group by Warehouse_ID
) B
left join vw_FirstRegistration M on B.Warehouse_ID = M.Warehouse_ID
where B.First in (M.FirstProgramCreatedDate)
order by B.Warehouse_ID
Any advice on how I can achieve my goal without this query taking an hour plus to run?
A combination of the ROW_NUMBER windowing function, plus the OVER clause on a SUM expression should perform pretty well.
Here's the query:
SELECT TOP (1) WITH TIES
v.Warehouse_ID
,v.SourceID
,v.firstName
,v.lastName
,v.firstProgramSource
,v.firstProgramName
,v.firstProgramCreatedDate
,SUM(v.totalPaid) OVER (PARTITION BY v.Warehouse_ID) AS totalPaid
,SUM(v.totalRegistrations) OVER (PARTITION BY v.Warehouse_ID) AS totalRegistrations
FROM
#vw_FirstRegistration AS v
ORDER BY
ROW_NUMBER() OVER (PARTITION BY v.Warehouse_ID
ORDER BY CASE WHEN v.firstProgramCreatedDate IS NULL THEN 1 ELSE 0 END,
v.firstProgramCreatedDate)
And here's a Rextester demo: https://rextester.com/GNOB14793
Results (I added another kid...):
+--------------+----------+-----------+----------+--------------------+------------------+-------------------------+-----------+--------------------+
| Warehouse_ID | SourceID | firstName | lastName | firstProgramSource | firstProgramName | firstProgramCreatedDate | totalPaid | totalRegistrations |
+--------------+----------+-----------+----------+--------------------+------------------+-------------------------+-----------+--------------------+
| 12345 | 1 | Max | Smith | League | Kid Hockey | 2017-06-06 | 135.00 | 4 |
| 12346 | 6 | Joe | Jones | Activity | Other Activity | 2017-09-26 | 125.00 | 4 |
+--------------+----------+-----------+----------+--------------------+------------------+-------------------------+-----------+--------------------+
EDIT: Changed the ORDER BY based on comments.
Try to use ROW_NUMBER() with PARTITIYION BY.
For more information please refer to:
https://learn.microsoft.com/en-us/sql/t-sql/functions/row-number-transact-sql?view=sql-server-2017

Resources