SQL Counting multiple columns for same ID in same table - sql-server

I have a table with columns GameID, GoalID, PlayerID, Assist1ID, Assist2ID (all integers). PlayerID translates as the ID of the person who actually scored the goal, but Assist1ID and Assist2ID also get player IDs.
I am trying to get a dataset with the distinct PlayerID's (from the PlayerID column or either of the assist columns), a count of goals (PlayerID column) and a count of assists (which is actually the sum of counting columns Assist1 and Assist2 when that playerID occurs). A PlayerID will never be in more than one of those columns.
I have been trying several approaches, mostly with UNION ALL, as well as some SUM/CASE but I am just not getting it. Should I be using a temporary table for this, or is there a way to check the rows, and if the PlayerID.
Example: (note that GoalID and GameID aren't really important in this case)
GameID | GoalID | PlayerID | Assist1ID | Assist2ID
1 | 1 | 1876 | 2098 | 1097
1 | 2 | 2098 | 1829 | 1876
1 | 3 | 2098 | 1876 | ----
My query should return:
PlayerID | Goals | Assists
1876 | 1 | 2
2098 | 2 | 1
1829 | 0 | 1
1097 | 0 | 1
etc
Is this actually possible, or will I have to do some work in the code part of things?

To make sure you get a result record for every player involved, no matter if they only played, only assisted or did both, you must go thrice through your data and glue the records together with UNION ALL. Then count.
select playerid, sum(goal) as goals, sum(assist) as assists
from
(
select playerid, 1 as goal, 0 as assist from mytable
union all
select assist1id, 0 as goal, 1 as assist from mytable
union all
select assist2id, 0 as goal, 1 as assist from mytable
)
group by playerid;

It can be done this way, but I have a feeling that there might be a simpler solution.
SELECT PlayerID, SUM(Goals), SUM(Assists)
FROM (
SELECT PlayerID,Count(*) AS Goals,0 AS Assists FROM Goals GROUP BY PlayerID UNION ALL
SELECT Assist1ID,0,Count(*) FROM Goals GROUP BY Assist1ID UNION ALL
SELECT Assist2ID,0,Count(*) FROM Goals GROUP BY Assist2ID
) T
WHERE NOT PlayerID IS NULL
GROUP BY PlayerID

SELECT x.playerid
, SUM(y.playerid = x.playerid) goals
, SUM(x.playerid IN (y.assist1id,y.assist2id)) assists
FROM
( SELECT playerID FROM my_table
UNION
SELECT assist1id FROM my_table
UNION
SELECT assist2id FROM my_table
) x
LEFT
JOIN my_table y
ON x.playerid IN(y.playerid,y.assist1id,y.assist2id)
WHERE x.playerid IS NOT NULL
GROUP
BY playerid;

Related

Sql Server - display a second record below first one with other data

I have an sql table with the below data:
Id department Amount
1 Accounting 10000
2 Catering 5000
3 Cleaning 5000
I want to return the data as below:
Id department Amount
1 Accounting 10000
1 50%
2 Catering 5000
2 25%
3 Cleaning 5000
3 25%
This implies every records return a second record just below it and display the percentage of the total amount. I have tried to use a PIVOT table but still I cannot position
the second row just below the first related one.
Has anyone ever done something similar I need just some guidelines.
create table #T(Id int, Dept varchar(10),Amount int)
insert into #T
values(1,'Accounting',10000),(2,'Catering',5000),(3,'Cleaning',5000)
declare #Totll float = (Select sum(Amount) from #T)
Select *
from #T
union
select Id,Convert(varchar(50), (Amount/#Totll)*100)+'%',0
from #T
order by Id,Amount desc
Use a CTE to calculate the total of the amounts.
Then use UNION ALL for your table and the query which calculates the percentages:
with cte as (select sum(amount) sumamount from tablename)
select id, department, amount
from tablename
union all
select id, concat(100 * amount / (select sumamount from cte), '%'), null
from tablename
order by id, amount desc
See the demo.
Results:
> id | department | amount
> -: | :--------- | -----:
> 1 | Accounting | 10000
> 1 | 50% | null
> 2 | Catering | 5000
> 2 | 25% | null
> 3 | Cleaning | 5000
> 3 | 25% | null

Find the top ranked unique item for each grouping in a set

Given the following dataset which contains a series of products for a customer, along with a number of related products for each, I want to pick the top ranked unique Related Product ID for each of the Product IDs.
Sample Data
This table shows what the data looks like for a single Customer. There will be multiple Customers.
The items selected in yellow are an example of what the results would look like for this example Customer ID.
So, a single Product ID may have multiple Related Product IDs. For a single customer with, say 6 Product IDs, I want to return the top ranked Related Product ID for each individual Product ID.
Rules
The catch is, that I want to eliminate duplication as much as possible. So if the same Related Product ID is the top ranked for more than one Product ID, the selection should move down to the next highest ranked Related Product ID.
The goal is to, where possible, provide a unique (within each Customer ID) Related Product ID for each Product ID.
Where it is not possible for a unique Related Product ID to be selected (because there are only duplicate Related Product IDs available), then the top ranked should be selected.
Results
For Product 2, the Related Product ID 23194 is the highest ranked, but it is not unique, so is skipped in favour of 23287. For Product 4, we could use either 23194 or 23300, but because neither is unique, we take the highest ranked item.
I've tried doing this using a recursive CTE, but this will iterate through the items and allocate the Related Product on the first Products before finding out if the Related Products are repeated later in the set.
How else can I approach this?
You can use ROW_NUMBER and COUNT OVER():
SQL Fiddle
;WITH Cte AS(
SELECT *,
RN = (RelatedProductRanking + COUNT(*) OVER(PARTITION BY ProductID)) *
COUNT(*) OVER(PARTITION BY RelatedProductID)
FROM tbl
),
CteRnk AS(
SELECT *,
RNK = ROW_NUMBER() OVER(PARTITION BY ProductID ORDER BY RN)
FROM Cte
)
SELECT
CustomerID, ProductRanking, ProductID, RelatedProductRanking, RelatedProductID
FROM CteRnk
WHERE RNK = 1
ORDER BY ProductRanking, RelatedProductRanking
RESULT
| CustomerID | ProductRanking | ProductID | RelatedProductRanking | RelatedProductID |
|------------|----------------|-----------|-----------------------|------------------|
| 12436 | 1 | 14553 | 1 | 14481 |
| 12436 | 2 | 33017 | 2 | 23287 |
| 12436 | 3 | 14203 | 1 | 14289 |
| 12436 | 4 | 23038 | 1 | 23194 |
| 12436 | 5 | 15120 | 1 | 14520 |
| 12436 | 6 | 23014 | 1 | 23300 |

SQL Server: how to create sequence number column

I have a Sales table with the following data:
| SalesId | CustomerId | Amount |
|---------|------------|--------|
| 1 | 1 | 100 |
| 2 | 2 | 75 |
| 3 | 1 | 30 |
| 4 | 3 | 49 |
| 5 | 1 | 93 |
I would like to insert a column into this table that tells us the number of times the customer has made a purchase. So it'll be like:
| SalesId | CustomerId | Amount | SalesNum |
|---------|------------|--------|----------|
| 1 | 1 | 100 | 1 |
| 2 | 2 | 75 | 1 |
| 3 | 1 | 30 | 2 |
| 4 | 3 | 49 | 1 |
| 5 | 1 | 93 | 3 |
So I can see that in salesId = 5, that is the 3rd transaction for customerId = 1. How can I write such a query to insert / update such column? I am on MS SQL but I am also interested in the MYSQL solution should I need to do this there in the future.
Thank you.
ps. Apology for the table formatting. Couldn't figure out how to format it nicely.
You need ROW_NUMBER() to assign a sequence number. I'd strongly advise against storing this value though, since you will need to recalculate it with every update, instead, you may be best off creating a view if you need it regularly:
CREATE VIEW dbo.SalesWithRank
AS
SELECT SalesID,
CustomerID,
Amount,
SalesNum = ROW_NUMBER() OVER(PARTITION BY CustomerID ORDER BY SalesID)
FROM Sales;
GO
SQL Server Example on SQL Fiddle
ROW_NUMBER() will not assign duplicates in the same group, e.g. if you were assigning the rows based on Amount and you have two sales for the same customer that are both 100, they will not have the same SalesNum, in the absence of any other ordering criteria in your ROW_NUMBER() function they will be randomly sorted. If you want Sales with the same amount to have the same SalesNum, then you need to use either RANK or DENSE_RANK. DENSE_RANK will have no gaps in the sequence, e.g 1, 1, 2, 2, 3, whereas RANK will start at the corresponding position, e.g. 1, 1, 3, 3, 5.
If you must do this as an update then you can use:
WITH CTE AS
( SELECT SalesID,
CustomerID,
Amount,
SalesNum,
NewSalesNum = ROW_NUMBER() OVER(PARTITION BY CustomerID ORDER BY SalesID)
FROM Sales
)
UPDATE CTE
SET SalesNum = NewSalesNum;
SQL Server Update Example on SQL Fiddle
MySQL Does not have ranking functions, so you need to use local variables to achieve a rank by keeping track of the value from the previous row. This is not allowed in views so you would just need to repeat this logic wherever you needed the row number:
SELECT s.SalesID,
s.Amount,
#r:= CASE WHEN #c = s.CustomerID THEN #r + 1 ELSE 1 END AS SalesNum,
#c:= CustomerID AS CustomerID
FROM Sales AS s
CROSS JOIN (SELECT #c:= 0, #r:= 0) AS var
ORDER BY s.CustomerID, s.SalesID;
The order by is critical here, which means in order to order the results without affecting the ranking you need to use a subquery:
SELECT SalesID,
Amount,
CustomerID,
SalesNum
FROM ( SELECT s.SalesID,
s.Amount,
#r:= CASE WHEN #c = s.CustomerID THEN #r + 1 ELSE 1 END AS SalesNum,
#c:= CustomerID AS CustomerID
FROM Sales AS s
CROSS JOIN (SELECT #c:= 0, #r:= 0) AS var
ORDER BY s.CustomerID, s.SalesID
) AS s
ORDER BY s.SalesID;
MySQL Example on SQL Fiddle
Again, I would recommend against storing the value, but if you must in MySQL you would use:
UPDATE Sales
INNER JOIN
( SELECT s.SalesID,
#r:= CASE WHEN #c = s.CustomerID THEN #r + 1 ELSE 1 END AS NewSalesNum,
#c:= CustomerID AS CustomerID
FROM Sales AS s
CROSS JOIN (SELECT #c:= 0, #r:= 0) AS var
ORDER BY s.CustomerID, s.SalesID
) AS s2
ON Sales.SalesID = s2.SalesID
SET SalesNum = s2.NewSalesNum;
MySQL Update Example on SQL Fiddle
Using Subquery,
Select *, (Select count(customerid)
from ##tmp t
where t.salesid <= s.salesid
and t.customerid = s.customerid)
from ##tmp s
Try this -
SELECT SalesId, CustomerId, Amount,
SalesNum = ROW_NUMBER() OVER (PARTITION BY CustomerId ORDER BY SalesId)
FROM YOURTABLE

Finding Location of Duplicate Column [duplicate]

I have a table
+--------+--------+--------+--------+--------+
| Market | Sales1 | Sales2 | Sales3 | Sales4 |
+--------+--------+--------+--------+--------+
| 68 | 1 | 2 | 3 | 4 |
| 630 | 5 | 3 | 7 | 8 |
| 190 | 9 | 10 | 11 | 12 |
+--------+--------+--------+--------+--------+
I want to find duplicates between all the above sales fields. In above example markets 68 and 630 have a duplicate Sales value that is 3.
My problem is displaying the Market having duplicate sales.
This problem would be incredibly simple to solve if you normalised your table.
Then you would just have the columns Market | Sales, or if the 1, 2, 3, 4 are important you could have Market | Quarter | Sales (or some other relevant column name).
Given that your table isn't in this format, you could use a CTE to make it so and then select from it, e.g.
WITH cte AS (
SELECT Market, Sales1 AS Sales FROM MarketSales
UNION ALL
SELECT Market, Sales2 FROM MarketSales
UNION ALL
SELECT Market, Sales3 FROM MarketSales
UNION ALL
SELECT Market, Sales2 FROM MarketSales
)
SELECT a.Market
,b.Market
FROM cte a
INNER JOIN cte b ON b.Market > a.Market
WHERE a.Sales = b.Sales
You can easily do this without the CTE, you just need a big where clause comparing all the combinations of Sales columns.
Supposing the data size is not so big,
make a new temporay table joinning all data:
Sales
Market
then select grouping by Sales and after take the ones bigger than 1:
select Max(Sales), Count(*) as Qty
from #temporary
group by Sales

Finding duplicate in SQL Server Table

I have a table
+--------+--------+--------+--------+--------+
| Market | Sales1 | Sales2 | Sales3 | Sales4 |
+--------+--------+--------+--------+--------+
| 68 | 1 | 2 | 3 | 4 |
| 630 | 5 | 3 | 7 | 8 |
| 190 | 9 | 10 | 11 | 12 |
+--------+--------+--------+--------+--------+
I want to find duplicates between all the above sales fields. In above example markets 68 and 630 have a duplicate Sales value that is 3.
My problem is displaying the Market having duplicate sales.
This problem would be incredibly simple to solve if you normalised your table.
Then you would just have the columns Market | Sales, or if the 1, 2, 3, 4 are important you could have Market | Quarter | Sales (or some other relevant column name).
Given that your table isn't in this format, you could use a CTE to make it so and then select from it, e.g.
WITH cte AS (
SELECT Market, Sales1 AS Sales FROM MarketSales
UNION ALL
SELECT Market, Sales2 FROM MarketSales
UNION ALL
SELECT Market, Sales3 FROM MarketSales
UNION ALL
SELECT Market, Sales2 FROM MarketSales
)
SELECT a.Market
,b.Market
FROM cte a
INNER JOIN cte b ON b.Market > a.Market
WHERE a.Sales = b.Sales
You can easily do this without the CTE, you just need a big where clause comparing all the combinations of Sales columns.
Supposing the data size is not so big,
make a new temporay table joinning all data:
Sales
Market
then select grouping by Sales and after take the ones bigger than 1:
select Max(Sales), Count(*) as Qty
from #temporary
group by Sales

Resources