Running count of duplicate values

Running count of duplicate values - sql-server

I have a table showing pallets and the amount of product ("units") on those pallets. Individual pallets can have multiple records due to multiple possible defect codes. This means when I am trying to sum the total units on all pallets, the same pallet could get counted more than once, which is undesirable. I would like (but don't know how) to add a running tally column to show how many times a specific pallet ID has appeared so that I can filter out any record where the count is greater than 1:
| Pallet_ID | Units | Defect_Code | COUNT |
+-----------+-------+-------------+-------+
| A1 | 100 | 03 | 1 |
| A1 | 100 | 05 | 2 |
| B1 | 95 | 03 | 1 |
| C1 | 300 | 05 | 1 |
| C1 | 300 | 06 | 2 |
| D1 | 210 | 03 | 1 |
| A1 | 100 | 10 | 3 |
| D1 | 210 | 03 | 2 |
In the above example, the correct sum total of units should be 705. A solution in SQL or in DAX would work (although I lean towards SQL). I have searched for a long time but could not find a solution that fits this particular scenario. Many thanks in advance for your time and consideration!

You may use the windowing function row_number() with the over clause where you partition by the pallet. Within each partition you can control which row is assigned the number 1 by using the order by inside the over clause.
select
*
from (
select
Pallet_ID
, Units
, Defect_Code
, row_number() over(partition by Pallet_ID order by defect_code) as count_of
from yourtable
)
where count_of = 1
Note I have arbitrability use the column defect_code to order by as I don't know what other columns may exist. If your table has a date/time value for when the row was created you could use this instead, or perhaps the unique key of the table.
side note:
I would not recommend using column alias of "count" as it's a SQL reserved word

Related

Maximum Daisy Chain Length

I have a bunch of value pairs (Before, After) by users in a table. In ideal scenarios these values should form an unbroken chain. e.g.
| UserId | Before | After |
|--------|--------|-------|
| 1 | 0 | 10 |
| 1 | 10 | 20 |
| 1 | 20 | 30 |
| 1 | 30 | 40 |
| 1 | 40 | 30 |
| 1 | 30 | 52 |
| 1 | 52 | 0 |
Unfortunately, these records originate in multiple different tables and are imported into my investigation table. The other values in the table do not lend themselves to ordering (e.g. CreatedDate) due to some quirks in the system saving them out of order.
I need to produce a list of users with gaps in their data. e.g.
| UserId | Before | After |
|--------|--------|-------|
| 1 | 0 | 10 |
| 1 | 10 | 20 |
| 1 | 20 | 30 |
// Row Deleted (30->40)
| 1 | 40 | 30 |
| 1 | 30 | 52 |
| 1 | 52 | 0 |
I've looked at the other Daisy Chaining questions on SO (and online in general), but they all appear to be on a given problem space, where one value in the pair is always lower than the other in a predictable fashion. In my case, there can be increases or decreases.
Is there a way to quickly calculate the longest chain that can be created? I do have a CreatedAt column that would provide some (very rough) relative ordering - When the date is more than about 10 seconds apart, we could consider them orderable)

Are you not therefore simply after this to get the first row where the "chain" is broken?
SELECT UserID, Before, After
FROM dbo.YourTable YT
WHERE NOT EXISTS (SELECT 1
FROM dbo.YourTable NE
WHERE NE.After = YT.Before)
AND YT.Before != 0;
If you want to last row where the row where the "chain" is broken, just swap the aliases on the columns in the WHERE in the NOT EXISTS.

the following performs hierarchical recursion on your example data and calculates a "chain" count column called 'h_level'.
;with recur_cte([UserId], [Before], [After], h_level) as (
select [UserId], [Before], [After], 0
from dbo.test_table
where [Before] is null
union all
select tt.[UserId], tt.[Before], tt.[After], rc.h_level+1
from dbo.test_table tt join recur_cte rc on tt.UserId=rc.UserId
and tt.[Before]=rc.[After]
where tt.[Before]<tt.[after])
select * from recur_cte;
Results:
UserId Before After h_level
1 NULL 10 0
1 10 20 1
1 20 30 2
1 30 40 3
1 30 52 3
Is this helpful? Could you further define which rows to exclude?

If you want users that have more than one chain:
select t.UserID
from <T> as t left outer join <T> as t2
on t2.UserID = t.UserID and t2.Before = t.After
where t2.UserID is null
group by t.UserID
having count(*) > 1;

Ranking within multiple groups & Efficient query for multiple table updates

I'm trying to add rank by sales by month and also change the date column to a 'month end' field that would show only last day of month.
Can i do two sets in a row like that without adding an update?
I'm looking for top 2 within each month - does limit and group by work?
I feel like this is right and most efficient query, but its not working - any help appreciated!!
UPDATE table1
SET DATE=EOMONTH(DATE) AS MONTH_END;
ALTER TABLE table1
ADD COLUMN RANK INT AFTER sales;
UPDATE table1
SET RANK=
RANK() OVER(PARTITION BY cust ORDER BY sales DESC);
LIMIT 2
orig table
+------+----------+-------+--+
| CUST | DATE | SALES | |
+------+----------+-------+--+
| 36 | 3-5-2018 | 50 | |
| 37 | 3-15-18 | 100 | |
| 38 | 3-25-18 | 65 | |
| 37 | 4-5-18 | 95 | |
| 39 | 4-21-18 | 500 | |
| 40 | 4-45-18 | 199 | |
+------+----------+-------+--+
desired output
+------+-----------+-------+------+
| CUST | Month End | SALES | Rank |
+------+-----------+-------+------+
| | | | |
| 37 | 3-31-18 | 100 | 1 |
| 38 | 3-31-18 | 65 | 2 |
| 39 | 4-30-18 | 500 | 1 |
| 40 | 4-30-18 | 199 | 2 |
+------+-----------+-------+------+

I do not know why you want EOMONTH as a stored value, but what you have for that will work.
I would not use [rank] as a column name as I avoid any words that are used in SQL, maybe [sales_rank] or similar.
ALTER TABLE table1
ADD COLUMN [sales_rank] INT AFTER sales;
with cte as (
select
cust
, DENSE_RANK() OVER(PARTITION BY cust ORDER BY sales DESC) as ranking
from table1
)
update cte
set sales_rank = ranking
where ranking < 3
;
LIMIT 2 is not something that can be used in SQL Server by the way, and it sure can't be used "per grouping". When you use a "window function" such as rank() or dense_rank() you can use the output of those in the where clause of the next "layer". i.e. use those functions in a subquery (or cte) and then use a where clause to filter rows by the calculated values.
Also note I used dense_rank() to guarantee that no rank numbers are skipped, so that the subsequent where clause will be effective.

SQL Server Query for Partitioning Data

I have a requirement of assigning sequential Numbers to students. The problem is the data must be partitioned by course first and then the Number must be assigned starting from say 1 to say 1000.
Each Course should have at least a gap of say 20 ( may differ ) to accommodate a student in the same course in case, someone, if left out as of now appears later.
and so on.
I have tried partitioning and Recursive CTE but haven't succeeded to get this kind of series for assigning finally the RollNumber.
Any help would be very much anticipated.
Thank You.

You can do this in two steps with a subquery. First get your row_number() partitioned by course and order by student id, then you can bump each partition by 20 by counting the previous 1 values returned by your row_number() and multiplying by 20.
SELECT
s_no,
course,
rownumber + (SUM(CASE WHEN rownumber = 1 THEN 1 ELSE 0 END) OVER (ORDER BY course, s_no ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) * 20) - 20
FROM
(
SELECT
s_no,
course,
ROW_NUMBER() OVER (PARTITION BY course ORDER BY s_no) rownumber
FROM test
) sub
ORDER BY course, s_no;
+------+--------+-----------+
| s_no | course | rownumber |
+------+--------+-----------+
| 1 | A | 1 |
| 2 | A | 2 |
| 3 | A | 3 |
| 1 | B | 21 |
| 2 | B | 22 |
| 3 | B | 23 |
| 1 | C | 41 |
| 2 | C | 42 |
| 3 | C | 43 |
+------+--------+-----------+
This isn't exactly as your desired output, but I think it's the same as what you are after. You can monkey with the math in that main query though and bump each partitions starting position to whatever you want.

TSQL Multiple column unpivot with named rows possible?

I know there are several unpivot / cross apply discussions here but I was not able to find any discussion that covers my problem. What I've got so far is the following:
SELECT Perc, Salary
FROM (
SELECT jobid, Salary_10 AS Perc10, Salary_25 AS Perc25, [Salary_Median] AS Median
FROM vCalculatedView
WHERE JobID = '1'
GROUP BY JobID, SourceID, Salary_10, Salary_25, [Salary_Median]
) a
UNPIVOT (
Salary FOR Perc IN (Perc10, Perc25, Median)
) AS calc1
Now, what I would like is to add several other columns, eg. one named Bonus which I also want to put in Perc10, Perc25 and Median Rows.
As an alternative, I also made a query with cross apply, but here, it seems as if you can not "force" sort the rows like you can with unpivot. In other words, I can not have a custom sort, but only a sort that is according to a number within the table, if I am correct? At least, here I do get the result like I wish to have, but the rows are in a wrong order and I do not have the rows names like Perc10 etc. which would be nice.
SELECT crossapplied.Salary,
crossapplied.Bonus
FROM vCalculatedView v
CROSS APPLY (
VALUES
(Salary_10, Bonus_10)
, (Salary_25, Bonus_25)
, (Salary_Median, Bonus_Median)
) crossapplied (Salary, Bonus)
WHERE JobID = '1'
GROUP BY crossapplied.Salary,
crossapplied.Bonus
Perc stands for Percentile here.
Output is intended to be something like this:
+--------------+---------+-------+
| Calculation | Salary | Bonus |
+--------------+---------+-------+
| Perc10 | 25 | 5 |
| Perc25 | 35 | 10 |
| Median | 27 | 8 |
+--------------+---------+-------+
Do I miss something or did I something wrong? I'm using MSSQL 2014, output is going into SSRS. Thanks a lot for any hint in advance!
Edit for clarification: The Unpivot-Method gives the following output:
+--------------+---------+
| Calculation | Salary |
+--------------+---------+
| Perc10 | 25 |
| Perc25 | 35 |
| Median | 27 |
+--------------+---------+
so it lacks the column "Bonus" here.
The Cross-Apply-Method gives the following output:
+---------+-------+
| Salary | Bonus |
+---------+-------+
| 35 | 10 |
| 25 | 5 |
| 27 | 8 |
+---------+-------+
So if you compare it to the intended output, you'll notice that the column "Calculation" is missing and the row sorting is wrong (note that the line 25 | 5 is in the second row instead of the first).
Edit 2: View's definition and sample data:
The view basically just adds computed columns of the table. In the table, I've got Columns like Salary and Bonus for each JobID. The View then just computes the percentiles like this:
Select
Percentile_Cont(0.1)
within group (order by Salary)
over (partition by jobID) as Salary_10,
Percentile_Cont(0.25)
within group (order by Salary)
over (partition by jobID) as Salary_25
from Tabelle
So the output is like:
+----+-------+---------+-----------+-----------+
| ID | JobID | Salary | Salary_10 | Salary_25 |
+----+-------+---------+-----------+-----------+
| 1 | 1 | 100 | 60 | 70 |
| 2 | 1 | 100 | 60 | 70 |
| 3 | 2 | 150 | 88 | 130 |
| 4 | 3 | 70 | 40 | 55 |
+----+-------+---------+-----------+-----------+
In the end, the view will be parameterized in a stored procedure.

Might this be your approach?
After your edits I understand, that your solution with CROSS APPLY would comes back with the right data, but not in the correct output. You can add constant values to your VALUES and do the sorting in a wrapper SELECT:
SELECT wrapped.Calculation,
wrapped.Salary,
wrapped.Bonus
FROM
(
SELECT crossapplied.*
FROM vCalculatedView v
CROSS APPLY (
VALUES
(1,'Perc10',Salary_10, Bonus_10)
, (2,'Perc25',Salary_25, Bonus_25)
, (3,'Median',Salary_Median, Bonus_Median)
) crossapplied (SortOrder,Calculation,Salary, Bonus)
WHERE JobID = '1'
GROUP BY crossapplied.SortOrder,
crossapplied.Calculation,
crossapplied.Salary,
crossapplied.Bonus
) AS wrapped
ORDER BY wrapped.SortOrder

Adding a row value more than once to STDEV()

I have the following table:
| rowNumber | amount | count |
| 1 | 1000 | 2 |
| 2 | 1500 | 3 |
| 3 | 1750 | 3 |
| 4 | 2000 | 1 |
Now if I want to get the stdev how can I make the amount of the row 1 be inserted in the function's expression twice, the amount of the row 2 be inserted 3 times and so on... Right now each amount is inserted in a temp table the necessary times and we get the stdev from that table, but I want to see if there is a better and more efficient way to do so.
Thanks.

You could join onto a numbers table
SELECT STDEV(amount)
FROM YourTable JOIN Numbers ON N <= YourTable.[count]
or write a custom CLR aggregate that takes both parameters and does the corresponding calculation.