Adding a row value more than once to STDEV()

Adding a row value more than once to STDEV() - sql-server

I have the following table:
| rowNumber | amount | count |
| 1 | 1000 | 2 |
| 2 | 1500 | 3 |
| 3 | 1750 | 3 |
| 4 | 2000 | 1 |
Now if I want to get the stdev how can I make the amount of the row 1 be inserted in the function's expression twice, the amount of the row 2 be inserted 3 times and so on... Right now each amount is inserted in a temp table the necessary times and we get the stdev from that table, but I want to see if there is a better and more efficient way to do so.
Thanks.

You could join onto a numbers table
SELECT STDEV(amount)
FROM YourTable JOIN Numbers ON N <= YourTable.[count]
or write a custom CLR aggregate that takes both parameters and does the corresponding calculation.

Related

Running count of duplicate values

I have a table showing pallets and the amount of product ("units") on those pallets. Individual pallets can have multiple records due to multiple possible defect codes. This means when I am trying to sum the total units on all pallets, the same pallet could get counted more than once, which is undesirable. I would like (but don't know how) to add a running tally column to show how many times a specific pallet ID has appeared so that I can filter out any record where the count is greater than 1:
| Pallet_ID | Units | Defect_Code | COUNT |
+-----------+-------+-------------+-------+
| A1 | 100 | 03 | 1 |
| A1 | 100 | 05 | 2 |
| B1 | 95 | 03 | 1 |
| C1 | 300 | 05 | 1 |
| C1 | 300 | 06 | 2 |
| D1 | 210 | 03 | 1 |
| A1 | 100 | 10 | 3 |
| D1 | 210 | 03 | 2 |
In the above example, the correct sum total of units should be 705. A solution in SQL or in DAX would work (although I lean towards SQL). I have searched for a long time but could not find a solution that fits this particular scenario. Many thanks in advance for your time and consideration!

You may use the windowing function row_number() with the over clause where you partition by the pallet. Within each partition you can control which row is assigned the number 1 by using the order by inside the over clause.
select
*
from (
select
Pallet_ID
, Units
, Defect_Code
, row_number() over(partition by Pallet_ID order by defect_code) as count_of
from yourtable
)
where count_of = 1
Note I have arbitrability use the column defect_code to order by as I don't know what other columns may exist. If your table has a date/time value for when the row was created you could use this instead, or perhaps the unique key of the table.
side note:
I would not recommend using column alias of "count" as it's a SQL reserved word

Maximum Daisy Chain Length

I have a bunch of value pairs (Before, After) by users in a table. In ideal scenarios these values should form an unbroken chain. e.g.
| UserId | Before | After |
|--------|--------|-------|
| 1 | 0 | 10 |
| 1 | 10 | 20 |
| 1 | 20 | 30 |
| 1 | 30 | 40 |
| 1 | 40 | 30 |
| 1 | 30 | 52 |
| 1 | 52 | 0 |
Unfortunately, these records originate in multiple different tables and are imported into my investigation table. The other values in the table do not lend themselves to ordering (e.g. CreatedDate) due to some quirks in the system saving them out of order.
I need to produce a list of users with gaps in their data. e.g.
| UserId | Before | After |
|--------|--------|-------|
| 1 | 0 | 10 |
| 1 | 10 | 20 |
| 1 | 20 | 30 |
// Row Deleted (30->40)
| 1 | 40 | 30 |
| 1 | 30 | 52 |
| 1 | 52 | 0 |
I've looked at the other Daisy Chaining questions on SO (and online in general), but they all appear to be on a given problem space, where one value in the pair is always lower than the other in a predictable fashion. In my case, there can be increases or decreases.
Is there a way to quickly calculate the longest chain that can be created? I do have a CreatedAt column that would provide some (very rough) relative ordering - When the date is more than about 10 seconds apart, we could consider them orderable)

Are you not therefore simply after this to get the first row where the "chain" is broken?
SELECT UserID, Before, After
FROM dbo.YourTable YT
WHERE NOT EXISTS (SELECT 1
FROM dbo.YourTable NE
WHERE NE.After = YT.Before)
AND YT.Before != 0;
If you want to last row where the row where the "chain" is broken, just swap the aliases on the columns in the WHERE in the NOT EXISTS.

the following performs hierarchical recursion on your example data and calculates a "chain" count column called 'h_level'.
;with recur_cte([UserId], [Before], [After], h_level) as (
select [UserId], [Before], [After], 0
from dbo.test_table
where [Before] is null
union all
select tt.[UserId], tt.[Before], tt.[After], rc.h_level+1
from dbo.test_table tt join recur_cte rc on tt.UserId=rc.UserId
and tt.[Before]=rc.[After]
where tt.[Before]<tt.[after])
select * from recur_cte;
Results:
UserId Before After h_level
1 NULL 10 0
1 10 20 1
1 20 30 2
1 30 40 3
1 30 52 3
Is this helpful? Could you further define which rows to exclude?

If you want users that have more than one chain:
select t.UserID
from <T> as t left outer join <T> as t2
on t2.UserID = t.UserID and t2.Before = t.After
where t2.UserID is null
group by t.UserID
having count(*) > 1;

DELETE TOP variable records with variable from grouping of another table

Say I have two tables: A and B
Table A
+----+-------+
| id | value |
+----+-------+
| 1 | 20 |
| 2 | 20 |
| 3 | 10 |
| 4 | 0 |
+----+-------+
Table B
+----+-------+
| id | value |
+----+-------+
| 1 | 20 |
| 2 | 10 |
| 3 | 30 |
| 4 | 20 |
| 5 | 20 |
| 6 | 10 |
+----+-------+
If I do SELECT value, COUNT(*) AS occurrence FROM A GROUP BY value, I'll get:
+-------+------------+
| value | occurrence |
+-------+------------+
| 20 | 2 |
| 10 | 1 |
| 0 | 1 |
+-------+------------+
Based on this grouping of table A, I want to delete occurrence records from table B with the same values. In other words, I want to delete from B 2 records with value 20, 1 record with value 10, and 1 record with value 0. (Other conditions include 'do nothing if no record exists' and 'smallest id first', but I think these conditions are pretty trivial compared to the bulk of this question.)
Table B after deleting should be:
+----+-------+
| id | value |
+----+-------+
| 3 | 30 |
| 5 | 20 |
| 6 | 10 |
+----+-------+
From the official TOP documentation, doesn't seems like I can perform some JOIN to use as the TOP expression.

We could use ROW_NUMBER with CTEs here:
WITH cteA AS (
SELECT value, COUNT(*) cnt
FROM A
GROUP BY value
),
cteB AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY value ORDER BY id) rn
FROM B
)
DELETE
FROM cteB b
INNER JOIN cteA a
ON b.value = a.value
WHERE
b.rn <= a.cnt;
The logic here is that we use ROW_NUMBER to keep track of the order of each value in the B table. Then, we join to bring in the counts of each value in the A table, and we only delete B records for which the row number is strictly less than or equal to the A count.
See the demo link below to verify that the logic be correct. Note that I use a select there, not a delete, but the correct rows are being targeted for deletion.
Demo

SQL Server Query for Partitioning Data

I have a requirement of assigning sequential Numbers to students. The problem is the data must be partitioned by course first and then the Number must be assigned starting from say 1 to say 1000.
Each Course should have at least a gap of say 20 ( may differ ) to accommodate a student in the same course in case, someone, if left out as of now appears later.
and so on.
I have tried partitioning and Recursive CTE but haven't succeeded to get this kind of series for assigning finally the RollNumber.
Any help would be very much anticipated.
Thank You.

You can do this in two steps with a subquery. First get your row_number() partitioned by course and order by student id, then you can bump each partition by 20 by counting the previous 1 values returned by your row_number() and multiplying by 20.
SELECT
s_no,
course,
rownumber + (SUM(CASE WHEN rownumber = 1 THEN 1 ELSE 0 END) OVER (ORDER BY course, s_no ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) * 20) - 20
FROM
(
SELECT
s_no,
course,
ROW_NUMBER() OVER (PARTITION BY course ORDER BY s_no) rownumber
FROM test
) sub
ORDER BY course, s_no;
+------+--------+-----------+
| s_no | course | rownumber |
+------+--------+-----------+
| 1 | A | 1 |
| 2 | A | 2 |
| 3 | A | 3 |
| 1 | B | 21 |
| 2 | B | 22 |
| 3 | B | 23 |
| 1 | C | 41 |
| 2 | C | 42 |
| 3 | C | 43 |
+------+--------+-----------+
This isn't exactly as your desired output, but I think it's the same as what you are after. You can monkey with the math in that main query though and bump each partitions starting position to whatever you want.

Why am I getting an index scan for a covered query using aggregate function?

I have a query:
select min(timestamp) from table
This table has 60+million rows, and daily I delete a few off the end. To determine whether or not there is any data old enough do delete I run the query above. There is an index on timestamp ascending, containing only one column, and the query plan in oracle causes this to be a full index scan. Should this not be the definition of a seek?
edit including plan:
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 2 | INDEX FULL SCAN (MIN/MAX)| NEVENTS_I2 | 1 | 8 | 4 (100)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
| 0 | SELECT STATEMENT | | 1 | 8 | 4 (0)| 00:00:01 |

Can you post the actual query plan? Are you sure that it is not doing a min/max index full scan? As you can see in this example, we're getting the MIN value from a 100,000 row table using a min/max index full scan with only a handful of consistent gets.
SQL> create table foo (
2 col1 date not null
3 );
Table created.
SQL> insert into foo
2 select sysdate + level
3 from dual
4 connect by level <= 100000;
100000 rows created.
SQL> create index idx_foo_col1
2 on foo( col1 );
Index created.
SQL> analyze table foo compute statistics for all indexed columns;
Table analyzed.
SQL> set autotrace on;
<<Note that I ran this statement once just to get the delayed block cleanout to
happen so that the consistent gets number wouldn't be skewed. You could run a
different query as well>>
1* select min(col1) from foo
SQL> /
MIN(COL1)
---------
02-FEB-11
Execution Plan
----------------------------------------------------------
Plan hash value: 817909383
--------------------------------------------------------------------------------
-----------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
Time |
--------------------------------------------------------------------------------
-----------
| 0 | SELECT STATEMENT | | 1 | 7 | 2 (0)|
00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | |
|
| 2 | INDEX FULL SCAN (MIN/MAX)| IDX_FOO_COL1 | 1 | 7 | 2 (0)|
00:00:01 |
--------------------------------------------------------------------------------
-----------
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2 consistent gets
0 physical reads
0 redo size
532 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

At first I thought that the index would only be used if the column is declared NOT NULL. I tested with the following setup:
SQL> CREATE TABLE my_table (ts TIMESTAMP);
Table created
SQL> INSERT INTO my_table
2 SELECT systimestamp + ROWNUM * INTERVAL '1' SECOND
3 FROM dual CONNECT BY LEVEL <= 100000;
100000 rows inserted
SQL> CREATE INDEX ix ON my_table(ts);
Index created
SQL> EXPLAIN PLAN FOR SELECT MIN(ts) FROM my_table;
Explained
SQL> SELECT * FROM TABLE(dbms_xplan.display);
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 69 (2)| 00:00:0
| 1 | SORT AGGREGATE | | 1 | 13 | |
| 2 | INDEX FULL SCAN (MIN/MAX)| IX | 90958 | 1154K| |
--------------------------------------------------------------------------------
Here we notice that the index is used, but all rows from the index are read. If we specify that the column is not null we get a much better plan:
SQL> ALTER TABLE my_table MODIFY ts NOT NULL;
Table altered
SQL> EXPLAIN PLAN FOR SELECT MIN(ts) FROM my_table;
Explained
SQL> SELECT * FROM TABLE(dbms_xplan.display);
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:0
| 1 | SORT AGGREGATE | | 1 | 13 | |
| 2 | INDEX FULL SCAN (MIN/MAX)| IX | 90958 | 1154K| 2 (0)| 00:00:0
--------------------------------------------------------------------------------
In fact this is the same plan that is also used if we add a WHERE clause (Oracle will read a single row from the index):
SQL> EXPLAIN PLAN FOR SELECT MIN(ts) FROM my_table WHERE ts IS NOT NULL;
Explained
SQL> SELECT * FROM TABLE(dbms_xplan.display);
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:
| 1 | SORT AGGREGATE | | 1 | 13 | |
| 2 | FIRST ROW | | 90958 | 1154K| 2 (0)| 00:00:
| 3 | INDEX FULL SCAN (MIN/MAX)| IX | 90958 | 1154K| 2 (0)| 00:00:
--------------------------------------------------------------------------------
This last plan shows (line 2) that Oracle is indeed performing a "seek".

Just wanted to hone in on the fact that an "INDEX FULL SCAN (MIN/MAX)" is simply not the same as an "INDEX FULL SCAN". An INDEX FULL SCAN really does scan the entire index (possibly with filtering). However an INDEX FULL SCAN (MIN/MAX) or INDEX RANGE SCAN (MIN/MAX) only gets the smallest or largest leaf block (from the range), but can only be employed as long as the column is NOT NULL (which is a bit silly, and really a bug, since a NULL value is by definition neither the smallest nor largest value). The (MIN/MAX) optimization is an implicit FIRST_ROWS action, and doesn't need the "WHERE ... IS NOT NULL" query condition to perform the optimization. Interestingly the MIN/MAX optimization is normally not considered by the CBO for function-based indexes, that's another little bug.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Adding a row value more than once to STDEV() - sql-server

You could join onto a numbers table SELECT STDEV(amount) FROM YourTable JOIN Numbers ON N <= YourTable.[count] or write a custom CLR aggregate that takes both parameters and does the corresponding calculation.

Related

Running count of duplicate values

Maximum Daisy Chain Length

DELETE TOP variable records with variable from grouping of another table

SQL Server Query for Partitioning Data

Why am I getting an index scan for a covered query using aggregate function?

Categories

Resources