Case statement not correctly matching expected values - sql-server

I'm trying to generate some randomized data, and I've been using newid() to seed functions since it is called once for every row and is guaranteed to return a different result each time. However I'm frequently getting values that are somehow not equal to any integers in the expected range.
I've tried a few variations, including a highly upvoted one, but they all result in the same issue. I've put it into a script that shows the problem:
declare #test table (id uniqueidentifier)
insert into #test
select newid() from sys.objects
select
floor(rand(checksum(id)) * 4),
case isnull(floor(rand(checksum(id)) * 4), -1)
when 0 then 0
when 1 then 1
when 2 then 2
when 3 then 3
when -1 then -1
else 999
end,
floor(rand(checksum(newid())) * 4),
case isnull(floor(rand(checksum(newid())) * 4), -1)
when 0 then 0
when 1 then 1
when 2 then 2
when 3 then 3
when -1 then -1
else 999
end
from #test
I expect the results to always be in the range 0 to 3 for all four columns. When the unique identifiers are retrieved from a table, the results are always correct (first two columns.) Similarly, when they're output on the fly they're also correct (third column.) But when they're compared on the fly to integers in a case statement, it often returns a value outside the expected range.
Here's an example, these are the first 20 rows when I ran it just now. As you can see there are '999' instances in the last column that shouldn't be there:
0 0 3 1
3 3 3 1
0 0 3 3
3 3 2 999
1 1 2 999
3 3 2 1
2 2 0 999
0 0 0 0
3 3 2 0
1 1 3 999
3 3 0 999
2 2 2 2
1 1 3 0
2 2 3 0
3 3 1 999
0 0 1 999
3 3 1 1
0 0 0 3
3 3 0 999
0 0 1 0
At first I thought maybe the type coercion was different than I expected, and the result of rand() * int was a float not an int. So I wrapped it all in floor to force it to be an int. Then I thought perhaps there's an odd null value creeping in, but with my case statement a null would be returned as -1, and there are none.
I've run this one two different SQL Server 2012 SP1 instances, both give the same sort of results.

In the fourth column, isnull(floor(rand(checksum(newid())) * 4), -1) is being evaluated up to five times for each row. Once for each branch of the case. On each call the values can be different. So it can return 2, not match 1, 3 not match 2, 1 not match 3, 3 not match 4 fall to the else and return 999.
This can be seen if you get the execution plan, and look at the XML, there is a line [whitespace added.]:
<ScalarOperator ScalarString="
CASE WHEN isnull(floor(rand(checksum(newid()))*(4.000000000000000e+000)),(-1.000000000000000e+000))=(0.000000000000000e+000) THEN (0)
ELSE CASE WHEN isnull(floor(rand(checksum(newid()))*(4.000000000000000e+000)),(-1.000000000000000e+000))=(1.000000000000000e+000) THEN (1)
ELSE CASE WHEN isnull(floor(rand(checksum(newid()))*(4.000000000000000e+000)),(-1.000000000000000e+000))=(2.000000000000000e+000) THEN (2)
ELSE CASE WHEN isnull(floor(rand(checksum(newid()))*(4.000000000000000e+000)),(-1.000000000000000e+000))=(3.000000000000000e+000) THEN (3)
ELSE CASE WHEN isnull(floor(rand(checksum(newid()))*(4.000000000000000e+000)),(-1.000000000000000e+000))=(-1.000000000000000e+000) THEN (-1)
ELSE (999)
END
END
END
END
END
">
Placing the expression in a CTE seems to keep the recomputes from happening:
; WITH T AS (SELECT isnull(floor(rand(checksum(newid())) * 4), -1) AS C FROM #Test)
SELECT CASE C
when 0 then 0
when 1 then 1
when 2 then 2
when 3 then 3
when -1 then -1
else 999 END
FROM T

Related

Find the number of transitions for certain value within a list of values

I have a table include "ID" and "Values", and wanted to know how many times does value "A" jumped into another values like below
ID
Values
1
A
1
A
1
A
1
B
1
A
1
B
1
B
1
C
1
C
1
C
1
A
2
A
2
A
2
B
2
A
2
B
2
C
2
B
Expected Result:
ID
Values
Desired Output
1
A
0
1
A
0
1
A
1
1
B
0
1
A
1
1
B
0
1
B
0
1
C
0
1
C
0
1
C
0
1
A
0
2
A
0
2
A
1
2
B
0
2
A
1
2
B
0
2
C
0
2
B
0
The final table should be like this:
ID
Number of Transitions
1
2
2
2
You just need LEAD() to look at the next value:
select id, values, lead(value) over(partition by id) next_value
from table
Then you can compare next_value with values, and apply an iff(value='A' and next_value!='A', 1, 0).
Then just SUM() or COUNT() and GROUP BY.
You could also treat this as a regexp problem where you want to count how many times a given pattern occurs for each id. The missing piece in your question is -what column dictates the order in which the values appear for each id? You'll need that for either of the solutions
select id, regexp_count(listagg(val,',') within group (order by ordering_col), 'A,[^A]')
from t
group by id;

Counting the occurance of a unique number in an array - MATLAB

I have an array that looks something like...
1 0 0 1 2 2 1 1 2 1 0
2 1 0 0 0 1 1 0 0 2 1
1 2 2 1 1 1 2 0 0 1 0
0 0 0 1 2 1 1 2 0 1 2
however my real array is (50x50).
I am relatively new to MATLAB and need to be able to count the amount of unique values in each row and column, for example there is four '1's in row-2 and three '0's in column-3. I need to be able to do this with my real array.
It would help even more if these quantities of unique values were in arrays of their own also.
PLEASE use simple language, or else i will get lost, for example if representing an array, don't call it x, but perhaps column_occurances_array... for me please :)
What I would do is iterate over each row of your matrix and calculate a histogram of occurrences for each row. Use histc to calculate the occurrences of each row. The thing that is nice about histc is that you are able to specify where the bins are to start accumulating. These correspond to the unique entries for each row of your matrix. As such, use unique to compute these unique entries.
Now, I would use arrayfun to iterate over all of your rows in your matrix, and this will produce a cell array. Each element in this cell array will give you the counts for each unique value for each row. Therefore, assuming your matrix of values is stored in A, you would simply do:
vals = arrayfun(#(x) [unique(A(x,:)); histc(A(x,:), unique(A(x,:)))], 1:size(A,1), 'uni', 0);
Now, if we want to display all of our counts, use celldisp. Using your example, and with the above code combined with celldisp, this is what I get:
vals{1} =
0 1 2
3 5 3
vals{2} =
0 1 2
5 4 2
vals{3} =
0 1 2
3 5 3
vals{4} =
0 1 2
4 4 3
What the above display is saying is that for the first row, you have 3 zeros, 5 ones and 3 twos. The second row has 5 zeros, 4 ones and 2 twos and so on. These are just for the rows. If you want to do these for columns, you have to modify your code slightly to operate along columns:
vals = arrayfun(#(x) [unique(A(:,x)) histc(A(:,x), unique(A(:,x)))].', 1:size(A,2), 'uni', 0);
By using celldisp, this is what we get:
vals{1} =
0 1 2
1 2 1
vals{2} =
0 1 2
2 1 1
vals{3} =
0 2
3 1
vals{4} =
0 1
1 3
vals{5} =
0 1 2
1 1 2
vals{6} =
1 2
3 1
vals{7} =
1 2
3 1
vals{8} =
0 1 2
2 1 1
vals{9} =
0 2
3 1
vals{10} =
1 2
3 1
vals{11} =
0 1 2
2 1 1
This means that in the first column, we see 1 zero, 2 ones and 1 two, etc. etc.
I absolutely agree with rayryeng! However, here is some code which might be easier to understand for you as a beginner. It is without cell arrays or arrayfuns and quite self-explanatory:
%% initialize your array randomly for demonstration:
numRows = 50;
numCols = 50;
yourArray = round(10*rand(numRows,numCols));
%% do some stuff of what you are asking for
% find all occuring numbers in yourArray
occVals = unique(yourArray(:));
% now you could sort them just for convinience
occVals = sort(occVals);
% now we could create a matrix occMat_row of dimension |occVals| x numRows
% where occMat_row(i,j) represents how often the ith value occurs in the
% jth row, analoguesly occMat_col:
occMat_row = zeros(length(occVals),numRows);
occMat_col = zeros(length(occVals),numCols);
for k = 1:length(occVals)
occMat_row(k,:) = sum(yourArray == occVals(k),2)';
occMat_col(k,:) = sum(yourArray == occVals(k),1);
end

3+ dimensional truth table in APL

I would like to enumerate all the combinations (tuples of values) of 3 or more finite-valued variables which satisfy a given condition. In math notation:
For example (inspired by Project Euler problem 9):
The truth tables for two variables at a time are easy enough:
a ∘.≤ b
1 1 1 1
0 1 1 1
0 0 1 1
b ∘.≤ c
1 1 1 1 1
0 1 1 1 1
0 0 1 1 1
0 0 0 1 1
After much head-scratching, I managed to combine them, by computing the ∧ of every 4-valued row of the former with each 4-valued column of the latter, and disclosing (⊃) on the correct axis, between 1 and 2:
⎕← tt ← ⊃[1.5] (⊂[2] a ∘.≤ b) ∘.∧ (⊂[1] b ∘.≤ c)
1 1 1 1 1
0 1 1 1 1
0 0 1 1 1
0 0 0 1 1
0 0 0 0 0
0 1 1 1 1
0 0 1 1 1
0 0 0 1 1
0 0 0 0 0
0 0 0 0 0
0 0 1 1 1
0 0 0 1 1
Then I could use its ravel to filter all possible tuples of values:
⊃ (,tt) / , a ∘., b ∘., c
1 1 1
1 1 2
1 1 3
1 1 4
1 1 5
1 2 2
1 2 3
...
3 3 5
3 4 4
3 4 5
Is this the best approach to this particular class of problems in APL?
Is there an easier or faster formula for this example, or for the general case?
More generally, comparing my (naïve?) array approach above to traditional scalar languages, I can see that I'm translating each loop into an additional dimension: 3 nested loops become a 3-rank truth table:
for c in 1..NC:
for b in 1..min(c, NB):
for a in 1..min(b, NA):
collect (a,b,c)
But in a scalar language one can effect optimizations along the way, for example breaking loops as soon as possible, or choosing the loop boundaries dynamically. In this case I don't even need to test for a ≤ b ≤ c, because it's implicit in the loop boundaries.
In this example both approaches have O(N³) complexity, so their runtime will only differ by a factor. But I'm wondering: how could I write the array solution in a more optimized way, if I needed to do so?
Are there any good books or online resources that address algorithmic issues or best practices in APL?
Here's an alternative approach. I'm not sure if it would run faster.
Following your algorithm for scalar languages, the possible values of c are
⎕IO←0
c←1+⍳NC
In the inner loops the values for b and a are
b←1+⍳¨NB⌊c
a←1+⍳¨¨NA⌊b
If we combine those
r←(⊂¨¨¨a,¨¨¨b),¨¨¨c
we get a nested array of (a,b,c) triplets which can be flattened and rearranged in a matrix
r←∊r
(((⍴r)÷3),3)⍴r
ADD:
Morten Kromberg sent me the following solution. On Dyalog APL it's ~ 30 times more efficient than the one above:
⎕IO←1
AddDim←{0≡⍵:⍪⍳⍺ ⋄ n←0⌈⍺-x←¯1+⊢/⍵ ⋄ (n⌿⍵),∊x+⍳¨n}
TTable←{⊃AddDim/⌽0,⍵}
TTable 3 4 5

Can't figure out how sum(1) is working in this query

So I am trying to combine to query's and to do that I need to figure out what is going on in this one. I'm still relatively new to sql server and I am forced to dive right into some complicated qrys and sometimes I get stuck on simple things like this. My problem is that the Sum(1) function is used and I'm not entirely sure how. Meaning I believe it is counting duplicates but I cannot tell based on what information it is doing so.
this is the query
SELECT
qryReinsuranceDPA1.POLICY_NO,
qryReinsuranceDPA1.PHASE_CODE,
qryReinsuranceDPA1.SUB_PHASE_CODE,
qryReinsuranceDPA1.ProdType,
TotalDPA = Sum(case when [SumOfNetDefExtraAdj] Is Null then [SumOfNetDefPremiumAdj] else [SumOfNetDefPremiumAdj] + SumOfNetDefExtraAdj end),
Sum(1) AS Expr1
FROM qryPolicyListforNYDefPRemAsset_Re RIGHT JOIN qryReinsuranceDPA1
ON
qryReinsuranceDPA1.POLICY_NO = qryPolicyListforNYDefPRemAsset_Re.POLICY_NO AND
qryReinsuranceDPA1.PHASE_CODE= qryPolicyListforNYDefPRemAsset_Re.PHASE_CODE AND
qryReinsuranceDPA1.SUB_PHASE_CODE = qryPolicyListforNYDefPRemAsset_Re.SUB_PHASE_CODE
GROUP BY qryReinsuranceDPA1.POLICY_NO,
qryReinsuranceDPA1.PHASE_CODE,
qryReinsuranceDPA1.SUB_PHASE_CODE,
qryReinsuranceDPA1.ProdType
--HAVING (((Sum(1))<>1))
GO
And this is a small sample of what it produces (the actually results number around 77,000)
POLICY_NO PHASE_CODE SUB_PHASE_CODE ProdType TotalDPA Expr1
228433800 0 1 TERM 282.324223 1
228439200 0 1 PERM 53.17048634 1
228439200 6 1 PERM 10.3805065 1
228441500 0 1 PERM 526.6883742 1
228441500 0 2 PERM 10.63320899 1
228441700 0 1 PERM 20.86247317 1
228448100 0 1 PERM 345.2117169 1
228460200 0 1 TERM 302.7574933 1
228464900 0 1 TERM 191.2597906 1
228468000 0 1 PERM 8445.190912 1
228473600 0 1 TERM 339.8413682 **2**
228473800 0 1 TERM 686.1766864 **2**
228477200 0 1 TERM 583.7580207 1
228481200 0 1 TERM 362.9472595 1
228481200 0 2 PERM 4.217792443 1
228482500 0 1 PERM 1894.303507 1
228482500 1 1 TERM 1312.183889 1
228491600 0 1 TERM 325.0796843 **2**
228494400 0 1 PERM 748.2710255 1
228501000 0 1 TERM 47.78070676 1
228501100 0 1 TERM 47.78070676 1
228501300 0 1 PERM 365.5651862 1
228501300 0 2 PERM 12.20547324 1
228501300 1 1 TERM 706.0961491 1
228501300 1 2 PERM 12.46769547 1
228502000 0 1 PERM 6562.164879 1
228502000 0 2 PERM 184.7741277 1
The right most column is the result of the Sum(1) and what I want to know is when and why does it produce a 2.
sum(1) is exactly equivalent to count(*) - it returns a count of all the rows within the group.
It will therefore return a value of 2 when, for a given value for each of POLICY_NO, PHASE_CODE, SUB_PHASE_CODE and ProdType there are two rows in the selected dataset (before grouping).

Matlab counting elements in array

Hey guys I just have a quick question regarding counting elements in an array.
the array is something like this
B = [1 0 1 0 0 -1; 1 1 1 0 -1 -1; 0 1 -1 0 0 1]
From this array i want to create an array structure, called column counts and another row counts. I really do want to crate an array structure, even if it is a less efficient process.
basically i want to go through the array and total for each column, row the total amount of times these values occur. For instance for the first row, i want the following output.
Row Counts
-1 0 1
1 3 2
thanks in advance
You can use the hist function to do this.
fprintf('Row counts\n');
disp([-1 0 1])
fprintf('\n')
for row = 1:3
disp(hist(m(i,:),3));
end
yields
Row counts
-1 0 1
1 3 2
2 1 3
1 3 2
I don't fully understand your question, but if you want to count the occurrences of an element in a Matlab array you can do something like:
% Find value 3 in array A
A =[ 1 4 5 3 3 1 2 4 2 3 ];
count = sum( A == 3 )
When comparing A==3 Matlab will fill an array with 0 and 1, meaning the second one that the element in the given position in A has the element you were looking for. So you can count the occurrences by accumulating the values in the array A==3
Edit: you can access the different dimensions like that:
A = [ 1 2 3 4; 1 2 3 4; 1 2 3 4 ]; % 3rows x 4columns matrix
count1 = sum( A(:,1) == 2 ); % count occurrences in the first column
count2 = sum( A(:,3) == 2 ); % ' ' third column
count3 = sum( A(2,:) == 2 ); % ' ' second row
You always access given rows or columns like that.

Resources