SAS EG How to compare cell values in an array loop? - loops

I am currently trying to compare cell values on the same row over multiple columns, but having issues with referencing the correct cells.
My data currently is this:
col1
col2
col3
col4
col5
col6
a
b
c
d
e
f
a
b
c
d
e
e
a
b
c
d
d
d
I would like to compare col{i} to col{i+1} and drop values when repeated to give:
col1
col2
col3
col4
col5
col6
a
b
c
d
e
f
a
b
c
d
e
-
a
b
c
d
-
-
My current code is:
data want;
set have;
array c{*} col;
do i = 1 to dim(c);
do j = i+1;
if c{i} = c{j} then .;
else c{i};
end;
end;
run;
TIA

data want;
set have;
array c{*} col:;
do i = dim(c) to 2 by -1; *no reason to check #1;
if c{i} = c{i-1} then call missing(c{i}); *if identical to prior, clear out;
end;
run;
You don't need two loops - just one - as you're just checking the record "before" (or "after", but "before" is easier to mentally comprehend, at least for me). Start on 2, check the one prior, and if identical, clear it out.
Importantly, this goes in reverse order (so it gets the d situation above) - if you go left to right, it won't get the last d as it won't compare to the right one.

For the case of data containing multiple segments of repeated values and wanting only unique consecutive values you will need to track an insertion index.
Example: Variable j tracks the insertion point
data have;
input (col1-col6) ($) #1 (kol1-kol6) ($);
format col: kol: $1.;
datalines;
a b c d e f
a b c d e e
a b c d d d
a a b b c c
a a b b a a
. b b b c d
a a . . c c
run;
data want(keep=col: kol:);
set have;
array c col1-col6;
j = 1;
do i = 2 to dim(c);
if c(i) ne c(j) then do;
j = j + 1;
if i ne j then do;
c(j) = c(i);
call missing(c(i));
end;
end;
end;
do j = j+1 to i-1;
call missing(c{j});
end;
run;
For the case of wanting only unique values of the array, you can use a bubble sorting comparison approach when the number of elements is smallish, say <10.
/* uniqueness via a bubbly search */
data want_b;
set have;
array c col1-col6;
j=0;
do i = 1 to dim(c);
if missing(c{i}) then continue;
do k = 1 to j; * bubble, bubble;
if c{k} = c{i} then do;
call missing(c{i});
leave;
end;
end;
if missing(c{i}) then continue;
j = j + 1;
if j < i then do;
c{j} = c{i};
call missing(c{i});
end;
end;
run;
When the number of elements increases you can use a hash to be more efficient whilst ensuring uniqueness.
/* uniqueness via hash lookup */
data want_h(keep=col: kol:);
set have;
array c col1-col6;
if _n_ = 1 then do;
declare hash v();
length value $20; * must be at least as long as longest of c{*} variable ;
v.defineKey('value');
v.defineData('i');
v.defineDone();
call missing(value);
end;
j = 0;
do i = 1 to dim(c);
if not missing(c{i}) then if v.check(key:c{i}) ne 0 then do;
v.add(key:c{i},data:i);
j = j + 1;
if i ne j then
c(j) = c(i);
end;
end;
do j = j+1 to dim(c);
call missing(c{j});
end;
v.clear();
run;

Related

Standard SQL - Not able to loop over main loop

When using nested LOOP or WHILE loop in bigquery, it seems I am not able to iterate again over the outer loop. The problem can be reproduced with the code below.
DECLARE i int64 DEFAULT 0;
DECLARE j int64 DEFAULT 0;
DECLARE k int64 DEFAULT 0;
WHILE i < 3 DO
SET i = i + 1;
WHILE j < 2 DO
SET j = j + 1;
IF j = 2 THEN
SET k = k+7;
END IF;
EXECUTE IMMEDIATE """
WITH test AS(SELECT #i2 AS i, #j2 AS j, #k2 AS k)
SELECT * FROM test
"""
USING i AS i2, j AS j2, k AS k2;
END WHILE;
END WHILE;
As output of this, Bigquery give me back two iterations (the inner loop):
Row
i
j
k
1
1
1
0
Row
i
j
k
1
1
2
7
I would expect that, when we end the inner while loop, we would go to the outer one and start over.
Ending up in something like:
Row
i
j
k
1
2
2
7
What is the right way to do this? When using the same set-up but with a LOOP and BREAK condition, the results are exactly the same as explained as above. when using CONTINUE instead of BREAK my query runs forever / keeps hanging at the second statement
I would expect that, when we end the inner while loop, we would go to the outer one and start over
It actually performs exactly as you expected - to check this - run below with extra line so you will see the proof
DECLARE i int64 DEFAULT 0;
DECLARE j int64 DEFAULT 0;
DECLARE k int64 DEFAULT 0;
WHILE i < 3 DO
SET i = i + 1;
SELECT i; # insert this line to check correctness
WHILE j < 2 DO
SET j = j + 1;
IF j = 2 THEN
SET k = k+7;
END IF;
EXECUTE IMMEDIATE """
WITH test AS(SELECT #i2 AS i, #j2 AS j, #k2 AS k)
SELECT * FROM test
"""
USING i AS i2, j AS j2, k AS k2;
END WHILE;
END WHILE;
So, obviously for i = 2 - WHILE j < 2 DO evaluated as false and thus skipped
What is the right way to do this?
It depends on what you are trying to achieve - but usually this is done by resetting j inside first loop as in below example
DECLARE i int64 DEFAULT 0;
DECLARE j int64 DEFAULT 0;
DECLARE k int64 DEFAULT 0;
WHILE i < 3 DO
SET i = i + 1;
SET j = 0; # reset j
WHILE j < 2 DO
SET j = j + 1;
IF j = 2 THEN
SET k = k+7;
END IF;
EXECUTE IMMEDIATE """
WITH test AS(SELECT #i2 AS i, #j2 AS j, #k2 AS k)
SELECT * FROM test
"""
USING i AS i2, j AS j2, k AS k2;
END WHILE;
END WHILE;

SQL SERVER -- Finding all Values in column A based on Column B

Column A Column B Column C
Row1 1,3,4 4 0
Row2 2,5,6 6 0
Row3 1,2,3 3 1
So, what I want to do is find the easiest way to check when column A, which is supposed to have the values of 1,2,3, and 4, by using column B, and return a 1 if Column A has all of the numbers into a new column C.
So for row 1, and row 2, Column C would be 0, because neither of them are storing all of the values that they are supposed to have. (Row 1 was supposed to have1,2,3,4, and row2 was supposed to have 1,2,3,4,5,6). Row 3 column C would have a 1, because it has all the values it is supposed to have. I can't just count the number of values either, because sometimes there are repeated values in column A.
I am trying to code this in a way that isn't too long, because I have to do this up to Column B = 100.
Thank you all!
Coming from C/C++/Java type languages, the simple approach is for each row look at each value - something like this (not in any particular language)
for (int i = 0; i < NUM_ROWS; i++)
{
int bV = columns['B'][i]
list<int> aV = columns['A'][i]
if (aV.length != bV) {
// If A doesn't have B items in it then
// they can't possibly match.
columns['C'][i] = 0;
}
else {
// have to inspect each element
int ok = 1
for(int j = 1; j <= bV; j++) {
if (aV[j] != j) {
ok = 0
break
}
}
columns['C'][i] = ok
}
}
If your language does list compares and the max allowed in column B isn't too high then you could predefine a list for each value of B and then just compare all the values of A
if (columns['A'][i] == expectedLists[colB]) {
columns['C'][i]= 1
}
else {
columns['C'][i]= 0
}
You could also do something similar with string representations of the list

Vectorizing a code that requires to complement some elements of a binary array

I have a matrix A of dimension m-by-n composed of zeros and ones, and a matrix J of dimension m-by-1 reporting some integers from [1,...,n].
I want to construct a matrix B of dimension m-by-n such that for i = 1,...,m
B(i,j) = A(i,j) for j=1,...,n-1
B(i,n) = abs(A(i,n)-1)
If sum(B(i,:)) is odd then B(i,J(i)) = abs(B(i,J(i))-1)
This code does what I want:
m = 4;
n = 5;
A = [1 1 1 1 1; ...
0 0 1 0 0; ...
1 0 1 0 1; ...
0 1 0 0 1];
J = [1;2;1;4];
B = zeros(m,n);
for i = 1:m
B(i,n) = abs(A(i,n)-1);
for j = 1:n-1
B(i,j) = A(i,j);
end
if mod(sum(B(i,:)),2)~=0
B(i,J(i)) = abs(B(i,J(i))-1);
end
end
Can you suggest more efficient algorithms, that do not use the nested loop?
No for loops are required for your question. It just needs an effective use of the colon operator and logical-indexing as follows:
% First initialize B to all zeros
B = zeros(size(A));
% Assign all but last columns of A to B
B(:, 1:end-1) = A(:, 1:end-1);
% Assign the last column of B based on the last column of A
B(:, end) = abs(A(:, end) - 1);
% Set all cells to required value
% Original code which does not work: B(oddRow, J(oddRow)) = abs(B(oddRow, J(oddRow)) - 1);
% Correct code:
% Find all rows in B with an odd sum
oddRow = find(mod(sum(B, 2), 2) ~= 0);
for ii = 1:numel(oddRow)
B(oddRow(ii), J(oddRow(ii))) = abs(B(oddRow(ii), J(oddRow(ii))) - 1);
end
I guess for the last part it is best to use a for loop.
Edit: See the neat trick by EBH to do the last part without a for loop
Just to add to #ammportal good answer, also the last part can be done without a loop with the use of linear indices. For that, sub2ind is useful. So adopting the last part of the previous answer, this can be done:
% Find all rows in B with an odd sum
oddRow = find(mod(sum(B, 2), 2) ~= 0);
% convert the locations to linear indices
ind = sub2ind(size(B),oddRow,J(oddRow));
B(ind) = abs(B(ind)- 1);

Count items in one cell array in another cell array matlab

I have 2 cell arrays which are "celldata" and "data" . Both of them store strings inside. Now I would like to check each element in "celldata" whether in "data" or not? For example, celldata = {'AB'; 'BE'; 'BC'} and data={'ABCD' 'BCDE' 'ACBE' 'ADEBC '}. I would like the expected output will be s=3 and v= 1 for AB, s=2 and v=2 for BE, s=2 and v=2 for BC, because I just need to count the sequence of the string in 'celldata'
The code I wrote is shown below. Any help would be certainly appreciated.
My code:
s=0; support counter
v=0; violate counter
SV=[]; % array to store the support
VV=[]; % array to store the violate
pairs = ['AB'; 'BE'; 'BC']
%celldata = cellstr(pairs)
celldata = {'AB'; 'BE'; 'BC'}
data={'ABCD' 'BCDE' 'ACBE' 'ADEBC '} % 3 AB, 2 BE, 2 BC
for jj=1:length(data)
for kk=1:length(celldata)
res = regexp( data(jj),celldata(kk) )
m = cell2mat(res);
e=isempty(m) % check res array is empty or not
if e == 0
s = s + 1;
SV(jj)=s;
v=v;
else
s=s;
v= v+1;
VV(jj)=v;
end
end
end
If I am understanding your variables correctly, s is the number of cells which the substring AB, AE and, BC does not appear and v is the number of times it does. If this is accurate then
v = cellfun(#(x) length(cell2mat(strfind(data, x))), celldata);
s = numel(data) - v;
gives
v = [1;1;3];
s = [3;3;1];

Finding same value in rows & columns of a 2D array

Hi guys I want to solve sodoku puzzles in matlab. My problem is that I should find same value in every row and every column and every 3*3 sub array.
Our 2d array is 9*9 and populated with value 1-9 randomly.
I wrote this for finding same value in rows, but I don't know how I should do it for columns and 3*3 sub arrays.
conflict_row = 0;
for i=1:9
temp = 0;
for j=1:9
if (temp==A(i,j))
conflict_row = conflict_row+1;
end
temp = A(i,j);
end
end
Sorry I'm a newbie.
Find values that are present in all columns:
v = find(all(any(bsxfun(#eq, A, permute(1:size(A,1), [3 1 2])),1),2));
Find values that are present in all rows:
v = find(all(any(bsxfun(#eq, A, permute(1:size(A,2), [3 1 2])),2),1));
Find values that are present in all 3x3 blocks: reshape the matrix as in this answer by A. Donda to transform each block into a 3D-slice; then reshape each block into a column; and apply 1:
m = 3; %// columns per block
n = 3; %// rows per block
B = permute(reshape(permute(reshape(A, size(A, 1), n, []), [2 1 3]), n, m, []), [2 1 3]);
B = reshape(B,m*n,[]);
v = find(all(any(bsxfun(#eq, B, permute(1:size(B,1), [3 1 2])),1),2));
Probably not the fastest solution but why don't you make a function of it and use it once for rows and once for columns
[conflict_row ] = get_conflict(A)
for i=1:9
temp = 0;
for j=1:9
if (temp==A(i,j))
conflict_row = conflict_row+1;
end
temp = A(i,j);
end
end
And then you call it twice
conflict_row = get_conflict(A); % Rows
Transpose A to get the columns
Convert the columns to rows and use the same code as before
conflict_col = get_conflict(A.');
If you want to work within the same column then you should do something like this (also sorry this is in C# I don't know what language you are working in):
int currentCol = 0;
foreach (var item in myMultiArray)
{
int currentColValue = item[currentCol];
}
This works because myArray is a array of arrays thus to select a specific column can easily be picked out by just allowing the foreach to perform your row iteration, and you just have to select the column you need with the currentCol value.

Resources