In a data step in which I have defined an array I can use the sum function, but the count function doesn't work. How can I count the number of values that are not zero within an array?
SUM_ARRAY = sum(of A1-A20); - Works
COUNT_ARRAY = count(of A1-A20); Yields the following error: "The COUNT function call has too many arguments"
The correct function instead of COUNT is either N, DIM, or HBOUND.
Unfortunately none will count specific values, only exclude missing values.
Looping through results is one way to count non 0s.
Array _a(*) a1-a20;
Count0 = 0;
Do I = 1 to dim(_a);
If _a (I) ne 0 then count0 = count0 + 1;
End;
COUNT can be coerced to do this, if your data is agreeable. I'm not sure it's better than the loop operation timewise or structurally, but it's at least an interesting solution.
Basically we delimit the data by ; including starting and ending it with the delimiter, then count the number of ;0;, and subtract from the total.
data _null_;
call streaminit(7);
array a[20];
do _i = 1 to 20;
a[_i] = max(0,rand('Uniform')-0.3);
end;
put a[*]=;
nonzero = dim(a) - count(';'||catx(';',of a[*])||';',';0;');
put nonzero=;
run;
Related
How can I delete all-zero pages from a 3D matrix in a loop?
I have come up with the following code, though it is not 'entirely' correct, if at all. I am using MATLAB 2019b.
%pseudo data
x = zeros(3,2,2);
y = ones(3,2,2);
positions = 2:4;
y(positions) = 0;
xy = cat(3,x,y); %this is a 3x2x4 array; (:,:,1) and (:,:,2) are all zeros,
% (:,:,3) is ones and zeros, and (:,:,4) is all ones
%my aim is to delete the arrays that are entirely zeros i.e. xy(:,:,1) and xy(:,:,2),
%and this is what I have come up with; it doesn't delete the arrays but instead,
%all the ones.
for ii = 1:size(xy,3)
for idx = find(xy(:,:,ii) == 0)
xy(:,:,ii) = strcmp(xy, []);
end
end
Use any to find indices of the slices with at least one non-zero value. Use these indices to extract the required result.
idx = any(any(xy)); % idx = any(xy,[1 2]); for >=R2018b
xy = xy(:,:,idx);
I am unsure what you'd expect your code to do, especially given you're comparing strings in all-numerical arrays. Here's a piece of code which does what you desire:
x = zeros(3,2,2);
y = ones(3,2,2);
positions = 2:4;
y(positions) = 0;
xy = cat(3,x,y);
idx = ones(size(xy,3),1,'logical'); % initialise catching array
for ii = 1:size(xy,3)
if sum(nnz(xy(:,:,ii)),'all')==0 % If the third dimension is all zeros
idx(ii)= false; % exclude it
end
end
xy = xy(:,:,idx); % reindex to get rid of all-zero pages
The trick here is that sum(xy(:,:,ii),'all')==0 is zero iff all elements on the given page (third dimension) are zero. In that case, exclude it from idx. Then, in the last row, simply re-index using logical indexing to retain only pages whit at least one non-zero element.
You can do it even faster, without a loop, using sum(a,[1 2]), i.e. the vectorial-dimension sum:
idx = sum(nnz(xy),[1 2])~=0;
xy = xy(:,:,idx);
I have an array sorted in ascended order. I want to replace the biggest m numbers found even positions in the array with 0.
My algorithm that I thought looks like this:
k=1;
for j=n:1 %%starting from last position to first
if(rem(j,2)==0 && (k<=m)) %%checking if the position is even & not getting over m numbers
B(j) = 0;
k = k + 1;
end
end
Can anyone point out why it is not working? Thank you!
A bit more complex
even = (n-rem(n,2)) : -2 : 1; % even indices in descending order
B( even(1:m) ) = 0; % set to zero
Note how n-rem(n,2) ensures that we start from the last even index into B.
PS,
It is best not to use j as a variable name in Matlab.
I believe this should do the trick. This works for vectors with both odd and even number of elements.
n = numel(B);
B((n-mod(n,2)):-2:(n-mod(n,2)-2*M)) = 0
or
n = mod(numel(B),2);
B((end-n:-2:end-n-2*M)) = 0
I prefer Shai's solution, but if your vector is huge, and M is relatively small, I would go with this approach, as it avoids creating a vector of length numel(B)/2
I have an array of variables and an array of flags, both of length 77. For every observation, the array of flags is made up of consecutive 0's, followed by consecutive 1's (ie, after a flag is a 1, all flags at a later index is a one). I am trying to calculate the mean/std/min/max of the array of variables where its corresponding flag is a 0. This is my macro:
%macro meanof_precancel(input, meanstat);
j = 77;
do i = 1 to 77;
if cancelled_{i} = 1 then do;
j = i - 1;
call symputx('lastactive', j);
leave;
end;
end;
if j = 0 then &meanstat = 0;
else &meanstat = mean(of &input.1-&input.&lastactive);
%mend;
I am having difficulty finding out how to resolve the line:
else &meanstat = mean(of &input.1-&input.&lastactive);
Does anybody have a strategy to resolve it to something like the following, for j = 33:
else mean_stats = mean(of total_1-total_33);
Thanks in advance.
I used another approach in the end, although it requires a creation of 77 new variables. I created a new array which sets each value to missing whenever its corresponding flag is one, and took the mean of this new array. For those interested :
%macro meanof_precancel(input, meanstat);
array &input.temp{77};
do i = 1 to 77;
if not cancelled_{i} then
&input.temp{i} = &input{i};
else &input.temp{i} = .;
end;
&meanstat = mean(of &input.temp{*});
%mend;
as you figured out you can only sum the values of the entire array, and the reason you were having issues with:
else &meanstat = mean(of &input.1-&input.&lastactive);
is because the call symputx earlier in the macro isn't executed until the datastep has finished.
here is the corresponding SAS documentation:
Problem Trying to Reference a SYMPUT-Assigned Value Before It Is
Available
One of the most common problems in using SYMPUT is trying to reference
a macro variable value assigned by SYMPUT before that variable is
created. The failure generally occurs because the statement
referencing the macro variable compiles before execution of the CALL
SYMPUT statement that assigns the variable's value. The most important
fact to remember in using SYMPUT is that it assigns the value of the
macro variable during program execution, but macro variable references
resolve during the compilation of a step, a global statement used
outside a step, or an SCL program.
As a result: • You cannot use a
macro variable reference to retrieve the value of a macro variable in
the same program (or step) in which SYMPUT creates that macro variable
and assigns it a value.
http://support.sas.com/documentation/cdl/en/mcrolref/61885/HTML/default/viewer.htm#a000210266.htm
this would be destructive of the original data so i would be careful, but it will allow for calculation of std/mean/min/max etc from the original array.
%macro precancel_stat(input, statvar, stat);
j = 77;
do i = 1 to 77;
if cancelled_{i} = 1 then do;
j = i - 1;
do k=i to 77;
&input.{k}=.;
end;
i=77;
end;
end;
if j = 0 then &statvar = 0;
else &statvar = &stat.(of &input.{*});
%mend;
/* test datasets*/
data test;
array sum_me{77} sum1 - sum77;
array cancelled_{77} cancelled1 - cancelled77;
do k=1 to 10;
do i =1 to 77;
sum_me{i}=i;
if i lt 33+k then cancelled_{i}=0; else cancelled_{i}=1;
end;
output;
end;
run;
/* test the macro call*/
data testit ;
set test (drop= i k );
array sum_me{77} sum1 - sum77;
array cancelled_{77} cancelled1 - cancelled77;
%precancel_stat(sum_me,meanstat,mean);
%precancel_stat(sum_me,StDev,STD);
%precancel_stat(sum_me,MinVal,Min);
%precancel_stat(sum_me,MarVal,Max);
%precancel_stat(sum_me,SumVal,sum);
run;
proc print data=testit;
run;
You can't use call symput that way, because the timing is wrong; you need to know the value of &lastactive. during compilation, but you don't actually know it until the data has been looked at.
You can certainly do this with a helper array. I would use a temporary array for this purpose, if you're going to do it that way (array &input.temp[77] _temporary_;) as it won't be written out uselessly to the final dataset and resides only in memory.
Honestly, you might be best off just having two variables, the mean-variable and a counter (your j is that already, basically). Instead of putting it in the temporary array, just
meanvar=meanvar+input[i];
And then at the end of the loop
meanvar=meanvar/j;
That seems more efficient.
I want to test a function func(par1,par2,par3) with all combinations of the parameters par1, par2 and par3 and store the output in a .mat file. My code looks like this at the moment:
n1 = 3;
n2 = 1;
n3 = 2;
parList1 = rand(1,n1); % n1,n2,n3 is just some integer
parList2 = rand(1,n2); % the lists are edited by hand in the actual script
parList3 = rand(1,n3);
saveFile = matfile('file.mat','Writable',true);
% allocate memory
saveFile.output = NaN(numel(parList1),numel(parList2),numel(parList3));
counter1 = 0;
for par1 = parList1
counter1 = counter1 + 1;
counter2 = 0; % reset inner counter
for par2 = parList2
counter2 = counter2 + 1;
counter3 = 0; % reset inner counter
for par3 = parList3
counter3 = counter3 + 1;
saveFile.output(counter1,counter2,counter3) = sum([par1,par2,par3]);
end
end
end
This works except if parList3 has only one item, i.e. if n3 = 1. Then the saveFile.output has singleton dimensions and I get the error
Variable 'output' has 2 dimensions in the file, this does not match the 3 dimensions in the indexing subscripts.
Is there a elegant way to fix this?
The expression in the for statement needs to be a row array, not a column array as in your example. The loops will exit after the first value with your code. Set a breakpoint on the saveFile.output command to see what I mean. With a column array, par1 will not be a scalar as desired, but the whole parList1 column. With a row array, par1 will iterate through each value of parList1 as intended
Another thing is that you need to reset your inner counters (counter2 and counter2) or your second and third dimensions will blow up larger than you expected.
The n3=1 problem is expected behavior because matfile defines the variables with fixed number of dimensions and it will treat saveFile.output as 2D. Once you have fixed those issues, you can solve the n3=1 problem by changing the line,
saveFile.output(counter1,counter2,counter3) = sum([par1,par2,par3]);
to
if n3==1, saveFile.output(counter1,counter2) = sum([par1,par2,par3]);
else saveFile.output(counter1,counter2,counter3) = sum([par1,par2,par3]);
end
By now I realized that actually in matfiles all the singleton dimensions, except for the first two are removed.
In my actual programm I decided to save the data in the file linearly and circumvent matfile's lack of linear indexing capability by using the functions sub2ind and ind2sub.
I want to merge 2 arrays into 1. For example:
A1= 1,1
A2= 2,2
then A3 = 1,2,1,2
For example:
A1= 1
A2= 2,2,2,2
then A3 = 1,2,2,2,2
For example:
A1= 1,1,1,1
A2= 2,2
then A3 = 1,2,1,2,1,1
In last example, when I ran my code, I got 1,2,1,2,1,20.
In the second last, I got 1,2,32767,2,2.
So I guess I have a wrong code. Right after the I finished taking the element of the shorter array and fill up all the rest of the A3 with whoever is longer. But I couldn't figure out why — can you help me?
code:
int *p3=arr3; //arr3 is A3 for example, arr1 = A1..etc, all sizes are defined
int index;
int index1=0;
int index2=0;
for(index = 0; index< sizeofArr3 ; index++)
{
if(index%2==0)
{
if(index1<=sizeofArr1)
*(p3++) = arr1[index1++];
else
*(p3++) = arr2[index2++];
}
else
{
if(index2<=sizeofArr2)
*(p3++) = arr2[index2++];
else
*(p3++) = arr1[index1++];
}
}
It's this line:
if (index1 <= sizeofArr1)
and the equivalent one for index2 and sizeofArr2. You should be using < rather than <=.
The reason has to do with C's zero-based arrays. For an array of size N, the element indexes are 0 through N-1 inclusive. Because you're allowing it to access element N (the N+1th element), you're actually invoking undefined behaviour.
Theoretically, the implementation can do anything in that case, up to and including destruction of the universe. I guess you're lucky that it just decided to give you results that were slightly awry :-)
Should <= sizeOfArr1 and 2 actually be < sizeOfArr1 and 2? How are you calculating your sizes?
The tests in the loops should be:
if (index1 < sizeofArr1)
with < rather than <=, assuming that the sizeofArr1 is a count of the number of elements in the array, rather than the maximum valid index in the array. When the arrays are the same length, this discrepancy doesn't matter (so the first sequence was OK), but when the arrays are different lengths, it does matter.