Creating Dataset with random Values in SAS - loops

I want to create a random dataset. Something like this-
ptno visits sex race
1 1 1 0
1 2 1 0
1 3 1 0
2 1 2 1
2 2 2 1
2 3 2 1
3 1 1 0
3 2 1 0
3 3 1 0
The values should be randomly generated. I want to know if I can do this dynamically using do loops. Thanks in advance for helping.

data want ;
length ptno visits sex race 8. ;
do ptno = 1 to 100 ;
_visits = ceil(ranuni(0)*5) ; /* between 1 & 5 */
sex = ceil(ranuni(0)*2) ; /* between 1 & 2 */
race = floor(ranuni(0)*2) ; /* between 0 & 1 */
do visits = 1 to _visits ;
output ;
end ;
end ;
drop _visits ;
run ;

SAS call ranuni() produce a random variate from a uniform distribution, if value is greater than 0.5 then 1, otherwise 0. Here, the same ptno (i) + seed get the same sex or race.
data want;
do i=100 to 110;
do j=1 to 5;
seed1=i+4567;
call ranuni(seed1,x);
seed2=i+1234;
call ranuni(seed2,y);
ptno=i;
visit=j;
sex=(x>0.5)+1;
race=(y<0.5);
output;
end;
end;
keep ptno--race;
run;

Related

Matlab Fill zeros matric based on array

So i have this data
F =
1
1
2
3
1
2
1
1
and zeros matric
NM =
0 0 0
0 0 0
0 0 0
i have rules, from the lis of array make connection for each variabel, from the F data the connection should be
1&1, 1&2, 2&3, 3&1, 1&2, 2&1, 1&1
each connection represent column and row value on NM matric, and if there is connection the value must be +1
so from the connection above the new matric should be
NNM=
2 2 0
1 0 1
1 0 0
im trying to code like this
[G H]=size(NM)
for i=1:G
j=2:G
if F(i)==A(j)
(NM(i,j))+1
else
NM(i,j)=0
end
end
NNM=NM
but there is no change from the NM matric?
what shoul i do?
Is this what you are trying to do
F = [1 1 2 3 1 2 1 1];
NM = zeros(3, 3);
for i=1:(numel(F)-1)
NM(F(i), F(i+1))=NM(F(i), F(i+1))+1;
end
You can use sparse (and then convert to full) as follows:
NM = full(sparse(F(1:end-1), F(2:end), 1));
list = [1 1 ; 1 2 ; 2 3 ; 3 1 ; 1 2 ; 2 1 ; 1 1 ] ;
[nx,ny] = size(list) ;
NM = zeros(3) ;
for i = 1:nx
for j = 1:ny
NM(list(i,1),list(i,2)) = NM(list(i,1),list(i,2)) + 1/2 ;
end
end

SAS - Find and print first non-zero value from a dataset in columns

I have a data set with ID in rows and months in columns, as the one shown below.
I want to create an auxiliary column that records the first value that is not zero of each line.
ID M1 M2 M3 M4 M5 Auxiliary column
1 0 0 8 8 7 8
2 7 7 7 . . 7
3 0 0 0 0 9 9
4 0 9 9 9 8 9
5 1 1 1 1 1 1
6 0 2 2 1 1 2
Currently l am using this code, but I haven't been able to get the results I am looking for. Any ideas?
data new_ops04;
set new_ops03;
array MONTHS (24) M1-M24;
RETAIN AUXILIARY_COLUMN 0;
do i=1 to 24;
IF MONTHS(i) ne 0 and AUXILIARY_COLUMN = 0 THEN
AUXILIARY_COLUMN = MONTHS(i);
end;
drop i;
run;
Thanks a lot!
You're very close. Just drop the retain statement:
data new_ops04;
set new_ops03;
array MONTHS (24) M1-M24;
AUXILIARY_COLUMN = 0;
do i=1 to 24;
IF MONTHS(i) ne 0 and AUXILIARY_COLUMN = 0 THEN
AUXILIARY_COLUMN = MONTHS(i);
end;
drop i;
run;
you need to consider what happens if the first observation(s) are missing
I would do this use case in proc sql. But your problem is that you are not stopping when you reach the first value. So:
flag = 0;
do i=1 to 24 until (flag)
if MONTHS(i) ne 0 and AUXILIARY_COLUMN = 0 THEN
AUXILIARY_COLUMN = MONTHS(i);
flag = 1;
end;
drop i, flag;

Retain the value of a variable within a group in SAS

I want to create a variable Var2 that is equal to 1 starting at the first observation Var1 is equal to 1 and Var2 is equal to 1 until the end of the by group defined by ID.
Here is the minimal working example:
ID Year Var1
1 1 .
1 2 0
1 3 .
1 4 1
1 5 .
And I want to create the following output:
ID Year Var1 Var2
1 1 . .
1 2 0 0
1 3 . 0
1 4 1 1
1 5 . 1
My current code is as follows:
DATA data1;
SET data0;
BY ID YEAR ;
IF LAST.ID THEN END = _N_;
IF Var1 > 0 THEN CNT=_N_;
RUN;
DATA data2;
SET data1;
BY ID YEAR ;
Var2 = 0;
IF Var1 = 1 THEN DO;
DO I = CNT TO END;
Var[I] = 1;
END;
END;
RUN;
However, SAS does not loop along observations.
I'm not sure what your example is doing, but this is fairly straightforward.
data want;
set have;
by id;
retain var2;
if first.id then var2=0;
if var1=1 then var2=1;
run;
Retain var2 to keep its value across observations, and then set it to 1 when you see a 1 in var1; finally, set it to 0 when you see a first.id row.

Shuffle, then find and replace duplicates in two dimensional array - without sorting

I'm looking for efficient algorithm (or any at all..) for this tricky thing. I'll simplify my problem. In my application, this array is about 10000 times bigger :)
I have an 2D array like this:
0 2 1 3 4
1 2 0 4 3
0 2 1 3 4
4 1 2 3 0
Yes, in every row there are values range from 0 to 4 but in different order. The order matters! I can't just sort it and solve this in easy way :)
Then, I shuffle it by choosing a random indexes and swapping them - couple of times. Example result:
0 1 1 1 4
1 2 2 4 3
0 2 3 3 4
4 2 0 3 0
I see duplicates in the rows, that's not good.. Algorithm should find this duplicates and replace them with a value that will not be another duplicate in particular row, for example:
0 1 2 3 4
1 2 0 4 3
0 2 3 1 4
4 2 0 3 1
Can you share your ideas? Maybe there is already very famous algorithm for this problem? I'd be grateful for any hint.
EDIT
Clarification for T_G: After the shuffle, particular row can't exchange values with another rows. It need to find duplicates and replace it with available (any) value left - which is not another duplicate.
After shuffling:
0 1 1 1 4
1 2 2 4 3
0 2 3 3 4
4 2 0 3 0
Steps:
I have 0; I don't see another zeros. Next.
I have 1; I see another 1; I should change it (the second one); there is no 2 in this row, so lets change this duplicate 1 to 2.
I have 1; I see another 1. I should change it (the second one); there is no 3 in this row, so lets change this duplicate 1 to 3. etc...
So if you input this row:
0 0 0 0 0 0 0 0 0
You should get:
0 1 2 3 4 5 6 7 8
Try something like this:
// Iterate matrix lines, line by line
for(uint32_t line_no = 0; line_no < max_line_num; line_no++) {
// counters for each symbol 0-4; index is symbol, val is counter
uint8_t counters[6];
// Clear counters before usage
memset(0, counters, sizeof(counters));
// Compute counters
for(int i = 0; i < 6; i++)
counters[matrix[line_no][i]]++;
// Index of maybe unused symbol; by default is 4
int j = 4;
// Iterate line in reversed order
for(int i = 4; i >= 0; i--)
if(counters[matrix[line_no][i]] > 1) { // found dup
while(counters[j] != 0) // find unused symbol "j"
j--;
counters[matrix[line_no][i]]--; // Decrease dup counter
matrix[line_no][i] = j; // substitute dup to symbol j
counters[j]++; // this symbol j is used
} // for + if
} // for lines

Counting the occurance of a unique number in an array - MATLAB

I have an array that looks something like...
1 0 0 1 2 2 1 1 2 1 0
2 1 0 0 0 1 1 0 0 2 1
1 2 2 1 1 1 2 0 0 1 0
0 0 0 1 2 1 1 2 0 1 2
however my real array is (50x50).
I am relatively new to MATLAB and need to be able to count the amount of unique values in each row and column, for example there is four '1's in row-2 and three '0's in column-3. I need to be able to do this with my real array.
It would help even more if these quantities of unique values were in arrays of their own also.
PLEASE use simple language, or else i will get lost, for example if representing an array, don't call it x, but perhaps column_occurances_array... for me please :)
What I would do is iterate over each row of your matrix and calculate a histogram of occurrences for each row. Use histc to calculate the occurrences of each row. The thing that is nice about histc is that you are able to specify where the bins are to start accumulating. These correspond to the unique entries for each row of your matrix. As such, use unique to compute these unique entries.
Now, I would use arrayfun to iterate over all of your rows in your matrix, and this will produce a cell array. Each element in this cell array will give you the counts for each unique value for each row. Therefore, assuming your matrix of values is stored in A, you would simply do:
vals = arrayfun(#(x) [unique(A(x,:)); histc(A(x,:), unique(A(x,:)))], 1:size(A,1), 'uni', 0);
Now, if we want to display all of our counts, use celldisp. Using your example, and with the above code combined with celldisp, this is what I get:
vals{1} =
0 1 2
3 5 3
vals{2} =
0 1 2
5 4 2
vals{3} =
0 1 2
3 5 3
vals{4} =
0 1 2
4 4 3
What the above display is saying is that for the first row, you have 3 zeros, 5 ones and 3 twos. The second row has 5 zeros, 4 ones and 2 twos and so on. These are just for the rows. If you want to do these for columns, you have to modify your code slightly to operate along columns:
vals = arrayfun(#(x) [unique(A(:,x)) histc(A(:,x), unique(A(:,x)))].', 1:size(A,2), 'uni', 0);
By using celldisp, this is what we get:
vals{1} =
0 1 2
1 2 1
vals{2} =
0 1 2
2 1 1
vals{3} =
0 2
3 1
vals{4} =
0 1
1 3
vals{5} =
0 1 2
1 1 2
vals{6} =
1 2
3 1
vals{7} =
1 2
3 1
vals{8} =
0 1 2
2 1 1
vals{9} =
0 2
3 1
vals{10} =
1 2
3 1
vals{11} =
0 1 2
2 1 1
This means that in the first column, we see 1 zero, 2 ones and 1 two, etc. etc.
I absolutely agree with rayryeng! However, here is some code which might be easier to understand for you as a beginner. It is without cell arrays or arrayfuns and quite self-explanatory:
%% initialize your array randomly for demonstration:
numRows = 50;
numCols = 50;
yourArray = round(10*rand(numRows,numCols));
%% do some stuff of what you are asking for
% find all occuring numbers in yourArray
occVals = unique(yourArray(:));
% now you could sort them just for convinience
occVals = sort(occVals);
% now we could create a matrix occMat_row of dimension |occVals| x numRows
% where occMat_row(i,j) represents how often the ith value occurs in the
% jth row, analoguesly occMat_col:
occMat_row = zeros(length(occVals),numRows);
occMat_col = zeros(length(occVals),numCols);
for k = 1:length(occVals)
occMat_row(k,:) = sum(yourArray == occVals(k),2)';
occMat_col(k,:) = sum(yourArray == occVals(k),1);
end

Resources