Compute recursion for one column conditionally on values in another columns - c

I was given dataset named Temp.dat with 2 columns (Dataset here). I initially formed structure named structure data_t data[100] so that I could arrange the columns in an increasing order based on the first column (Column 0 = min(failure time, censored time), Column 1 indicates 1 = Death observation, 0 = censored observation). A portion of the structured dataset has the following form
0.064295 1
0.070548 1
0.070850 1
0.071508 0
0.077981 1
0.086628 1
0.088239 1
0.090754 1
0.093260 0
0.094090 1
0.094367 1
0.097019 1
0.099336 1
0.103765 1
0.103961 1
0.111674 0
0.122609 0
0.123730 1
Now, I want to write the C code to form different time periods whose endpoints always end with entry 1in the 2nd column. Looks like the following:
Expected output - 3rd column (Time Interval) added
0.064295 1 [0 0.064295)
0.070548 1 [0.064295 0.070548)
0.070850 1 [0.070548 0.070850)
0.071508 0 [0.070850 0.077891) ---> Skip 0.071508 here because of 0 in column 1
0.077981 1 [0.070850 0.077981)
0.086628 1 [0.077981 0.086628)
0.088239 1 [0.086628 0.088239)
0.090754 1 [0.088239 0.090754)
0.093260 0 [0.090754 0.094090)
0.094090 1 [0.090754 0.094090)
0.094367 1 [0.094090 0.094367)
0.097019 1 [0.094367 0.097019)
0.099336 1 [0.097019 0.099336)
0.103765 1 [0.099336 0.103765)
0.103961 1 [0.103765 0.103961)
0.111674 0 [0.103961 0.123730)
0.122609 0 [0.103961 0.123730)
0.123730 1 [0.103961 0.123730)
So far, I am unable to write the code to perform this. So if anyone could help on this step, I would sincerely appreciate it.
Next, I wrote up the following code to get the output shown below. Note that column 2 is not what I want, but this is the best thing so far I could get.
double array[8][MAX];
double total = 100;
for(int i = 0; i < MAX; i++) {
double start = 0;
double count = 0;
if(i) start = data[i - 1].x;
array[0][i] = data[i].x;
array[1][i] = data[i].y;
array[2][i] = start;
array[3][i] = data[i].x;
array[4][0] = count;
array[5][0] = count;
array[6][0] = total;
array[7][0] = 1;
/*keep track of number of deaths and censors at each time t_i*/
if (fmod(arr[1][i], 2.0) == 1)
{arr[4][i+1] = count + 1.0;
arr[5][i+1] = count;
}
else {arr[4][i+1] = count;
arr[5][i+1] = count + 1.0;
}
return(0);
}
Sample Output
0.064295 1 [0.060493 0.064295) 1.000000 0.000000 191.000000 0.950000
0.070548 1 [0.064295 0.070548) 1.000000 0.000000 190.000000 0.945000
0.070850 1 [0.070548 0.070850) 1.000000 0.000000 189.000000 0.940000
0.071508 0 [0.070850 0.071508) 1.000000 0.000000 188.000000 0.940000
0.077981 1 [0.071508 0.077981) 0.000000 1.000000 187.000000 0.935000
0.086628 1 [0.077981 0.086628) 1.000000 0.000000 186.000000 0.929973
0.088239 1 [0.086628 0.088239) 1.000000 0.000000 185.000000 0.924946
0.090754 1 [0.088239 0.090754) 1.000000 0.000000 184.000000 0.919919
0.093260 0 [0.090754 0.093260) 1.000000 0.000000 183.000000 0.919919
Column 7 stands for KM estimator of survival distribution function. It was computed based on the following rules:
1. If the i-th entry in column 1 is 0, simply save the corresponding i-th entry in column 6 equal to the previous (i-1)th- entry in the same column.
2. If the i-th entry in column 1 is 1 but one or multiple successive entries before it is 0 (for example, the last entry of column 1 is followed right before by two 0s), we compute the corresponding i-th entry in column 6 with the formula: (i-1)-th entry*(1- 1/(j-th entry in column 5)) where the j-th entry in column 5 corresponds to the nearest entry 1 in column 1 (for example, the last 4 rows of column 1 has 1 0 0 1 in it, which implies the last entry in column 6 would be computed as 0.890096*(1-1/177) where 177 = the first entry in column 5 which has the corresponding entry in column 1 = 1 (rather than 0).
Task left to finish: First, I need to form the right column 2 so that for a random input t in the range of column 0, the code would give the corresponding result in column 6.
Second, I want to compute the variance of KM estimator, using this formula: S(t)^2*(summation over t_i <= t) d_i/(r_i*(r_i-d_i)),
where S(t) = the KM estimator computed at time t (column 7 above), d_i is the total number of deaths up to index i (so, sum of entries up to d_i of column 5 above), r_i = i-th entry in column 6. For example, if t = 0.071, then t_i only has 3 possible values based on Column 0 (t_i would be 0.064295, 0.070548 and 0.070850). I came up with the following working code (not sure if the output was the correct ones)
N = [an integer]; #define size of array here
double sigma[N];
sigma[0] = 0;
double sum[N];
sum[0] = 0;
for(int i=1; i< N; i++){
sum[i] = sum[i-1] + (float)(arr[4][i]/(arr[6][i-1]*(arr[6][i])));
sigma[i] = pow(arr[7][i],2)*sum[i];
printf("%.0lf", sigma[i]);
}
Sample Output
0.004775
0.004750
0.004725
0.004700
0.004675
0.004700
0.004650
0.004625
0.004600
0.004575
0.004600
0.004550
0.004525
0.004500
0.004475
0.004450
0.004425
0.004450
0.004450
0.004400
0.004375
0.004350
0.004325
0.004300
0.004275
0.004250
0.004225
0.004200
0.004175
0.004149
0.004124
0.004150
0.004099
0.004074
0.004100
0.004049
0.004024
0.004051
0.003999
0.003974
0.004001
0.003949
0.003976
0.003923
0.003898
0.003926
0.003873
0.003848
0.003823
0.003797
0.003772
0.003747
0.003775
0.003722
0.003750
0.003696
0.003725
0.003671
0.003700
0.003646
0.003676
0.003621
0.003595
0.003570
0.003544
0.003519
0.003549
0.003494

This is a partial answer. First, lets declare the array as arr[MAX][8], that means you have MAX rows and 8 columns. This makes it easier to sort the data.
Next, lets create dummy data 0.100, 0.101, ... that's easier to look at it.
To find the 5th column, you can use an additional loop (for(int j = i; j < count; j++){...}) to find the next non-zero value.
We have to keep track of total dead counts (dead_count) and increment each time arr[i][1] is zero.
Kaplan-Meier formula is taken as 1 - (double)dead_count/(double)count
MCVE would look like:
#include <stdlib.h>
#include <stdio.h>
int compare_2d_array(const void *pa, const void *pb)
{
double a = *(double*)pa;
double b = *(double*)pb;
if(a > b) return 1;
if(a < b) return -1;
return 0;
}
int main(void)
{
double arr[][8] =
{
{ 0.100, 1, 0, 0, 0, 0, 0 , 0 }, //initialize columns
{ 0.101, 1 }, // we can skip adding the zeros, it's done automatically
{ 0.102, 1 },
{ 0.103, 0 },
{ 0.104, 1 },
{ 0.105, 1 },
{ 0.106, 1 },
{ 0.107, 1 },
{ 0.108, 0 },
{ 0.109, 1 },
{ 0.110, 1 },
{ 0.111, 1 },
{ 0.112, 1 },
{ 0.113, 1 },
{ 0.114, 1 },
{ 0.115, 0 },
{ 0.116, 0 },
{ 0.117, 1 },
};
int count = sizeof(arr)/sizeof(*arr);
//sort
qsort(arr, count, sizeof(arr[0]), compare_2d_array);
int dead_count = 0;
for(int i = 0; i < count; i++)
{
double start = i ? arr[i - 1][0] : 0;
double end = arr[i][0]; //<- I don't know what to use as default value!
//if arr[i][1] is zero, then end should equal the next non-zero value
double end;
for(int j = i; j < count; j++)
{
end = arr[j][0];
if(arr[j][1])
break;
}
arr[i][2] = start;
arr[i][3] = end;
arr[i][4] = arr[i][1];
arr[i][5] = !arr[i][1];
if(!arr[i][1])
dead_count++;
printf("%3d %.6lf %.0lf [%.6lf %.6lf) %.0lf %.0lf %3d %.6lf\n",
i,
arr[i][0],
arr[i][1],
start,
end,
arr[i][4],
arr[i][5],
count - i, 1 - (double)dead_count/(double)count );
}
return 0;
}
Output:
0 0.100000 1 [0.000000 0.100000) 1 0 18 1.000000
1 0.101000 1 [0.100000 0.101000) 1 0 17 1.000000
2 0.102000 1 [0.101000 0.102000) 1 0 16 1.000000
3 0.103000 0 [0.102000 0.104000) 0 1 15 0.944444
4 0.104000 1 [0.103000 0.104000) 1 0 14 0.944444
5 0.105000 1 [0.104000 0.105000) 1 0 13 0.944444
6 0.106000 1 [0.105000 0.106000) 1 0 12 0.944444
7 0.107000 1 [0.106000 0.107000) 1 0 11 0.944444
8 0.108000 0 [0.107000 0.109000) 0 1 10 0.888889
9 0.109000 1 [0.108000 0.109000) 1 0 9 0.888889
10 0.110000 1 [0.109000 0.110000) 1 0 8 0.888889
11 0.111000 1 [0.110000 0.111000) 1 0 7 0.888889
12 0.112000 1 [0.111000 0.112000) 1 0 6 0.888889
13 0.113000 1 [0.112000 0.113000) 1 0 5 0.888889
14 0.114000 1 [0.113000 0.114000) 1 0 4 0.888889
15 0.115000 0 [0.114000 0.117000) 0 1 3 0.833333
16 0.116000 0 [0.115000 0.117000) 0 1 2 0.777778
17 0.117000 1 [0.116000 0.117000) 1 0 1 0.777778

Related

Matlab array that decreases from the center

I've been trying to make a 2-dimensional array that has the largest number in the center, and numbers around it decrement by one like this:
[0 0 0 0 0 0 0;
0 1 1 1 1 1 0;
0 1 2 2 2 1 0;
0 1 2 3 2 1 0;
0 1 2 2 2 1 0;
0 1 1 1 1 1 0;
0 0 0 0 0 0 0]
Any help?
This is easy using implicit expansion:
M = 7; % desired size. Assumed to be odd
t = [0:(M-1)/2 (M-3)/2:-1:0].';
result = min(t, t.');
Alternatively, you can use the gallery function with the 'minij' option to produce one quadrant of the result, and then extend symmetrically:
M = 7; % desired size. Assumed to be odd
result = gallery('minij',(M+1)/2)-1;
result = [result result(:,end-1:-1:1)];
result = [result; result(end-1:-1:1,:)];
Another approach, using padarray from the Image Processing toolbox:
result = 0;
for k = 1:(M-1)/2;
result = padarray(result+1, [1 1]);
end

How to find the longest interval of 1's in a list [matlab]

I need to find the longest interval of 1's in a matrix, and the position of the first "1" in that interval.
For example if i have a matrix: [1 0 0 1 1 1 0 0 0 0 1 1 1 1 1 1 1 ]
I need to have both the length of 7 and that the first 1's position is 11.
Any suggestions on how to proceed would be appreciated.
Using this anwser as a basis, you can do as follows:
a = [1 0 0 1 1 1 0 0 0 0 1 1 1 1 1 1 1 ]
dsig = diff([0 a 0]);
startIndex = find(dsig > 0);
endIndex = find(dsig < 0) - 1;
duration = endIndex-startIndex+1;
duration
startIdx = startIndex(duration == max(duration))
endIdx = endIndex(duration == max(duration))
This outputs:
duration =
1 3 7
startIdx =
11
endIdx =
17
Please note, this probably needs double checking if it works for other cases than your example. Nevertheless, I think this is the way in the right directions. If not, in the linked anwser you can find more info and possibilities.
If there are multiple intervals of one of the same length, it will only give the position of the first interval.
A=round(rand(1,20)) %// test vector
[~,p2]=find(diff([0 A])==1); %// finds where a string of 1's starts
[~,p3]=find(diff([A 0])==-1); %// finds where a string of 1's ends
le=p3-p2+1; %// length of each interval of 1's
ML=max(le); %// length of longest interval
ML %// display ML
p2(le==ML) %// find where strings of maximum length begin (per Marcin's answer)
I have thought of a brute force approach;
clc; clear all; close all;
A= [1 0 0 1 1 1 0 0 0 0 1 1 1 1 1 1 1 ];
index = 1;
globalCount = 0;
count = 0;
flag = 0; %// A flag to keep if the previous encounter was 0 or 1
for i = 1 : length(A)
if A(i) == 1
count = count + 1;
if flag == 0
index = i
flag = 1;
end
end
if A(i) == 0 || i == length(A)
if count > globalCount
globalCount = count;
end
flag = 0;
count = 0;
end
end

map a matrix with another matrix

I have a question to the mapping of a matrix with another matrix which contains only 1 and 0.
Here an example of my problem: A is the matrix with doubles
A = [ 1 4 3;
2 3 4;
4 3 1;
4 5 5;
1 2 1];
B is a matrix with ones and zeros:
B = [ 0 0 0;
0 0 0;
1 1 1;
1 1 1;
0 0 0];
I want to achieve a matrix C which is the result of A mapped by B, just like that:
C = [ 0 0 0;
0 0 0;
4 3 1;
4 5 5;
0 0 0];
I tried B as a logical array and as a matrix. Both lead to the same error:
"Subscript indices must either be real positive integers or logicals."
Just multiply A and B element-wise:
C = A.*B
I like Dan's solution, but this would be another way:
C = zeros(size(A));
C(B==1) = A(B==1);

Longest subsequence with alternating increasing and decreasing values

Given an array , we need to find the length of longest sub-sequence with alternating increasing and decreasing values.
For example , if the array is ,
7 4 8 9 3 5 2 1 then the L = 6 for 7,4,8,3,5,2 or 7,4,9,3,5,1 , etc.
It could also be the case that first we have small then big element.
What could be the most efficient solution for this ? I had a DP solution in mind. And if we were to do it using brute force how would we do it (O(n^3) ?) ?
And it's not a homework problem.
You indeed can use dynamic programming approach here. For sake of simplicity , assume we need to find only the maximal length of such sequence seq (it will be easy to tweak solution to find the sequence itself).
For each index we will store 2 values:
maximal length of alternating sequence ending at that element where last step was increasing (say, incr[i])
maximal length of alternating sequence ending at that element where last step was decreasing (say, decr[i])
also by definition we assume incr[0] = decr[0] = 1
then each incr[i] can be found recursively:
incr[i] = max(decr[j])+1, where j < i and seq[j] < seq[i]
decr[i] = max(incr[j])+1, where j < i and seq[j] > seq[i]
Required length of the sequence will be the maximum value in both arrays, complexity of this approach is O(N*N) and it requires 2N of extra memory (where N is the length of initial sequence)
simple example in c:
int seq[N]; // initial sequence
int incr[N], decr[N];
... // Init sequences, fill incr and decr with 1's as initial values
for (int i = 1; i < N; ++i){
for (int j = 0; j < i; ++j){
if (seq[j] < seq[i])
{
// handle "increasing" step - need to check previous "decreasing" value
if (decr[j]+1 > incr[i]) incr[i] = decr[j] + 1;
}
if (seq[j] > seq[i])
{
if (incr[j]+1 > decr[i]) decr[i] = incr[j] + 1;
}
}
}
... // Now all arrays are filled, iterate over them and find maximum value
How algorithm will work:
step 0 (initial values):
seq = 7 4 8 9 3 5 2 1
incr = 1 1 1 1 1 1 1 1
decr = 1 1 1 1 1 1 1 1
step 1 take value at index 1 ('4') and check previous values. 7 > 4 so we make "decreasing step from index 0 to index 1, new sequence values:
incr = 1 1 1 1 1 1 1 1
decr = 1 2 1 1 1 1 1 1
step 2. take value 8 and iterate over previous value:
7 < 8, make increasing step: incr[2] = MAX(incr[2], decr[0]+1):
incr = 1 1 2 1 1 1 1 1
decr = 1 2 1 1 1 1 1 1
4 < 8, make increasing step: incr[2] = MAX(incr[2], decr[1]+1):
incr = 1 1 3 1 1 1 1 1
decr = 1 2 1 1 1 1 1 1
etc...

A question about matrix manipulation

Given a 1*N matrix or an array, how do I find the first 4 elements which have the same value and then store the index for those elements?
PS:
I'm just curious. What if we want to find the first 4 elements whose value differences are within a certain range, say below 2? For example, M=[10,15,14.5,9,15.1,8.5,15.5,9.5], the elements I'm looking for will be 15,14.5,15.1,15.5 and the indices will be 2,3,5,7.
If you want the first value present 4 times in the array 'tab' in Matlab, you can use
num_min = 4
val=NaN;
for i = tab
if sum(tab==i) >= num_min
val = i;
break
end
end
ind = find(tab==val, num_min);
By instance with
tab = [2 4 4 5 4 6 4 5 5 4 6 9 5 5]
you get
val =
4
ind =
2 3 5 7
Here is my MATLAB solution:
array = randi(5, [1 10]); %# random array of integers
n = unique(array)'; %'# unique elements
[r,~] = find(cumsum(bsxfun(#eq,array,n),2) == 4, 1, 'first');
if isempty(r)
val = []; ind = []; %# no answer
else
val = n(r); %# the value found
ind = find(array == val, 4); %# indices of elements corresponding to val
end
Example:
array =
1 5 3 3 1 5 4 2 3 3
val =
3
ind =
3 4 9 10
Explanation:
First of all, we extract the list of unique elements. In the example used above, we have:
n =
1
2
3
4
5
Then using the BSXFUN function, we compare each unique value against the entire vector array we have. This is equivalent to the following:
result = zeros(length(n),length(array));
for i=1:length(n)
result(i,:) = (array == n(i)); %# row-by-row
end
Continuing with the same example we get:
result =
1 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0
0 0 1 1 0 0 0 0 1 1
0 0 0 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0 0 0
Next we call CUMSUM on the result matrix to compute the cumulative sum along the rows. Each row will give us how many times the element in question appeared so far:
>> cumsum(result,2)
ans =
1 1 1 1 2 2 2 2 2 2
0 0 0 0 0 0 0 1 1 1
0 0 1 2 2 2 2 2 3 4
0 0 0 0 0 0 1 1 1 1
0 1 1 1 1 2 2 2 2 2
Then we compare that against four cumsum(result,2)==4 (since we want the location where an element appeared for the forth time):
>> cumsum(result,2)==4
ans =
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
Finally we call FIND to look for the first appearing 1 according to a column-wise order: if we traverse the matrix from the previous step column-by-column, then the row of the first appearing 1 indicates the index of the element we are looking for. In this case, it was the third row (r=3), thus the third element in the unique vector is the answer val = n(r). Note that if we had multiple elements repeated 4 times or more in the original array, then the one first appearing for the forth time will show up first as a 1 going column-by-column in the above expression.
Finding the indices of the corresponding answer value is a simple call to FIND...
Here is C++ code
std::map<int,std::vector<int> > dict;
std::vector<int> ans(4);//here we will store indexes
bool noanswer=true;
//my_vector is a vector, which we must analize
for(int i=0;i<my_vector.size();++i)
{
std::vector<int> &temp = dict[my_vector[i]];
temp.push_back(i);
if(temp.size()==4)//we find ans
{
std::copy(temp.begin(),temp.end(),ans.begin() );
noanswer = false;
break;
}
}
if(noanswer)
std::cout<<"No Answer!"<<std::endl;
Ignore this and use Amro's mighty solution . . .
Here is how I'd do it in Matlab. The matrix can be any size and contain any range of values and this should work. This solution will automatically find a value and then the indicies of the first 4 elements without being fed the search value a priori.
tab = [2 5 4 5 4 6 4 5 5 4 6 9 5 5]
%this is a loop to find the indicies of groups of 4 identical elements
tot = zeros(size(tab));
for nn = 1:numel(tab)
idxs=find(tab == tab(nn), 4, 'first');
if numel(idxs)<4
tot(nn) = Inf;
else
tot(nn) = sum(idxs);
end
end
%find the first 4 identical
bestTot = find(tot == min(tot), 1, 'first' );
%store the indicies you are interested in.
indiciesOfInterst = find(tab == tab(bestTot), 4, 'first')
Since I couldn't easily understand some of the solutions, I made that one:
l = 10; m = 5; array = randi(m, [1 l])
A = zeros(l,m); % m is the maximum value (may) in array
A(sub2ind([l,m],1:l,array)) = 1;
s = sum(A,1);
b = find(s(array) == 4,1);
% now in b is the index of the first element
if (~isempty(b))
find(array == array(b))
else
disp('nothing found');
end
I find this easier to visualize. It fills '1' in all places of a square matrix, where values in array exist - according to their position (row) and value (column). This is than summed up easily and mapped to the original array. Drawback: if array contains very large values, A may get relative large too.
You're PS question is more complicated. I didn't have time to check each case but the idea is here :
M=[10,15,14.5,9,15.1,8.5,15.5,9.5]
val = NaN;
num_min = 4;
delta = 2;
[Ms, iMs] = sort(M);
dMs = diff(Ms);
ind_min=Inf;
n = 0;
for i = 1:length(dMs)
if dMs(i) <= delta
n=n+1;
else
n=0;
end
if n == (num_min-1)
if (iMs(i) < ind_min)
ind_min = iMs(i);
end
end
end
ind = sort(iMs(ind_min + (0:num_min-1)))
val = M(ind)

Resources