x and y lengths differ in an apply - arrays

So I'm trying to run an apply function over an array. The idea is to look at the value in the risk factor column and if this is 1, use "OnsetFunction" and if it's a zero to use the HighOnsetFunction. The would then produce a column of values which populates another column in array.
> apply(OutComes, 1, function(x) { if(x["Risk_Factor"] == 1)
> + {OnsetFunction()}
> + else{ HighOnsetFunction()}})
I'm having trouble with the apply function above and keep getting this message.
>Error in xy.coords(x, y) : 'x' and 'y' lengths differ
There are only five rows in the array at the moment as I'm trying to make sure the code works on a small group before I extend it to be many people, but I'm not sure what the x and y are. I've seen this message with graphs, but never with this before.

I think you are trying to use ifelse but using apply and an if
Try:
ifelse(OutComes$Risk_Factor==1, OnsetFunction(), HighOnsetFunction())

Related

Excel: #CALC! error (Nested Array) when using MAP functions for counting interval overlaps

I am struggling with the following formula, it works for some scenarios but not in all of them. The name input has the data set that is failing, getting an #CALC! error with the description "Nested Array":
=LET(input, {"N1",0,0;"N1",0,10;"N1",10,20},
names, INDEX(input,,1), namesUx, UNIQUE(names), dates, FILTER(input, {0,1,1}),
byRowResult, BYROW(namesUx, LAMBDA(name,
LET(set, FILTER(dates, names=name),
startDates, INDEX(set,,1), endDates, INDEX(set,,2), onePeriod, IF(ROWS(startDates)=1, TRUE, FALSE),
IF(onePeriod, IF(startDates <= IF(endDates > 0, endDates, startDates + 1),0, 1),
LET(seq, SEQUENCE(ROWS(startDates)),
mapResult, MAP(startDates, endDates, seq, LAMBDA(start,end,idx,
LET(incIdx, 1-N(ISNUMBER(XMATCH(seq,idx))),
startInc, FILTER(startDates, incIdx), endInc, FILTER(endDates, incIdx),
MAP(startInc, endInc,LAMBDA(ss,ee, N(AND(start <= ee, end >= ss))))
))),
SUM(mapResult)))
))), HSTACK(namesUx, byRowResult)
)
If we replace the input values in previous formula with the following range: A2:C4, in G1:H1 would be the expected output:
Provided also a graphical representation to visualize the intervals and their corresponding overlap. From the screenshot, we have 2 overlaps.
If we use the above formula for the same range we get the following output:
If we hover the #CALC! cell, it informs about the specific error:
Let's explain the input data and what the formula does:
Input data
First column: N1, N2, N3, represents names
Second Column: Start of the interval (I am using numeric values, but in my real situation will be dates)
Third Column: End of the interval (I am using numeric values, but in my real situation will be dates)
Formula
The purpose of the formula is to identify for each unique names, how many intervals overlap. The calculation goes by each row (BYROW) of the unique names and for each pair of start-end values, counts the overlaps with respect to the other start-end values. I use FILTER to exclude the current start-end pair with the following condition: FILTER(startDates, incIdx) and I tested it works properly.
The condition to exclude the start data of the current name of the iteration of BYROW is the following:
1-N(ISNUMBER(XMATCH(seq,idx)))
and used as second input argument of the FILTER function.
The rest is just to check the overlap range condition.
I separate the logic when a name has only one interval, from the rest because the calculation is different, For a single interval I just want to check that the end date comes after start date and treat the special case of 0. This particular case I tested it works.
Testing and workarounds
I already isolated where is the issue and when it happens. The problem happens in the following call:
MAP(startInc, endInc,LAMBDA(ss,ee, N(AND(start <= ee, end >= ss))))
when startInc and endInc has more than one row. It has nothing to do with the content of the LAMBDA function. I can use:
MAP(startInc, endInc,LAMBDA(ss,ee, 1))
and still fails. The problem is with the input arrays: startInc, endInc. If I use any other array for example the following ones it doesn't works:
MAP(seq,LAMBDA(ss, 1))
Similar result using names, startDates, etc, even if I use: {1;2;3} fails. If use use idx it works, because it is not an array. Therefore the error happens with any type of array or range.
I have also tested that the input arguments are correct having the correct shape and values. For example replacing the MAP function with: TEXTJOIN(",",, startInc)&" ; " (and also with endInc) and replacing SUM with CONCAT to concatenate the result.
In terms of input data I tested the following scenarios:
{"N1",0,0;"N1",0,10} -> Works
{"N1",0,0;"N1",0,10;"N2",10,0;"N2",10,20;"N3",20,10} -> Works
{"N1",0,0;"N1",0,10;"N1",10,20} -> Error
{"N1",0,0;"N1",0,10;"N1",10,0} -> Error
{"N1",0,0;"N1",0,10;"N1",10,0;"N1",20,10} -> Error
{"N1",0,0;"N1",0,10;"N2",10,0;"N2",10,20;"N2",20,10} -> Error
The cases that work are because it goes to the MAP function an array of size 1 (number of duplicated names is less than 3)
I did some research on internet about #CALC! error, but there is no too much details about this error and it is provided only a very trivial case. I didn't find any indication in the limit of nested calls of the new arrays functions: BYROW, MAP, etc.
Conclusion, it seems that the following nested structure produce this error:
=MAP({1;2;3}, LAMBDA(n, MAP({4;5;6}, LAMBDA(s, TRUE))))
even for a trivial case like this.
On contrary the following situation works:
=MAP({1;2;3}, LAMBDA(n, REDUCE("",{4;5;6}, LAMBDA(a,s, TRUE))))
because the output of REDUCE is not an array.
Any suggestion on how to circumvent this limitation in my original formula?, Is this a real situation of an array that cannot use another array as input?, Is it a bug?
As #JosWoolley pointed out:
LAMBDA's calculation parameter should return a single value and not an
array
I haven't seen that way, or deduced it from #CALC! Nested Array error definition:
The nested array error occurs when you try to input an array formula
that contains an array. To resolve the error, try removing the second
array...For example, =MUNIT({1,2}) is asking Excel to return
a 1x1 array, and a 2x2 array, which isn't currently supported.
=MUNIT(2) would calculate as expected
so the alternative is then to remove this second MAP call. The following link gave me an idea about how to do it: Identify overlapping dates and times in Excel, therefore using SUMPRODUCT or SUM can serve the purpose.
=LET(input, {"N1",0,0;"N1",0,10;"N1",10,20},
names, INDEX(input,,1), namesUx, UNIQUE(names), dates, FILTER(input, {0,1,1}),
byRowResult, BYROW(namesUx, LAMBDA(name,
LET(set, FILTER(dates, names=name),
startDates, INDEX(set,,1), endDates, INDEX(set,,2),
onePeriod, IF(ROWS(startDates)=1, TRUE, FALSE),
IF(onePeriod, IF(startDates <= IF(endDates > 0, endDates, startDates + 1),0, 1),
LET(seq, SEQUENCE(ROWS(startDates)),
mapResult, MAP(startDates, endDates, seq, LAMBDA(start,end,idx,
LET(incIdx, 1-N(ISNUMBER(XMATCH(seq,idx))),
startInc, FILTER(startDates, incIdx), endInc, FILTER(endDates, incIdx),
SUMPRODUCT((startInc <= end) * (endInc >= start ))
))),SUM(mapResult)))/2
))), HSTACK(namesUx, byRowResult)
)
We need to divide by 2 the result, because we are counting the overlapping in both directions. A overlaps with B and vice versa.
It can be further simplified because there is no need to build the names: startInc, endInc to exclude the range itself we are checking for overlap. We can include it and subtract one overlap. This is the way to do it:
=LET(input, {"N1",0,0;"N1",0,10;"N1",10,20},
names, INDEX(input,,1), namesUx, UNIQUE(names), dates, FILTER(input, {0,1,1}),
byRowResult, BYROW(namesUx, LAMBDA(name,
LET(set, FILTER(dates, names=name),
startDates, INDEX(set,,1), endDates, INDEX(set,,2),
onePeriod, IF(ROWS(startDates)=1, TRUE, FALSE),
IF(onePeriod, IF(startDates <= IF(endDates > 0,
endDates, startDates + 1),0, 1),
SUM(MAP(startDates, endDates, LAMBDA(start,end,
SUMPRODUCT((startDates <= end) * (endDates >= start ))-1)))/2)
))), HSTACK(namesUx, byRowResult)
)
Here, the output, removing the array as input and using the corresponding range A2:C4. Providing also a graphical representations of the intervals (highlighted) and in cell G2 putting the corresponding previous formula:
Note: Since we are using SUMPRODUCT with a single input, it can be replaced with SUM.

Compute if loop SPSS

Ultimately, I want to change scores of 0 to 1, scores of 1 to 2, and scores of 2 to 3. I thought one way to do that was using +1, but I realize I could also use a more complicated if then series.
Here is what I did so far:
I used the existing variable (x) to create a new variable (y=x+1) using SPSS syntax. I only want to do this for variables with values >=0 (this was my approach to excluding cells with missing data; the range for x is 0-2).
I can create x+1, but it overwrites the existing variables.
DO REPEAT x =var_1 TO var_86.
if (x>=0) x=(x+1).
end repeat.
exe.
I tried this modification, but it doesn't work:
DO REPEAT x = var_1 TO var_86 / y = var_1a TO var_86a.
IF (x >= 0) y=x +1.
END REPEAT.
EXE.
The error message is:
DO REPEAT The form VARX TO VARY to refer to a range of variables has
been used incorrectly. When using VARX TO VARY to create new
variables, X must be an integer less than or equal to the integer Y.
(Can't use A3 TO A1.)
I tried many other configurations including vectors and loops but haven't yet figured out how to do this computation across the range of variables without overwriting the existing ones. Thanks in advance for any recommendations.
The message you are getting is because SPSS doesn't understand the form var_1a TO var_86a.
For the x to y form to work the number has to be at the end of the name, so for example varA_1 to varA_86 should work.
While you're at it, here's a simple way to go about your task:
recode var_1 TO var_86 (0=1)(1=2)(2=3) into varA_1 TO varA_86.

A comparison between the two images by Alheistogram

My project is a query about the image and I am you first comparison between the two images by each image Histogram If alike given by presents to me that the picture is similar, but the problem whenever he tells me Enter the two are not alike
A=imread('C:\Users\saba\Desktop\images\q4.jpg');%reading images as array to variable 'a'&'b'
B = imread('C:\Users\saba\Desktop\images\q1.jpg');
j=rgb2gray(A);
i=rgb2gray(B);
subplot(2,2,1);imshow(A);
subplot(2,2,2);imshow(B);
subplot(2,2,3);imshow(j);
subplot(2,2,4);imshow(i);
if histeq(j)==histeq(i)
disp('The images are same')%output display
else
disp('the images are not same')
end
In order to compare directly with the == operator the images would have to be the same images. If you are wanting to do this, you could just check if i==j, provided they are the same size.
As far as I know, there is no builtin function or toolbox which checks whether or not two images are similar. One rough method you could use is seeing how different the sums of the pixel values for each of the rows and the columns are:
maxColumnDifference = max(abs(sum(j, 1) - sum(i, 1)));
maxRowDifference = max(abs(sum(j, 2) - sum(i, 2)));
You could then have some tolerance which the sums must be within, should be a function of the size of the image. To give a standardised answer (0-255) of how different the row or the column is, just divide each of the sums by the number of pixels.
maxColumnDifference = max(abs(sum(j, 1)/size(j,1) - sum(i, 1)/size(i,1)));
maxRowDifference = max(abs(sum(j, 2)/size(j,2) - sum(i, 2)/size(i,2)));
You could then determine if they are similar with something like:
tolerance = 50;
if (maxRowDifference < tolerance) && (maxColumnDifference < tolerance)
disp('Images are similarish');
else
disp('Images are not similar enough for this poor tool to recognise');
end
Note that this is all speculation, not tested at all, and there is probably a better way of doing it.

Split array into smaller unequal-sized arrays dependend on array-column values

I'm quite new to MatLab and this problem really drives me insane:
I have a huge array of 2 column and about 31,000 rows. One of the two columns depicts a spatial coordinate on a grid the other one a dependent parameter. What I want to do is the following:
I. I need to split the array into smaller parts defined by the spatial column; let's say the spatial coordinate are ranging from 0 to 500 - I now want arrays that give me the two column values for spatial coordinate 0-10, then 10-20 and so on. This would result in 50 arrays of unequal size that cover a spatial range from 0 to 500.
II. Secondly, I would need to calculate the average values of the resulting columns of every single array so that I obtain per array one 2-dimensional point.
III. Thirdly, I could plot these points and I would be super happy.
Sadly, I'm super confused since I miserably fail at step I. - Maybe there is even an easier way than to split the giant array in so many small arrays - who knows..
I would be really really happy for any suggestion.
Thank you,
Arne
First of all, since you wish a data structure of array of different size you will need to place them in a cell array so you could try something like this:
res = arrayfun(#(x)arr(arr(:,1)==x,:), unique(arr(:,1)), 'UniformOutput', 0);
The previous code return a cell array with the array splitted according its first column with #(x)arr(arr(:,1)==x,:) you are doing a function on x and arrayfun(function, ..., 'UniformOutput', 0) applies function to each element in the following arguments (taken a single value of each argument to evaluate the function) but you must notice that arr must be numeric so if not you should map your values to numeric values or use another way to select this values.
In the same way you could do
uo = 'UniformOutput';
res = arrayfun(#(x){arr(arr(:,1)==x,:), mean(arr(arr(:,1)==x,2))), unique(arr(:,1)), uo, 0);
You will probably want to flat the returning value, check the function cat, you could do:
res = cat(1,res{:})
Plot your data depends on their format, so I can't help if i don't know how the data are, but you could try to plot inside a loop over your 'res' variable or something similar.
Step I indeed comes with some difficulties. Once these are solved, I guess steps II and III can easily be solved. Let me make some suggestions for step I:
You first define the maximum value (maxValue = 500;) and the step size (stepSize = 10;). Now it is possible to iterate through all steps and create your new vectors.
for k=1:maxValue/stepSize
...
end
As every resulting array will have different dimensions, I suggest you save the vectors in a cell array:
Y = cell(maxValue/stepSize,1);
Use the find function to find the rows of the entries for each matrix. At each step k, the range of values of interest will be (k-1)*stepSize to k*stepSize.
row = find( (k-1)*stepSize <= X(:,1) & X(:,1) < k*stepSize );
You can now create the matrix for a stepk by
Y{k,1} = X(row,:);
Putting everything together you should be able to create the cell array Y containing your matrices and continue with the other tasks. You could also save the average of each value range in a second column of the cell array Y:
Y{k,2} = mean( Y{k,1}(:,2) );
I hope this helps you with your task. Note that these are only suggestions and there may be different (maybe more appropriate) ways to handle this.

Finding whether a value is equal to the value of any array element in MATLAB

Can anyone tell me if there is a way (in MATLAB) to check whether a certain value is equal to any of the values stored within another array?
The way I intend to use it is to check whether an element index in one matrix is equal to the values stored in another array (where the stored values are the indices of the elements which meet a certain criteria).
So, if the indices of the elements which meet the criteria are stored in the matrix below:
criteriacheck = [3 5 6 8 20];
Going through the main array (called array) and checking if the index matches:
for i = 1:numel(array)
if i == 'Any value stored in criteriacheck'
%# "Do this"
end
end
Does anyone have an idea of how I might go about this?
The excellent answer previously given by #woodchips applies here as well:
Many ways to do this. ismember is the first that comes to mind, since it is a set membership action you wish to take. Thus
X = primes(20);
ismember([15 17],X)
ans =
0 1
Since 15 is not prime, but 17 is, ismember has done its job well here.
Of course, find (or any) will also work. But these are not vectorized in the sense that ismember was. We can test to see if 15 is in the set represented by X, but to test both of those numbers will take a loop, or successive tests.
~isempty(find(X == 15))
~isempty(find(X == 17))
or,
any(X == 15)
any(X == 17)
Finally, I would point out that tests for exact values are dangerous if the numbers may be true floats. Tests against integer values as I have shown are easy. But tests against floating point numbers should usually employ a tolerance.
tol = 10*eps;
any(abs(X - 3.1415926535897932384) <= tol)
you could use the find command
if (~isempty(find(criteriacheck == i)))
% do something
end
Note: Although this answer doesn't address the question in the title, it does address a more fundamental issue with how you are designing your for loop (the solution of which negates having to do what you are asking in the title). ;)
Based on the for loop you've written, your array criteriacheck appears to be a set of indices into array, and for each of these indexed elements you want to do some computation. If this is so, here's an alternative way for you to design your for loop:
for i = criteriacheck
%# Do something with array(i)
end
This will loop over all the values in criteriacheck, setting i to each subsequent value (i.e. 3, 5, 6, 8, and 20 in your example). This is more compact and efficient than looping over each element of array and checking if the index is in criteriacheck.
NOTE: As Jonas points out, you want to make sure criteriacheck is a row vector for the for loop to function properly. You can form any matrix into a row vector by following it with the (:)' syntax, which reshapes it into a column vector and then transposes it into a row vector:
for i = criteriacheck(:)'
...
The original question "Can anyone tell me if there is a way (in MATLAB) to check whether a certain value is equal to any of the values stored within another array?" can be solved without any loop.
Just use the setdiff function.
I think the INTERSECT function is what you are looking for.
C = intersect(A,B) returns the values common to both A and B. The
values of C are in sorted order.
http://www.mathworks.de/de/help/matlab/ref/intersect.html
The question if i == 'Any value stored in criteriacheck can also be answered this way if you consider i a trivial matrix. However, you are proably better off with any(i==criteriacheck)

Resources