How to compare each matrix to mean and return value in Matlab - arrays

for example lets consider
a = fix(8 * randn(10,5));
and mean(a) would give me mean of each column.
So, what I was planning to do was comparing the mean of first column to each of its content till the column and and proceed to the next column with its mean and comparing with each of its content.
I was able to get this code here (I know there are multiple for loops but thats the best I could come up with, any alternate answer would be greatly accepted)
if(ndims(a)==2)
b = mean(a);
for c = 1:size(a,2)
for d = 1:size(a)
for e = 1:size(b,2)
if(a(d,c)>b(1,c))
disp(1);
else
disp(false);
end
end
end
end
else
disp('Input should be a 2D matrix');
end
I don't know if this is the right answer? Could any one tell me?
Thanks in advance.

It seems you want to know whether each entry is greater than its column-mean.
This is done efficiently with bsxfun:
result = bsxfun(#gt, a, mean(a,1));
Example:
a =
3 1 3 2
5 2 3 1
1 3 5 2
The column-means, given by mean(a,1), are
ans =
3.000000000000000 2.000000000000000 3.666666666666667 1.666666666666667
Then
>> result = bsxfun(#gt, a, mean(a,1))
result =
0 0 0 1
1 0 0 0
0 1 1 1

If you are trying to do what I think you are (print one if the average value of a column is greater than the value in that column, zero otherwise) you can eliminate a lot of loops doing the following (using your same a and b):
for ii=1:length(b)
c(:,ii) = b(ii) > a(:,ii);
end
c will be your array of ones and zeros.

Related

A matrix and a column vector containing indices, how to iterate with no loop?

I have a big matrix (500212x7) and a column vector like below
matrix F vector P
0001011 4
0001101 3
1101100 6
0000110 1
1110000 7
The vector contains indices considered within the matrix rows. P(1) is meant to point at F(1,4), P(2) at F(2,3) and so on.
I want to negate a bit in each row in F in a column pointed by P element (in the same row).
I thought of things like
F(:,P(1)) = ~F(:,P(1));
F(:,P(:)) = ~F(:,P(:));
but of course these scenarios won't produce the result I expect as the first line won't make P element change and the second one won't even let me start the program because a full vector cannot make an index.
The idea is I need to do this for all F and P rows (changing/incrementing "simultaneously") but take the value of P element.
I know this is easily achieved with for loop but due to large dimensions of the F array such a way to solve the problem is completely unacceptable.
Is there any kind of Matlab wizardry that lets solving such a task with the use of matrix operations?
I know this is easily achieved with for loop but due to large dimensions of the F array such a way to solve the problem is completely unacceptable.
You should never make such an assumption. First implement the loop, then check to see if it really is too slow for you or not, then worry about optimizing.
Here I'm comparing Luis' answer and the trival loop:
N = 500212;
F = rand(N,7) > 0.6;
P = randi(7,N,1);
timeit(#()method1(F,P))
timeit(#()method2(F,P))
function F = method1(F,P)
ind = (1:size(F,1)) + (P(:).'-1)*size(F,1); % create linear index
F(ind) = ~F(ind); % negate those entries
end
function F = method2(F,P)
for ii = 1:numel(P)
F(ii,P(ii)) = ~F(ii,P(ii));
end
end
Timings are 0.0065 s for Luis' answer, and 0.0023 s for the loop (MATLAB Online R2019a).
It is especially true for very large arrays, that loops are faster than vectorized code, if the vectorization requires creating an intermediate array. Memory access is expensive, using more of it makes the code slower.
Lessons: don't dismiss loops, don't prematurely try to optimize, and don't optimize without comparing.
Another solution:
xor( F, 1:7 == P )
Explanation:
1:7 == P generates one-hot arrays.
xor will cause a bit to retain its value against a 0, and flip it against a 1
Not sure if it qualifies as wizardry, but linear indexing does exactly what you want:
F = [0 0 0 1 0 1 1; 0 0 0 1 1 0 1; 1 1 0 1 1 0 0; 0 0 0 0 1 1 0; 1 1 1 0 0 0 0];
P = [4; 3; 6; 1; 7];
ind = (1:size(F,1)) + (P(:).'-1)*size(F,1); % create linear index
F(ind) = ~F(ind); % negate those entries

Countif the Result of Subtracting Two Arrays Exceeds a Certain Value in Excel

I am new to array formulae and am having trouble with the following scenario:
I have the following matrix:
F G H I J ... R S T U V
1 0 0 1 1
0 1 1 1 2 3 1 2
2 0 2 3 1 2 0 1 0 0
2 1 0 0 1 0 0 3 0 0
My goal is to count the number of rows within which the difference between the sum of columns F:J and the sum of columns R:V is greater than a threshold. Critically, only rows with full data should be included: row 1 (where there are only values for columns F1:J1) and row 2 (where there are only some values for columns F2:J2) should be ignored.
If the threshold = 2.5, then the solution is 1. That is, row 3 is the only row with complete data where the difference between the sum of F3:J3 (8) and the sum of R3:V3 (3) is greater than 2.5 (e.g., 5 > 2.5).
I have tried to put together the following formula, rather pathetically, based on the teachings of #Tom Sharpe and #QHarr:
=COUNT(IF(SUBTOTAL(9,OFFSET(F1,ROW(F1:F4)-ROW(F1),0,1,COLUMNS(F1:J1)))-SUBTOTAL(9,OFFSET(R1,ROW(R1:R4)-ROW(R1),0,1,COLUMNS(R1:V1)))>2.5,IF(AND(SUBTOTAL(2,OFFSET(F1,ROW(F1:F4)-ROW(F1),0,1,COLUMNS(F1:J1)))=COLUMNS(F1:J1),SUBTOTAL(2,OFFSET(R1,ROW(R1:R4)-ROW(R1),0,1,COLUMNS(R1:V1)))=COLUMNS(R1:V1)),SUBTOTAL(9,OFFSET(F1,ROW(F1:F4)-ROW(F1),0,1,COLUMNS(F1:J1)))),IF(AND(SUBTOTAL(2,OFFSET(F1,ROW(F1:F4)-ROW(F1),0,1,COLUMNS(F1:J1)))=COLUMNS(F1:J1),SUBTOTAL(2,OFFSET(R1,ROW(R1:R4)-ROW(R1),0,1,COLUMNS(R1:V1)))=COLUMNS(R1:V1)),SUBTOTAL(9,OFFSET(R1,ROW(R1:V1)-ROW(R1),0,1,COLUMNS(R1:V1))))))
But it seems to always produce a value of 1, even if I edit the matrix such that the difference between the sum of F4:J4 and R4:v4 also exceeds 2.5. Sadly I am struggling to understand why and would appreciate any guidance on the matter.
As an array formula in one cell without volatile functions:
=SUM((MMULT(--(LEN(F2:J5)*LEN(R2:V5)>0),--TRANSPOSE(COLUMN(F2:J2)>0))=5)*(MMULT(F2:J5-R2:V5,TRANSPOSE(--(COLUMN(F2:J2)>0)))>2.5))
should do the trick :D
Maybe, in say X1 (assuming you have labelled your columns):
=COUNTIF(Y:Y,TRUE)
In Y1 whatever your chosen cutoff (eg 2.5) and in Y2:
=((COUNTBLANK(F2:J2)+COUNTBLANK(R2:V2)=0)*SUM(F2:J2)-SUM(R2:V2))>Y$1
copied down to suit.
Try this:
=SUMPRODUCT((MMULT(F1:J4-R1:V4,--(ROW(INDIRECT("1:"&COLUMNS(F1:J4)))>0))>2.5)*(MMULT((LEN(F1:J4)>0)+(LEN(R1:V4)>0),--(ROW(INDIRECT("1:"&COLUMNS(F1:J4)))>0))=(COLUMNS(F1:J4)+COLUMNS(R1:V4))))
I think this will do it, replacing your AND's by multiplies (*):
=SUMPRODUCT(--((SUBTOTAL(9,OFFSET(F1,ROW(F1:F4)-ROW(F1),0,1,COLUMNS(F1:J1)))-SUBTOTAL(9,OFFSET(R1,ROW(R1:R4)-ROW(R1),0,1,COLUMNS(R1:V1)))>2.5)*(SUBTOTAL(2,OFFSET(F1,ROW(F1:F4)-ROW(F1),0,1,COLUMNS(F1:J1)))=COLUMNS(F1:J1))*(SUBTOTAL(2,OFFSET(R1,ROW(R1:R4)-ROW(R1),0,1,COLUMNS(R1:V1)))=COLUMNS(R1:V1))>0))
It could be simplified a bit more but a bit short of time.
Just another option...
=IF(NOT(OR(IFERROR(MATCH(TRUE,ISBLANK(F1:J1),0),FALSE),IFERROR(MATCH(TRUE,ISBLANK(R1:V1),0),FALSE))), SUBTOTAL(9,F1:J1)-SUBTOTAL(9,R1:V1), "Missing Value(s)")
My approach was a little different from what you tried to adapt from #TomSharp in that I'm validating the cells have data (not blank) and then perform the calculation, othewise return an error message. This is still an array function call, so when you enter the formulas, press ctrl+shft+enter.
The condition part of the opening if() checks to see that each range's cells are not blank: if a match( true= isblank(cell))
means a cell is blank (bad), if no match ... ie no blank cells, Match will return an #NA "error" (good). False is good = Errors found ? No. ((ie no blank cells))
Then the threshold condition becomes:
=COUNTIF(X1:X4,">"&Threshold)' Note: no Array formula here
I gave the threshold (Cell W6) a named range for read ablity.

matlab: how to speed up the count of consecutive values in a cell array

I have the 137x19 cell array Location(1,4).loc and I want to find the number of times that horizontal consecutive values are present in Location(1,4).loc. I have used this code:
x=Location(1,4).loc;
y={x(:,1),x(:,2)};
for ii=1:137
cnt(ii,1)=sum(strcmp(x(:,1),y{1,1}{ii,1})&strcmp(x(:,2),y{1,2}{ii,1}));
end
y={x(:,1),x(:,2),x(:,3)};
for ii=1:137
cnt(ii,2)=sum(strcmp(x(:,1),y{1,1}{ii,1})&strcmp(x(:,2),y{1,2}{ii,1})&strcmp(x(:,3),y{1,3}{ii,1}));
end
y={x(:,1),x(:,2),x(:,3),x(:,4)};
for ii=1:137
cnt(ii,3)=sum(strcmp(x(:,1),y{1,1}{ii,1})&strcmp(x(:,2),y{1,2}{ii,1})&strcmp(x(:,3),y{1,3}{ii,1})&strcmp(x(:,4),y{1,4}{ii,1}));
end
y={x(:,1),x(:,2),x(:,3),x(:,4),x(:,5)};
for ii=1:137
cnt(ii,4)=sum(strcmp(x(:,1),y{1,1}{ii,1})&strcmp(x(:,2),y{1,2}{ii,1})&strcmp(x(:,3),y{1,3}{ii,1})&strcmp(x(:,4),y{1,4}{ii,1})&strcmp(x(:,5),y{1,5}{ii,1}));
end
... continue for all the columns. This code run and gives me the correct result but it's not automated and it's slow. Can you give me ideas to automate and speed up the code?
I think I will write an answer to this since I've not done so for a while.
First convert your cell Array to a matrix,this will ease the following steps by a lot. Then diff is the way to go
A = randi(5,[137,19]);
DiffA = diff(A')'; %// Diff creates a matrix that is 136 by 19, where each consecutive value is subtracted by its previous value.
So a 0 in DiffA would represent 2 consecutive numbers in A are equal, 2 consecutive 0s would mean 3 consecutive numbers in A are equal.
idx = DiffA==0;
cnt(:,1) = sum(idx,2);
To do 3 consecutive number counts, you could do something like:
idx2 = abs(DiffA(:,1:end-1))+abs(DiffA(:,2:end)) == 0;
cnt(:,2) = sum(idx2,2);
Or use another Diff, the abs is used to avoid negative number + positive number that also happens to give 0; otherwise only 0 + 0 will give you a 0; you can now continue this pattern by doing:
idx3 = abs(DiffA(:,1:end-2))+abs(DiffA(:,2:end-1))+abs(DiffA(:,3:end)) == 0
cnt(:,3) = sum(idx3,2);
In loop format:
absDiffA = abs(DiffA)
for ii = 1:W
absDiffA = abs(absDiffA(:,1:end-1) + absDiffA(:,1+1:end));
idx = (absDiffA == 0);
cnt(:,ii) = sum(idx,2);
end
NOTE: this method counts [0,0,0] twice when evaluating 2 consecutives, and once when evaluating 3 consecutives.

How to increment some of elements in an array by specific values in MATLAB

Suppose we have an array
A = zeros([1,10]);
We have several indexes with possible duplicate say:
indSeq = [1,1,2,3,4,4,4];
How can we increase A(i) by the number of i in the index sequence i.e. A(1) = 2, A(2) = 1, A(3) = 1, A(4) = 3?
The code A(indSeq) = A(indSeq)+1 does not work.
I know that I can use the following for loop to achieve the goal, but I wonder if there is anyway that we can avoid for-loop? We can assume that the indSeq is sorted.
A for-loop solution:
for i=1:length(indSeq)
A(indSeq(i)) = A(indSeq(i))+1;
end;
You can use accumarray for such a label based counting job, like so -
accumarray(indSeq(:),1)
Benchmarking
As suggested in the other answer, you can also use hist/histc. Let's benchmark these two for a large datasize. The benchmarking code I used had -
%// Create huge random array filled with ints that are duplicated & sorted
maxn = 100000;
N = 10000000;
indSeq = sort(randi(maxn,1,N));
disp('--------------------- With HISTC')
tic,histc(indSeq,unique(indSeq));toc
disp('--------------------- With ACCUMARRAY')
tic,accumarray(indSeq(:),1);toc
Runtime output -
--------------------- With HISTC
Elapsed time is 1.028165 seconds.
--------------------- With ACCUMARRAY
Elapsed time is 0.220202 seconds.
This is run-length encoding, and the following code should do the trick for you.
A=zeros(1,10);
indSeq = [1,1,2,3,4,4,4,7,1];
indSeq=sort(indSeq); %// if your input is always sorted, you don't need to do this
pos = [1; find(diff(indSeq(:)))+1; numel(indSeq)+1];
A(indSeq(pos(1:end-1)))=diff(pos)
which returns
A =
3 1 1 3 0 0 1 0 0 0
This algorithm was written by Luis Mendo for MATL.
I think what you are looking for is the number of occurences of unique values of the array. This can be accomplished with:
[num, val] = hist(indSeq,unique(indSeq));
the output of your example is:
num = 2 1 1 3
val = 1 2 3 4
so num is the number of times val occurs. i.e. the number 1 occurs 2 times in your example

Excel- make array from votes on a score

I'm a whiz at Matlab, but apparently I can't figure out excel for my life today. I have a spreadsheet where I keep track of votes. So I record x number of votes for each score, i.e. on a scale of 1 to 5, 3 people voted 4, 2 people voted 3, and 1 person voted 1. I want to find the median of these votes, but I need to turn them into an array first, otherwise I'm just taking the median of the numbers of votes. I'm having trouble with getting arrays to work in this case. I need to build an array, with the above example, that looks like {4 4 4 3 3 1}, and then I can take the median of that (I assume I can just use the regular median function on an array?).
I realize the problem here is that I don't really know excel very well. So I guess I'm just asking for an answer, which is frowned upon when I can't show much work myself. But can someone give me a hint?
This one intrigued me, I'm sure there is a way to do this with an array formula but they have never been my strong point. For the time being here is a VBA solution:
Function MedianArray(rngScore As Range, rngCount As Range) As Double
Dim arrS() As Variant, arrC() As Variant, arrM() As Variant
Dim i As Integer, j As Integer, k As Integer
Dim d As Double
arrS = rngScore
arrC = rngCount
d = WorksheetFunction.Sum(rngCount)
ReDim arrM(1 To d, 1 To 1)
k = 1
For i = 1 To UBound(arrS, 2)
For j = 0 To arrC(1, i) - 1
arrM(k, 1) = arrS(1, i)
k = k + 1
Next j
Next i
MedianArray = WorksheetFunction.Median(arrM())
End Function
Given you say you don't know much about VBA here's how you do it:
From Excel press Alt + F11 to open the VB Editor
In the VB Editor menus select Insert -> Module
Paste in the code above
In the cell where you need median value type =MedianArray(B1:F1,B2:F2), assuming your scores are in row 1 columns B through F and the counts are directly below.
Hope this helps.
I'll let someone else post a VBA solution, but here's what I did using just formulas:
A B C D E
1 Running Total: 1 1 3 6 6 Median
2 Greater/lesser: < < = > > 3.5
3 Values: 1 2 3 4 5
4 Counts: 1 2 3
Rows 3 and 4 are your original values and counts of values. Row 1 is the running total of the counts, going from left to right. Row 2 represents whether row 1 is greater than, lesser than, or equal to the total sum of the counts row.
If there's no = in row 2, then you just need to get the value from the first column with a >. This is achieved with an HLookup.
If there is an = in row 2, then you need to get the average of the value in the = column and the value of the first > column.
See it in action
I'd like to know if there's a more elegant way!

Resources