I have three arrays A, B, and A - B = C. They are broken into columns and formatted in excel like:
A, B, C, A, B, C, A, B, C... D, E
I want to sum all C>0 = D, and sum of all C<0 = E. The problem is that C is broken up for easy human readability, so I only want to call every third column.
My solution:
Following a variation on the method given here and here, and a simple test array of data:
1 0 1 1
1 1 0 1
-1 -1 -1 1
0 -1 -1 0
-1 -1 1 0
1 1 0 1
I will pull out the even columns and do the conditional sums:
=SUMPRODUCT((MOD(COLUMN(A1:D1),2)=0)*(A1:D1>0),A1:D1)
=SUMPRODUCT((MOD(COLUMN(A1:D1),2)=0)*(A1:D1<0),A1:D1)
Which produces the correct result:
1 0
2 0
1 -1
0 -1
0 -1
2 0
But I am absolutely baffled as to why this works. For one thing, I didn't put in the double negative (--), so upon getting a "TRUE" or "FALSE" value, the formula should have spit an error at me. For another, this works just fine even though I'm not running it as a CSE array function in excel. And the part I get least of all is the arguments for SUMPRODUCT().
MOD() is just acting as a filter for the conditional, that I get, but I don't understand how it's handling A1:D1 when all it gets from COLUMN is a single number. COLUMN(A1:D1) just returns a single scalar value in excel, the first column in the range, in this case 1. How is that being turned into the needed array [1, 3], especially since I'm not using CSE?
SUMPRODUCT is naturally an Array type formula which is why you do not need the CSE:
=SUMPRODUCT(A1:A6,B1:B6)
This will iterate and do A1*B1+A2*B2+...
So using this functionality we can do:
=SUMPRODUCT((MOD(COLUMN(A1:D1),2)))
Which will iterate through the columns and return: 2 MOD(1,2)+MOD(2,2)+...
The reason you do not need the double unary(--) is because you have the * in the expression. Any math done on a Boolean(TRUE/FALSE) will turn it to its Bit (1/0).
The -- is short hand for -1 * -1 *
You would need the -- if you did this:
=SUMPRODUCT(--(MOD(COLUMN(A1:D1),2)=0),--(A1:D1>0),A1:D1)
So with this you would end up multiplying a series of 1's and 0's against the series in A1:D1.
When either is false it will return 0 and 0 times anything is 0.
So only when both are TRUE or 1 does the value in the corresponding cell get added to the iterating sum.
Related
I have an arbitrary number of 2d arrays of equal width but possibly non-equal height. They can consist of 1s, 0s, or a wildcard * which can match either a 1 or a 0. The wildcards are always in the same columns. I want to return all possible rows that are consistent with at least one row in every array simultaneously, and contain no wild cards.
For a concrete example, consider the three 2d arrays
1 0 1 * * 0 1 0 * 1 * 1
a = 1 1 1 * b = * 1 1 0 c = * 0 * 0
0 1 0 * * 0 0 1
A possible row in the solution might be 1 0 1 0. It is consistent with the top row of a, the top row of b, and the second row of c. By contrast, a row that should not be in the solution is 0 1 0 1, since it is not consistent with any row of b despite being consistent with the bottom row of a and the top row of c.
Beyond doing an inefficient brute-force check I'm rather stuck. It seems like there should be a faster way. Are there are any tricks that might help solve this problem efficiently?
CONTEXT
I have a large number of columns with categoricals, all with different, unrankable choices. To make my life easier for analysis, I'd like to take each of them and convert it to several columns with logicals. For example:
1 GENRE
2 Pop
3 Classical
4 Jazz
...would turn into...
1 Pop Classical Jazz
2 1 0 0
3 0 1 0
4 0 0 1
PROBLEM
I've tried using ind2vec but this only works with numericals or logicals. I've also come across this but am not sure it works with categoricals. What is the right function to use in this case?
If you want to convert from a categorical vector to a logical array, you can use the unique function to generate column indices, then perform your encoding using any of the options from this related question:
% Sample data:
data = categorical({'Pop'; 'Classical'; 'Jazz'; 'Pop'; 'Pop'; 'Jazz'});
% Get unique categories and create indices:
[genre, ~, index] = unique(data)
genre =
Classical
Jazz
Pop
index =
3
1
2
3
3
2
% Create logical matrix:
mat = logical(accumarray([(1:numel(index)).' index], 1))
mat =
6×3 logical array
0 0 1
1 0 0
0 1 0
0 0 1
0 0 1
0 1 0
ind2vec do work with the cell strings, and you could call cellstr function to get such a cell string.
This codes may help (From this ,I only changed a little)
data = categorical({'Pop'; 'Classical'; 'Jazz';});
GENRE = cellstr(data); %change categorical data into cell strings
[~, loc] = ismember(GENRE, unique(GENRE));
genre = ind2vec(loc')';
Gen=full(genre);
array2table(Gen, 'VariableNames', unique(GENRE))
run such a code will return this:
ans =
Classical Jazz Pop
_________ ____ ___
0 0 1
1 0 0
0 1 0
you can call unique(GENRE) to check the categories(in cell strings). In the meanwhile, logical(Gen)(or call logical(full(genre))) contain columns with logical that you need.
P.s. categorical structure might be faster than cell string, but ind2vec function doesn't work with it. unique and accumarray might better.
I am going to write a do loop over possible values of an array elements. More specifically I have an array, say A(:) with size n and any element of array A can be 0 or 1. I want to iterate over all possible values of elements of A. Of course a simple way is
do A(1)=0, 1
do A(2)=0, 1
....
! do something with array A
end do
end do
but the size of my array is large and this method is not very suitable. is there a better way to do this?
Since this is binary only, why not (mis-)use integers for this task? Just increment the integer by one for each of the combinations and read out the corresponding bits using btest:
program test
implicit none
integer, parameter :: n = 3
integer :: i
integer :: A(n)
integer :: idx(n) = [(i, i=0,n-1)] ! bit positions start at zero
! Use a loop to increment the integer
do i=0,2**n-1
! Get the bits at the lowest positions and map true to "1" and false to "0"
A = merge( 1, 0, btest( i, idx ) )
print *,A
enddo
end program
This simple code fills A with all combinations (one at a time) and prints them successively.
./a.out
0 0 0
1 0 0
0 1 0
1 1 0
0 0 1
1 0 1
0 1 1
1 1 1
Note that Fortran only has signed integers, so the highest bit is not usable here. This leaves you up to 2^(32-1) combinations for (default) 4 byte integers if you start from zero as shown here (array length up to 31).
To get the full range, do the loop in the following way:
do i=-huge(1)-1,huge(1)
This gives you the full 2^32 distinct variations for arrays of length 32.
Let's say I have a vector A = [-1,2];
Each element in A is described by the actual number and sign. So each element has a 2 dimensional feature-set.
I would like to generate a matrix, in this case 2x2 where the columns correspond to the element, and rows correspond to the presence of a feature. The presence of a feature is described by 1's and 0's. So, if an element is positive, it is 1, if the element is the number 1, then the result is 1 as well. In the case above I would get:
Element 1 Element 2
Is this a 1? 1 0
Is this a positive number? 0 1
What is the smartest way to go about accomplishing this? Obviously if statements would work, but I feel that there should be a faster, much smarter way of going about this. I am coding this in matlab by the way, and I would appreciate any help.
#Benoit_11's solution is a fine one. Here's a similar but maybe simpler solution. You could try both and see which is faster if you care about speed.
features = [abs(A) == 1; A > 0];
this assumes A is a row vector in order to get the output in the format you specified.
Simple way using ismember for the first condition and logical operation for the 2nd condition. ismember outputs a logical array which you can plug into the output you need (here called DescribeA; and likewise when you check for values greater than 0 using the > operator.
%// Test array
A = [-1,2,1,-10,5,-3,1]
%// Initialize output
DescribeA = zeros(2,numel(A));
%// 1st condition. Check if values are 1 or -1
DescribeA(1,:) = ismember(A,1)|ismember(A,-1);
%// Check if they are > 0
DescribeA(2,:) = A>0;
Output in Command Window:
A =
-1 2 1 -10 5 -3 1
DescribeA =
1 0 1 0 0 0 1
0 1 1 0 1 0 1
I feel there is a smarter way for the 1st condition but I can't seem to find it.
I have a matrix in Matlab(2012) with 3 columns and X number of rows, X is defined by the user, so varies each time. For this example though I will use a fixed 5x3 matrix.
So I would like to perform an iterative function on each row within the matrix, while the value in the third column is below a certain value. Then store the new values within the same matrix, so overwrite the original values.
The code below is a simplified version of the problem.
M=[-2 -5 -3 -2 -4]; %Vector containing random values
Vf_X=M+1; %Defining the first column of the matrix
Vf_Y=M+2; %Defining the secound column of the matrix
Vf_Z=M; %Defining the third column of the matrix
Vf=[Vf_X',Vf_Y',Vf_Z']; %Creating the matrix
while Vf(:,3)<0
Vf=Vf+1;
end
disp(Vf)
The result I get is
1 2 0
-2 -1 -3
0 1 -1
1 2 0
-1 0 -2
Ideally I would like to get this result instead
1 2 0
1 2 0
1 2 0
1 2 0
1 2 0
The while will not start if any value is above zero to begin with and stops as soon as one value goes above zero.
I hope this makes sense and I have supplied enough information
Thank you for your time and help.
Your current problem is that you stop iterating the very moment any of the values in the third row break the condition. Correct me if I'm wrong, but what I think you want is to continue doing iterations on the remaining rows, until the conditions are broken by all third columns.
You could do that like this:
inds = true(size(Vf,1),1);
while any(inds)
Vf(inds,:) = Vf(inds,:)+1;
inds = Vf(:,3) < 0;
end
Of course, for the simple addition you provide, there is a better and faster way:
inds = Vf(:,3)<0;
Vf(inds,:) = bsxfun(#minus, Vf(inds,:), Vf(inds,3));
But for general functions, the while above will do the trick.