Combine two rows of cell which includes same element - arrays

For example I have a cell array like:
Column1----Column2
'aaaa'--------4
'bbbb'--------5
'cccc'---------2
'cccc'---------0
'dddd'--------0
'dddd'--------3
'eeee'--------0
'ffff'-----------0
And what I want is to merge the rows which has same elements. Finally what I want to obtain is:
'aaaa'--------4
'bbbb'--------5
'cccc'---------2
'dddd'--------3
'eeee'--------0
'ffff'-----------0
I'm looking for an answer without for loops.

Find all completely unique strings (i.e ffff_0 and ffff__1 are unique, but aaaa_1 and aaaa___1 are obviously not unique. (apparently underscores represent formatting?)
Once you have that, do the same thing with just the letters.
I am pretty sure you will have to do that (above) in some capacity to get your desired output, and if that's the case, I think you are right on the edge of the speed tradeoff between for loops and all that extra memory allocation and sorting through values finding unique ones.

Try this:
arr([arr{:,2}] ~= 0,:)
arr -
rows: all rows of arr such that the second column does not equal 0,
columns: all columns
Might be a syntax error in there somewhere, been a while since I used Matlab...
Edit: New answer
non_zero = transpose([arr{:,2}] ~= 0);
arr = arr(non_zero | ~ismember(arr(:,1),arr(non_zero,1)),:)
Essentially what I'm doing: Get all rows such that the right hand side is not zero, OR the left-hand side is not a member of the left-hand sides of the non-zero rows. The latter condition will only be satisfied for rows with zero with no matching left-hand side in the non-zero rows (and hence not repeats). Now keep in mind that this will still not work if you have any duplicate rows (same left and right-hand side). If that's a possibility then do this:
non_zero = transpose([arr{:,2}] ~= 0);
arr = arr(non_zero | ~ismember(arr(:,1),arr(non_zero,1)),:);
[~,U] = unique(arr(:,1));
arr = arr(U,:)

Related

Python: Finding the row index of a value in 2D array when a condition is met

I have a 2D array PointAndTangent of dimension 8500 x 5. The data is row-wise with 8500 data rows and 5 data values for each row. I need to extract the row index of an element in 4th column when this condition is met, for any s:
abs(PointAndTangent[:,3] - s) <= 0.005
I just need the row index of the first match for the above condition. I tried using the following:
index = np.all([[abs(s - PointAndTangent[:, 3])<= 0.005], [abs(s - PointAndTangent[:, 3]) <= 0.005]], axis=0)
i = int(np.where(np.squeeze(index))[0])
which doesn't work. I get the follwing error:
i = int(np.where(np.squeeze(index))[0])
TypeError: only size-1 arrays can be converted to Python scalars
I am not so proficient with NumPy in Python. Any suggestions would be great. I am trying to avoid using for loop as this is small part of a huge simulation that I am trying.
Thanks!
Possible Solution
I used the following
idx = (np.abs(PointAndTangent[:,3] - s)).argmin()
It seems to work. It returns the row index of the nearest value to s in the 4th column.
You were almost there. np.where is one of the most abused functions in numpy. Half the time, you really want np.nonzero, and the other half, you want to use the boolean mask directly. In your case, you want np.flatnonzero or np.argmax:
mask = abs(PointAndTangent[:,3] - s) <= 0.005
mask is a 1D array with ones where the condition is met, and zeros elsewhere. You can get the indices of all the ones with flatnonzero and select the first one:
index = np.flatnonzero(mask)[0]
Alternatively, you can select the first one directly with argmax:
index = np.argmax(mask)
The solutions behave differently in the case when there are no rows meeting your condition. Three former does indexing, so will raise an error. The latter will return zero, which can also be a real result.
Both can be written as a one-liner by replacing mask with the expression that was assigned to it.

Split array into smaller unequal-sized arrays dependend on array-column values

I'm quite new to MatLab and this problem really drives me insane:
I have a huge array of 2 column and about 31,000 rows. One of the two columns depicts a spatial coordinate on a grid the other one a dependent parameter. What I want to do is the following:
I. I need to split the array into smaller parts defined by the spatial column; let's say the spatial coordinate are ranging from 0 to 500 - I now want arrays that give me the two column values for spatial coordinate 0-10, then 10-20 and so on. This would result in 50 arrays of unequal size that cover a spatial range from 0 to 500.
II. Secondly, I would need to calculate the average values of the resulting columns of every single array so that I obtain per array one 2-dimensional point.
III. Thirdly, I could plot these points and I would be super happy.
Sadly, I'm super confused since I miserably fail at step I. - Maybe there is even an easier way than to split the giant array in so many small arrays - who knows..
I would be really really happy for any suggestion.
Thank you,
Arne
First of all, since you wish a data structure of array of different size you will need to place them in a cell array so you could try something like this:
res = arrayfun(#(x)arr(arr(:,1)==x,:), unique(arr(:,1)), 'UniformOutput', 0);
The previous code return a cell array with the array splitted according its first column with #(x)arr(arr(:,1)==x,:) you are doing a function on x and arrayfun(function, ..., 'UniformOutput', 0) applies function to each element in the following arguments (taken a single value of each argument to evaluate the function) but you must notice that arr must be numeric so if not you should map your values to numeric values or use another way to select this values.
In the same way you could do
uo = 'UniformOutput';
res = arrayfun(#(x){arr(arr(:,1)==x,:), mean(arr(arr(:,1)==x,2))), unique(arr(:,1)), uo, 0);
You will probably want to flat the returning value, check the function cat, you could do:
res = cat(1,res{:})
Plot your data depends on their format, so I can't help if i don't know how the data are, but you could try to plot inside a loop over your 'res' variable or something similar.
Step I indeed comes with some difficulties. Once these are solved, I guess steps II and III can easily be solved. Let me make some suggestions for step I:
You first define the maximum value (maxValue = 500;) and the step size (stepSize = 10;). Now it is possible to iterate through all steps and create your new vectors.
for k=1:maxValue/stepSize
...
end
As every resulting array will have different dimensions, I suggest you save the vectors in a cell array:
Y = cell(maxValue/stepSize,1);
Use the find function to find the rows of the entries for each matrix. At each step k, the range of values of interest will be (k-1)*stepSize to k*stepSize.
row = find( (k-1)*stepSize <= X(:,1) & X(:,1) < k*stepSize );
You can now create the matrix for a stepk by
Y{k,1} = X(row,:);
Putting everything together you should be able to create the cell array Y containing your matrices and continue with the other tasks. You could also save the average of each value range in a second column of the cell array Y:
Y{k,2} = mean( Y{k,1}(:,2) );
I hope this helps you with your task. Note that these are only suggestions and there may be different (maybe more appropriate) ways to handle this.

Sum along absolute values in an Array in Matlab

My array contains a string in the first row
how can I sum the array from the 2nd row to the Nth/1442th row (as in my example) disregarding the negative signs present in the column?
for example, my code for an array called data2 is:
S = sum(data2(2,15):data2(1442,15));
so sum all of the elements from row 2 to row 1442 in column 15.
This doesn't work but it also does not have anything to deal with the absolute value of whatever row its checking
data is from a .csv:
You should do something like this:
sum(abs(data(2:1442,15)));
The abs function will find the absolute value of each value in the array (i.e. disregard the negative sign). data(2:1442,15) will grab rows 2-1442 of the 15th column, as you wanted.
EDIT: apparently data is a cell array, so you could do the following, I think:
sum(abs([data{2:1442,15}]));
Ok so it looks like you have a constant column so
data2(2,15) = -0.02
and further down
data2(1442,15) = -0.02 %(I would assume)
So when you form:
data2(2,15):data2(1442,15)
this is essential like trying to create an array but of a single value since:
-0.02:-0.02
ans =
-0.0200
which of course gives:
>> sum(-0.02:-0.02)
ans =
-0.0200
What you want should be more like:
sum(data2(2:1442,15))
That way, the index: 2:1442, forms a vector of all the row references for you.
To disregard the negative values:
your answer = sum(abs(data2(2:1442,15)))
EDIT: For a cell array this works:
sum(abs(cell2mat(data2(2:1442,15))))

How to copy consecutive values to an array

copyto = zeros(10)
what = ones(3)
where = 2
copyto[where:len(what)+where] = what
Is there a way to copy all values from a smaller array into a bigger array at a specific position, without providing the upper index? The way I thought it would work was
copyto[where:] = what
but this gives me
ValueError: operands could not be broadcast together with shapes
Thanks!
At the left and the right hand of the assignment you must have arrays with the same shape so that a one-to-one correspondence between the individual elements exist. In your case the array(view) copyto[where:] has 8 elements, while what has 3, so your assignment is not well defined. (Or to put it otherwise: there is no unique way to assign three values to eight variables, therefore the assignment is ill defined.)

Why does `x[0]` return a zero-length vector?

Say I have a vector, for example, x <- 1:10, then x[0] returns a zero-length vector of the same class as x, here integer(0).
I was wondering if there is a reason behind that choice, as opposed to throwing an error, or returning NA as x[11] would? Also, if you can think of a situation where having x[0] return integer(0) is useful, thank you for including it in your answer.
As seen in ?"["
NA and zero values are allowed: rows of an index matrix containing a
zero are ignored, whereas rows containing an NA produce an NA in the
result.
So an index of 0 just gets ignored. We can see this in the following
x <- 1:10
x[c(1, 3, 0, 5, 0)]
#[1] 1 3 5
So if the only index we give it is 0 then the appropriate response is to return an empty vector.
My crack at it as I am not a programmer and certainly do not contribute to R source. I think it may be because you need some sort of place holder to state that something occurred here but nothing was returned. This becomes more apparent with things like tables and split. For instance when you make a table of values and say there are zero of that cell you need to hold that that cell made from a string in a vector has no values. it would not be a appropriate to have x[0]==0 as it's not the numeric value of zero but the absence of any value.
So in the following splits we need a place holder and integer(0) holds the place of no values returned which is not the same as 0. Notice for the second one it returns numeric(0) which is still a place holder stating it was numeric place holder.
with(mtcars, split(as.integer(gear), list(cyl, am, carb)))
with(mtcars, split(gear, list(cyl, am, carb)))
So in a way my x[FALSE] retort is true in that it holds the place of the non existent zero spot in the vector.
All right this balonga I just spewed is true until someone disputes it and tears it down.
PS page 19 of this guide (LINK) state that integer() and integer(0) are empty integer.
Related SO post: How to catch integer(0)?
Since the array indices are 1-based, index 0 has no meaning. The value is ignored as a vector index.

Resources