I know mapping 2D array into 1D array has been asked many times, but I did not find a solution that would fit a where the column count varies.
So I want get a 1-dimensional index from this 2-dimensional array
Col> _0____1____2__
Row 0 |_0__|_1__|_2__|
V 1 |_3__|_4__|
2 |_5__|_6__|_7__|
3 |_8__|_9__|
4 |_10_|_11_|_12_|
5 |_13_|_14_|
The normal formula index = row * columns + column does not work, since after the 2nd row the index is out of place.
What is the correct formula here?
EDIT:
The specific issue is that I have a list of items in with the layout like in the grid, but a one dimensional array for the data. So while looping through the elements in the UI, I need to get the correct data, but can only get the row and column for that element. I need to find a way to turn a row/column value into an index for the data-array
Bad picture trying to explain it
A truly optimal answer (or even a provably correct one) will depend on the language you are using and how it lays out memory for such arrays.
However, taking your question simply at face value, you have to know what the actual length of each row is in order to calculate a 1D index.
So either the row length follows some pattern that can be inferred from the data, or you have (or can write) a rlen = rowLength( 2dTable, RowNumber) function.
Then, depending on how big the tables are and how fast you need to run, you can calculate a 1D index from the 2d table by adding all the previous row lengths until the current row length is less than the 2d column index.
or build a 1d table of the row lengths (or commulative rowlengths) so you can scan it and so only call your rowlength function for each row only once.
With a better description of your problem, you might get a better answer...
For your example which alternates between 3 and 2 columns you can construct a formula:
index = (row / 2) * (3 + 2) + (row % 2 ? 3 : 0) + column
(C-like syntax, assuming integer division)
In general though, the one and only way to implement what you're doing here, jagged arrays, is to make an array of arrays, a.k.a. an Iliffe vector. That means, use the row number as index into an array of pointers which point to the individual row arrays containing the actual data.
You can have an additional 1D array having the length of the columns say "length". Then your formula is index=sum {length(i)}+column. i runs from 0 to row.
Related
I have a 2D array PointAndTangent of dimension 8500 x 5. The data is row-wise with 8500 data rows and 5 data values for each row. I need to extract the row index of an element in 4th column when this condition is met, for any s:
abs(PointAndTangent[:,3] - s) <= 0.005
I just need the row index of the first match for the above condition. I tried using the following:
index = np.all([[abs(s - PointAndTangent[:, 3])<= 0.005], [abs(s - PointAndTangent[:, 3]) <= 0.005]], axis=0)
i = int(np.where(np.squeeze(index))[0])
which doesn't work. I get the follwing error:
i = int(np.where(np.squeeze(index))[0])
TypeError: only size-1 arrays can be converted to Python scalars
I am not so proficient with NumPy in Python. Any suggestions would be great. I am trying to avoid using for loop as this is small part of a huge simulation that I am trying.
Thanks!
Possible Solution
I used the following
idx = (np.abs(PointAndTangent[:,3] - s)).argmin()
It seems to work. It returns the row index of the nearest value to s in the 4th column.
You were almost there. np.where is one of the most abused functions in numpy. Half the time, you really want np.nonzero, and the other half, you want to use the boolean mask directly. In your case, you want np.flatnonzero or np.argmax:
mask = abs(PointAndTangent[:,3] - s) <= 0.005
mask is a 1D array with ones where the condition is met, and zeros elsewhere. You can get the indices of all the ones with flatnonzero and select the first one:
index = np.flatnonzero(mask)[0]
Alternatively, you can select the first one directly with argmax:
index = np.argmax(mask)
The solutions behave differently in the case when there are no rows meeting your condition. Three former does indexing, so will raise an error. The latter will return zero, which can also be a real result.
Both can be written as a one-liner by replacing mask with the expression that was assigned to it.
So I m trying to find some alternative to sumifs in excel where each condition needs to be checked in a 2D range instead of a 1D range.
For example, in the below table I want the sum of values in column V for rows where A12 ("IJ") is present in range A2:C8 (P), B12 ("NM") is present in the range D2:F8 (S) and C12 ("XX") is present in range G2:I8 (A)
I am trying to find a solution involving an array-based formula (without VBA).
Like for example in the below-given formulas,
SUMPRODUCT((B2:B8'=A12)*J2:J8) will give an array-based calculation as follows
SUMPRODUCT({TRUE;FALSE;TRUE;FALSE;FALSE;TRUE;FALSE}*{22;79;45;67;43;72;52})
= SUMPRODUCT({22;0;45;0;0;72;0})
=139
It is easy when there is only one condition needs to be checked but like sumifs, I intend to check multiple conditions, but as soon as I add other conditions, the array becomes multidimensional and gives the wrong answer.
Example:
SUMPRODUCT((A2:C8=A12)*(D2:F8=B12)*J2:J8) breaks down to
=SUMPRODUCT(
{FALSE,TRUE,FALSE;FALSE,FALSE,FALSE;FALSE,TRUE,FALSE;FALSE,FALSE,FALSE;FALSE,FALSE,FALSE;FALSE,TRUE,FALSE;FALSE,FALSE,FALSE}*
{TRUE,FALSE,FALSE;FALSE,FALSE,FALSE;TRUE,FALSE,FALSE;FALSE,FALSE,FALSE;FALSE,FALSE,FALSE;TRUE,FALSE,FALSE;FALSE,FALSE,FALSE}
*J2:J8)
in the background what is happening is (example for 3rd row)
SUMPRODUCT( ({FALSE, TRUE ,FALSE} * {TRUE,FALSE,FALSE}) * 45 )
= SUMPRODUCT({FALSE,FALSE,FALSE} *45 )
=0
SUMPRODUCT(({FALSE,TRUE ,FALSE} + {TRUE,FALSE,FALSE}) * 45 )
= SUMPRODUCT({TRUE,TRUE,FALSE} *45 )
= 90
#expected answer =45
Can someone help me understand where I am going wrong or what I am missing?
If there is any other way then suggestions are always welcome.
Please note this is a dummy data actual data is very big for each header (P,S,A) there are values in 10 columns respectively and the number of rows is also very large.
Try this...
=SUMPRODUCT( ((A2:A8=A12)+(B2:B8=A12)+(C2:C8=A12)) * ((D2:D8=B12)+(E2:E8=B12)+(F2:F8=B12)) * ((G2:G8=C12)+(H2:H8=C12)+(I2:I8=C12)) * J2:J8 )
For SUMPRODUCT to work, the shape of the Boolean array needs to match the shape of the array you wish to conditionally sum.
J2:J8 is seven rows tall by one column wide.
The above formula creates an array of 1s and 0s from your three criteria ranges and shapes it into seven rows tall by one column wide.
At that point, SUMPRODUCT can do it's normal thing because the criteria array matches the dimension of the sum array J2:J8.
MATLAB is well-known for being column-major. Consequently, manipulating entries of an array that are in the same column is faster than manipulating entries that are on the same row.
In that case, why do so many built-in functions, such as linspace and logspace, output row vectors rather than column vectors? This seems to me like a de-optimization...
What, if any, is the rationale behind this design decision?
It is a good question. Here are some ideas...
My first thought was that in terms of performance and contiguous memory, it doesn't make a difference if it's a row or a column -- they are both contiguous in memory. For a multidimensional (>1D) array, it is correct that it is more efficient to index a whole column of the array (e.g. v(:,2)) rather than a row (e.g. v(2,:)) or other dimension because in the row (non-column) case it is not accessing elements that are contiguous in memory. However, for a row vector that is 1-by-N, the elements are contiguous because there is only one row, so it doesn't make a difference.
Second, it is simply easier to display row vectors in the Command Window, especially since it wraps the rows of long arrays. With a long column vector, you will be forced to scroll for much shorter arrays.
More thoughts...
Perhaps row vector output from linspace and logspace is just to be consistent with the fact that colon (essentially a tool for creating linearly spaced elements) makes a row:
>> 0:2:16
ans =
0 2 4 6 8 10 12 14 16
The choice was made at the beginning of time and that was that (maybe?).
Also, the convention for loop variables could be important. A row is necessary to define multiple iterations:
>> for k=1:5, k, end
k =
1
k =
2
k =
3
k =
4
k =
5
A column will be a single iteration with a non-scalar loop variable:
>> for k=(1:5)', k, end
k =
1
2
3
4
5
And maybe the outputs of linspace and logspace are commonly looped over. Maybe? :)
But, why loop over a row vector anyway? Well, as I say in my comments, it's not that a row vector is used for loops, it's that it loops through the columns of the loop expression. Meaning, with for v=M where M is a 2-by-3 matrix, there are 3 iterations, where v is a 2 element column vector in each iteration. This is actually a good design if you consider that this involves slicing the loop expression into columns (i.e. chunks of contiguous memory!).
I'm quite new to MatLab and this problem really drives me insane:
I have a huge array of 2 column and about 31,000 rows. One of the two columns depicts a spatial coordinate on a grid the other one a dependent parameter. What I want to do is the following:
I. I need to split the array into smaller parts defined by the spatial column; let's say the spatial coordinate are ranging from 0 to 500 - I now want arrays that give me the two column values for spatial coordinate 0-10, then 10-20 and so on. This would result in 50 arrays of unequal size that cover a spatial range from 0 to 500.
II. Secondly, I would need to calculate the average values of the resulting columns of every single array so that I obtain per array one 2-dimensional point.
III. Thirdly, I could plot these points and I would be super happy.
Sadly, I'm super confused since I miserably fail at step I. - Maybe there is even an easier way than to split the giant array in so many small arrays - who knows..
I would be really really happy for any suggestion.
Thank you,
Arne
First of all, since you wish a data structure of array of different size you will need to place them in a cell array so you could try something like this:
res = arrayfun(#(x)arr(arr(:,1)==x,:), unique(arr(:,1)), 'UniformOutput', 0);
The previous code return a cell array with the array splitted according its first column with #(x)arr(arr(:,1)==x,:) you are doing a function on x and arrayfun(function, ..., 'UniformOutput', 0) applies function to each element in the following arguments (taken a single value of each argument to evaluate the function) but you must notice that arr must be numeric so if not you should map your values to numeric values or use another way to select this values.
In the same way you could do
uo = 'UniformOutput';
res = arrayfun(#(x){arr(arr(:,1)==x,:), mean(arr(arr(:,1)==x,2))), unique(arr(:,1)), uo, 0);
You will probably want to flat the returning value, check the function cat, you could do:
res = cat(1,res{:})
Plot your data depends on their format, so I can't help if i don't know how the data are, but you could try to plot inside a loop over your 'res' variable or something similar.
Step I indeed comes with some difficulties. Once these are solved, I guess steps II and III can easily be solved. Let me make some suggestions for step I:
You first define the maximum value (maxValue = 500;) and the step size (stepSize = 10;). Now it is possible to iterate through all steps and create your new vectors.
for k=1:maxValue/stepSize
...
end
As every resulting array will have different dimensions, I suggest you save the vectors in a cell array:
Y = cell(maxValue/stepSize,1);
Use the find function to find the rows of the entries for each matrix. At each step k, the range of values of interest will be (k-1)*stepSize to k*stepSize.
row = find( (k-1)*stepSize <= X(:,1) & X(:,1) < k*stepSize );
You can now create the matrix for a stepk by
Y{k,1} = X(row,:);
Putting everything together you should be able to create the cell array Y containing your matrices and continue with the other tasks. You could also save the average of each value range in a second column of the cell array Y:
Y{k,2} = mean( Y{k,1}(:,2) );
I hope this helps you with your task. Note that these are only suggestions and there may be different (maybe more appropriate) ways to handle this.
I am not 100% what the role of the 1: is here. At which index to start the copy? But then why not two such parameters for the rank 2 array?
To be a little more explicit:
In Fortran90 and up you access a single array value by giving a single index, and access a subarray (a window of the array) by giving a range of indices separated by a colon.
a(1) = 0.
a(2:5) = (/3.14,2.71,1.62,0./)
You can also give a step size.
a(1:5:2) = (/0,2.71,0./)
And finally, you can leave out values and a default will be inserted. So if a runs from index 1 to 5 then I could write the above as
a(::2) = (/0,2.71,0./)
and the 1 and 5 are implied. Obviously, you shouldn't leave these out if it makes the code unclear.
With a multidimensional array, you can mix and match these on each dimension, as in your example.
You're taking a slice of array2, namely the elements in the D'th column from row 1 to C and putting them in the slice of array1 which is elements 1 through A
So both slices are 1-dimensional arrays
Slice may not be the correct term in Fortran