This question already has answers here:
Closed 13 years ago.
Possible Duplicates:
Implementing a matrix, which is more efficient - using an Array of Arrays (2D) or a 1D array?
Performance of 2-dimensional array vs 1-dimensional array
I was looking at one of my buddy's molecular dynamics code bases the other day and he had represented some 2D data as a 1D array. So rather than having to use two indexes he only has to keep track of one but a little math is done to figure out what position it would be in if it were 2D. So in the case of this 2D array:
two_D = [[0, 1, 2],
[3, 4, 5]]
It would be represented as:
one_D = [0, 1, 2, 3, 4, 5]
If he needed to know what was in position (1,1) of the 2D array he would do some simple algebra and get 4.
Is there any performance boost gained by using a 1D array rather than a 2D array. The data in the arrays can be called millions of times during the computation.
I hope the explanation of the data structure is clear...if not let me know and I'll try to explain it better.
Thank you :)
EDIT
The language is C
For a 2-d Array of width W and height H you can represent it as a 1-d Array of length W*H where each index
(x,y)
where x is the column and y is the row, of the 2-d array is mapped to to the index
i=y*W + x
in the 1-D array. Similarily you can use the inverse mapping:
y = i / W
x = i % W
. If you make W a power of 2 (W=2^m), you can use the hack
y = i >> m;
x = (i & (W-1))
where this optimization is restricted only to the case where W is a power of 2. A compiler would most likely miss this micro-optimization so you'd have to implement it yourself.
Modulus is a slow operator in C/C++, so making it disappear is advantageous.
Also, with large 2-d arrays keep in mind that the computer stores them in memory as a 1-d array and basically figures out the indexes using the mappings I listed above.
Far more important than the way that you determine these mappings is how the array is accessed. There are two ways to do it, column major and row major. The way that you traverse is more important than any other factor because it determines if you are using caching to your advantage. Please read http://en.wikipedia.org/wiki/Row-major_order .
Take a look at Performance of 2-dimensional array vs 1-dimensional array
Often 2D arrays are implemented as 1D arrays. Sometimes 2D arrays are implemented by a 1D array of pointers to 1D arrays. The first case obviously has no performance penalty compared to a 1D array, because it is identical to a 1D array. The second case might have a slight performance penalty due to the extra indirection (and additional subtle effects like decreased cache locality).
It's different for each system what kind is used, so without information about what you're using there's really no way to be sure. I'd advise to just test the performance if it's really important to you. And if the performance isn't that important, then don't worry about it.
For C, 2D arrays are 1D arrays with syntactic sugar, so the performance is identical.
You didn't mention which language this is regarding or how the 2D array would be implemented. In C 2D arrays are actually implemented as 1D arrays where C automatically performs the arithmetic on the indices to acces the right element. So it would do what your friend does anyway behind the scenes.
In other languages a 2d array might be an array of pointers to the inner arrays, in which case accessing an element would be array lookup + pointer dereference + array lookup, which is probably slower than the index arithmetic, though it would not be worth optimizing unless you know that this is a bottleneck.
oneD_index = 3 * y + x;
Where x is the position within the row and y the position in the column. Instead of 3 you use your column width. This way you can convert your 2D coordinates to a 1D coordinate.
Related
I have a 2D array, and have computed necessary updates along a given dimension of it using a 1D array (said updates can't be computed in place as earlier calculations would override values needed in later calculations). I thus want to copy the updates into my 2D array. The most obvious way to do this would, at first glance, appear to be to use Array slicing and Array.blit.
I have tried the approach of extracting the relevant dimension using array slicing, and then blitting across to that, but that doesn't update the values inside the 2D array. I think what is happening is that a new, separate, 1D array is being created when I make the slice, and the values are being blitted into that new array, which of course is dropped a moment later when it goes back out of scope.
I suppose you could say that I was expecting the slicing to return a view into the 2D array which would work for the blit function call, but instead the slicing actually returns a new array with the values copied into it (which, thinking about it, is what slicing does otherwise, I believe).
Currently I am using a workaround whereby I create a 2D array, where one of the dimensions is only 1 element wide (thus effectively re-creating a 1D array), and then using Array2D.blit. I would prefer to do it directly though, both because I find this ugly, and moreover because it would be quite useful elsewhere in my program where I can't just declare a 1D array as 2D.
My first approach:
let srcArray = Array.zeroCreate srcArrayLength
... // do relevant computation
srcArray.[index] <- result
... // finish computation
Array.blit srcArray 0 destArray.[index, *] 0 srcArrayLength
My current approach:
let srcArray = Array2D.zeroCreate 1 srcArrayLength
... // do relevant computation
srcArray.[0,index] <- result
... // finish computation
Array2D.blit srcArray 0 0 destArray index 0 1 srcArrayLength
The former approach has no effect on my destination 2D array. The latter approach works where I use it, but as I said above it isn't nice, and cannot be used in another situation, where I have a jagged 2D array (i.e. 'a[][]) that I would like to blit across from.
How might I go about achieiving my aim? I thought of Span/Memory, but it wasn't clear to me if and how they could be used here. Alternatively, if you can spot a better way to do this that doesn't involve blit, I'm all-virtual-ears.
I figured out a fairly good solution to this, with the help of someone over in the F# Foundation Slack. Since nobody else has posted an answer, I'll put this one up.
Both Array.Copy (note that that is the .NET Array.Copy method, not the F#-specific Array.copy) and Buffer.BlockCopy were suggested to me. Array.Copy still complains about mismatching array types, but Buffer.BlockCopy ignores the dimensionality of the supplied array, and merely copies the specified number of bytes from one location to another. Using this and relying on the fact that 2D arrays are really stored as 1D arrays in row-major order (the same as C, I believe), it is quite possible to overwrite the last dimension of a multi-dimensional array reasonably cleanly.
I updated the code from the 'current approach' in my question to the below:
let srcArray = Array.zeroCreate srcArrayLength
... //do relevant computation
srcArray.[index] <- result
... //finish computation
Buffer.BlockCopy(srcArray, 0, destArray, firstDimIndex * lengthOfSecondDim * sizeof<'a>, lengthOfSecondDim * sizeof<'a>
Not only does it do the job in a way which I personally find a bit tidier, but it has a side-benefit in that it is noticeably faster than the second approach described in the question - I haven't yet run a benchmark to quantify the difference though.
I am running the following code:
for i in range(1000)
My_Array=numpy.concatenate((My_Array,New_Rows[i]), axis=0)
The above code is slow. Is there any faster approach?
This is basically what is happening in all algorithms based on arrays.
Each time you change the size of the array, it needs to be resized and every element needs to be copied. This is happening here too. (some implementations reserve some empty slots; e.g. doubling space of internal memory with each growing).
If you got your data at np.array creation-time, just add these all at once (memory will allocated only once then!)
If not, collect them with something like a linked list (allowing O(1) appending-operations). Then read it in your np.array at once (again only one memory allocation).
This is not much of a numpy-specific topic, but much more about data-strucures.
Edit: as this quite vague answer got some upvotes, i feel the need to make clear that my linked-list approach is one possible example. As indicated in the comment, python's lists are more array-like (and definitely not linked-lists). But the core-fact is: list.append() in python is fast (amortized: O(1)) while that's not true for numpy-arrays! There is also a small part about the internals in the docs:
How are lists implemented?
Python’s lists are really variable-length arrays, not Lisp-style linked lists. The implementation uses a contiguous array of references to other objects, and keeps a pointer to this array and the array’s length in a list head structure.
This makes indexing a list a[i] an operation whose cost is independent of the size of the list or the value of the index.
When items are appended or inserted, the array of references is resized. Some cleverness is applied to improve the performance of appending items repeatedly; when the array must be grown, some extra space is allocated so the next few times don’t require an actual resize.
(bold annotations by me)
Maybe creating an empty array with the correct size and than populating it?
if you have a list of arrays with same dimensions you could
import numpy as np
arr = np.zeros((len(l),)+l[0].shape)
for i, v in enumerate(l):
arr[i] = v
works much faster for me, it only requires one memory allocation
It depends on what New_Rows[i] is, and what kind of array do you want. If you start with lists (or 1d arrays) that you want to join end to end (to make a long 1d array) just concatenate them all at once. Concatenate takes a list of any length, not just 2 items.
np.concatenate(New_Rows, axis=0)
or maybe use an intermediate list comprehension (for more flexibility)
np.concatenate([row for row in New_Rows])
or closer to your example.
np.concatenate([New_Rows[i] for i in range(1000)])
But if New_Rows elements are all the same length, and you want a 2d array, one New_Rows value per row, np.array does a nice job:
np.array(New_Rows)
np.array([i for i in New_Rows])
np.array([New_Rows[i] for i in range(1000)])
np.array is designed primarily to build an array from a list of lists.
np.concatenate can also build in 2d, but the inputs need to be 2d to start with. vstack and stack can take care of that. But all those stack functions use some sort of list comprehension followed by concatenate.
In general it is better/faster to iterate or append with lists, and apply the np.array (or concatenate) just once. appending to a list is fast; much faster than making a new array.
I think #thebeancounter 's solution is the way to go.
If you do not know the exact size of your numpy array ahead of time, you can also take an approach similar to how vector class is implemented in C++.
To be more specific, you can wrap the numpy ndarray into a new class which has a default size which is larger than your current needs. When the numpy array is almost fully populated, copy the current array to a larger one.
Assume you have a large list of 2D numpy arrays, with the same number of columns and different number of rows like this :
x = [numpy_array1(r_1, c),......,numpy_arrayN(r_n, c)]
concatenate like this:
while len(x) != 1:
if len(x) == 2:
x = np.concatenate((x[0], x[1]))
break
for i in range(0, len(x), 2):
if (i+1) == len(x):
x[0] = np.concatenate((x[0], x[i]))
else:
x[i] = np.concatenate((x[i], x[i+1]))
x = x[::2]
Ok, I have a some array A[4][4], and another A[16], which are both just different representations of each other. Now, I am given an elements position on the 2D array, but I have to access it from the 1D array. IE, if I'm told to access the element A[2][0] in my 1D array, how would I do that?
In this simple example, A[2][0] maps to A[8] since you are requesting the first element in the third group of four. Similarly, A[0][2] maps to A[2] since you are requesting the third element of the first group of four. In general (A[i][j]) you are requesting the (i*4+j)-th element of A.
In the even more general case, you are requesting the (i*width+j)-th element.
This depends on your programming language and choice of array type. Depending on your language, arrays are either stored in row-major order or column-major order:
Edit: Java does not have multidimensional arrays, as per the documentation: In Java, a multi-dimensional array is structured an array of arrays, i.e. an array whose elements are references to array objects. This means that each row can be of different length and therefore the storage format is neither row-major nor column-major.
Row-major order is used in C/C++, PL/I, Python, Speakeasy and others. Column-major order is used in Fortran, MATLAB, GNU Octave, R, Julia, Rasdaman, and Scilab.
In some languages, you can also choose the order (e.g. MATLAB)
For row-major order, A[2][0] would be at A[2*4+0] (where 4 is the size of one row):
offset = row*NUMCOLS + column
For column-major order, A[2][0] would be at A[0*4+2] (where 4 is the size of one column):
offset = row + column*NUMROWS
It really depends on your programming language!
Does anyone know how to do array matrix multiplication in matlab? i.e. I have two 3 dimensional arrays consisting of sets of matrices in the first 2 dimensions and I would like to multiply each matrix in the first array with the corresponding one in the second array. So, i.e. if
A=randn(3,3);
B=cat(3,A,A);
I would like [[operation]] such that
B[[operation]]B = cat(3,A*A, A*A)
done in efficient vector form.
Many thanks in advance.
I have used MULTIPROD from the Mathworks FileExchange for N-D array multiplication before. It is basically an extension of bsxfun to N-D arrays, and works quite nicely (and fast) - although the interface is a bit cumbersome.
I have a 3D array in MATLAB, with size(myArray) = [100 100 50]. Now, I'd like to get a specific layer, specified by an index in the first dimension, in the form of a 2D matrix.
I tried myMatrix = myArray(myIndex,:,:);, but that gives me a 3D array with size(myMatrix) = [1 100 50].
How do I tell MATLAB that I'm not interested in the first dimension (since there's only one layer), so it can simplify the matrix?
Note: I will need to do this with the second index also, rendering size(myMatrix) = [100 1 50] instead of the desired [100 50]. A solution should be applicable to both cases, and preferably to the third dimension as well.
Use the squeeze function, which removes singleton dimensions.
Example:
A=randn(4,50,100);
B=squeeze(A(1,:,:));
size(B)
ans =
50 100
This is generalized and you needn't worry about which dimension you're indexing along. All singleton dimensions are squeezed out.
reshape(myArray(myIndex,:,:),[100,50])
squeeze, reshape and permute are probably the three most important functions when dealing with N-D matrices. Just to have an example how to use the third function:
A=randn(4,50,100);
B=permute(A(1,:,:),[2,3,1])