Choose conditionally column-wise or row-wise iteration - arrays

I would like to iterate over an array but I want to pick whether the iteration is column wise or row wise. In other words, I want to define each time at runtime, whether rows or cols goes to the outer loop, against a condition. The dummy implementation of course would be:
if cond:
for rows:
for cols:
ar[rows][cols];
elif !cond:
for cols:
for rows:
ar[rows][cols];
Now, is there a compressed way to express the above implementation?
Unfortunately, going over all cases (my array is 4-dimensional, so I have 16 cases) is not the best way to go.
So, is there any algorithmic approach that compresses these loops into one loop?

Statically-allocated arrays of the same type can be treated as being 1-dimensional if you're careful - in this case, you can calculate the Nth-offset within it (perhaps with a new function) and iterate over those indices rather than by the I,J,K,L-th member.
Alternatively, if you have a language which doesn't allocate static arrays or you can't use one for some reason (fully dynamic type system?), you may instead be able to put the starting pointers into a new array and iterate over that array (which is cheap as it'll only have 4 members in your case), using each as the starting point!
You may find you need to create a logical view into the first array from the second with a different ordering.
In both cases, you likely want to wrap with a modulo % operator, which will allow you to keep adding your sub-indices together while calculating offsets that wrap around the length of your array.

Related

Dynamic array guarantee clarification

In Skiena's Algorithm Design Manual, he mentions at one point:
The primary thing lost using dynamic arrays is the guarantee that each array
access takes constant time in the worst case. Now all the queries will be fast, except
for those relatively few queries triggering array doubling. What we get instead is a
promise that the nth array access will be completed quickly enough that the total
effort expended so far will still be O(n).
I'm struggling to understand this. How will an array query expand the array?
Dynamic arrays are arrays where the size does not need to be specified (Think of an ArrayList in java). Under the hood, dynamic arrays are implemented using a regular array. Though, because it's a regular array the implementation of the ArrayList needs to specify the size of the underlying array.
So the typical way to handle this in dynamic arrays is to initialize the standard array with a certain amount of elements, then when it reached it's maximum elements, the array is doubled in size.
Because of this underlying functionality, most of the time it will take constant time when adding to a dynamic array, but occasionally it will double the size of the 'under the hood' standard array which will take longer than the normal add time.
If your confusion lies with his use of the word 'query', I believe he means to say 'adding or removing from the array' because a simple 'get' query shouldn't be related to the underlying standard array size.

Most efficient way to add a constant to a column of an array in LabVIEW?

I want to add a constant to the second column of an array.
I do this as shown below:
Where for illustration the values are as follows:
What is the most efficient way of adding a constant to an array column?
With a question about efficiency you should supply number. For anything lower than a 1000 x 1000 2D array I can't measure the difference. Usually it is best to simply test it.
Here the code for testing (same answer as crossrulz)
With a 10000 x 10000 array option 2 becomes about 10 times faster.
One comment unless you are in a very high demanding situation, readability is usually preferred over efficiency. In my opinion option 2 is more readable since it has no for loop and the constant is presented as a constant instead of an array.
But you can get more efficient than that by using the In Place Element structure. The image below shows two different ways to add 5 to a column. The second one avoids making a memory copy of the entire array. Indexing out a column of an array with Index Array and then modifying it requires a shift of underlying memory format, even though the array is going to be put back in the Replace Array Subset. The In Place Element structure gives enough context to LabVIEW for it to recognize that the Add can be done without data copies.
Index Array to get the second column, add your constant, and then Replace Array Subset to replace the second column.

Numpy concatenate is slow: any alternative approach?

I am running the following code:
for i in range(1000)
My_Array=numpy.concatenate((My_Array,New_Rows[i]), axis=0)
The above code is slow. Is there any faster approach?
This is basically what is happening in all algorithms based on arrays.
Each time you change the size of the array, it needs to be resized and every element needs to be copied. This is happening here too. (some implementations reserve some empty slots; e.g. doubling space of internal memory with each growing).
If you got your data at np.array creation-time, just add these all at once (memory will allocated only once then!)
If not, collect them with something like a linked list (allowing O(1) appending-operations). Then read it in your np.array at once (again only one memory allocation).
This is not much of a numpy-specific topic, but much more about data-strucures.
Edit: as this quite vague answer got some upvotes, i feel the need to make clear that my linked-list approach is one possible example. As indicated in the comment, python's lists are more array-like (and definitely not linked-lists). But the core-fact is: list.append() in python is fast (amortized: O(1)) while that's not true for numpy-arrays! There is also a small part about the internals in the docs:
How are lists implemented?
Python’s lists are really variable-length arrays, not Lisp-style linked lists. The implementation uses a contiguous array of references to other objects, and keeps a pointer to this array and the array’s length in a list head structure.
This makes indexing a list a[i] an operation whose cost is independent of the size of the list or the value of the index.
When items are appended or inserted, the array of references is resized. Some cleverness is applied to improve the performance of appending items repeatedly; when the array must be grown, some extra space is allocated so the next few times don’t require an actual resize.
(bold annotations by me)
Maybe creating an empty array with the correct size and than populating it?
if you have a list of arrays with same dimensions you could
import numpy as np
arr = np.zeros((len(l),)+l[0].shape)
for i, v in enumerate(l):
arr[i] = v
works much faster for me, it only requires one memory allocation
It depends on what New_Rows[i] is, and what kind of array do you want. If you start with lists (or 1d arrays) that you want to join end to end (to make a long 1d array) just concatenate them all at once. Concatenate takes a list of any length, not just 2 items.
np.concatenate(New_Rows, axis=0)
or maybe use an intermediate list comprehension (for more flexibility)
np.concatenate([row for row in New_Rows])
or closer to your example.
np.concatenate([New_Rows[i] for i in range(1000)])
But if New_Rows elements are all the same length, and you want a 2d array, one New_Rows value per row, np.array does a nice job:
np.array(New_Rows)
np.array([i for i in New_Rows])
np.array([New_Rows[i] for i in range(1000)])
np.array is designed primarily to build an array from a list of lists.
np.concatenate can also build in 2d, but the inputs need to be 2d to start with. vstack and stack can take care of that. But all those stack functions use some sort of list comprehension followed by concatenate.
In general it is better/faster to iterate or append with lists, and apply the np.array (or concatenate) just once. appending to a list is fast; much faster than making a new array.
I think #thebeancounter 's solution is the way to go.
If you do not know the exact size of your numpy array ahead of time, you can also take an approach similar to how vector class is implemented in C++.
To be more specific, you can wrap the numpy ndarray into a new class which has a default size which is larger than your current needs. When the numpy array is almost fully populated, copy the current array to a larger one.
Assume you have a large list of 2D numpy arrays, with the same number of columns and different number of rows like this :
x = [numpy_array1(r_1, c),......,numpy_arrayN(r_n, c)]
concatenate like this:
while len(x) != 1:
if len(x) == 2:
x = np.concatenate((x[0], x[1]))
break
for i in range(0, len(x), 2):
if (i+1) == len(x):
x[0] = np.concatenate((x[0], x[i]))
else:
x[i] = np.concatenate((x[i], x[i+1]))
x = x[::2]

Delete a specific element from a Fortran array

Is there a function in Fortran that deletes a specific element in an array, such that the array upon deletion shrinks its length by the number of elements deleted?
Background:
I'm currently working on a project which contain sets of populations with corresponding descriptions to the individuals (i.e, age, death-age, and so on).
A method I use is to loop through the array, find which elements I need, place it in another array, and deallocate the previous array and before the next time step, this array is moved back to the array before going through the subroutines to find once again the elements not needed.
You can use the PACK intrinsic function and intrinsic assignment to create an array value that is comprised of selected elements from another array. Assuming array is allocatable, and the elements to be removed are nominated by a logical mask logical_mask that is the same size as the original value of array:
array = PACK(array, .NOT. logical_mask)
Succinct syntax for a single element nominated by its index is:
array = [array(:index-1), array(index+1:)]
Depending on your Fortran processor, the above statements may result in the compiler creating temporaries that may impact performance. If this is problematic then you will need to use the subroutine approach that you describe.
Maybe you want to look into linked lists. You can insert and remove items and the list automatically resizes. This resource is pretty good.
http://www.iag.uni-stuttgart.de/IAG/institut/abteilungen/numerik/images/4/4c/Pointer_Introduction.pdf
To continue the discussion, the solution you might want to implement depends on the number of delete operation and access you do, where you insert/delete the elements (the first, the last, randomly in the set?), how do you access the data (from the first to the last, randomly in the set?), what are your efficiency requirements in terms of CPU and memory.
Then you might want to go for linked list or for static or dynamic vectors (other types of data structures might also fit better your needs).
For example:
a static vector can be used when you want to access a lot of elements randomly and know the maximum number nmax of elements in the vector. Simply use an array of nmax elements with an associated length variable that will track the last element. A deletion can simply and quickly be done my exchanging the last element with the deleted one and reducing the length.
a dynamic vector can be implemented when you don't know the maximum number of elements. In order to avoid systematic array allocation+copy+unallocation at for each deletion/insertion, you fix the maximum number of elements (as above) and only augment its size (eg. nmax becomes 10*nmax, then reallocate and copy) when reaching the limit (the reverse system can also be implemented to reduce the number of elements).

Is there a way to map a list of integers to a unique number or a unique hash?

The permutation of the list of integers should also be preserved in the hash -- i.e., lists containing the same numbers in a different order should have different hashes.
One way to do this would be to concatenate the list of integers into a string, but this could be an expensive comparison test if the list is massive.
Context: If I already have 5 large arrays 'analyzed' and hashed away, I would be able to quickly check whether an incoming array is new or not.
https://en.wikipedia.org/wiki/Pigeonhole_principle
"In mathematics, the pigeonhole principle states that if n items are put into m containers, with n > m, then at least one container must contain more than one item"
It is certainly possible to create a unique number, its just that its hilariously huge.
Consider
[1,2,3]
A simple list, but to make sure we have enough holes for our pigeons, we would need to have space for the largest integer in each slot, so assuming 4 bytes per item, we would need a 12 byte integer to store the hash uniquely, or ~3.4028237e+38 different values. And that's only 3 integers.
No, an efficient hash is rarely unique, but a good hash is unlikely to have collisions for similar values.
To answer your question about checking for existence, consider the following:
If you have an array of n items, in order to hash it, you need to take n steps. In order to check for existence, you need, at worst, n steps to check each item in turn.
In either case, you are going to be spending about the same amount time comparing arrays.
An array structure seems to be a perfect choice where the index differentiate between elements, or you can use a list of elements where an element has an index value assigned to just before insertion.
Never use a String as a list structure, because it has it's own properties, like immutability (in the case of Java).

Resources