Turning thousands of elements from a light curve disk into an array - arrays

I am a bit new to this, and have been assigned something quite simple. However, I can't seem to get the job done.
I'm working on light curves for a research product and the data has up to 10,000 elements (sometimes more). My professor asked me to simply turn all these elements into an array with n-elements of dimension so that when we print the data, all 10,000 elements are not loading and we can just call the element we want from the array.
How do I go about approaching this? Feel free to ask for more clarification!
The data file that I call given my prof's routine called ldsk; it tells me how big the file is.

My professor asked me to simply turn all these elements into an array with n-elements...
Let me use the variables ej for the j-th element of an array. Assume each of your ~10,000 values corresponds to a different element, ej, then you can create the array doing the following:
array = [e1, e2, e3, e4, ..., eN]
where N = ~10,000 and the ... was used to avoid having to type out 10,000 terms. Does that make more sense?
Make sure you use []'s in IDL, not ()'s to create arrays. Newer versions under different compiling options can mistake the ()'s for the calling of a function.
we can just call the element we want from the array.
Once created, you can index the array and print to the screen using the following (assume you want the j-th element):
PRINT, array[j]
where j can be any number between 1 and N - 1 (since IDL starts indexing from zero).

Related

Storing and replacing values in array continuously

I'm trying to read amplitude from a waveform and shine a green, yellow or red light depending on the amplitude of the signal. I'm fairly new to labVIEW and couldnt get my idea that wouldve worked with any other programming language I know to work. What I'm trying to do is take the value of the signal and for everytime it updates I'll store the value of the amplitude into an index of a large array. With each measurement being stored in the n+1 index of the array.
After a certain amount of data points I want to start over and replace values in the array (I use the formula node with the modulus for this). By keeping a finite amount of indexes to check for max value I restrict my amplitude check to a certain time period.
However my problem is that whenever I use the replace array subset to insert a new value into index n, all the other index points get erased. Rendering it pretty much useless. I was thinking its the Initialize array causing problems but I just cant seem to wrap my head around what to do here.
I tried creating just basic arrays in the front panel, but those either are control or indicator arrays and can't seem to be both written and read from, its either control (read but not write) or indicate(write but not read)?. Maybe its just not possible to do what I had in mind in an eloquent way in LabVIEW. If its not possible to do this with arrays in LabVIEW I will look for a different way to do it.
I'm pretty sure I got most of the rest of the code down except for an unfinished part here and there. Its just my issue with the arrays not working as I want them too.
I expected the array to retain its previously inputted data for index n-1 when index n is inputted. And only to be replaced once the index has come back to that specific point.
Instead its like a new array is initialized every time a new index is input.
download link for the VI
What you want to do:
Transport the content of the modified array into the next iteration of the WHILE loop.
What happens:
On each iteration, the content of the array is the same. It is the content of the initial array you created outside.
To solve this, right-click the orange square on the left border of the loop, and make it a "shift register". The symbol changes, and a similar symbol appears on the right border. Now, wire the modified array to the symbol on the right. What flows out into that symbol on the right, comes in from the left symbol on the next iteration.
Edit:
I have optimized your code a little. There is a modulo function, and an IF clause can handle ranges. ..3 means "values lower or equal 3". The next case is "Default", the next "7..". Unfortunately, this only works for integers. Otherwise, one would use nested IF clauses with the < comparator or similar.

How to blit from a 1D array along a dimension of a 2D array?

I have a 2D array, and have computed necessary updates along a given dimension of it using a 1D array (said updates can't be computed in place as earlier calculations would override values needed in later calculations). I thus want to copy the updates into my 2D array. The most obvious way to do this would, at first glance, appear to be to use Array slicing and Array.blit.
I have tried the approach of extracting the relevant dimension using array slicing, and then blitting across to that, but that doesn't update the values inside the 2D array. I think what is happening is that a new, separate, 1D array is being created when I make the slice, and the values are being blitted into that new array, which of course is dropped a moment later when it goes back out of scope.
I suppose you could say that I was expecting the slicing to return a view into the 2D array which would work for the blit function call, but instead the slicing actually returns a new array with the values copied into it (which, thinking about it, is what slicing does otherwise, I believe).
Currently I am using a workaround whereby I create a 2D array, where one of the dimensions is only 1 element wide (thus effectively re-creating a 1D array), and then using Array2D.blit. I would prefer to do it directly though, both because I find this ugly, and moreover because it would be quite useful elsewhere in my program where I can't just declare a 1D array as 2D.
My first approach:
let srcArray = Array.zeroCreate srcArrayLength
... // do relevant computation
srcArray.[index] <- result
... // finish computation
Array.blit srcArray 0 destArray.[index, *] 0 srcArrayLength
My current approach:
let srcArray = Array2D.zeroCreate 1 srcArrayLength
... // do relevant computation
srcArray.[0,index] <- result
... // finish computation
Array2D.blit srcArray 0 0 destArray index 0 1 srcArrayLength
The former approach has no effect on my destination 2D array. The latter approach works where I use it, but as I said above it isn't nice, and cannot be used in another situation, where I have a jagged 2D array (i.e. 'a[][]) that I would like to blit across from.
How might I go about achieiving my aim? I thought of Span/Memory, but it wasn't clear to me if and how they could be used here. Alternatively, if you can spot a better way to do this that doesn't involve blit, I'm all-virtual-ears.
I figured out a fairly good solution to this, with the help of someone over in the F# Foundation Slack. Since nobody else has posted an answer, I'll put this one up.
Both Array.Copy (note that that is the .NET Array.Copy method, not the F#-specific Array.copy) and Buffer.BlockCopy were suggested to me. Array.Copy still complains about mismatching array types, but Buffer.BlockCopy ignores the dimensionality of the supplied array, and merely copies the specified number of bytes from one location to another. Using this and relying on the fact that 2D arrays are really stored as 1D arrays in row-major order (the same as C, I believe), it is quite possible to overwrite the last dimension of a multi-dimensional array reasonably cleanly.
I updated the code from the 'current approach' in my question to the below:
let srcArray = Array.zeroCreate srcArrayLength
... //do relevant computation
srcArray.[index] <- result
... //finish computation
Buffer.BlockCopy(srcArray, 0, destArray, firstDimIndex * lengthOfSecondDim * sizeof<'a>, lengthOfSecondDim * sizeof<'a>
Not only does it do the job in a way which I personally find a bit tidier, but it has a side-benefit in that it is noticeably faster than the second approach described in the question - I haven't yet run a benchmark to quantify the difference though.

Numpy concatenate is slow: any alternative approach?

I am running the following code:
for i in range(1000)
My_Array=numpy.concatenate((My_Array,New_Rows[i]), axis=0)
The above code is slow. Is there any faster approach?
This is basically what is happening in all algorithms based on arrays.
Each time you change the size of the array, it needs to be resized and every element needs to be copied. This is happening here too. (some implementations reserve some empty slots; e.g. doubling space of internal memory with each growing).
If you got your data at np.array creation-time, just add these all at once (memory will allocated only once then!)
If not, collect them with something like a linked list (allowing O(1) appending-operations). Then read it in your np.array at once (again only one memory allocation).
This is not much of a numpy-specific topic, but much more about data-strucures.
Edit: as this quite vague answer got some upvotes, i feel the need to make clear that my linked-list approach is one possible example. As indicated in the comment, python's lists are more array-like (and definitely not linked-lists). But the core-fact is: list.append() in python is fast (amortized: O(1)) while that's not true for numpy-arrays! There is also a small part about the internals in the docs:
How are lists implemented?
Python’s lists are really variable-length arrays, not Lisp-style linked lists. The implementation uses a contiguous array of references to other objects, and keeps a pointer to this array and the array’s length in a list head structure.
This makes indexing a list a[i] an operation whose cost is independent of the size of the list or the value of the index.
When items are appended or inserted, the array of references is resized. Some cleverness is applied to improve the performance of appending items repeatedly; when the array must be grown, some extra space is allocated so the next few times don’t require an actual resize.
(bold annotations by me)
Maybe creating an empty array with the correct size and than populating it?
if you have a list of arrays with same dimensions you could
import numpy as np
arr = np.zeros((len(l),)+l[0].shape)
for i, v in enumerate(l):
arr[i] = v
works much faster for me, it only requires one memory allocation
It depends on what New_Rows[i] is, and what kind of array do you want. If you start with lists (or 1d arrays) that you want to join end to end (to make a long 1d array) just concatenate them all at once. Concatenate takes a list of any length, not just 2 items.
np.concatenate(New_Rows, axis=0)
or maybe use an intermediate list comprehension (for more flexibility)
np.concatenate([row for row in New_Rows])
or closer to your example.
np.concatenate([New_Rows[i] for i in range(1000)])
But if New_Rows elements are all the same length, and you want a 2d array, one New_Rows value per row, np.array does a nice job:
np.array(New_Rows)
np.array([i for i in New_Rows])
np.array([New_Rows[i] for i in range(1000)])
np.array is designed primarily to build an array from a list of lists.
np.concatenate can also build in 2d, but the inputs need to be 2d to start with. vstack and stack can take care of that. But all those stack functions use some sort of list comprehension followed by concatenate.
In general it is better/faster to iterate or append with lists, and apply the np.array (or concatenate) just once. appending to a list is fast; much faster than making a new array.
I think #thebeancounter 's solution is the way to go.
If you do not know the exact size of your numpy array ahead of time, you can also take an approach similar to how vector class is implemented in C++.
To be more specific, you can wrap the numpy ndarray into a new class which has a default size which is larger than your current needs. When the numpy array is almost fully populated, copy the current array to a larger one.
Assume you have a large list of 2D numpy arrays, with the same number of columns and different number of rows like this :
x = [numpy_array1(r_1, c),......,numpy_arrayN(r_n, c)]
concatenate like this:
while len(x) != 1:
if len(x) == 2:
x = np.concatenate((x[0], x[1]))
break
for i in range(0, len(x), 2):
if (i+1) == len(x):
x[0] = np.concatenate((x[0], x[i]))
else:
x[i] = np.concatenate((x[i], x[i+1]))
x = x[::2]

Delete a specific element from a Fortran array

Is there a function in Fortran that deletes a specific element in an array, such that the array upon deletion shrinks its length by the number of elements deleted?
Background:
I'm currently working on a project which contain sets of populations with corresponding descriptions to the individuals (i.e, age, death-age, and so on).
A method I use is to loop through the array, find which elements I need, place it in another array, and deallocate the previous array and before the next time step, this array is moved back to the array before going through the subroutines to find once again the elements not needed.
You can use the PACK intrinsic function and intrinsic assignment to create an array value that is comprised of selected elements from another array. Assuming array is allocatable, and the elements to be removed are nominated by a logical mask logical_mask that is the same size as the original value of array:
array = PACK(array, .NOT. logical_mask)
Succinct syntax for a single element nominated by its index is:
array = [array(:index-1), array(index+1:)]
Depending on your Fortran processor, the above statements may result in the compiler creating temporaries that may impact performance. If this is problematic then you will need to use the subroutine approach that you describe.
Maybe you want to look into linked lists. You can insert and remove items and the list automatically resizes. This resource is pretty good.
http://www.iag.uni-stuttgart.de/IAG/institut/abteilungen/numerik/images/4/4c/Pointer_Introduction.pdf
To continue the discussion, the solution you might want to implement depends on the number of delete operation and access you do, where you insert/delete the elements (the first, the last, randomly in the set?), how do you access the data (from the first to the last, randomly in the set?), what are your efficiency requirements in terms of CPU and memory.
Then you might want to go for linked list or for static or dynamic vectors (other types of data structures might also fit better your needs).
For example:
a static vector can be used when you want to access a lot of elements randomly and know the maximum number nmax of elements in the vector. Simply use an array of nmax elements with an associated length variable that will track the last element. A deletion can simply and quickly be done my exchanging the last element with the deleted one and reducing the length.
a dynamic vector can be implemented when you don't know the maximum number of elements. In order to avoid systematic array allocation+copy+unallocation at for each deletion/insertion, you fix the maximum number of elements (as above) and only augment its size (eg. nmax becomes 10*nmax, then reallocate and copy) when reaching the limit (the reverse system can also be implemented to reduce the number of elements).

algorithm/data structure for this "enumerating all possibilities" task (combinatorial objects)

This is probably a common question that arises in search/store situations and there is a standard answer. I'm trying to do this from intuition and am somewhat out of my comfort zone.
I'm attempting to generate all of a certain kind of combinatorial object. Each object of size n can be generated from an object of size n-1, usually in multiple ways. From the single object of size 2, my search generates 6 objects of size 3, about 140 objects of size 4, and about 29,000 objects of size 5. As I generate the objects, I store them in a globally declared array. Before storing each object, I have to check all the previous ones stored for that size, to make sure I didn't generate it already from an earlier (n-1)-object. I currently do this in a naive way, which is just that I go through all the objects currently sitting in the array and compare them to the one currently being generated. Only if it's different from every single one there do I add it to the array and increment the number of objects currently in there. The new object is just added as the most recent object in the array, it is not sorted, and so this is obviously inefficient, and I can't hope to generate the objects of size 6 in this way.
(To give an idea of the problem of the growth of the array: the first couple of 4-objects, from among the 140 or so, give rise to over 2000 new 5-objects in a fraction of a second. By the time I've gotten to the last few 4-objects, with over 25,000 5-objects already stored, each 4-object generates only a handful of previously unseen 5-objects, but takes several seconds for the process for each 4-object. There is very little correlation between the order I generate new objects in, and their eventual position as a consequence of the comparison function I'm using.)
Obviously if I had a sorted array of objects, it would be much more efficient to find out whether I'm looking at a new object: using a binary midpoint search strategy I'd only have to look at roughly log_2(n) of the n objects currently stored, instead of all n of them. But placing the newly generated object at the right place in an array means moving half of the existing ones, on average, to make room for it. (I would implement this with an array of pointers pointing to the unsorted array of object structs, so that I only had to move pointers instead of moving data, but it still seems like a lot of pointers to have to repoint at each insert.)
The other option would be to place the objects in a linked list, as insertion is very cheap in that situation. But then I wouldn't have random access to the elements in the linked list--you can only find the right place to insert the newly generated object (if it's actually new) by traversing the list node by node and comparing. On average you'd have to traverse half the list before finding the right insertion point, which doesn't sound any better than repointing half the pointers.
Is there a third choice I'm missing? I could accomplish this very easily if I had both random access to stored elements so I could find the insertion point quickly (in log_2(n) steps), and I could insert new objects very cheaply, like in a linked list. Am I dreaming?
To summarise: I need to be able to determine whether an object is new or duplicates an existing one, and I need to be able to insert an object at the right place. I don't ever need to delete an object. Thank you.

Resources