Is there a function in Fortran that deletes a specific element in an array, such that the array upon deletion shrinks its length by the number of elements deleted?
Background:
I'm currently working on a project which contain sets of populations with corresponding descriptions to the individuals (i.e, age, death-age, and so on).
A method I use is to loop through the array, find which elements I need, place it in another array, and deallocate the previous array and before the next time step, this array is moved back to the array before going through the subroutines to find once again the elements not needed.
You can use the PACK intrinsic function and intrinsic assignment to create an array value that is comprised of selected elements from another array. Assuming array is allocatable, and the elements to be removed are nominated by a logical mask logical_mask that is the same size as the original value of array:
array = PACK(array, .NOT. logical_mask)
Succinct syntax for a single element nominated by its index is:
array = [array(:index-1), array(index+1:)]
Depending on your Fortran processor, the above statements may result in the compiler creating temporaries that may impact performance. If this is problematic then you will need to use the subroutine approach that you describe.
Maybe you want to look into linked lists. You can insert and remove items and the list automatically resizes. This resource is pretty good.
http://www.iag.uni-stuttgart.de/IAG/institut/abteilungen/numerik/images/4/4c/Pointer_Introduction.pdf
To continue the discussion, the solution you might want to implement depends on the number of delete operation and access you do, where you insert/delete the elements (the first, the last, randomly in the set?), how do you access the data (from the first to the last, randomly in the set?), what are your efficiency requirements in terms of CPU and memory.
Then you might want to go for linked list or for static or dynamic vectors (other types of data structures might also fit better your needs).
For example:
a static vector can be used when you want to access a lot of elements randomly and know the maximum number nmax of elements in the vector. Simply use an array of nmax elements with an associated length variable that will track the last element. A deletion can simply and quickly be done my exchanging the last element with the deleted one and reducing the length.
a dynamic vector can be implemented when you don't know the maximum number of elements. In order to avoid systematic array allocation+copy+unallocation at for each deletion/insertion, you fix the maximum number of elements (as above) and only augment its size (eg. nmax becomes 10*nmax, then reallocate and copy) when reaching the limit (the reverse system can also be implemented to reduce the number of elements).
Related
I am learning about arrays, single linked list and double linked list now a days and this question came that
" What is the best option between these three data structures when it comes to fast searching, less memory, easily insertion and updating of things "
As far I know array cannot be the answer because it has fixed size. If we want to insert a new thing. it wouldn't always be possible. Double linked list can do the task but there will be two pointers needed for each node so there will be memory problem, so I think single linked list will fulfill all given requirements. Am I right? Please correct me if I am missing any point. There is also one more question that instead of choosing one of them, can I make combination of one or more data structures given here to meet all the requirements?
"What is the best option between these three data structures when it comes to fast searching, less memory, easily insertion and updating of things".
As far as I can tell Arrays serve the purpose.
Fast search: You could do binary search if array is sorted. You dont get that option in linkedlist
Less memory: Arrays will take least memory (but contiguous memory )
Insertion: Inserting in array is a matter of a[i] = "value". If array size is exceeded then simply export data into a new array. That is exactly how HashMaps / ArrayLists work under covers.
Updating things: Only Arrays provide you with Random access. a[i] ="new value".. updated in O(1) time if you know the index.
Each of those has its own benefits and downsides.
For search speed, I'd say arrays are better suitable due to the quick lookup times.
Since an array is a sequence of same-size elements, retrieving the value at an index is just memoryLocation + index * elementSize. For a linked list, the whole list needs traversing.
Arrays also win in the "less memory" category, since there's no need to store extra pointers.
For insertions, arrays are slow. You'll need to traverse the array, copy contents to a new array, assign the new array, delete the old one...
Insertions go much quicker in linked- or double lists, because it's just a matter of changing one or two pointers.
In the end, it all just depends on the use case. Are you inserting a lot? Then you probably want to consider a non-array structure.
Do you need many quick lookups? Consider those arrays again. Etc..
See also this question.
A linked list is usually the best choice when we don’t know in advance the number of elements we will have to store or the number can change dynamically.
Arrays have slow insertion and deletion times. To insert an element to the front or middle of the array, the first step is to ensure that there is space in the array for the new element, otherwise, the array needs to be RESIZED. This is an expensive operation. The next step is to open space for the new element by shifting every element after the desired index. Likewise, for deletion, shifting is required after removing an element. This implies that insertion time for arrays is Big O of n (O(n)) as n elements must be shifted.
Using static arrays, we can save some extra memory in
comparison to linked lists because we do not need to store pointers to the next node
a doubly-linked list support fast insertion/removal at their ends. This is used in LRU cache, where you need to enter new item to front and remove the oldest item from the end.
This is probably a common question that arises in search/store situations and there is a standard answer. I'm trying to do this from intuition and am somewhat out of my comfort zone.
I'm attempting to generate all of a certain kind of combinatorial object. Each object of size n can be generated from an object of size n-1, usually in multiple ways. From the single object of size 2, my search generates 6 objects of size 3, about 140 objects of size 4, and about 29,000 objects of size 5. As I generate the objects, I store them in a globally declared array. Before storing each object, I have to check all the previous ones stored for that size, to make sure I didn't generate it already from an earlier (n-1)-object. I currently do this in a naive way, which is just that I go through all the objects currently sitting in the array and compare them to the one currently being generated. Only if it's different from every single one there do I add it to the array and increment the number of objects currently in there. The new object is just added as the most recent object in the array, it is not sorted, and so this is obviously inefficient, and I can't hope to generate the objects of size 6 in this way.
(To give an idea of the problem of the growth of the array: the first couple of 4-objects, from among the 140 or so, give rise to over 2000 new 5-objects in a fraction of a second. By the time I've gotten to the last few 4-objects, with over 25,000 5-objects already stored, each 4-object generates only a handful of previously unseen 5-objects, but takes several seconds for the process for each 4-object. There is very little correlation between the order I generate new objects in, and their eventual position as a consequence of the comparison function I'm using.)
Obviously if I had a sorted array of objects, it would be much more efficient to find out whether I'm looking at a new object: using a binary midpoint search strategy I'd only have to look at roughly log_2(n) of the n objects currently stored, instead of all n of them. But placing the newly generated object at the right place in an array means moving half of the existing ones, on average, to make room for it. (I would implement this with an array of pointers pointing to the unsorted array of object structs, so that I only had to move pointers instead of moving data, but it still seems like a lot of pointers to have to repoint at each insert.)
The other option would be to place the objects in a linked list, as insertion is very cheap in that situation. But then I wouldn't have random access to the elements in the linked list--you can only find the right place to insert the newly generated object (if it's actually new) by traversing the list node by node and comparing. On average you'd have to traverse half the list before finding the right insertion point, which doesn't sound any better than repointing half the pointers.
Is there a third choice I'm missing? I could accomplish this very easily if I had both random access to stored elements so I could find the insertion point quickly (in log_2(n) steps), and I could insert new objects very cheaply, like in a linked list. Am I dreaming?
To summarise: I need to be able to determine whether an object is new or duplicates an existing one, and I need to be able to insert an object at the right place. I don't ever need to delete an object. Thank you.
I want to make a 2D array "data" with the following dimensions: data(T,N)
T is a constant and N I dont know anything about to begin with. Is it possible to do something like this in fortran
do i = 1, T
check a few flags
if (all flags ok)
c = c+ 1
data(i,c) = some value
end if
end do
Basically I have no idea about the second dimension. Depending on some flags, if those flags are fine, I want to keep adding more elements to the array.
How can I do this?
There are several possible solutions. You could make data an allocatable array and guess the maximum value for N. As long as you don't excess N, you keep adding data items. If a new item would exceed the array size, you create a temporary array, copy data to the temporary array, deallocate data and reallocate with a larger dimension.
Another design choice would be to use a linked list. This is more flexible in that the length is indefinite. You loss "random access" in that the list is chained rather than indexed. You create an user defined type that contains various data, e.g., scalers, arrays, whatever, and also a pointer. When you add a list item, the pointer points to that next item. The is possible in Fortran >=90 since pointers are supported.
I suggest searching the web or reading a book about these data structures.
Assuming what you wrote is more-or-less how your code really goes, then you assuredly do know one thing: N cannot be greater than T. You would not have to change your do-loop, but you will definitely need to initialize data before the loop.
I want to store a small amount of items( less than 255) which have constant size (a c char )and be able to do the following operations:
Append a value to an arbitrary position and have the other items preserve their previous order.
Delete an item and have the other items preserve their order(as above).
Find the next and previous of an item.
I have tried using an array and making a function to add a value by moving all items after it a place forward.Same thing can happen with deleting, but it is too inefficient.Of course, I do not mind having to use a library, long as it is readily available and free.
Array - access: O(1), insert: O(n)
Double-linked list - access O(n), previous/next: O(1), insert(*): O(1)
RB tree with number of childs stored: O(log n) for all operations.
(*): You need the traverse the list first to get to the position (O(n)).
Note: no, the array is not messy, it's really simple to implement. Also as you can see, depending on the usage, it can be quite efficient.
Based on the number of elements, and your remark to array implementation you should stick to arrays.
You could use a double-linked list for it. However, this won't work if you want to keep the array behaviour (e.g. accessing elements quickly (O(1), for a LL it's O(n)) by their index)
I am very puzzled about this. Everywhere there is written "linked lists are faster than arrays" but no one makes the effort to say WHY. Using plain logic I can't understand how a linked list can be faster. In an array all cells are next to each other so as long as you know the size of each cell it's easy to reach one cell instantly. For example if there is a list of 10 integers and I want to get the value in the fourth cell then I just go directly to the start of the array+24 bytes and read 8 bytes from there.
In the other hand when you have a linked list and you want to get the element in the fourth place then you have to start from the beginning or end of the list(depending on if it's a single or double list) and go from one node to the other until you find what you're looking for.
So how the heck can going step by step be faster than going directly to an element?
This question title is misleading.
It asserts that linked lists are faster than arrays without limiting the scope well. There are a number of times when arrays can be significantly faster and there are a number of times when a linked list can be significantly faster: the particular case of linked lists "being faster" does not appear to be supported.
There are two things to consider:
The theoretical bounds of linked-lists vs. arrays in a particular operation; and
the real-world implementation and usage pattern including cache-locality and allocations.
As far as the access of an indexed element: The operation is O(1) in an array and as pointed out, is very fast (just an offset). The operation is O(k) in a linked list (where k is the index and may always be << n, depending) but if the linked list is already being traversed then this is O(1) per step which is "the same" as an array. If an array traversal (for(i=0;i<len;i++) is faster (or slower) depends upon particular implementation/language/run-time.
However, if there is a specific case where the array is not faster for either of the above operations (seek or traversal), it would be interesting to see to be dissected in more detail. (I am sure it is possible to find a language with a very degenerate implementation of arrays over lists cough Haskell cough)
Happy coding.
My simple usage summary: Arrays are good for indexed access and operations which involve swapping elements. The non-amortized re-size operation and extra slack (if required), however, may be rather costly. Linked lists amortize the re-sizing (and trade slack for a "pointer" per-cell) and can often excel at operations like "chopping out or inserting a bunch of elements". In the end they are different data-structures and should be treated as such.
Like most problems in programming, context is everything. You need to think about the expected access patterns of your data, and then design your storage system appropriately. If you insert something once, and then access it 1,000,000 times, then who cares what the insert cost is? On the other hand, if you insert/delete as often as you read, then those costs drive the decision.
Depends on which operation you are referring to. Adding or removing elements is a lot faster in a linked list than in an array.
Iterating sequentially over the list one by one is more or less the same speed in a linked list and an array.
Getting one specific element in the middle is a lot faster in an array.
And the array might waste space, because very often when expanding the array, more elements are allocated than needed at that point in time (think ArrayList in Java).
So you need to choose your data structure depending on what you want to do:
many insertions and iterating sequentially --> use a LinkedList
random access and ideally a predefined size --> use an array
Because no memory is moved when insertion is made in the middle of the array.
For the case you presented, its true - arrays are faster, you need arithmetic only to go from one element to another. Linked list require indirection and fragments memory.
The key is to know what structure to use and when.
Linked lists are preferable over arrays when:
a) you need constant-time insertions/deletions from the list (such as in real-time computing where time predictability is absolutely critical)
b) you don't know how many items will be in the list. With arrays, you may need to re-declare and copy memory if the array grows too big
c) you don't need random access to any elements
d) you want to be able to insert items in the middle of the list (such as a priority queue)
Arrays are preferable when:
a) you need indexed/random access to elements
b) you know the number of elements in the array ahead of time so that you can allocate the correct amount of memory for the array
c) you need speed when iterating through all the elements in sequence. You can use pointer math on the array to access each element, whereas you need to lookup the node based on the pointer for each element in linked list, which may result in page faults which may result in performance hits.
d) memory is a concern. Filled arrays take up less memory than linked lists. Each element in the array is just the data. Each linked list node requires the data as well as one (or more) pointers to the other elements in the linked list.
Array Lists (like those in .Net) give you the benefits of arrays, but dynamically allocate resources for you so that you don't need to worry too much about list size and you can delete items at any index without any effort or re-shuffling elements around. Performance-wise, arraylists are slower than raw arrays.
Reference:
Lamar answer
https://stackoverflow.com/a/393578/6249148
LinkedList is Node-based meaning that data is randomly placed in memory and is linked together by nodes (objects that point to another, rather than being next to one another)
Array is a set of similar data objects stored in sequential memory locations
The advantage of a linked list is that data doesn’t have to be sequential in memory. When you add/remove an element, you are simply changing the pointer of a node to point to a different node, not actually moving elements around. If you don’t have to add elements towards the end of the list, then accessing data is faster, due to iterating over less elements. However there are variations to the LinkedList such as a DoublyLinkedList which point to previous and next nodes.
The advantage of an array is that yes you can access any element O(1) time if you know the index, but if you don’t know the index, then you will have to iterate over the data.
The down side of an array is the fact that its data is stored sequentially in memory. If you want to insert an element at index 1, then you have to move every single element to the right. Also, the array has to keep resizing itself as it grows, basically copying itself in order to make a new array with a larger capacity. If you want to remove an element in the begging, then you will have to move all the elements to left.
Arrays are good when you know the index, but are costly as they grow.
The reason why people talk highly about linked lists is because the most useful and efficient data structures are node based.