What's the best way to delete elements from an object inside an iterator? - arrays

Suppose you want to delete an element from inside an iteration:
a = ['a','b','c','d','e'];
for i = 0 to len(a){
print a[i];
if (i==1) a.remove(i);
};
The output is a b d e, with c missing. This is a common bug that happens because you changed the array while it was still looping. Some workarounds include keeping a list of elements to delete after the loop, and updating the index after a deletion. How do you deal with this problem?

The most obvious approach is to iterate from the end of the array to the beginning:
a = ['a','b','c','d','e'];
for i = len(a)-1 downto 0 {
print a[i];
if (i==1) a.remove(i);
};
Many languages have iterators with support for deleting elements during forward iteration by telling the iterator to do the deletion. However, this does not always work (e.g., an iterator for the list returned by Java's Arrays.asList will not support deleting elements, because the list does not "own" the backing array.
If you have to forward iterate using an index, at least subtract 1 from the index when you delete the element. That way you won't skip elements.

It depends on whether your iterator access -- including the loop, which in your example iterates from 0 to 5, even though position 5 may no longer exist by the end -- is by concrete position in the collection, or by abstract reference to the next element.
For instance, iterators in Java are of the abstract type, so you can delete the pointed-to item and then reliably continue iterating through the remainder of them.
In a language like C, when iterating through an array you would more typically encounter the difficulty you describe and risk writing buggy code. Perhaps the best general solution is to accumulate the "to-delete" set of things which you want to delete, and then process them as a separate step. Only if you have full knowledge to the internal representation of the collection and how iteration works can you safely perform the deletion and then adjust the iterator -- and the termination condition of the loop -- correctly to continue iterating. This will, however, often by the case anyway if you are working with an array. It's just a question of whether the additional adjustment code is simpler than the code to maintain the "to-delete" set.

Your can delete it even inside an iteration via an iterator.
vector::iterator itVec = vectInt.begin();
for ( ; itVec != vectInt.end(); )
{
if (*itVec == 1022)
itVec = vectInt.erase(itVect);
else
++itVec;
}
the interator response to know the end of the elements (instead of via recording the length of the end), and free the memory.

Related

Delete a specific element from a Fortran array

Is there a function in Fortran that deletes a specific element in an array, such that the array upon deletion shrinks its length by the number of elements deleted?
Background:
I'm currently working on a project which contain sets of populations with corresponding descriptions to the individuals (i.e, age, death-age, and so on).
A method I use is to loop through the array, find which elements I need, place it in another array, and deallocate the previous array and before the next time step, this array is moved back to the array before going through the subroutines to find once again the elements not needed.
You can use the PACK intrinsic function and intrinsic assignment to create an array value that is comprised of selected elements from another array. Assuming array is allocatable, and the elements to be removed are nominated by a logical mask logical_mask that is the same size as the original value of array:
array = PACK(array, .NOT. logical_mask)
Succinct syntax for a single element nominated by its index is:
array = [array(:index-1), array(index+1:)]
Depending on your Fortran processor, the above statements may result in the compiler creating temporaries that may impact performance. If this is problematic then you will need to use the subroutine approach that you describe.
Maybe you want to look into linked lists. You can insert and remove items and the list automatically resizes. This resource is pretty good.
http://www.iag.uni-stuttgart.de/IAG/institut/abteilungen/numerik/images/4/4c/Pointer_Introduction.pdf
To continue the discussion, the solution you might want to implement depends on the number of delete operation and access you do, where you insert/delete the elements (the first, the last, randomly in the set?), how do you access the data (from the first to the last, randomly in the set?), what are your efficiency requirements in terms of CPU and memory.
Then you might want to go for linked list or for static or dynamic vectors (other types of data structures might also fit better your needs).
For example:
a static vector can be used when you want to access a lot of elements randomly and know the maximum number nmax of elements in the vector. Simply use an array of nmax elements with an associated length variable that will track the last element. A deletion can simply and quickly be done my exchanging the last element with the deleted one and reducing the length.
a dynamic vector can be implemented when you don't know the maximum number of elements. In order to avoid systematic array allocation+copy+unallocation at for each deletion/insertion, you fix the maximum number of elements (as above) and only augment its size (eg. nmax becomes 10*nmax, then reallocate and copy) when reaching the limit (the reverse system can also be implemented to reduce the number of elements).

Is this a good way to speed up array processing?

So, today I woke up with this single idea.
Just supouse you have a long list of things, an array, and you have to check each one of those to find the one that matches what you're looking for. To do this, you could maybe use a for loop. Now, imagine that the one you're looking for is almost at the end of the list but you don't know it. So, in that case, asuming it doesn't matter the order in which you check the elements of the list, it would be more convinient for you to start from the last element rather than the first one just to save some time and memory maybe. But then, what if your element is almost at the beggining?
That's when I thought: what if I could start checking the elements from both ends of the list at the same time?
So, after several tries, I came up with this raw sample code (which is written in js) that, in my opinion, would solve what we were defining above:
fx (var list) {
var len = length(list);
// To save some time as we were saying, we could check first if the array isn't as long as we were expecting
if (len == 0) {
// If it's not, then we just process the only element anyway
/*
...
list[0]
...
*/
return;
} else {
// So, now here's the thing. The number of loops won't be the length of the list but just half of it.
for (var i = 0; i == len/2; i++) {
// And inside each loop we process both the first and last elements and so on until we reach the middle or find the one we're looking, whatever happens first
/*
...
list[i]
list[len]
...
*/
len--;
}
}
return;
};
Anyway, I'm still not totally sure about if this would really speed up the process or make it slower or not making any difference at all. That's why I need your help, guys.
In your own experience, what do you think? Is this really a good way to make this kind of process faster? If it is or it isn't, why? Is there a way to improve it?
Thanks, guys.
Your proposed algorithm is good if you know that the item is likely to be at the beginning or end but not in the middle, bad if it's likely to be in the middle, and merely overcomplicated if it's equally likely to be anywhere in the list.
In general, if you have an unsorted list of n items then you potentially have to check all of them, and that will always take time which is at least proportional to n (this is roughly what the notation “O(n)” means) — there are no ways around this, other than starting with a sorted or partly-sorted list.
In your scheme, the loop runs for only n/2 iterations, but it does about twice as much work in each iteration as an ordinary linear search (from one end to the other) would, so it's roughly equal in total cost.
If you do have a partly-sorted list (that is, you have some information about where the item is more likely to be), then starting with the most likely locations first is a fine strategy. (Assuming you're not frequently looking for items which aren't in the list at all, in which case nothing helps you.)
If you work from both ends, then you'll get the worst performance when the item you're looking for is near the middle. No matter what you do, sequential searching is O(n).
If you want to speed up searching a list, you need to use a better data structure, such as a sorted list, hash table, or B-tree.

Prolog - The same functioning but with no findall

Does anyone know how I could implement I predicate doing what this one does but without "findall"?
Thank you a lot.
domains
oferta = rebaixat ; normal
element = string
list = element*
database
producte (string, integer, oferta)
predicates
nondeterm reduced2(list)
clauses
producte("Enciam",2,rebaixat).
producte("Peix",5,normal).
producte("Llet",1,rebaixat).
producte("Formatge",5,normal).
reduced2(Vals):-
findall(Val, producte(Val,_,rebaixat),Vals).
Goal
write("Amb findall"),nl,
reduced2(List).
I don't know much about Visual Prolog, but I will try to give a general answer. It depends on whether you want to find a findall/3 replacement for a specific case or in general, though.
In the specific case, you can use an accumulator. In your case, this would be a list into which the values are added as you find them. Something like:
acc(In, List) :-
... % do something to generate possible next value
... % test if value is already in list In
!,
Out = [Val|In], % adds to the head, but is faster than using append
acc(Out, List).
acc(List, List).
I.e., when you can't find another possible value, you return the list of values that you found. Note that this may be slow if you have to accumulate a lot of values and generating the next possible value is done via backtracking. Also, this will not let you generate duplicates, so it's not an exact replacement for findall/3.
If you want a general replacement for findall/3, where you can specify a goal and a variable or term that will contain the instance to be accumulated, then you won't get around using some sort of non-logical global variable. After finding the next value, you add it to what's been stored in the global variable, and cause backtracking. If generating the next value fails, you retrieve the content of the global variable and return it.

Infinity as sentinel in mergesort?

I am currently reading Cormen's "Introduction to Algorithms" and I found something called a sentinel.
It's used in the mergesort algorithm as a tool to decide when one of the two merging lists is exhausted. Cormen uses the infinity symbol for the sentinels in his pseudocode and I would like to know how such an infinite value can be implemented in C.
A sentinel is just a dummy value. For strings, you might use a NULL pointer since that's not a sensible thing to have in a list. For integers, you might use a value unlikely to occur in your data set e.g. if you are dealing with a list ages, then you can use the age -1 to denote the list.
You can get an "infinite value" for floats, but it's not the best idea. For arrays, pass the size explicitly; for lists, use a null pointer sentinel.
in C, when sorting an array, you usually know the size so you could actually sort a range [begin, end) in which end is one past the end of the array. E.g. int a[n] could be sorted as sort(a, a + n).
This allow you to do two things:
call your sort recursively with the part of the array you haven't sorted yet (merge sort is a recursive algorithm)
use end as a sentinel.
If you know the elements in your list will range from the smallest to the highest possible values for the given data type the code you are looking at won't work. You'll have to come up with something else, which I am sure can be done. I have that book in front of me right now and I am looking at the code that is causing you trouble and I have a solution that will work for you if you know the values range from the smallest for the given data type to the largest minus one at most. Open that book back up to page 31 and take a look at the Merge function. The lines causing you problems are lines 8 and 9 where the sentinel value of infinity is being used. Now, we know the two arrays are each sorted already and that we just need to merge them to get the array that is twice as big and in sorted order. This means that the largest elements in each half is at the end of the sub-arrays, and that the larger of the two is the largest in the array that is twice as big and we will have sorted once the merge function has completed. All we need to do is determine the largest of those two values, increment that value by one, and use that as our sentinel. So, lines 8 and 9 of the code should be replaced by the following 6 lines of code:
if L[n1] < R[n2]
largest = R[n2]
else
largest = L[n1]
L[n1 + 1] = largest + 1
R[n2 + 1] = largest + 1
That should work for you. I have a test tomorrow in my algorithms course on this stuff and I came across your post here and thought I'd help you out. The authors' use of sentinels in this book is something that has always bugged me, and I absolutely can not stand how much they are in love with recursion. Iteration is faster and in my opinion usually easier to come up with and grasp.
The trick is that you don't have to check array bounds when incrementing the index in only one of the lists in the inner while loops. Hence you need sentinels that are larger than all other elements. In c++ I usually use std::numeric_limits<TYPE>::max().
The C-equivalent should be macros like INT_MAX, UINT_MAX, LONG_MAX etc. Those are good sentinels. If you need two different sentinels, use ..._MAX and ..._MAX - 1
This is all assuming you're merging two lists that are ordered ascending.

How will you delete duplicate odd numbers from a linked list?

Requirements/constraint:
delete only duplicates
keep one copy
list is not initially sorted
How can this be implemented in C?
(An algorithm and/or code would be greatly appreciated!)
If the list is very long and you want reasonable performances and you are OK with allocating an extra log(n) of memory, you can sort in nlog(n) using qsort or merge sort:
http://swiss-knife.blogspot.com/2010/11/sorting.html
Then you can remove duplicates in n (the total is: nlog(n) + n)
If your list is very tiny, you can do like jswolf19 suggest, and you will get: n(n-1)/2 worst.
There are several different ways of detecting/deleting duplicates:
Nested loops
Take the next value in sequence, then scan until the end of the list to see if this value occurs again. This is O(n2) -- although I believe the bounds can be argued lower? -- but the actual performance may be better as only scanning from i to end (not 0 to end) is done and it may terminate early. This does not require extra data aside from a few variables.
(See Christoph's answer as how this could be done just using a traversal of the linked list and destructive "appending" to a new list -- e.g. the nested loops don't have to "feel" like nested loops.)
Sort and filter
Sort the list (mergesort can be modified to work on linked lists) and then detect duplicate values (they will be side-by-side now). With a good sort this is O(n*lg(n)). The sorting phase usually is/can be destructive (e.g. you have "one copy") but it has been modified ;-)
Scan and maintain a look-up
Scan the list and as the list is scanned add the values to a lookup. If the lookup already contains said values then there is a duplicate! This approach can be O(n) if the lookup access is O(1). Generally a "hash/dictionary" or "set" is used as the lookup, but if only a limited range of integrals are used then an array will work just fine (e.g. the index is the value). This requires extra storage but no "extra copy" -- at least in the literal reading.
For small values of n, big-O is pretty much worthless ;-)
Happy coding.
I'd either
mergesort the list followed by a linear scan to remove duplicates
use an insertion-sort based algorithm which already removes duplicates when re-building the list
The former will be faster, the latter is easier to implement from scratch: Just construct a new list by popping off elements from your old list and inserting them into the new one by scanning it until you hit an element of greater value (in which case you insert the element into the list) or equal value (in which case you discard the element).
Well, you can sort the list first and then check for duplicates, or you could do one of the following:
for i from 0 to list.length-1
for j from i+1 to list.length-1
if list[i] == list[j]
//delete one of them
fi
loop
loop
This is probably the most unoptimized piece of crap, but it'll probably work.
Iterate through the list, holding a pointer to the previous object every time you go on to the next one. Inside your iteration loop iterate through it all to check for a duplicate. If there is a duplicate, now back in the main iteration loop, get the next object. Set the previous objects pointer to the next object to the object you just retrieved, then break out of the loop and restart the whole process till there are no duplicates.
You can do this in linear time using a hash table.
You'd want to scan through the list sequentially. Each time you encounter an odd numbered element, look it up in your hash table. If that number is already in the hash table, delete it from the list, if not add it to the hash table and continue.
Basically the idea is that for each element you scan in the list, you are able to check in constant time whether it is a duplicate of a previous element that you've seen. This takes only a single pass through your list and will take at worst a linear amount of memory (worst case is that every element of the list is a unique odd number, thus your hash table is as long as your list).

Resources