I have a problem that I didn't find a solution that could be efficient enough. I need to speed up a circular buffer with a fixed size of 1.000.000 elements. It is currently implemented using a singly linked list.
For the moment, I have changed the implementation to use an array instead of the linked list. I use a write and read pointer to avoid shifting every index of my array. I need to do A LOT of lookup in my fifo, and I would need to delete items from indexes (well, I know it breaks the fifo rule).
First I was thinking of a sorted index table that matches the fifo array. It would be a O(log n) complexity for the lookup, but every time I'll need to update my fifo, I'll also need to update my index table. This is the part I didn't manage to do it efficiently (with a small complexity).
Any hints about an implementation that keeps track of the FIFO's order, and gives good performances in insert/delete/search operations ?
Thanks.
One approach would be to use:
An array with n elements to store the items
A Fenwick tree with n elements to store the occupancy.
We use the Fenwick Tree to write a 1 whenever an element is present, or 0 if the element is not present.
Once you have this structure, you can find the k^th present element and perform deletions in O(logn) time. (The actual implementation details may be a bit fiddly due to the FIFO wraparound - it may help to keep track of the total occupancy in the array and the occupancy from the pointer to the first element until the end of the array.)
Note that this structure will allow you to delete items anywhere, but only to insert items at the end of the FIFO - it is not clear whether this matches your requirements?
Related
Why is array considered a data structure ? How is array a data structure in tetms of efficiency? Please explain by giving some examples
It's a data structure because it's collection of data and the tools to work it.
Primary features:
Extremely fast lookup by index.
Extremely fast index-order traversal.
Minimal memory footprint (not so with the optional modifications I mentioned).
Insertion is normally O(N) because you may need to copy the array when you reallocate the array to make space for new elements. However, you can bring the cost of appending down to amortized O(1) by over-allocating (i.e. by doubling the size of the array every time you reallocate).[1]
Deletion is O(N) because you will need to shift N/2 elements on average. You could keep track the number of unused elements at the start and end of the array to make removals from the ends O(1).[1]
Lookup by index is O(1). It's a simple pointer addition.
Lookup by value is O(N). If the data is ordered, one can use a binary search to reduce this to O(log N).
Keeping track of the first used element and the last used element would technically qualify as a different data structure because the functions to access the data structure are different, but it would still be called an array.
Hashes provide an excellent mechanism to extract values corresponding to some given key in almost O(1) time. But it never preserves the order in which the keys are inserted. So is there any data structure which can simulate the best of array as well as hash, that is, return the value corresponding to a given key in O(1) time, as well as returning the nth value inserted in O(1) time? The ordering should be maintained, i.e., if the hash is {a:1,b:2,c:3}, and something like del hash[b] has been done, nth(2) should return {c,3}.
Examples:
hash = {};
hash[a] = 1;
hash[b] = 2;
hash[c] = 3;
nth(2); //should return 2
hash[d] = 4;
del hash[c];
nth(3); //should return 4, as 'd' has been shifted up
Using modules like TIE::Hash or similar stuff won't do, the onus is on me to develop it from scratch!
It depends on how much memory may be allocated for this data structure. For O(N) space there are several choices:
It's easy to get a data structure with O(1) time for each of these operations: "get value by key", "get nth value inserted", "insert" - but only when "delete" time is O(N). Just use combination of a hash map and an array, as explained by ppeterka.
Less obvious, but still simple is O(sqrt N) for "delete" and O(1) for all other operations.
A little bit more complicated is to "delete" in O(N1/4), O(N1/6), or, in general case, in O(M*N1/M) time.
It's, most likely, impossible to decrease "delete" time to O(log N) while retaining O(1) for other operations. But it is possible if you agree to O(log N) time for every operation. Solutions, based on binary search tree or on a skip list, allow it. One option is order statistics tree. You can augment every node of a binary search tree with a counter, storing number of elements in the sub-tree under this node; then use it to find nth node. Other option is to use Indexable skiplist. One more option is to use O(M*N1/M) solution with M=log(N).
And I don't think you can get O(1) "delete" without increasing time for other operations even more.
If unlimited space is available, you can do every operation in O(1) time.
O(sqrt N) "delete"
You can use a combination of two data structures to find value by key and to find value by its insertion order. First one is a hash map (mapping key to both value and a position in other structure). Second one is tiered vector, which maps position to both value and key.
Tiered vector is a relatively simple data structure, it may be easily developed from scratch. Main idea is to split array into sqrt(N) smaller arrays, each of size sqrt(N). Each small array needs only O(sqrt N) time to shift values after deletion. And since each small array is implemented as circular buffer, small arrays can exchange a single element in O(1) time, which allows to complete "delete" operation in O(sqrt N) time (one such exchange for each sub-array between deleted value and first/last sub-array). Tiered vector allows insertion into the middle also in O(sqrt N), but this problem does not require it, so we can just append a new element at the end in O(1) time. To access element by its position, we need to determine starting position of circular buffer for sub-array, where element is stored, then get this element from circular buffer; this needs also O(1) time.
Since hash map remembers a position in tiered vector for each of its keys, it should be updated when any element in tiered vector changes position (O(sqrt N) hash map updates for each "delete").
O(M*N1/M) "delete"
To optimize "delete" operation even more, you can use approach, described in this answer. It deletes elements lazily and uses a trie to adjust element's position, taking into account deleted elements.
O(1) for every operation
You can use a combination of three data structures to do this. First one is a hash map (mapping key to both value and a position in the array). Second one is an array, which maps position to both value and key. And third one is a bit set, one bit for each element of the array.
"Insert" operation just adds one more element to the array's end and inserts it into hash map.
"Delete" operation just unsets corresponding bit in the bit set (which is initialized with every bit = 1). Also it deletes corresponding entry from hash map. (It does not move elements of array or bit set). If, after "delete" the bit set has more than some constant proportion of elements deleted (like 10%), the whole data structure should be re-created from scratch (this allows O(1) amortized time).
"Find by key" is trivial, only hash map is used here.
"Find by position" requires some pre-processing. Prepare a 2D array. One index is the position we search. Other index is current state of our data structure, the bit set, reinterpreted as an index. Calculate population count for each prefix of every possible bit set and store prefix length, indexed by both population count and the bit set itself. Having this 2D array ready, you can perform this operation by first indexing by position and current "state" in this 2D array, then by indexing in the array with values.
Time complexity for every operation is O(1) (for insert/delete it is O(1) amortized). Space complexity is O(N 2N).
In practice, using whole bit set to index an array limits allowed value of N by pointer size (usually 64), even more it is limited by available memory. To alleviate this, we can split both the array and the bit set into sub-arrays of size N/C, where C is some constant. Now we can use a smaller 2D array to find nth element in each sub-array. And to find nth element in the whole structure, we need additional structure to record number of valid elements in each sub-array. This is a structure of constant size C, so every operation on it is also O(1). This additional structure may me implemented as an array, but it is better to use some logarithmic-time structure like indexable skiplist. After this modification, time complexity for every operation is still O(1); space complexity is O(N 2N/C).
Now, that the question is clear for me too (better late than never...) here are my proposals:
you could maintain two hashes: one with keys, and one with the insert order. this however is very ugly and slow to maintain when deleting, and inserting in between. This would give the same almost O(1) time needed to access the elements both ways.
you could use a hash for the keys, and maintain an array for the insert order. this one is a lot nicer than the hash type, deleting is still not very fast, but I think still a lot quicker than with the two hash approach. This also gives true O(1) on accessing the nth element.
At first, I misunderstood the question, and gave a solution that gives O(1) key lookup, and O(n) lookup of nth element:
In Java, there is the LinkedHashMap for this particular task.
I think however that if someone finds this page, this might not be totally useless, so I leave it here...
There is no data structure in O(1) for everything you cited. In particular any data structure with random dynamic insertion/deletion in the middle AND sorted/indexed access cannot have maintenance time lower than O(log N), to maintain such a dynamic collection you have to resort either on the operator "less than" (binary thus O(log2 N)) or some computed organization (typical O(sqrt N), by using sqrt(N) sub arrays). Note that O(sqrt N)>O(log N).
So, no.
You might reach O(1) for everything including keeping order with the linked list+hash map, and if access is mostly sequential, you could cache nth(x), to access nth(x+/-1) in O(1).
I guess only a plain array will give you O(1), best variant is to look for solution which gives O(n) in worst scenario. You can also use a really really bad approach - using key as index in plain array. I guess there is a way to transform any key to index in plain array.
std::string memoryMap[0x10000];
int key = 100;
std::string value = "Hello, World!";
memoryMap[key] = value;
I want to store a small amount of items( less than 255) which have constant size (a c char )and be able to do the following operations:
Append a value to an arbitrary position and have the other items preserve their previous order.
Delete an item and have the other items preserve their order(as above).
Find the next and previous of an item.
I have tried using an array and making a function to add a value by moving all items after it a place forward.Same thing can happen with deleting, but it is too inefficient.Of course, I do not mind having to use a library, long as it is readily available and free.
Array - access: O(1), insert: O(n)
Double-linked list - access O(n), previous/next: O(1), insert(*): O(1)
RB tree with number of childs stored: O(log n) for all operations.
(*): You need the traverse the list first to get to the position (O(n)).
Note: no, the array is not messy, it's really simple to implement. Also as you can see, depending on the usage, it can be quite efficient.
Based on the number of elements, and your remark to array implementation you should stick to arrays.
You could use a double-linked list for it. However, this won't work if you want to keep the array behaviour (e.g. accessing elements quickly (O(1), for a LL it's O(n)) by their index)
Requirements/constraint:
delete only duplicates
keep one copy
list is not initially sorted
How can this be implemented in C?
(An algorithm and/or code would be greatly appreciated!)
If the list is very long and you want reasonable performances and you are OK with allocating an extra log(n) of memory, you can sort in nlog(n) using qsort or merge sort:
http://swiss-knife.blogspot.com/2010/11/sorting.html
Then you can remove duplicates in n (the total is: nlog(n) + n)
If your list is very tiny, you can do like jswolf19 suggest, and you will get: n(n-1)/2 worst.
There are several different ways of detecting/deleting duplicates:
Nested loops
Take the next value in sequence, then scan until the end of the list to see if this value occurs again. This is O(n2) -- although I believe the bounds can be argued lower? -- but the actual performance may be better as only scanning from i to end (not 0 to end) is done and it may terminate early. This does not require extra data aside from a few variables.
(See Christoph's answer as how this could be done just using a traversal of the linked list and destructive "appending" to a new list -- e.g. the nested loops don't have to "feel" like nested loops.)
Sort and filter
Sort the list (mergesort can be modified to work on linked lists) and then detect duplicate values (they will be side-by-side now). With a good sort this is O(n*lg(n)). The sorting phase usually is/can be destructive (e.g. you have "one copy") but it has been modified ;-)
Scan and maintain a look-up
Scan the list and as the list is scanned add the values to a lookup. If the lookup already contains said values then there is a duplicate! This approach can be O(n) if the lookup access is O(1). Generally a "hash/dictionary" or "set" is used as the lookup, but if only a limited range of integrals are used then an array will work just fine (e.g. the index is the value). This requires extra storage but no "extra copy" -- at least in the literal reading.
For small values of n, big-O is pretty much worthless ;-)
Happy coding.
I'd either
mergesort the list followed by a linear scan to remove duplicates
use an insertion-sort based algorithm which already removes duplicates when re-building the list
The former will be faster, the latter is easier to implement from scratch: Just construct a new list by popping off elements from your old list and inserting them into the new one by scanning it until you hit an element of greater value (in which case you insert the element into the list) or equal value (in which case you discard the element).
Well, you can sort the list first and then check for duplicates, or you could do one of the following:
for i from 0 to list.length-1
for j from i+1 to list.length-1
if list[i] == list[j]
//delete one of them
fi
loop
loop
This is probably the most unoptimized piece of crap, but it'll probably work.
Iterate through the list, holding a pointer to the previous object every time you go on to the next one. Inside your iteration loop iterate through it all to check for a duplicate. If there is a duplicate, now back in the main iteration loop, get the next object. Set the previous objects pointer to the next object to the object you just retrieved, then break out of the loop and restart the whole process till there are no duplicates.
You can do this in linear time using a hash table.
You'd want to scan through the list sequentially. Each time you encounter an odd numbered element, look it up in your hash table. If that number is already in the hash table, delete it from the list, if not add it to the hash table and continue.
Basically the idea is that for each element you scan in the list, you are able to check in constant time whether it is a duplicate of a previous element that you've seen. This takes only a single pass through your list and will take at worst a linear amount of memory (worst case is that every element of the list is a unique odd number, thus your hash table is as long as your list).
I would like to write a piece of code for inserting a number into a sorted array at the appropriate position (i.e. the array should still remain sorted after insertion)
My data structure doesn't allow duplicates.
I am planning to do something like this:
Find the right index where I should be putting this element using binary search
Create space for this element, by moving all the elements from that index down.
Put this element there.
Is there any other better way?
If you really have an array and not a better data structure, that's optimal. If you're flexible on the implementation, take a look at AA Trees - They're rather fast and easy to implement. Obviously, takes more space than array, and it's not worth it if the number of elements is not big enough to notice the slowness of the blit as compared to pointer magic.
Does the data have to be sorted completely all the time?
If it is not, if it is only necessary to access the smallest or highest element quickly, Binary Heap gives constant access time and logn addition and deletion time.
More over it can satisfy your condition that the memory should be consecutive, since you can implement a BinaryHeap on top of an array (I.e; array[2n+1] left child, array[2n+2] right child).
A heap based implementation of a tree would be more efficient if you are inserting a lot of elements - log n for both locating/removing and inserting operations.