dynamic array auto shrink - c

When designing a dynamic array which will shrink, what is a method to keep track of the highest used index, and when that index's held object is deleted, find the new highest used index.
Right now I can think up having a a simple int last_used and then charging the cost of maintaining this variable to the func_delete which must check if it deletes the highest, and if so, check each smaller value looking for a non null. (array will always be null initialized)
if(last_index == deleted_index){
while(last_index >0 && array[--last_index] != NULL)
// if the array is now only half full, I realloc
}
Is there any other smart ways?

It looks okay to me, but logic wise, if the function func_delete always deletes the highest item, then there shouldn't be any NULL values at smaller indices. So this should do:
if(last_index == deleted_index && array[last_index] != NULL) {
//delete last item
free(array[last_index]);
//set to NULL and decrement
array[last_index--] = NULL;
}
Edit:
based on your comments, I understand what you're trying to do, I think you could just keep track of the highest used index in the insertion function instead:
void array_insert(array , element e, int index)
{
if (index > arr->highest_index) {
arr->highest_index = index;
}
//insert element
}
And when you delete you check if the index is the highest or not. like you're doing, don't think there's a better way to do that, without complicating things further, for example, you could keep another sorted list of indices, this way you when you delete the highest index you find the next one in constant time, but like I said, it complicates things. However, I think another data structure might be more useful, a linked list for example, is more efficient when randomly deleting nodes, but not when randomly inserting nodes.

Related

Map in order range loop

I'm looking for a definitive way to range over a Go map in-order.
Golang spec states the following:
The iteration order over maps is not specified and is not guaranteed to be the same from one iteration to the next. If map entries that have not yet been reached are removed during iteration, the corresponding iteration values will not be produced. If map entries are created during iteration, that entry may be produced during the iteration or may be skipped. The choice may vary for each entry created and from one iteration to the next. If the map is nil, the number of iterations is 0.
All I've found here on StackOverflow and Googling are (imho) workarounds that I don't like.
Is there a solid way to iterate through a map and retrieve items in the order they've been inserted?
The solutions I've found are:
Keep track of keys and values in two separate slices: which sounds like "Do not use a map", losing all the advantages of using maps.
Use a map but keep track of keys in a different slice: this means data duplication which might lead to data misalignment and eventually may bring loads of bugs and painful debugging.
What do you suggest?
Edit in response to the possible duplicate flag.
There's a slight difference between my question and the one provided (this question, but also this one), both questions asked for looping through the map following the keys lexicographic order; I, instead, have specifically asked:
Is there a solid way to iterate through a map and retrieve items in the order they've been inserted?
which is not lexicographic and thus different from #gramme.ninja question:
How can I get the keys to be in order / sort the map so that the keys are in order and the values correspond?
If you need a map and keys in order, those are 2 different things, you need 2 different (data) types to provide that functionality.
With a keys slice
The easiest way to achieve this is to maintain key order in a different slice. Whenever you put a new pair into the map, first check if the key is already in it. If not, add the new key to the separate slice. When you need elements in order, you may use the keys slice. Of course when you remove a pair, you also have to remove it from the slice too.
The keys slice only has to contain the keys (and not the values), so the overhead is little.
Wrap this new functionality (map+keys slice) into a new type and provide methods for it, and hide the map and slice. Then data misalignment cannot occur.
Example implementation:
type Key int // Key type
type Value int // Value type
type Map struct {
m map[Key]Value
keys []Key
}
func New() *Map {
return &Map{m: make(map[Key]Value)}
}
func (m *Map) Set(k Key, v Value) {
if _, ok := m.m[k]; !ok {
m.keys = append(m.keys, k)
}
m.m[k] = v
}
func (m *Map) Range() {
for _, k := range m.keys {
fmt.Println(m.m[k])
}
}
Using it:
m := New()
m.Set(1, 11)
m.Set(2, 22)
m.Range()
Try it on the Go Playground.
With a value-wrapper implementing a linked-list
Another approach would be to wrap the values, and –along the real value– also store the next/previous key.
For example, assuming you want a map like map[Key]Value:
type valueWrapper struct {
value Value
next *Key // Next key
}
Whenever you add a pair to the map, you set a valueWrapper as the value, and you have to link this to the previous (last) pair. To link, you have to set next field of the last wrapper to point to this new key. To easily implement this, it's recommended to also store the last key (to avoid having to search for it).
When you want to iterate over the elements in insertion order, you start from the first (you have to store this), and its associated valueWrapper will tell you the next key (in insertion order).
Example implementation:
type Key int // Key type
type Value int // Value type
type valueWrapper struct {
v Value
next *Key
}
type Map struct {
m map[Key]valueWrapper
first, last *Key
}
func New() *Map {
return &Map{m: make(map[Key]valueWrapper)}
}
func (m *Map) Set(k Key, v Value) {
if _, ok := m.m[k]; !ok && m.last != nil {
w2 := m.m[*m.last]
m.m[*m.last] = valueWrapper{w2.v, &k}
}
w := valueWrapper{v: v}
m.m[k] = w
if m.first == nil {
m.first = &k
}
m.last = &k
}
func (m *Map) Range() {
for k := m.first; k != nil; {
w := m.m[*k]
fmt.Println(w.v)
k = w.next
}
}
Using it is the same. Try it on the Go Playground.
Notes: You may vary a couple of things to your liking:
You may declare the internal map like m map[Key]*valueWrapper and so in Set() you can change the next field without having to assign a new valueWrapper.
You may choose first and last fields to be of type *valueWrapper
You may choose next to be of type *valueWrapper
Comparison
The approach with an additional slice is easier and cleaner. But removing an element from it may become slow if the map grows big, as we also have to find the key in the slice which is "unsorted", so it's O(n) complexity.
The approach with linked-list in value-wrapper can easily be extended to support fast element removal even if the map is big, if you also add the prev field to the valueWrapper struct. So if you need to remove an element, you can super-fast find the wrapper (O(1)), update the prev and next wrappers (to point to each other), and perform a simple delete() operation, it's O(1).
Note that deletion in the first solution (with slice) could still be sped up by using 1 additional map, which would map from key to index of the key in the slice (map[Key]int), so delete operation could still be implemented in O(1), in exchange for greater complexity. Another option for speeding up could be to change the value in the map to be a wrapper, which could hold the actual value and the index of the key in the slice.
See related question: Why can't Go iterate maps in insertion order?

Limit input data to achieve a better Big O complexity

You are given an unsorted array of n integers, and you would like to find if there are any duplicates in the array (i.e. any integer appearing more than once).
Describe an algorithm (implemented with two nested loops) to do this.
The question that I am stuck at is:
How can you limit the input data to achieve a better Big O complexity? Describe an algorithm for handling this limited data to find if there are any duplicates. What is the Big O complexity?
Your help will be greatly appreciated. This is not related to my coursework, assignment or coursework and such. It's from the previous year exam paper and I am doing some self-study but seem to be stuck on this question. The only possible solution that i could come up with is:
If we limit the data, and use nested loops to perform operations to find if there are duplicates. The complexity would be O(n) simply because the amount of time the operations take to perform is proportional to the data size.
If my answer makes no sense, then please ignore it and if you could, then please suggest possible solutions/ working out to this answer.
If someone could help me solve this answer, I would be grateful as I have attempted countless possible solution, all of which seems to be not the correct one.
Edited part, again.. Another possible solution (if effective!):
We could implement a loop to sort the array so that it sorts the array (from lowest integer to highest integer), therefore the duplicates will be right next to each other making them easier and faster to be identified.
The big O complexity would still be O(n^2).
Since this is linear type, it would simply use the first loop and iterate n-1 times as we are getting the index in the array (in the first iteration it could be, for instance, 1) and store this in a variable names 'current'.
The loop will update the current variable by +1 each time through the iteration, within that loop, we now write another loop to compare the current number to the next number and if it equals to the next number, we can print using a printf statement else we move back to the outer loop to update the current variable by + 1 (next value in the array) and update the next variable to hold the value of the number after the value in current.
You can do linearly (O(n)) for any input if you use hash tables (which have constant look-up time).
However, this is not what you are being asked about.
By limiting the possible values in the array, you can achieve linear performance.
E.g., if your integers have range 1..L, you can allocate a bit array of length L, initialize it to 0, and iterate over your input array, checking and flipping the appropriate bit for each input.
A variance of Bucket Sort will do. This will give you complexity of O(n) where 'n' is the number of input elements.
But one restriction - max value. You should know the max value your integer array can take. Lets say it as m.
The idea is to create a bool array of size m (all initialized to false). Then iterate over your array. As you find an element, set bucket[m] to true. If it is already true then you've encountered a duplicate.
A java code,
// alternatively, you can iterate over the array to find the maxVal which again is O(n).
public boolean findDup(int [] arr, int maxVal)
{
// java by default assigns false to all the values.
boolean bucket[] = new boolean[maxVal];
for (int elem : arr)
{
if (bucket[elem])
{
return true; // a duplicate found
}
bucket[elem] = true;
}
return false;
}
But the constraint here is the space. You need O(maxVal) space.
nested loops get you O(N*M) or O(N*log(M)) for O(N) you can not use nested loops !!!
I would do it by use of histogram instead:
DWORD in[N]={ ... }; // input data ... values are from < 0 , M )
DWORD his[M]={ ... }; // histogram of in[]
int i,j;
// compute histogram O(N)
for (i=0;i<M;i++) his[i]=0; // this can be done also by memset ...
for (i=0;i<N;i++) his[in[i]]++; // if the range of values is not from 0 then shift it ...
// remove duplicates O(N)
for (i=0,j=0;i<N;i++)
{
his[in[i]]--; // count down duplicates
in[j]=in[i]; // copy item
if (his[in[i]]<=0) j++; // if not duplicate then do not delete it
}
// now j holds the new in[] array size
[Notes]
if value range is too big with sparse areas then you need to convert his[]
to dynamic list with two values per item
one is the value from in[] and the second is its occurrence count
but then you need nested loop -> O(N*M)
or with binary search -> O(N*log(M))

Efficiency of an unsorted vs sorted linked list in C

For a programming project, I created two linked list programs: an unsorted linked list and a sorted linked list. The unsorted linked list program adds values to the end of the list as long as the value is not found in the list. If the value is found in the list, the node containing the value is removed. The only difference in the sorted linked_list program is that if a value is not found in the list, instead of just adding the value to the end, the program looks for the proper space to insert the value so that the repository is consistently maintained in sorted order. I have a "stepcounter" variable that basically increments each time a pointer in my program is reassigned to a point to a different pointer, even during traversal of the linked list. I output this variable to the screen to give me an idea of the efficiency of my program. What's strange is that if I run the same operation on the sorted list and on the unsorted list, the number of steps, or, effort of the unsorted list is MORE than the sorted list. This seems very counter-intuitive to me but I looked through my code and I'm pretty sure I incremented in all the same places, so I can't come up with an explanation as to why the unsorted linked list operations would have more steps than the sorted. Is there something I'm missing?
If you are really keeping track of pointer assignments, then walking like
while (p && (p.value != input) && (p.next != NULL)) p = updatePointer(p.next);
(assuming updatePointer takes care of your counting) performs one of those for each node you examine.
To know if a item is in the unsorted list you have to look at every node in the list. (That is, you have to use the code I had above)
To do the same thing on the sorted list you only have to keep looking until you pass the space where the item is questions would have been. This implies code like
while (p && (p.value < input) && (p.next != NULL)){
p = updatePointer(p.next);
}
if (p.value == input) //...
Assuming randomly distributed (i.e. unordered input) you expect the second to case to require about 1/2 as many comparisons.
Suppose you have 1000 data to insert in both lists and the data is pure random order but
of the values of 1 up to 1000.
Additionally suppose both lists are filled already with 500 data items of pure random order for the unsorted list and of sorted order in case of the sorted list.
For the unsorted list you have to check each item to find possible doubles, which lead
to a pointer stepping forward for each visited node.
For the sorted list you only have to search forward this way until the first element
appears in the list which has a greater value.
The chance of such hits is 50% by 1000 elements being inserted into a list already filled
with 500 items for a total range of 1 to 1000 for the values.
This creates 50% of all operations being inserts for replaces, which let the unsorted list
being checked for additional items compared to the sorted list.
Insertion itself is more cheap with the unsorted list (1 step instead of 4 steps).

Deleting rows from a structure

My project was to create a program with a menu that extracts data from a file and arranges it in a structure. Via the menu you are able to insert rows to the end of the structure or delete rows from the structure. The program then copies the structure and saves it to the same file.
So far I have everything working fine except for deleting rows.
Right now my code will just delete the last row of the structure no matter what.
else if( input == 3)
{
int delete;
printf("Enter row number to be deleted:\n");
scanf("%i", &delete);
if( delete > i )
{
printf("ERROR: Please enter a valid integer\n");
}
strcpy(data[delete].First, data[(delete+1)].First);
strcpy(data[delete].Last, data[(delete+1)].Last);
data[delete].Gender = data[(delete+1)].Gender;
data[delete].Age = data[(delete+1)].Age;
data[delete].Weight = data[(delete+1)].Weight;
data[delete].Height = data[(delete+1)].Height;
i = i - 1;
}
The variable i is a counter that keeps track of the number of rows in the structure. The code seems to me like it should replace the data of whatever the input delete is and replace it with the data above it in the structure, however it is not working.
Am I going about this the wrong way?
The variable i is a counter that keeps track of the number of rows …
Since you delete data[delete], the rows are apparently numbered from 0 to i-1. The if error condition delete > i should then rather be delete >= i.
if order isn't important, swap the last element with the one being
removed (unless you're removing the last element, obviously), then
just decrement i by one. No loop required. – WhozCraig
That's right, of course, albeit there is no point in the part of the swap which copies the element being removed to the position of the last element.
If the order is to be preserved, we can also do without an explicit loop:
memmove(data+delete, data+delete+1, (--i-delete) * sizeof *data);

Data structure for playing notes in MIDI synthesizer

I'm working on a hardware virtual analog synthesizer using C, and I'm trying to come up with an efficient data structure to manage the dynamic assignment of synthesizer voices in response to incoming MIDI messages.
I have a structure type which holds the data for a single synthesizer voice (pitch, low frequency oscillator, ADSR settings, etc.) and I have a "NoteOn" callback function which is executed when a MIDI "Note On" message is decoded. This function needs to take an "idle" voice from an "idle" pool, modify some settings in the structure, and assign it to a "playing" pool that the main synth engine works with to generate audio samples. Then, when a "Note Off" message is received, the voice with a note value corresponding to the one in the "Note Off" message needs to be selected from the "playing" pool, have its data structure modified again, and eventually returned to the "idle" pool (depending on envelope/ADSR settings.)
I tried an implementation using linked lists for both pools, but my implementation seemed rather cumbersome and slow. I'd like this process to be as quick as possible, to maintain playing responsiveness. Any suggestions?
If a linked list is too slow, the usual answer is to implement a hash table. There many, many possible variations of the data structure and algorithm. I'll just describe open, "single"-hashing, because that's the variation I'm most familiar with.
So with an open hash table, the table is just an array ("closed" hashing has an array, too, but each element is a linked list). We want the array to be, at most, about half-full for performance reasons. And at maximum-capacity, the filled table will actually have one empty slot still, because this simplifies the algorithm.
We also need a hash function which accepts the type of the key values, and returns integers. It's very difficult to predict how the hash function will behave with respect to clustered keys and overall performance. So just make sure it's an isolated function that can easily be changed later. It can be as simple as shifting-around all the bytes and adding them together.
int hash (char *key, int key_length, int table_size)
{
int ret, i;
for (i=0, ret=0; i < key_length; i++)
{
ret += key[i] << i;
}
return abs(ret) % table_size;
}
The table-lookup function uses the hash function to decide where to start looking in the array. If the key isn't found there (determined by doing a memcmp() on the actual search key and the key stored at that position in the table), it looks at each successive key, wrapping from the end of the array back to the beginning, and declares failure if it finds an empty table element.
#define RETURN_TABLE_I_IF_EQUAL_KEY_OR_EMPTY \
if (memcmp(table + i, &key, sizeof key) == 0 || (key_type)table[i] == 0) \
return table + i;
key_value_pair *hash_lookup(key_value_pair *table, int table_size, key_type key)
{
int h, i;
h = hash(&key, sizeof key, table_size);
i = h;
RETURN_TABLE_I_IF_EQUAL_KEY_OR_EMPTY
for ( ; i < table_size; i++)
RETURN_TABLE_I_IF_EQUAL_KEY_OR_EMPTY
for (i=0; i < h; i++)
RETURN_TABLE_I_IF_EQUAL_KEY_OR_EMPTY
return NULL;
}
We'll need one more function in front of this to handle a few quirks. It can return a NULL pointer which indicates that not only has the key not been found, but the table itself is overfull. An overfull table, which really means "completely full", but we decided earlier that a "full" table should really have one empty element. This means that both for loops should not run to completion; when it finds an empty table position, that's a failure. With an overfull table, it has to scan the entire table before discovering that the key is not present, thus losing much of the performance advtantage from using a hash at all.
The lookup function can also return a valid pointer to an empty slot. This is also a failure to find the value, but not an error. If adding the key/value pair for the first time, this will be slot to store it.
Or it could return a pointer to the desired table element. And this will be faster than a linear search, be it an array or linked list.
Deleting a key from the table requires us to fill-in the vacated position in the sequence. There are a couple of options.
If you're not worried about the table running out of space (it's set really large, and the lifetime and usage can be controlled), you can overwrite the entry with a deleted special key, distinct from an empty key.
Or, if you want to reclaim the space, too, you'll need to lookup the key, and then scan the rest of the "chain" (sequence of keys up to the next empty slot (including wrap-around)) and move the last key with a matching hash into the key-to-delete's position. Then write-over this moved key/value's position with the empty key. .... oops! This process must be repeated for the this last matching key until we're actually clearing the very last key in the chain. (I need to go fix this in my implementation right now!....)

Resources