I'm currently trying to find the best data structure / algorithm that fits to my case:
I receive single unique random ids (uint32_t), and I create an element associated to each new id.
I need to retrieve elements from the id.
I also need to access the next and the previous element from any element (or even id) in the order of creation. The order of creation mainly depends on the current element, which is always accessible aside, so the new element should be its next.
Here is an example:
(12) <-> (5) <-> (8) <-> (1)
^ ^
'------------------------'
If I suppose the current element to be (8) and a new element (3) is created, it should look like:
(12) <-> (5) <-> (8) <-> (3) <-> (1)
^ ^
'--------------------------------'
An important thing to consider is that insertion, deletion and search happen with almost the same (high) frequency. Not completely sure about how many elements will live at the same time, but I would say max ~1000.
Knowing all of this, I think about using an AVL with ids as the sorted keys, keeping the previous and the next element too.
In C language, something like this:
struct element {
uint32_t id;
/* some other fields */
struct element *prev;
struct element *next;
}
struct node {
struct element *elt;
struct node *left;
struct node *right;
};
static struct element* current;
Another idea may be to use a hash map, but then I would need to find the right hash function. Not completely sure it always beats the AVL in practice for this amount of elements though. It depends on the hash function anyway.
Is the AVL a good idea or should I consider something else for this case?
Thanks !
PS: I'm not a student trying to make you do my homework, I'm just trying to develop a simple window manager (just for fun).
You are looking for some variation of what's called in java a LinkedHashMap
This is basically a combination of a hash-table and a (bi-directional) linked list.
The linked-list has elements in the desired order. Inserting an element in a known location (assuming you have the pointer to the correct location) is done in O(1). Same goes for deletion. The linked list contains all the elements in their desired order.
The second data-structure is the hash-map (or tree map). This data structure maps from a key (your unique id), to a POINTER in the linked list. This way, given an id - you can quickly finds its location on the linked-list, and from there you can easily access next and previous elements.
high level pseudo code for insertion:
insert(x, v, y): //insert key=x value=v, after element with key=y
if x is in hash-table:
abort
p = find(hash-table,y) //p is a pointer
insert_to_list_after(x,v,p) //insert key=x,value=v right after p
add(hash-table,x,p) //add x to the hash-table, and make it point to p.
high level pseudo code for search:
search(x):
if x is not in hash-table:
abort
p = find(hash-table,x)
return p->value;
deletion should be very similar to insertion (and in same time complexity).
Note that it is also fairly easy to find element that is after x:
p = find(hash-table,x)
if (p != NULL && p->next != NULL):
return p->next->value
My suggestion is that you use a combination of two data structures - a list to store the elements in the order they are inserted and a hash map or binary search tree to implement an associative array(map) between the id and list node. You will perform the search using the associative array and will be able to access neighboring elements using the list. Deletion is also relatively easy, but you need to delete from both structures.
Complexity of find/insert/delete will be log(n) if you use binary search tree and expected complexity is constant if you use a hash table.
You should definitely consider the Skip List data structure.
It seems perfect for your case, because it has an expected O(log(n)) insert / search / delete and if you have a pointer to a node, you can find the previous and the next element in O(1) by just moving that pointer.
The conclusion is that if you've just created a node, you have a pointer to it, and you can find the prev/next element in O(1) time.
Related
I'm trying to create a program that reads a file that is filled with words in the dictionary, then stores every word in the hash table, I already have a hash function, for example the hash function returns an index 123 how will I be able to determine if that index right there has no value yet, else if the certain index has value should I just make the word the new head of the list or should I add it to the end of the list? Should I initialize the whole array first to something like "NULL" because if a variable wasn't initialized it contains garbage value, does that work the same with arrays from a struct..
typedef struct node
{
char word[LENGTH + 1];
struct node *next;
}
node;
// Number of buckets in hash table
// N = 2 ^ 13
const unsigned int N = 8192;
// Hash table
node *table[N];
This is part of my code LENGTH here is defined above with the value of 45..
how will I be able to determine if that index right there has no value yet
The "slots" in your table are linked lists. The table stores pointers to the head nodes of these linked lists. If that pointer is NULL, the list is empty, but you don't need to make it a special case: When you look up a word, just walk the list while the pointer to the next node is not null. If the pointer to the head node is null, your walk is stopped short early, that's all.
should I just make the word the new head of the list or should I add it to the end of the list?
It shouldn't really matter. The individual lists at the nodes are supposed to be short. The idea of the hash table is to turn a linear search on all W words into a faster linear search on W/N words on average. If you see that your table has only a few long lists, your hash function isn't good.
You must walk the list once to ensure that you don't insert duplicates anyway, so you can insert at the end. Or you could try to keep each linked list alphabetically sorted. Pick one method and stick with it.
Should I initialize the whole array first to something like "NULL" because if a variable wasn't initialized it contains garbage value, does that work the same with arrays from a struct.
Yes, please initialize your array of head node pointers to NULL, so that the hash table is in a defined state. (If your array is at file scope or static, the table should be initialized to null pointers already, but it doesn't hurt to make the initialization explicit.)
So I have the following structure:
typedef struct listElement
{
element value;
struct listElement;
} listElement, *List;
element is not a known type, meaning I don't know exactly what data type I'm dealing with, wether they're integers or or floats or strings.
The goal is to make a function that eletes the listElements that are redundant more than twice (meaning a value can only appear 0 times, once or twice, not more)
I've already made a function that uses bruteforce, with a nested loop, but that's a cluster**** as I'm dealing with a large number of elements in my list. (Going through every element and comparing it to the rest of the elements in the list)
I was wondering if there was a better solution that uses less isntructions and has a lower complexity.
You can use a hash table and map elements to their count.
if hashTable[element] (count for this particular element) returns 2, then delete the current element.
to understand the data structure assume a linked list of nodes where each node has an array called bucked which can store some strings.
struct NODE
{
char *bucket[BUCKET_SIZE]; // an array of strings
int count; // number of items in the array
Node *next;
};
and to insert in this list the prototype of function is:
insert( List *list, char *new_string );
I am trying to understand what this means. I have a bucket (an array) of some size (e.g.: 20) which is inside a struct (node) of a linked list.
"As you add values to the list, they can be inserted into an existing bucket if there is room, using a regular ordered array insertion (shuffling items down)."
These are two ways which I think would work. Please let me know which one to implement.
bucket[8]
Type a:
1 insert "o".
bucket[0]="o";
2 insert "one"
bucket[0]="o"
bucket[1]="one"
3 insert "two"
bucket[0]= "0"
bucket[1]="one"
bucket[2]="two"
bucket[8]
Type b:
1 insert "o"
bucket[0]="o";
2 insert "one"
bucket[0]="one"
bucket[1]="o"
3 insert "two"
bucket[0]= "two"
bucket[1]="one"
bucket[2]="0"
or i am getting it completely wrong there is something else which its trying to tell me.
This really depends on how you interpret "regular ordered array insertion" and "shuffling items down".
By "ordered array insertion", does it mean "sorted array insertion"? If it does, neither of your suggestions would work. Rather, you'd want an insertion that inserts the value correctly within the array at its sorted position. If it does not, then either of your suggestions would work (but first, let's consider the next phrase).
By "shuffling items down", does it mean "keep pushing items down"? If it does, you'd want Type b. If it doesn't, you'd want Type a.
Since this doesn't seem like a standard library requirement or anything, you probably would have better luck asking the source of your requirements.
Also, as moeCake also pointed out, you'll have to handle the case when count > BUCKET_SIZE as well.
So I'm still trying to wrap my head around linked lists in C. They are.. mind-boggling to me right now because I have yet to fully understand pointers, let alone pointers to pointers, and dynamic memory allocation that linked lists require.
I'm trying to create a two dimensional array with independent height, and width values. At most they would be 30x30. I have a two dimensional array let's call it arr[x][y]. arr[x][y] is filled with values of integers ranging from -2 to 1, how would I transfer this two dimensional array into a linked list? How would I then access values from this linked list on whim? I'm very confused, and any help would be appreciated. I'm looking through tutorials as we speak.
Additionally this is supposed to be a sort of stack linked list where I could call functions such as push(pushes a new value to the top of the linked list), pop(pops a value from the top of the linked list), top(returns the value most recently pushed onto the stack), isEmpty(checks if the stack is empty).
I don't need any full code, but code would be helpful here. I just need an understanding though of Linked Lists, and how to implement these sort of functions.
Additionally here is the assignment that this is related to: Assignment
It's a maze solver, I've already done code for analyzing a ascii picture into integer values for the two dimensional array. And as stated above that is what I need help with.
Hint : from your assignment, the stack is not supposed to fully represent the array, but to represent a path you dynamically build to find a way from the starting position of the maze to the target position of the maze.
Basically you need to create a link list, whose each node is the head of another list contained as a member (which conceptually grows downwards), along with a usual next pointer in the list.
For accessing an element like 2D array such as arr[3][4], you need to walk the first list while keeping a count of yand then move downward counting x Or you could do vice versa.
This is a common data structure assignment which goes by the name "multi stack or multi queue" which if implemented by lists gives what you are looking for.
struct Node
{
int data;
struct Node *next;
struct Node *head; // This head can be null initially as well as for the last node in a direction
};
First of all you need to define the proper structure.The first times it will be easier for you to create a list that terminates when the pointer to the next node is NULL.Afterwards you will discover lists with sentinel, bidirectional lists and things that now may seem too complicated.
For example that's a structure:
typedef struct __node
{
int info;
struct __node* next;
}node;
typedef node* list;
This time let's assume that list and node are the same thing, you will find more precise to separate the concept of list than the concept of node, and for example you may store in the list it's length (avoiding to count everytime all the nodes), but for now let's do it that way.
You initialize the list:
list l=NULL;
So the list contains zero nodes, to test if it's empty you just see if the pointer is NULL.
Add a new element:
if(NULL==l)
{
l=(node*)malloc(sizeof(node));
l->next=NULL;
l->info=0;
}
Now the list contains zero nodes, create a function to add a new node:
void pushBack(list* listPointer, int info)
{
if(NULL==*listPointer)
{
*listPointer=(node*)malloc(sizeof(node));
(*listPointer)->info=info;
}
else
{
node* ptr=l;
while(ptr->next!=NULL)
ptr=ptr->next;
ptr->next=(node*)malloc(sizeof(node));
ptr->info=info;
}
}
You could also gain efficiency adding the elements in front.Or optimize the code by returning the added element, so that you don't have to find the last element everytime.I leave this to you.Now let's call the pushBack function for every element of the array:
for(int i=0; i<N; i++)
{
pushBack(l,arr[i]);
}
That's all, learn your way to implement linked lists.
You're not supposed to convert the whole array into a linked list, you're only supposed to convert the best path into a linked list. You'd do this by brute force, trying directions and backtracking when you ran into dead ends.
Your path, the linked list, would need to look something like this:
struct PathNode
{
int coordX, coordY;
PathNode * next, * prev;
}
If I remember later, I'll draw a picture or something of this structure and add it to the post. comment on this post in a few hours to attract my attention.
The list would always contain a starting point, which would be the first node in the list. As you moved to other positions, one after the other, you'd push them onto the end of the list. This way, you could follow your path from your current position to the beginning of the maze by simply popping elements off of the list, one by one, in order.
This particular linked list is special in that it's two way: it has a pointer to both the next element and the previous one. Lists with only one of the two are called singly linked lists, this one with both is called a doubly linked list. Singly linked lists are one way only, and can only be traversed in one direction.
Think of your linked list as giant pile of strings, each with a starting end and a finishing end. As you walk through the maze, you tie a string at every node you visit and bring an end with you to the next square. If you have to backtrack, you bring the string back with you so it no longer points to the wrong square. Once you find your way to the end of the maze, you will be able to trace your steps by following the string.
Could you just explain what -> means exactly?
-> is an all-in-one pointer dereference and member access operator. Say we have:
PathNode * p = malloc(sizeof(*p));
PathNode q;
We can access p's and q's members in any of the following ways:
(*p).coordX;
q.coordX;
p->coordX;
(&q)->coordX;
I'm trying to implement a space efficient trie in C. This is my struct:
struct node {
char val; //character stored in node
int key; //key value if this character is an end of word
struct node* children[256];
};
When I add a node, it's index is the unsigned char cast of the character. For example, if I want to add "c", then
children[(unsigned char)'c']
is the pointer to the newly added node. However, this implementation requires me to declare a node* array of 256 elements. What I want to do is:
struct node** children;
and then when adding a node, just malloc space for the node and have
children[(unsigned char)'c']
point to the new node. The issue is that if I don't malloc space for children first, then I obviously can't reference any index or else that's a big error.
So my question is: how do I implement a trie such that it only stores the non-null pointers to its children?
You could try using a de la Briandais trie, where you only have one child pointer for each node, and every node also has a pointer to a "sibling", so that all siblings are effectively stored as a linked list rather than directly pointed to by the parent.
You can't really have it both ways and be both space efficient and have O(1) lookup in the children nodes.
When you only allocate space for the entries that's actually added, and not the null pointers, you can no longer do
children[(unsigned char)'c']
As you can no longer index directly into the array.
One alternative is to simply do a linear search through the children. and store an additional count of how many entries the children array has i.e.
children[(unsigned char)'c'] = ...;
Have to become
for(i = 0; i < len; i++) {
if(children[i] == 'c')
break;
}
if(i == len) {
//...reallocate and add space for one item in children
}
children[i] = ...;
If your tree ends up with a lot of non-empty entries at one level, you might insert the children in sorted order and do a binary search. Or you might add the childrens as a linked list instead of an array.
If you just want to do an English keyword search, I think you can minimize the size of your children, from 256 to just 26 - just enough to cover the 26 letters a-z.
Furthermore, you can use a linked list to keep the number of children even smaller so we can have more efficient iteration.
I haven't gone through the libraries yet but I think trie implementation will help.
You can be both space efficient and keep the constant lookup time by making child nodes of every node a hash table of nodes. Especially when Unicode characters are involved and the set of characters you can have in your dictionary is not limited to 52 + some, this becomes more of a requirement than a nicety. This way you can keep the advantages of using a trie and be time and space efficient at the same time.
I must also add that if the character set you are using is approaching unbounded, chances are having a linked list of nodes may just do fine. If you like an unmanageable nightmare, you can opt for a hybrid approach where first few levels keep their children in hash tables while the lower levels have a linked list of them. For a true bug farm, opt for a dynamic one where as each linked list passes a threshold, you convert it to a hash table on the fly. You could easily amortize the cost.
Possibilities are endless!