Managing duplicates in a binary tree with memory efficiency - c

I have a self balancing key-value binary tree (similar to Tarjan's Zip Tree) where there will be duplication of keys. To ensure O(log N) performance the only thing I can come up with is to maintain three pointers per node; a less than, a greater than, and an "equals". The equals pointer is a pointer to a linked-list of members having the same key.
This seems memory inefficient to me because I'll have an extra 8 bytes per node in the whole tree to handle the infrequent duplicate occurrences. Is there a better way that doesn't involve "cheats" like bit banging the left or right pointers for use as a flag?

When you have a collision insertion, allocate new buffer, copy new data.
Hash the new data pointer down to one or two bytes. You'll need a hash that only returns zero on zero input!
Store the hash value in your node. This field would be zero if there are no collision data, so you are O(log KeyCount) for all keys without extra data elements. You're worst case is log KeyCount plus whatever your hashing algorithm yields on lookups, which might be a constant close to 1 additional step until your table has to be resized.
Obviously, choice of hashing algorithm is critical here. Look for one that is good with pointer values on whatever architecture you are targeting. You may need different hashes for different architectures.
You can carry this even further by using only one byte hash values that get you the hash table that you then use the key hash (can be a larger integer) to find the pointer to the additional data. When a hash table fills up, insert a new one into the parent table. I'll leave the math to you.
Regarding data locality. Since the node data are large, you already don't have good node record to actual data locality anyway. This scheme doesn't change that, except in the case where you have multiple data nodes for a particular key, in which case, you'd likely have cache miss getting to the correct index of a variable array embedded in the node. This scheme avoids having to reallocate the nodes on collisions, and probably won't have a severe impact on your cache miss rate.

I usually use this setup when i do a binary search tree, it skips in an array the duplicates values:
#include <stdio.h>
#include <stdlib.h>
#define SIZE 13
typedef struct Node
{
struct Node * right;
struct Node * left;
int value;
}TNode;
typedef TNode * Nodo;
void bst(int data, Nodo * p )
{
Nodo pp = *p;
if(pp == NULL)
{
pp = (Nodo)malloc(sizeof(struct Node));
pp->right = NULL;
pp->left = NULL;
pp->value = data;
*p = pp;
}
else if(data == pp->value)
{
return;
}
else if(data > pp->value)
{
bst(data, &pp->right);
}
else
{
bst(data, &pp->left);
}
}
void displayDesc(Nodo p)
{
if(p != NULL)
{
displayDesc(p->right);
printf("%d\n", p->value);
displayDesc(p->left);
}
}
void displayAsc(Nodo p)
{
if(p != NULL)
{
displayAsc(p->left);
printf("%d\n", p->value);
displayAsc(p->right);
}
}
int main()
{
int arr[SIZE] = {4,1,0,7,5,88,8,9,55,42,0,5,6};
Nodo head = NULL;
for(int i = 0; i < SIZE; i++)
{
bst(arr[i], &head);
}
displayAsc(head);
exit(0);
}

Related

How to improve recursive function in C?

i have a program with various recursive functions.
I now need to optimize the code to run the program faster: i checked with profiler and, a part from the biggest function with lots of checks, i have two functions that require a lot of time every run.
One (Unmarked_Nodes) is like this:
typedef struct node* tree;
struct node{
char* data;
tree left;
tree right;
int marker;
};
static int remaining = 0;
int main(){
...
}
int Unmarked_Nodes(tree root) {
if (root != NULL) {
Unmarked_Nodes(root->left);
if (root->marker == 0)
remaining++;
Unmarked_Nodes(root->right);
}
return remaining;
}
The other is similar but instead of the if cycle it has a printf of data.
The other, however, is faster than this... why? Or instead: how can i improve the code to make it run faster?
Thanks in advance
Candidate improvements: might help a little although answer remains O(n).
Recurse less often
Loop inside the function for one of the children.
Avoid global
Simply not needed.
Use const
No so much a speed improvement, yet allows for use with constant data.
Avoid hiding pointers
int Unmarked_Nodes(const struct node *root) {
int remaining = 0;
while (root != NULL) {
remaining += Unmarked_Nodes(root->left);
if (root->marker == 0) {
remaining++;
}
root = root->right;
}
return remaining;
}
Perhaps only recurse when both children are non-NULL. Test null-ness at the end of the loop since it is initially false for all recursive entry.
static int Unmarked_Nodes2r(const struct node *root) {
int remaining = 0;
do {
if (root->marker == 0) {
remaining++;
}
if (root->left) {
if (root->right) {
remaining += Unmarked_Nodesr(root->right);
}
root = root->left;
// continue; // Could skip loop test.
} else {
root = root->right;
}
} while (root);
return remaining;
}
int Unmarked_Nodes2(const struct node *root) {
return root ? Unmarked_Nodes2r(root) : 0;
}
In the absence of more information, it would seem that you likely "visit" the tree three times: once for 'marking' nodes (for whatever purpose), once to 'print' marked (or unmarked) nodes, and once more to reset those marks.
Presuming that 'marked nodes' are the interesting ones, consider using a dynamic array of pointers (malloc/realloc in suitable increments) to build a list of only those nodes, print from that list (no 2nd tree traversal), then free() the list (no 3rd tree traversal).
You wouldn't need to 'mark/unmark' anything. Interesting nodes added to the suggested list mean that those nodes are 'marked', and 'unmarked' when the list is erased.
You may need to consider if 'marking' may encounter unwanted duplicates.
Another suggestion is to consider transforming the tree into a list once it is filled. Then, use conventional binary search of that list to mark 'nodes', and a sweep through to erase marks (presuming the same list is to be reused multiple times.
Another suggestion relates to whether you are marking to include or to exclude from the print traversal. If marked nodes are included, then simply 'unmark' them as you print them. If marked nodes are excluded, then mark all those other unmarked nodes being printed that haven't previously been 'excluded' and remember whether '0' means 'marked' or if '1' means marked for the next time it comes to searching/marking.

compact multiple-array implementation of doubly linked list with O(1) insertion and deletion

I am confused about my solution to an exercise (10.3-4) in CLRS (Cormen Intro to Algorithms 3ed). My implementation seems to be able to perform deletion + de-allocation in O(1) time, while two solutions I have found online both require O(n) time for these operations, and I want to know who is correct.
Here's the text of the exercise:
It is often desirable to keep all elements of a doubly linked list compact in storage, using, for example, the first m index locations in the multiple-array representation. (This is the case in a paged, virtual-memory computing environment.) Explain how to implement the procedures ALLOCATE OBJECT and FREE OBJECT so that the representation is compact. Assume that there are no pointers to elements of the linked list outside the list itself. (Hint: Use the array implementation of a stack.)
By "multiple-array representation", they are referring to an implementation of a linked list using next, prev, and key arrays, with indices acting as pointers stored in the arrays rather than objects with members pointing to next and prev. That particular implementation was discussed in the text of Section 10.3 of CLRS, while this particular exercise seems to be simply imposing the addition condition of having the elements be "compact", or, as I understand it, packed into the beginning of the arrays, without any gaps or holes with "inactive" elements.
There was a previous thread on the same exercise here, but that I couldn't figure out what I want to know from that thread.
The two solutions I found online are first one here and second one here, on page 6 of the pdf. Both solutions say to shift all elements after a gap down by one in order to fill the gap, taking O(n) time. My own implementation instead simply takes the last "valid" element in the array and uses it to fill any gap that is created, which happens only when elements are deleted. This maintains the "compactness" property. Of course, the appropriate prev and next "pointers" are updated, and this is O(1) time. Additionally, the ordinary implementation from Sec. 10.3 in the book, which does not require compactness, had a variable named "free" which pointed to the beginning of a second linked list, which has all the "non-valid" elements, which are available to be written over. For my implementation, since any insertion must be done at the earliest available, e.g. non-valid array slot, I simply had my variable "free" act more like the variable "top" in a stack. This seemed so obvious that I'm not sure why both of those solutions called for an O(n) "shift down everything after the gap" method. So which one is it?
Here is my C implementation. As far as I know, everything works and takes O(1) time.
typedef struct {
int *key, *prev, *next, head, free, size;
} List;
const int nil = -1;
List *new_list(size_t size){
List *l = malloc(sizeof(List));
l->key = malloc(size*sizeof(int));
l->prev = malloc(size*sizeof(int));
l->next = malloc(size*sizeof(int));
l->head = nil;
l->free = 0;
l->size = size;
return l;
}
void destroy_list(List *l){
free(l->key);
free(l->prev);
free(l->next);
free(l);
}
int allocate_object(List *l){
if(l->free == l->size){
printf("list overflow\n");
exit(1);
}
int i = l->free;
l->free++;
return i;
}
void insert(List *l, int k){
int i = allocate_object(l);
l->key[i] = k;
l->next[i] = l->head;
if(l->head != nil){
l->prev[l->head] = i;
}
l->prev[i] = nil;
l->head = i;
}
void free_object(List *l, int i){
if(i != l->free-1){
l->next[i] = l->next[l->free-1];
l->prev[i] = l->prev[l->free-1];
l->key[i] = l->key[l->free-1];
if(l->head == l->free-1){
l->head = i;
}else{
l->next[l->prev[l->free-1]] = i;
}
if(l->next[l->free-1] != nil){
l->prev[l->next[l->free-1]] = i;
}
}
l->free--;
}
void delete(List *l, int i){
if(l->prev[i] != nil){
l->next[l->prev[i]] = l->next[i];
}else{
l->head = l->next[i];
}
if(l->next[i] != nil){
l->prev[l->next[i]] = l->prev[i];
}
free_object(l, i);
}
Your approach is correct.
The O(n) "shift-everything-down" solution is also correct in the sense that it meets the specification of the problem, but clearly your approach is preferable from a runtime perspective.

Any faster methods to find data?

This is an Interview question.
We are developing a k/v system, part of it has been developed, we need you to finish it.
Things already done -
1) Return a hash of any string, you can assume return value is always unique, no collision,
it's up to you to use it or not
int hash(char *string);
Things you have to finish -
int set(char *key, char *value);
char *get(char *key);
And my answer was
struct kv {
int key;
char *value;
kv *next;
};
struct kv *top;
struct kv *end;
void set(char *key, char *value) {
if(top == NULL) {
top = malloc(struct kv);
end = top;
}
sturct kv *i = top;
int k = hash(key);
while(i != end) {
if(i->key == k) {
i->value = value;
return;
}
i = i->next;
}
i = malloc(struct kv);
i->key = k;
i->value = value;
end = i;
}
char *get(char *key) {
if(top == NULL) {
return NULL;
}
sturct kv *i = top;
int k = hash(key);
while(i != end) {
if(i->key == k) {
return i->value;
}
i = i->next;
}
return NULL;
}
Q: - Is there any faster way to do it? What do you think is the fastest way?
What you have done is made a linked list to store the key value pairs. But as you can see, the search complexity is O(n). You can make it faster by creating a hash table. You already have a hash function with guaranteed 0 collisions.
What you can do is
char* hash_tables[RANGE_OF_HASH] = {NULL}; // Your interviewer should provide you RANGE_OF_HASH
Then your set and get become -
void set(char* key, char* value) {
hash_table[hash(key)] = value; // Can do this because no collisions are guaranteed.
}
char* get(char* key) {
return hash_table[hash(key)];
}
In this case since you don't have to iterate over all the keys inserted, the get complexity is O(1) (also set).
But you need to be aware that this usually occupies more space than your approach.
Your method occupies O(n) space but this occupies O(RANGE_OF_HASH). Which might not be acceptable in situations where memory is a constraint.
If RANGE_OF_HASH is very huge(like INT_MAX) and you don't have enough memory for hash_table, you can create a multi level hash table.
For instance, your main hash_table will have only 256 slots. Each of the entry will point to another hash table of 256 entries and so on. You will have
to do some bit masking to get the hash value for each level. You can allocate each level on demand basis. This way you will minimize the memory usage.
There's lots of great ways of doing this. Here's a small reading list, go through it. There's definitely more out there that I'm not aware of.
Sorted list with binary search - Depending on the usage patterns, can be fast or slow to build, but lookups are guaranteed to be O(log(N)).
Hash table - fast, close to O(1) on average, O(N) in worst case for all operations.
Binary tree - best case O(log(N)), worst case O(N).
AVL tree - guaranteed O(log(N)) for all operations.
Red-black tree - similar to AVL but trades off lookup speed for more inserting speed.
Trie - True O(1) on all operations, at the expense of more memory usage.
After this, take a break, brace yourself, and delve into this article about computer memory. This is already advanced stuff and will show you that sometimes a worse big-O measure can actually perform better in real world scenarios. It's all down to what kind of data will there be and what the usage patterns are.

Pushing to a stack containing ONLY unique values in C

I've implemented a stack with pointers, that works like it's suppose too. Now, I need it push to the stack, without it pushing a duplicate. For example, if I push '2' into the stack, pushing another '2' will still result with only one '2' in the stack because it already exists.
Below is how I went about trying to create the new push function. I know that I'm suppose to traverse the stack and check it for the element I'm adding, but I guess I'm doing that wrong? Can anyone help me out?
typedef struct Node {
void *content;
struct Node *next;
} Node;
typedef struct Stack {
Node *head;
int count;
} Stack;
void push(Stack *stack, void *newElem) {
Node *newNode = (Node*) malloc(sizeof(Node));
if (stack->count > 0) {
int i;
for (i = 0, newNode = stack->head; i < stack->count; i++, newNode =
newNode->next) {
if (newNode->content == newElem) return;
}
} else {
newNode->next = stack->head;
newNode->content = newElem;
stack->head = newNode;
stack->count++;
}
}
if (newNode->content == newElem)
You are comparing two pointers. I guess you want to check whether their contents are equal:
#include <string.h>
if (memcmp(newNode->content, newElem, size) == 0)
The value size may be indicated by the caller. In your case, it should be sizeof(int).
Moreover, once you have traversed the stack, you don't add the element to your data structure.
The problem is that if your stack is non-empty, and you don't find the element already in the stack, you don't do anything. You need to get rid of the else keyword and make that code unconditional. Then, you allocate space for the new Node before you know if you need it or not, and, even worse, overwrite the newly allocated pointer with your iteration over the stack to see if you need to push it or not. So move the malloc down after the } ending the if
You already have a working
void push(Stack *stack, void *newElem);
right?
So, why not write a new function
int push_unique(Stack *stack, void *newElem) {
if (find_value(stack, newElem) != NULL) {
return 1; // indicate a collision
}
push(stack, newElem); // re-use old function
return 0; // indicate success
}
Now you've reduced the problem to writing
Node *find_value(Stack *stack, void *value);
... can you do that?
I'm not sure you realized it, but your proposed implementation is performing a linear search over a linked list. If you're pushing 2,000 elements on a stack with an average of 2 duplicates of each element value, that's 2,000 searches of a linked list averaging between 500-750 links(it depends on when, IE:what order, the duplicates are presented to the search function in. This requires 1 million+ compares. Not pretty.
A MUCH more efficient duplicate detection in find_value() above could use a hash table, with search time O(1), or a tree, with search time O(log N). The former if you know how many values you're potentially pushing onto the stack, and the latter if the number is unknown, like when receiving data from a socket in real-time. (if the former you could implement your stack in an array instead of a much slower, and more verbose linked-list)
In either case, to properly maintain the hashtable, your pop() function would need to be paired with a hashtable hashpop() function, which would remove the matching value from the hashtable.
With a Hashtable, your stack could just point to the element's value sitting in it's hash location - returned from find_value(). With a self-balancing tree however, the location of the node, and thus the element value, would be changing all the time, so you'd need to store the element's value in the stack, and the tree. Unless you're writing in a very tight memory environment, the performance the 2nd data structure would afford would be well worth the modest cost in memory.

Hashing with large data sets and C implementation

I have a large number of values ranging from 0 - 5463458053. To each value, I wish to map a set containing strings so that the operation lookup, i. e. finding whether a string is present in that set takes the least amount of time. Note that this set of values may not contain all values from (0 - 5463458053), but yes, a large number of them.
My current solution is to hash those values (between 0 - 5463458053) and for each value, have a linked list of strings corresponding to that value. Every time, I want to check for a string in a given set, I hash the value(between 0 - 5463458053), get the linked list, and traverse it to find out whether it contains the aforementioned string or not.
While this might seem easier, it's a little time consuming. Can you think of a faster solution? Also, collisions will be dreadful. They'll lead to wrong results.
The other part is about implementing this in C. How would I go about doing this?
NOTE: Someone suggested using a database instead. I wonder if that'll be useful.
I'm a little worried about running out of RAM naturally. :-)
You could have an hash-table of hash-sets. The first hash-table has keys your integers. The values inside it are hash-sets, i.e. hash-tables whose keys are strings.
You could also have an hashed set, with the keys being pairs of integers and strings.
There are many libraries implementing such data structures (and in C++, the standard library is implementing them, as std::map & std::set). For C, I was thinking of Glib from GTK.
With hashing techniques, memory use is proportional to the size of the considered sets (or relations). For instance, you could accept 30% emptiness rate.
Large number of strings + fast lookup + limited memory ----> you want a prefix trie, crit-bit tree, or anything of that family (many different names for very similar things, e.g. PATRICIA... Judy is one such thing too). See for example this.
These data structores allow for prefix-compression, so they are able to store a lot of strings (which somehow necessarily will have common prefixes) very efficiently. Also, lookup is very fast. Due to caching and paging effects that the common big-O notation does not account for, they can be as fast or even faster than a hash, at a fraction of the memory (even though according to big-O, nothing except maybe an array can beat a hash).
A Judy Array, with the C library that implements it, might be exactly the base of what you need. Here's a quote that describes it:
Judy is a C library that provides a state-of-the-art core technology
that implements a sparse dynamic array. Judy arrays are declared
simply with a null pointer. A Judy array consumes memory only when it
is populated, yet can grow to take advantage of all available memory
if desired. Judy's key benefits are scalability, high performance, and
memory efficiency. A Judy array is extensible and can scale up to a
very large number of elements, bounded only by machine memory. Since
Judy is designed as an unbounded array, the size of a Judy array is
not pre-allocated but grows and shrinks dynamically with the array
population. Judy combines scalability with ease of use. The Judy API
is accessed with simple insert, retrieve, and delete calls that do not
require extensive programming. Tuning and configuring are not required
(in fact not even possible). In addition, sort, search, count, and
sequential access capabilities are built into Judy's design.
Judy can be used whenever a developer needs dynamically sized arrays,
associative arrays or a simple-to-use interface that requires no
rework for expansion or contraction.
Judy can replace many common data structures, such as arrays, sparse
arrays, hash tables, B-trees, binary trees, linear lists, skiplists,
other sort and search algorithms, and counting functions.
If the entries are from 0 to N and consecutive: use an array. (Is indexing fast enough for you?)
EDIT: the numbers do not seem to be consecutive. There is a large number of {key,value} pairs, where the key is a big number (>32 bits but < 64 bits) and the value is a bunch of strings.
If memory is available, a hash table is easy, if the bunch of strings is not too large you can inspect them sequentially. If the same strings occur (much) more than once, you could enumerate the strings (put pointers to them in a char * array[] and use the index into that array instead. finding the index given a string probably involves another hash table)
For the "master" hashtable an entry would probably be:
struct entry {
struct entry *next; /* for overflow chain */
unsigned long long key; /* the 33bits number */
struct list *payload;
} entries[big_enough_for_all] ; /* if size is known in advance
, preallocation avoids a lot of malloc overhead */
if you have enough memory to store a heads-array, you chould certainly do that:
struct entry *heads[SOME_SIZE] = {NULL, };
, otherwise you can combine the heads array with the array of entries. (like I did Lookups on known set of integer keys here)
Handling collisions is easy: as you walk the overflow chain, just compare your key with the key in the entry. If they are unequal: walk on. If they are equal: found; now go walking the strings.
You can use a single binary search tree (AVL/Red-black/...) to contain all the strings, from all sets, by keying them lexicographically as (set_number, string). You don't need to store sets explicitly anywhere. For example, the comparator defining the order of nodes for the tree could look like:
function compare_nodes (node1, node2) {
if (node1.set_number < node2.set_number) return LESS;
if (node1.set_number > node2.set_number) return GREATER;
if (node1.string < node2.string) return LESS;
if (node1.string > node2.string) return GREATER;
return EQUAL;
}
With such a structure, some common operations are possible (but maybe not straightforward).
To find whether a string s exists in the set set_number, simply lookup (set_number, s) in the tree, for an exact match.
To find all strings in the set set_number:
function iterate_all_strings_in_set (set_number) {
// Traverse the tree from root downwards, looking for the given key. Return
// wherever the search ends up, whether it found the value or not.
node = lookup_tree_weak(set_number, "");
// tree empty?
if (node == null) {
return;
}
// We may have gotten the greatest node from the previous set,
// instead of the first node from the set we're interested in.
if (node.set_number != set_number) {
node = successor(node);
}
while (node != null && node.set_number == set_number) {
do_something_with(node.string);
node = successor(node);
}
}
The above requires O((k+1)*log(n)) time, where k is the number of strings in set_number, and n is the number of all strings.
To find all set numbers with at least one string associated:
function iterate_all_sets ()
{
node = first_node_in_tree();
while (node != null) {
current_set = node.set_number;
do_something_with(current_set);
if (cannot increment current_set) {
return;
}
node = lookup_tree_weak(current_set + 1, "");
if (node.set_number == current_set) {
node = successor(node);
}
}
}
The above requires O((k+1)*log(n)) time, where k is the number of sets with at least one string, and n is the number of all strings.
Note that the above code assumes that the tree is not modified in the "do_something" calls; it may crash if nodes are removed.
Addidionally, here's some real C code which demonstrates this, using my own generic AVL tree implemetation. To compile it, it's enough to copy the misc/ and structure/ folders from BadVPN source somewhere and add an include path there.
Note how my AVL tree does not contain any "data" in its nodes, and how it doesn't do any of its own memory allocation. This comes handy when you have a lot of data to work with. To make it clear: the program below does only a single malloc(), which is the one that allocates the nodes array.
#include <stdlib.h>
#include <stdio.h>
#include <inttypes.h>
#include <assert.h>
#include <structure/BAVL.h>
#include <misc/offset.h>
struct value {
uint32_t set_no;
char str[3];
};
struct node {
uint8_t is_used;
struct value val;
BAVLNode tree_node;
};
BAVL tree;
static int value_comparator (void *unused, void *vv1, void *vv2)
{
struct value *v1 = vv1;
struct value *v2 = vv2;
if (v1->set_no < v2->set_no) {
return -1;
}
if (v1->set_no > v2->set_no) {
return 1;
}
int c = strcmp(v1->str, v2->str);
if (c < 0) {
return -1;
}
if (c > 0) {
return 1;
}
return 0;
}
static void random_bytes (unsigned char *out, size_t n)
{
while (n > 0) {
*out = rand();
out++;
n--;
}
}
static void random_value (struct value *out)
{
random_bytes((unsigned char *)&out->set_no, sizeof(out->set_no));
for (size_t i = 0; i < sizeof(out->str) - 1; i++) {
out->str[i] = (uint8_t)32 + (rand() % 94);
}
out->str[sizeof(out->str) - 1] = '\0';
}
static struct node * find_node (const struct value *val)
{
// find AVL tree node with an equal value
BAVLNode *tn = BAVL_LookupExact(&tree, (void *)val);
if (!tn) {
return NULL;
}
// get node pointer from pointer to its value (same as container_of() in Linux kernel)
struct node *n = UPPER_OBJECT(tn, struct node, tree_node);
assert(n->val.set_no == val->set_no);
assert(!strcmp(n->val.str, val->str));
return n;
}
static struct node * lookup_weak (const struct value *v)
{
BAVLNode *tn = BAVL_Lookup(&tree, (void *)v);
if (!tn) {
return NULL;
}
return UPPER_OBJECT(tn, struct node, tree_node);
}
static struct node * first_node (void)
{
BAVLNode *tn = BAVL_GetFirst(&tree);
if (!tn) {
return NULL;
}
return UPPER_OBJECT(tn, struct node, tree_node);
}
static struct node * next_node (struct node *node)
{
BAVLNode *tn = BAVL_GetNext(&tree, &node->tree_node);
if (!tn) {
return NULL;
}
return UPPER_OBJECT(tn, struct node, tree_node);
}
size_t num_found;
static void iterate_all_strings_in_set (uint32_t set_no)
{
struct value v;
v.set_no = set_no;
v.str[0] = '\0';
struct node *n = lookup_weak(&v);
if (!n) {
return;
}
if (n->val.set_no != set_no) {
n = next_node(n);
}
while (n && n->val.set_no == set_no) {
num_found++; // "do_something_with_string"
n = next_node(n);
}
}
static void iterate_all_sets (void)
{
struct node *node = first_node();
while (node) {
uint32_t current_set = node->val.set_no;
iterate_all_strings_in_set(current_set); // "do_something_with_set"
if (current_set == UINT32_MAX) {
return;
}
struct value v;
v.set_no = current_set + 1;
v.str[0] = '\0';
node = lookup_weak(&v);
if (node->val.set_no == current_set) {
node = next_node(node);
}
}
}
int main (int argc, char *argv[])
{
size_t num_nodes = 10000000;
// init AVL tree, using:
// key=(struct node).val,
// comparator=value_comparator
BAVL_Init(&tree, OFFSET_DIFF(struct node, val, tree_node), value_comparator, NULL);
printf("Allocating...\n");
// allocate nodes (missing overflow check...)
struct node *nodes = malloc(num_nodes * sizeof(nodes[0]));
if (!nodes) {
printf("malloc failed!\n");
return 1;
}
printf("Inserting %zu nodes...\n", num_nodes);
size_t num_inserted = 0;
// insert nodes, giving them random values
for (size_t i = 0; i < num_nodes; i++) {
struct node *n = &nodes[i];
// choose random set number and string
random_value(&n->val);
// try inserting into AVL tree
if (!BAVL_Insert(&tree, &n->tree_node, NULL)) {
printf("Insert collision: (%"PRIu32", '%s') already exists!\n", n->val.set_no, n->val.str);
n->is_used = 0;
continue;
}
n->is_used = 1;
num_inserted++;
}
printf("Looking up...\n");
// lookup all those values
for (size_t i = 0; i < num_nodes; i++) {
struct node *n = &nodes[i];
struct node *lookup_n = find_node(&n->val);
if (n->is_used) { // this node is the only one with this value
ASSERT(lookup_n == n)
} else { // this node was an insert collision; some other
// node must have this value
ASSERT(lookup_n != NULL)
ASSERT(lookup_n != n)
}
}
printf("Iterating by sets...\n");
num_found = 0;
iterate_all_sets();
ASSERT(num_found == num_inserted)
printf("Removing all strings...\n");
for (size_t i = 0; i < num_nodes; i++) {
struct node *n = &nodes[i];
if (!n->is_used) { // must not remove it it wasn't inserted
continue;
}
BAVL_Remove(&tree, &n->tree_node);
}
return 0;
}

Resources