C - Hashing a void type? - c

I've made my own implementation of a HashMap/HashTable (i know they are different but it's irrelevant for this question).
In this implementation, I'd like it to be very flexible. I want to be able to store ints, structs, chars, strings, etc all as keys or values without having to change the code of my algorithms. For example, in Java I can just do:
HashMap<Integer, MyPersonalClass> and it will just work. In C, I know there is no direct equivalent except void*. The issue is, if I have:
/* Node structure. */
struct hm_Node
{
void *key, *value;
struct hm_Node *next;
};
As the Node(s) that make up my HashMap/HashTable, then my hash() method needs to somehow parse the key correctly. So far I've only looked up an algorithm for char*.
Is there something like:
// This may not be valid code, just using it as an example
unsigned int hash(void *ptr)
{
switch(typeof(ptr)) // I know ptr is of type void*
{
case char*: ... break;
case char: ... break;
case int: ... break;
}
}
How does that work exactly? I'm just trying to avoid having a whole different implementation for a HashMap of X, Y, and Z types. Thanks.

Look at the implementation of, for example qsort:They let the user provide the comparison function in order to be able to implement arbitrary sorts.
You can go the same way by letting the user provide a proper hash function through a function pointer - If you want, you can supply them with some pre-built hash functions for standard types they can re-use.

Related

How to write C function accepting (one) argument of any type

I am implementing simple library for lists in C, and I have a problem with writing find function.
I would like my function to accept any type of argument to find, both:
find(my_list, 3) and find(my_list, my_int_var_to_find).
I already have information what is type of list's elements.
For now I've found couple of ways dealing with this:
different function with suffix for different types: int findi(void* list, int i), int findd(void* list, double d) - but I don't like this approach, it seems like redundancy for me and an API is confusing.
using union:
typedef union {
int i;
double d;
char c;
...
} any_type;
but this way I force user to both know about any_type union, and to create it before invocation of find. I would like to avoid that.
using variadic function: int find(void* list, ...). I like this approach. However, I am concerned about no restrictions on number of arguments. User is free to write int x = find(list, 1, 2.0, 'c') although I don't know what it should mean.
I have seen also answer to this question: C : send different structures for one function argument but it's irrelevant, because I want to accept non-pointer arguments.
What is the proper way of handling this function?
You could instead try implementing your function similar to a generic function like bsearch, which can perform a binary search on an array of any data type:
void *bsearch(const void *key, const void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *))
Rather than hard-coding the different implementations for different data types inside your function, you instead pass a pointer to a function which will do the type-dependent operation, and only it knows the underlying implementation. In your case, that could be some sort of traversal/iteration function.
The other thing bsearch needs to know (apart from the obvious - search key and array length) is the size of each element in the array, so that it can calculate the address of each element in the array and pass it to the comparison function.
If you had a finite list of types that were to be operated on, there's nothing wrong with having a family of findX() functions. The above method requires a function for each data type to be passed to the bsearch function, however one of the main differences is that common functionality doesn't need to be repeated and the generic function can be used for any data type.
I wouldn't really say there's any proper way to do this, it's up to you and really depends on the problem you're trying to solve.
I am not sure whether answering my own question is polite, but I want your opinion.
I tried to solve this problem using va_list. Why so? Because this way I can write only one function. Please, mind that I know what type the argument should be. This way I can do this:
int find(void* list, ...) {
any_type object = {0};
int i = -1;
va_list args;
va_start(args, list);
switch(type_of_elem(list)) {
case INT: object.i = va_arg(args, int); break;
...
}
/* now &object is pointer to memory ready for comparision
* f.eg. using memcmp */
return i;
}
The advantage of this solution is that I can wrap presented switch-case and reuse it with other functions.
After researching a little bit more on my concern regarding no limit on number of arguments I realized that printf lacks this limit either. You can write printf("%d", 1, 2, 3).
But I tweaked my solution with additional macro:
#define find_(list, object) find((list), (object))
Which produces error message at compile time, saying that find_ macro expects 2 arguments not 3.
What do you think about it? Do you think this is better solution than previously suggested?

How to check if a void* pointer can be safely cast to something else?

Let's say I have this function, which is part of some gui toolkit:
typedef struct _My_Struct My_Struct;
/* struct ... */
void paint_handler( void* data )
{
if ( IS_MY_STRUCT(data) ) /* <-- can I do something like this? */
{
My_Struct* str = (My_Struct*) data;
}
}
/* in main() */
My_Struct s;
signal_connect( SIGNAL_PAINT, &paint_handler, (void*) &s ); /* sent s as a void* */
Since the paint_handler will also be called by the GUI toolkit's main loop with other arguments, I cannot always be sure that the parameter I am receiving will always be a pointer to s.
Can I do something like IS_MY_STRUCT in the paint_handler function to check that the parameter I am receiving can be safely cast back to My_Struct* ?
Your void pointer looses all its type information, so by that alone, you cannot check if it can be cast safely. It's up to the programmer to know if a void* can be cast safely to a type.
Unfortunately there is no function to check what the pointer was before it appears in that context (void).
The one solution I can think of is if you place an int _struct_id as the first member of all of your structs. This id member can then be safely checked regardless of the type but this will fail if you pass pointers that don't implement this member (or int, char, ... pointers).
The best you could do would be to look at what data points to to see if it has telltale signs of being what you want, although a) it wouldn't be anywhere close to a guarantee and b) might be dangerous, as you don't know how big the thing data actually points to is. I suppose it isn't any more dangerous than just casting it and using it, but (as has been suggested) a redesign would be better.
If you are creating the type that is being used, you could include as part of the type some kind of identifying information that would help you rule out some void pointers as not being of the type you are looking for. While you would run the chance that some random area of memory would contain the same data or signature as what you are looking for, at least you would know when something was not the type you were looking for.
This approach would require that the struct was initialized in such a way that the signature members, used to determine if the memory area is not valid, is initialized to the signature value.
An example:
typedef struct {
ULONG ulSignature1;
// .. data elements that you want to have
ULONG ulSignature2;
} MySignedStruct;
#define MYSIGNEDSTRUCT_01 0x1F2E3D4C
#define MYSIGNEDSTRUCT_02 0xF1E2D3C4
#define IS_MY_STRUCT(sAdr) ( (((MySignedStruct *)sAdr)->ulSignature1 == MYSIGNEDSTRUCT_01 ) && (((MySignedStruct *)sAdr)->ulSignature1 == MYSIGNEDSTRUCT_02))
This is kind of a rough approach however it can help. Naturally using a macro like IS_MY_STRUCT() where the argument is used twice can be problematic if the argument has a side effect so you would have to be careful of something like IS_MY_STRUCT(xStruct++) where xStruct is a pointer to a MySignedStruct.
There really isn't in c. void pointers are typeless, and should only ever be casted when you truly know what they point to.
Perhaps you should instead reconsider your design; rewrite your code so that no inspection is necessary. This is the same reason google disallows RTTI in its style guide.
I know the question is 3 years old but here I go,
How about using a simple global enum to distinguish where the function is called from. then you can switch between what type to cast the void pointer to.

Access struct members as if they are a single array?

I have two structures, with values that should compute a pondered average, like this simplified version:
typedef struct
{
int v_move, v_read, v_suck, v_flush, v_nop, v_call;
} values;
typedef struct
{
int qtt_move, qtt_read, qtt_suck, qtd_flush, qtd_nop, qtt_call;
} quantities;
And then I use them to calculate:
average = v_move*qtt_move + v_read*qtt_read + v_suck*qtt_suck + v_flush*qtd_flush + v_nop*qtd_nop + v_call*qtt_call;
Every now and them I need to include another variable. Now, for instance, I need to include v_clean and qtt_clean. I can't change the structures to arrays:
typedef struct
{
int v[6];
} values;
typedef struct
{
int qtt[6];
} quantities;
That would simplify a lot my work, but they are part of an API that need the variable names to be clear.
So, I'm looking for a way to access the members of that structures, maybe using sizeof(), so I can treat them as an array, but still keep the API unchangeable. It is guaranteed that all values are int, but I can't guarantee the size of an int.
Writing the question came to my mind... Can a union do the job? Is there another clever way to automatize the task of adding another member?
Thanks,
Beco
What you are trying to do is not possible to do in any elegant way. It is not possible to reliably access consecutive struct members as an array. The currently accepted answer is a hack, not a solution.
The proper solution would be to switch to an array, regardless of how much work it is going to require. If you use enum constants for array indexing (as #digEmAll suggested in his now-deleted answer), the names and the code will be as clear as what you have now.
If you still don't want to or can't switch to an array, the only more-or-less acceptable way to do what you are trying to do is to create an "index-array" or "map-array" (see below). C++ has a dedicated language feature that helps one to implement it elegantly - pointers-to-members. In C you are forced to emulate that C++ feature using offsetof macro
static const size_t values_offsets[] = {
offsetof(values, v_move),
offsetof(values, v_read),
offsetof(values, v_suck),
/* and so on */
};
static const size_t quantities_offsets[] = {
offsetof(quantities, qtt_move),
offsetof(quantities, qtt_read),
offsetof(quantities, qtt_suck),
/* and so on */
};
And if now you are given
values v;
quantities q;
and index
int i;
you can generate the pointers to individual fields as
int *pvalue = (int *) ((char *) &v + values_offsets[i]);
int *pquantity = (int *) ((char *) &q + quantities_offsets[i]);
*pvalue += *pquantity;
Of course, you can now iterate over i in any way you want. This is also far from being elegant, but at least it bears some degree of reliability and validity, as opposed to any ugly hack. The whole thing can be made to look more elegantly by wrapping the repetitive pieces into appropriately named functions/macros.
If all members a guaranteed to be of type int you can use a pointer to int and increment it:
int *value = &(values.v_move);
int *quantity = &(quantities.qtt_move);
int i;
average = 0;
// although it should work, a good practice many times IMHO is to add a null as the last member in struct and change the condition to quantity[i] != null.
for (i = 0; i < sizeof(quantities) / sizeof(*quantity); i++)
average += values[i] * quantity[i];
(Since the order of members in a struct is guaranteed to be as declared)
Writing the question came to my mind... Can a union do the job? Is there another clever way to automatize the task of adding another member?
Yes, a union can certainly do the job:
union
{
values v; /* As defined by OP */
int array[6];
} u;
You can use a pointer to u.values in your API, and work with u.array in your code.
Personally, I think that all the other answers break the rule of least surprise. When I see a plain struct definition, I assume that the structure will be access using normal access methods. With a union, it's clear that the application will access it in special ways, which prompts me to pay extra attention to the code.
It really sounds as if this should have been an array since the beggining, with accessor methods or macros enabling you to still use pretty names like move, read, etc. However, as you mentioned, this isn't feasible due to API breakage.
The two solutions that come to my mind are:
Use a compiler specific directive to ensure that your struct is packed (and thus, that casting it to an array is safe)
Evil macro black magic.
How about using __attribute__((packed)) if you are using gcc?
So you could declare your structures as:
typedef struct
{
int v_move, v_read, v_suck, v_flush, v_nop, v_call;
} __attribute__((packed)) values;
typedef struct
{
int qtt_move, qtt_read, qtt_suck, qtd_flush, qtd_nop, qtt_call;
} __attribute__((packed)) quantities;
According to the gcc manual, your structures will then use the minimum amount of memory possible for storing the structure, omitting any padding that might have normally been there. The only issue would then be to determine the sizeof(int) on your platform which could be done through either some compiler macros or using <stdint.h>.
One more thing is that there will be a performance penalty for unpacking and re-packing the structure when it needs to be accessed and then stored back into memory. But at least you can be assured then that the layout is consistent, and it could be accessed like an array using a cast to a pointer type like you were wanting (i.e., you won't have to worry about padding messing up the pointer offsets).
Thanks,
Jason
this problem is common, and has been solved in many ways in the past. None of them is completely safe or clean. It depends on your particuar application. Here's a list of possible solutions:
1) You can redefine your structures so fields become array elements, and use macros to map each particular element as if it was a structure field. E.g:
struct values { varray[6]; };
#define v_read varray[1]
The disadvantage of this approach is that most debuggers don't understand macros. Another problem is that in theory a compiler could choose a different alignment for the original structure and the redefined one, so the binary compatibility is not guaranted.
2) Count on the compiler's behaviour and treat all the fields as it they were array fields (oops, while I was writing this, someone else wrote the same - +1 for him)
3) create a static array of element offsets (initialized at startup) and use them to "map" the elements. It's quite tricky, and not so fast, but has the advantage that it's independent of the actual disposition of the field in the structure. Example (incomplete, just for clarification):
int positions[10];
position[0] = ((char *)(&((values*)NULL)->v_move)-(char *)NULL);
position[1] = ((char *)(&((values*)NULL)->v_read)-(char *)NULL);
//...
values *v = ...;
int vread;
vread = *(int *)(((char *)v)+position[1]);
Ok, not at all simple. Macros like "offsetof" may help in this case.

How to let a user to create a function? [ library ? ]

Is there any free library to let a user easily build a C mathematical expression, which can be used like any other function? I mean c expression/function which could be as quick as 'inline' mathematical expression and could be used many times in program.
I think I can be done in C somehow, but does anybody can tell if it could be real if it have to be a CUDA dveice function?
There are a few options. I assume you want something that the user can "call" several times, like this:
void *s = make_func("2 * x + 7");
...
printf("%lf\n", call_func(s, 3.0)); // prints 13
...
printf("%lf\n", call_func(s, 5.0)); // prints 17
...
free_func(s);
One option is to implement this as a recursive structure holding function pointers and constants. Something like:
enum item_type { VAR, CONST, FUNC };
struct var {
enum item_type;
int id;
};
struct constant {
enum item_type;
double value;
};
struct func {
enum item_type;
double (*func)(double, double);
enum item_type *a, *b;
};
Then make_func would parse the above string into something like:
(struct func *){ FUNC, &plus,
(struct func *){ FUNC, &times,
(struct constant *){ CONST, 2 },
(struct var *){ VAR, 'x' } }
(struct constant *){ CONST, 7 } }
If you can understand that - the enum type_item in the struct func is used to point to the next node in the tree (or rather, the first element of that node, which is the enum), and the enum is what our code uses to find out what the item type is. Then, when we use the call(void *, ...) function, it counts how many variables there are - this is how many extra arguments the call function should have been passed - then replaces the variables with the values we've called it with, then does the calculations.
The other option (which will probably be considerably faster and easier to extend) is to use something like libjit to do most of that work for you. I've never used it, but a JIT compiler gives you some basic building blocks (like add, multiply, etc. "instructions") that you can string together as you need them, and it compiles them down to actual assembly code (so no going through a constructed syntax tree calling function pointers like we had to before) so that when you call it it's as fast and dynamic as possible.
I don't know libjit's API, but it looks easily capable of doing what you seem to need. The make_func and free_func could all be pretty much the same as they are above (you might have to alter your calls to call_func) and would basically construct, use, and destroy a JIT object based on how it parses the user's string. The same as above, really, but you wouldn't need to define the syntax tree, data types, etc. yourself.
Hope that is somewhat helpful.
libtcc (from TCC) can be used as a very small and fast JIT; see libtcc_test.cc for sample usage.

union versus void pointer

What would be the differences between using simply a void* as opposed to a union? Example:
struct my_struct {
short datatype;
void *data;
}
struct my_struct {
short datatype;
union {
char* c;
int* i;
long* l;
};
};
Both of those can be used to accomplish the exact same thing, is it better to use the union or the void* though?
I had exactly this case in our library. We had a generic string mapping module that could use different sizes for the index, 8, 16 or 32 bit (for historic reasons). So the code was full of code like this:
if(map->idxSiz == 1)
return ((BYTE *)map->idx)[Pos] = ...whatever
else
if(map->idxSiz == 2)
return ((WORD *)map->idx)[Pos] = ...whatever
else
return ((LONG *)map->idx)[Pos] = ...whatever
There were 100 lines like that. As a first step, I changed it to a union and I found it to be more readable.
switch(map->idxSiz) {
case 1: return map->idx.u8[Pos] = ...whatever
case 2: return map->idx.u16[Pos] = ...whatever
case 3: return map->idx.u32[Pos] = ...whatever
}
This allowed me to see more clearly what was going on. I could then decide to completely remove the idxSiz variants using only 32-bit indexes. But this was only possible once the code got more readable.
PS: That was only a minor part of our project which is about several 100’000 lines of code written by people who do not exist any more. The changes to the code have to be gradual, in order not to break the applications.
Conclusion: Even if people are less used to the union variant, I prefer it because it can make the code much lighter to read. On big projects, readability is extremely important, even if it is just you yourself, who will read the code later.
Edit: Added the comment, as comments do not format code:
The change to switch came before (this is now the real code as it was)
switch(this->IdxSiz) {
case 2: ((uint16_t*)this->iSort)[Pos-1] = (uint16_t)this->header.nUz; break;
case 4: ((uint32_t*)this->iSort)[Pos-1] = this->header.nUz; break;
}
was changed to
switch(this->IdxSiz) {
case 2: this->iSort.u16[Pos-1] = this->header.nUz; break;
case 4: this->iSort.u32[Pos-1] = this->header.nUz; break;
}
I shouldn't have combined all the beautification I did in the code and only show that step. But I posted my answer from home where I had no access to the code.
In my opinion, the void pointer and explicit casting is the better way, because it is obvious for every seasoned C programmer what the intent is.
Edit to clarify: If I see the said union in a program, I would ask myself if the author wanted to restrict the types of the stored data. Perhaps some sanity checks are performed which make sense only on integral number types.
But if I see a void pointer, I directly know that the author designed the data structure to hold arbitrary data. Thus I can use it for newly introduced structure types, too.
Note that it could be that I cannot change the original code, e.g. if it is part of a 3rd party library.
It's more common to use a union to hold actual objects rather than pointers.
I think most C developers that I respect would not bother to union different pointers together; if a general-purpose pointer is needed, just using void * certainly is "the C way". The language sacrifices a lot of safety in order to allow you to deliberately alias the types of things; considering what we have paid for this feature we might as well use it when it simplifies the code. That's why the escapes from strict typing have always been there.
The union approach requires that you know a priori all the types that might be used. The void * approach allows storing data types that might not even exist when the code in question is written (though doing much with such an unknown data type can be tricky, such as requiring passing a pointer to a function to be invoked on that data instead of being able to process it directly).
Edit: Since there seems to be some misunderstanding about how to use an unknown data type: in most cases, you provide some sort of "registration" function. In a typical case, you pass in pointers to functions that can carry out all the operations you need on an item being stored. It generates and returns a new index to be used for the value that identifies the type. Then when you want to store an object of that type, you set its identifier to the value you got back from the registration, and when the code that works with the objects needs to do something with that object, it invokes the appropriate function via the pointer you passed in. In a typical case, those pointers to functions will be in a struct, and it'll simply store (pointers to) those structs in an array. The identifier value it returns from registration is just the index into the array of those structs where it has stored this particular one.
Although using union is not common nowadays, since union is more definitive for your usage scenario, suits well. In the first code sample it's not understood the content of data.
My preference would be to go the union route. The cast from void* is a blunt instrument and accessing the datum through a properly typed pointer gives a bit of extra safety.
Toss a coin. Union is more commonly used with non-pointer types, so it looks a bit odd here. However the explicit type specification it provides is decent implicit documentation. void* would be fine so long as you always know you're only going to access pointers. Don't start putting integers in there and relying on sizeof(void*) == sizeof (int).
I don't feel like either way has any advantage over the other in the end.
It's a bit obscured in your example, because you're using pointers and hence indirection. But union certainly does have its advantages.
Imagine:
struct my_struct {
short datatype;
union {
char c;
int i;
long l;
};
};
Now you don't have to worry about where the allocation for the value part comes from. No separate malloc() or anything like that. And you might find that accesses to ->c, ->i, and ->l are a bit faster. (Though this might only make a difference if there are lots of these accesses.)
It really depends on the problem you're trying to solve. Without that context it's really impossible to evaluate which would be better.
For example, if you're trying to build a generic container like a list or a queue that can handle arbitrary data types, then the void pointer approach is preferable. OTOH, if you're limiting yourself to a small set of primitive data types, then the union approach can save you some time and effort.
If you build your code with -fstrict-aliasing (gcc) or similar options on other compilers, then you have to be very careful with how you do your casting. You can cast a pointer as much as you want, but when you dereference it, the pointer type that you use for the dereference must match the original type (with some exceptions). You can't for example do something like:
void foo(void * p)
{
short * pSubSetOfInt = (short *)p ;
*pSubSetOfInt = 0xFFFF ;
}
void goo()
{
int intValue = 0 ;
foo( &intValue ) ;
printf( "0x%X\n", intValue ) ;
}
Don't be suprised if this prints 0 (say) instead of 0xFFFF or 0xFFFF0000 as you may expect when building with optimization. One way to make this code work is to do the same thing using a union, and the code will probably be easier to understand too.
The union reservs enough space for the largest member, they don't have to be same, as void* has a fixed size, whereas the union can be used for arbitrary size.
#include <stdio.h>
#include <stdlib.h>
struct m1 {
union {
char c[100];
};
};
struct m2 {
void * c;
};
int
main()
{
printf("sizeof m1 is %d ",sizeof(struct m1));
printf("sizeof m2 is %d",sizeof(struct m2));
exit(EXIT_SUCCESS);
}
Output:
sizeof m1 is 100 sizeof m2 is 4
EDIT: assuming you only use pointers of the same size as void* , I think the union is better, as you will gain a bit of error detection when trying to set .c with an integer pointer, etc'.
void* , unless you're creating you're own allocator, is definitely quick and dirty, for better or for worse.

Resources