C - Hashing Struct with n amount of unsigned int properties - c

I have a struct:
struct A
{
unsigned int a, b, c, d, ...
}
I want to make a function:
unsigned int A_hash(const A* const var)
{
return ...
}
The number returned needs to be very very large as modulus for HashTable insertion will not work properly if A_hash(var) < myHashTable.capacity.
I've seen questions like this before like "Hash function that takes in two integers", "hash function that takes in five integers", etc but what aboutn integers? I'm looking for a more general algorithm for decent hashing. It doesn't need to be enterprise-level.
I was thinking perhaps start with a massive number like
return (0x7FFFFFFFF & a) + (0x7FFFFFFFF & b) + ...
but I don't think this will be good enough. I also don't know how to stop the A_hash function from overflowing but that may be another problem all together.

I think implicitly you are asking how it is possible to treat the entire object just like a long byte-stream, like #bruceg explained. If I'm wrong, then you might as well ignore this answer, because this is what I will address. Note that this solution does not apply merely for hashing, but for anything that requires you to treat data like bytes (such as copying from/writing to memory or files).
I think what you are looking for is merely reading byte by byte. For this you can insipre yourself from std::ostream::write (which is a C++ method though). For example, you could write A_hash in such a way that you could invoke it like this :
int hash = A_hash((char*)&a, sizeof(a)); // where 'a' is of type 'struct A'.
You could write A_hash, for example, like this:
unsigned int A_hash(char* data, unsigned int dataSize)
{
unsigned int hash = someValue;
for (unsigned int i = 0; i < dataSize; ++i)
{
char byte = data[i];
doSomethingWith(hash);
}
return hash;
}
The great advantage of this method is that you don't need to rewrite the function if you add/remove fields to your struct ; sizeof(A) will expand/reduce at compile-time. The other great advantage is that it works for any value, so you can reuse that function with any type you want, including int, another struct, an enum, a pointer, ...

Related

In C program, how to dynamically select data type

I have encountered a problem, and I haven't found an answer in the internet and forums. I hope you can help.
There is an interface requirement to update the LOG parameter. The interface passes in the parameter index and the value to be added, and the interface adds the value to the corresponding parameter. The interface requires simple implementation and no complicated judgments.
My idea is to create a mapping table that records the starting address and data type of each parameter. When the interface is called, the parameter address is obtained according to the parameter index and forced to be converted to the corresponding type, and then the addition operation is performed.
The problem with this solution is that the increase_log_info function is too complicated. How to simplify the increase_log_info function in C language? How parameter types can be mapped directly, rather than through an if...else condition.
Thanks in advance.
Note:
T_LOG_INFO is data structure definition and cannot be changed.
LOG_INFO_INCREASE is an update parameter interface provided
externally and cannot be changed.
Other codes can be changed.
#pragma once
/********************************can not be change, begin**********************************/
/* T_LOG_INFO is data structure definition and cannot be changed. */
typedef struct
{
unsigned short tx_num;
unsigned int tx_bytes;
unsigned short rx_num;
unsigned int rx_bytes;
unsigned char discard_num;
unsigned int discard_bytes;
// There are many parameters behind, not listed
}T_LOG_INFO;
T_LOG_INFO g_log_info;
/* This macro is called very frequently, and efficiency needs to be considered.
** LOG_INFO_INCREASE is an update parameter interface provided externally and cannot be changed. */
//#define LOG_INFO_INCREASE(para_idx, inc_val)
/********************************can not be change, end**********************************/
/********************************an alternative, begin**********************************/
enum
{
LOG_PARA_IDX_TX_NUM,
LOG_PARA_IDX_TX_BYTES,
LOG_PARA_IDX_RX_NUM,
LOG_PARA_IDX_RX_BYTES,
LOG_PARA_IDX_DISCARD_NUM,
LOG_PARA_IDX_DISCARD_BYTES,
LOG_PARA_IDX_MAX
};
enum
{
DATA_TYPE_U8,
DATA_TYPE_U16,
DATA_TYPE_U32
};
typedef struct
{
/* Indicates the offset of this parameter in the structure. */
unsigned char offset;
/* Indicates the data type of the parameter. */
unsigned char data_type;
}T_PARA_MAPPING;
/* This table can also be calculated during system initialization. */
T_PARA_MAPPING g_para_mapping_table[LOG_PARA_IDX_MAX] =
{
{0, DATA_TYPE_U16}, // LOG_PARA_IDX_TX_NUM
{4, DATA_TYPE_U32}, // LOG_PARA_IDX_TX_BYTES
{8, DATA_TYPE_U16}, // LOG_PARA_IDX_RX_NUM
{12, DATA_TYPE_U32}, // LOG_PARA_IDX_RX_BYTES
{16, DATA_TYPE_U8}, // LOG_PARA_IDX_DISCARD_NUM
{20, DATA_TYPE_U32} // LOG_PARA_IDX_DISCARD_BYTES
};
/* How to simplify the function??? especially to remove the judgment. */
static inline void increase_log_info(unsigned int para_idx, unsigned inc_val)
{
unsigned int data_type = g_para_mapping_table[para_idx].data_type;
/* Get the parameter address and cast it to the corresponding type pointer before adding. */
if (data_type == DATA_TYPE_U8)
{
*((unsigned char*)(((unsigned char*)&g_log_info) + g_para_mapping_table[para_idx].offset)) += inc_val;
}
else if (data_type == DATA_TYPE_U16)
{
*((unsigned short*)(((unsigned char*)&g_log_info) + g_para_mapping_table[para_idx].offset)) += inc_val;
}
else
{
*((unsigned int*)(((unsigned char*)&g_log_info) + g_para_mapping_table[para_idx].offset)) += inc_val;
}
}
/* This macro is called very frequently, and efficiency needs to be considered. */
#define LOG_INFO_INCREASE(para_idx, inc_val) increase_log_info(para_idx, inc_val)
/********************************an alternative, end**********************************/
/********************************test case, begin**********************************/
void increase_log_info_test()
{
LOG_INFO_INCREASE(LOG_PARA_IDX_TX_NUM, 1);
LOG_INFO_INCREASE(LOG_PARA_IDX_TX_NUM, 2);
LOG_INFO_INCREASE(LOG_PARA_IDX_TX_NUM, 3);
LOG_INFO_INCREASE(LOG_PARA_IDX_RX_BYTES, 10);
LOG_INFO_INCREASE(LOG_PARA_IDX_RX_BYTES, 20);
LOG_INFO_INCREASE(LOG_PARA_IDX_RX_BYTES, 30);
}
/********************************test case, end**********************************/
Quick answer with, maybe, syntax errors. But I hope the idea can be grasped.
I would prepare an array with a "datatype" for every member of the T_LOG_INFO struct:
{
unsigned short tx_num;
unsigned int tx_bytes;
unsigned short rx_num;
...
}
Copy/paste the above struct and, with a lot of editing, the array would be declared like this:
const char datatypes[LOG_PARA_IDX_MAX] = {
/* unsigned short tx_num; */ 2,
/* unsigned int tx_bytes; */ 4,
/* unsigned short rx_num; */ 2,
...
}
For lazyness, I used numbers like 2, 4 and so on. They indicate mainly the length, but they can carry other info (20=array of 20 char...).
Then I would declare another data structure (the final one):
struct datadesc {
int addr;
char kind;
} datadesc_array[LOG_PARA_IDX_MAX];
Prepare the table in code (the program itself) with:
address=&g_log_info;
for (int i=0; i<LOG_PARA_IDX_MAX; i++) {
datadesc_array[i].addr = address;
datadesc_array[i].kind = datatypes[i];
address += datatypes[i]; // simple like this if datatypes[] only accounts for length...
}
At this point, when you receive a command, you do:
param_addr = datadesc[para_idx].addr;
param_kind = datadesc[para_idx].kind;
switch (param_kind) {
case 2: // unsigned short
*(short*) param_addr = ...
break;
case 4: // unsigned int
*(unsigned int*) param_addr = ...
}
This way you have a reduced set of cases, just one for every data type you cope with. The only long work is done while preparing the datatypes[] array,
Basically, you can't. Data types in C are a purely static, compile-time construct. There's no such thing as a variable that holds a type or anything like that.
So you fundamentally can't avoid a chain of ifs, or else a switch, with the code for each different type written out separately. In principle you can avoid some of the repetition using macros, but that may actually end up being harder to read and understand.
The efficiency isn't so bad, though. A modern compiler is likely to handle an if chain in an efficient way, and switch might be even better.
Given this, your array of offsets and types may be unnecessary complexity. I would start with something much simpler, albeit longer:
void increase_log_info(unsigned int para_idx, unsigned inc_val) {
switch (para_idx) {
case LOG_PARA_IDX_TX_NUM:
g_log_info.tx_num += inc_val;
break;
case LOG_PARA_IDX_TX_BYTES:
g_log_info.tx_bytes += inc_val;
break;
// ...
}
}
It'll probably compile into a jump table. That's probably more efficient than what you have, as we don't have to keep accessing the mapping table somewhere else in memory and doing the corresponding address calculations. If it really can be proved to be too slow, you could consider some alternatives, but don't optimize prematurely!
This also has the advantage of being robust if the offset or types of any of the g_log_info members changes. In your code, you have to remember to manually update your mapping table, or else face very confusing bugs which the compiler will give you no help in detecting.
If you have an extremely large number of members, consider generating this function's C code with a script instead of by hand.
If this is inlined and called with a constant para_idx value, you can expect the compiler to propagate the constant and emit only the code to update the specific member in question.

Casting Structs With Void Pointers into Structs With Typed Pointers

Short version:
Suppose I have two structs:
struct charPtrWithLen
{
size_t len;
char * charPtr;
}
struct voidPtrWithLen
{
size_t len;
void * voidPtr;
}
Is there a way to cast voidPtrWithLen into charPtrWithLen and vice-versa, or even better, implicitly convert one into the other, much the same way that a char * and a void * can be readily cast and implicitly converted between each other?
Put another way:
I am trying to write all my C so that all pointers to arrays bring their size information with them. I am also trying to write generic functions using void pointers where applicable to keep operations which are essentially identical, well, identical. I am looking for a way to pass the typed-pointer-containing 'sized-array' structs into the generic functions taking void-pointer-containing 'sized-array' arguments.
Long version, with involved example:
So, void pointers are wonderfully flexible, so I can do this:
int foo(void * ptr, size_t dataLen);
/* ... */
char * c;
size_t c_n;
/* ... */
foo(c, c_n);
/* ... */
int * i;
size_t i_n;
/* ... */
foo(i, i_n);
But since the pattern of "pointer to arbitrary length array, plus size there-of" is so common, suppose at some point I get tired of specifying my various functions in terms of pairs of arguments, pointer and length, and instead I start to code with such pairs encapsulated in a struct instead:
typedef struct
{
size_t v_n;
void * v;
}
pointerWithSize;
/* ... */
int foo(pointerWithSize);
So far so good. I can always assign my "char * c" or "int * i" into the pointerWithSize's "void * v" with minimal difficulty. But when you do this long enough, using the same pattern, you run into the following problem: Soon enough you have a bunch of general functions which work with the data agnostically, and are thus happy to take void pointers, for example things like:
pointerWithSize combinePointersWithSize(pointerWithSize p1, pointerWithSize p2);
int readFromStream(FILE * readFromHere, pointerWithSize * readIntoHere);
But you also end up with functions which are inherently intended for specific data types:
size_t countOccurancesOfChar(pointerWithSize str, char c);
int summate(pointerWithSize integers);
And then you end up with the annoyance of having to do casts inside the latter category of functions. E.g. you end up with stuff like this:
/* This inside countOccurancesOfChar */
if(((char * )str.m)[i] == c) {
/* ..or this inside summate: */
sum += ((int * )integers.m)[i];
So you get to a point where you have a lot of functions which operate specifically on "strings with size", and in all of those cases, you don't want to have to much around with void pointers. So instead, in those cases you start doing stuff like this:
typedef struct
{
size_t v_n;
char * v;
}
stringWithSize;
/* ... */
size_t countOccurancesOfChar(stringWithSize str, char c);
int parseFormatting(stringWithSize str, struct someFormat_t foo);
Which is great, because now all the string related code doesn't need to be cluttered with casts. BUT, now I can't use my wonderful generic function combinePointersWithSize to concatenate my strings contained within the stringWithSize, in a way that's as syntactically clean, as I could if I was still writing my functions in terms of two separate arguments for each pointer-and-size pair.
To finish up the illustration:
pointerWithSize combinePointersWithSize(pointerWithSize p1, pointerWithSize p2);
void * combineAlternative(void * p1, size_t p_n1, void * p2);
/* ... */
stringWithSize a, b, c;
/* ... */
/* This doesn't work, incompatible types: */
c = combinePointersWithSize(a, b);
/* But this works, because char * can be passed into void * parameter. */
c.v_n = a.v_n + b.v_n;
c.v = combineAlternative(a.v, a.v_n, b.v, b.v_n); /* Works fine. */
Possible Solutions I've Considered:
1: Don't write my functions with those structs as arguments, instead write them with individual pair arguments. But this is a big part of what I want to avoid in the first place - I like the 'cleanness' and clarity of intent that having a size_t and a pointer bundled in one struct represents.
2: Do something like this:
stringWithSize a, b, c;
/* ... */
pointerWithSize d;
d = combinePointersWithSize((pointerWithSize){.v=a.v, .v_n=a.v_n}, (pointerWithSize){.v=b.v, .v_n=b.v_n})
/* and then do either this: */
c.v = d.v;
c.v_n = d.v_n;
foo(c);
/* ..or this: */
foo((stringWithSize){.v=d.v, .v_n=d.v_n});
..but I think most would agree, this is also as bad or worse as the original problem of casting within the library functions. On the surface it looks worse, because it offloads the casting burden to the client code instead of library code which can hopefully be fairly stable after being implemented/completed (incl. testing/etc). On the other hand, if you did keep every function defined in terms of the void * containing pointerWithSize, you could end up forcing similar casts to the kind you're doing inside your own functions, elsewhere in their code, and worse, you're losing the advantage of the compiler yelling at you, because now the code is carrying everything within the same pointerWithSize struct.
I'm also concerned about how many compilers out there have the ability to optimize the first of the two variants of this solution away (where 'd' servers as merely a temporary result holder.
3: Union-of-pointers. Instead of my prior pointerWithSize example, I would do:
typedef union
{
void * void;
char * char;
int * int;
/* ...and so on... */
}
rainbowPointer;
typedef struct
{
size_t v_n;
rainbowPointer v;
}
pointerWithSize;
At first glance this is almost good enough. However, I very frequently end up wanting to store arrays of some struct which is specific to the program I'm working on inside this "pointer with size" construct, and in those cases, a predefined union of pointer types would be useless to me, I'd still be right back at this problem.
4: I could write wrapper functions for each permuted pointer type. I could EVEN write function-like macros to define each of these pointer-with-size struct types, which would in the same swoop generate the wrapper functions. For example:
#define pointerWithSizeDef(T, name) \
typedef struct \
{ \
size_t v_n; \
T * v;
} \
name; \
foo_ ## name (name p1) \
{ \
/* generic function code defined in macro */ \
/* Or something like this: */ \
foo((pointerWithSize){.v=p1.v, .v_n=p1.v_n});
};
/* Then, stuff like this: */
pointerWithSizeDef(char, stringWithSize)
My intuition is that sooner or later this method would become unwieldy.
5: If there is a mechanism with no performance impact, but which is unappealing otherwise, I could write my generic functions as function-like macros, which in turn invoke the underlying actual function:
int foo_actual(void * v, size_t v_n);
#define foo(p) \
foo_actual(p.v, p.v_n);
..or even something like this, to replace casting syntax:
#define castToPointerWithSize(p) \
((pointerWithSize){.v=p.v, .v_n=p.v_n})
/* ... */
stringWithSize a;
foo(castToPointerWithSize(a));
But as these examples for possible-solution-#5 show, I can't actually think of a way to do this that wouldn't quickly become a possible problem (e.g. if someone wanted to place a function call which returned a pointerWithSize in place of 'p' in the above examples - you'd be running the function twice, and it wouldn't be at all obvious from the code.
So I don't think any of the solutions I've thought of are really sufficient for my usecase, so I'm hoping some of you know of some C syntax or mechanism I could take advantage of here to make it easy to cast/"cast" between two structs which are identical save for the pointer type of one of their members.
Firstly, any kind of "actual" casting isn't going to be allowed per the letter of the standard, because C makes no guarantee at all that all pointers have the same format. A cast from some arbitrary pointer type to a void pointer is allowed to involve a conversion of representation (that gets reversed when you cast it back in order to access the data), including possibly to a different size of pointer or a pointer existing in a separate address space. So a simple reinterpretation of a bit pattern to change pointer type is not safe; void*'s bit pattern isn't guaranteed to mean anything in particular, and the bit patterns of other types aren't guaranteed to be related in any particular way. (How many systems actually take advantage of this, I have no idea.)
Since the explicit conversion between void* and other types has to exist somewhere, using whole-value conversion is probably the safest idea. What you could do is define a macro to quickly and easily generate "cast functions" for you, e.g.:
#define GEN_CAST(NAME, FROM_TYPE, TO_TYPE) \
static inline TO_TYPE NAME(FROM_TYPE from) { \
return (TO_TYPE){ .v=p.v, .v_n=p.v_n }; \
}
GEN_CAST(s_to_v, stringWithSize, pointerWithSize)
GEN_CAST(v_to_s, pointerWithSize, stringWithSize)
...that you can then use in place of the cast operator in expressions:
stringWithSize a, b, c;
pointerWithSize d;
d = combinePointersWithSize(s_to_v(a), s_to_v(b));
foo(v_to_s(d));
A good compiler should recognise that on common platforms the conversion function is an identity operation, and remove it entirely.
You should be able to cast one to another by converting one to a pointer, casting it to a pointer of the other type, and dereferencing it. This will work in reverse too.
struct charPtrWithLen
{
size_t len;
char * charPtr;
};
struct voidPtrWithLen
{
size_t len;
void * voidPtr;
};
int main() {
struct charPtrWithLen cpwl = {.len = 6, .charPtr = "Hello"};
struct voidPtrWithLen vpwl = *(struct voidPtrWithLen *)&cpwl;
return 0;
}
Note this will only work as long as the struct layout is the same for both structs.

Why are function pointers useful?

So, I was looking over function pointers, and in the examples I have seen, particularly in this answer here. They seem rather redundant.
For example, if I have this code:
int addInt(int n, int m) {
return n+m;
}
int (*functionPtr)(int,int);
functionPtr = &addInt;
int sum = (*functionPtr)(2, 3); // sum == 5
It seems here that the creating of the function pointer has no purpose, wouldn't it be easier just to do this?
int sum = addInt(2, 3); // sum == 5
If so, then why would you need to use them, so what purpose would they serve? (and why would you need to pass function pointers to other functions)
Simple examples of pointers seem similarly useless. It's when you start doing more complicated things that it helps. For example:
// Elsewhere in the code, there's a sum_without_safety function that blindly
// adds the two numbers, and a sum_with_safety function that validates the
// numbers before adding them.
int (*sum_function)(int, int);
if(needs_safety) {
sum_function = sum_with_safety;
}
else {
sum_function = sum_without_safety;
}
int sum = sum_function(2, 3);
Or:
// This is an array of functions. We'll choose which one to call based on
// the value of index.
int (*sum_functions)(int, int)[] = { ...a bunch of different sum functions... };
int (*sum_function)(int, int) = sum_functions[index];
int sum = sum_function(2, 3);
Or:
// This is a poor man's object system. Each number struct carries a table of
// function pointers for various operations; you can look up the appropriate
// function and call it, allowing you to sum a number without worrying about
// exactly how that number is stored in memory.
struct number {
struct {
int (*sum)(struct number *, int);
int (*product)(struct number *, int);
...
} * methods;
void * data;
};
struct number * num = get_number();
int sum = num->methods->sum(number, 3);
The last example is basically how C++ does virtual member functions. Replace the methods struct with a hash table and you have Objective-C's method dispatch. Like variable pointers, function pointers let you abstract things in valuable ways that can make code much more compact and flexible. That power, though, isn't really apparent from the simplest examples.
They are one of those most useful things in C! They allow you to make a lot more modular software.
Callbacks
eg,
typedef void (*serial_data_callback)(int length, unsigned char* data);
void serial_port_data_received(serial_data_callback callback)
{
on_data_received = callback;
}
void data_received(int length, unsigned char* data)
{
if(on_data_received != NULL) on_data_received(length, data);
}
this means in your code you can use the general serial routines.....then you might have two things that use serial, modbus and terminal
serial_port_data_received(modbus_handle_data);
serial_port_data_received(terminal_handle_data);
and they can implement the callback function and do what's appropriate.
They allow for Object Oriented C code. It's a simple way to create "Interfaces" and then each concrete type might implement things different. For this, generally you will have a struct that will have function pointers, then functions to implement each function pointer, and a creation function that will setup the function pointers with the right functions.
typedef struct
{
void (*send)(int length, unsigned char* data);
} connection_t;
void connection_send(connection_t* self, int length, unsigned char* data)
{
if(self->send != NULL) self->send(length, data);
}
void serial_send(int length, unsigned char* data)
{
// send
}
void tcp_send(int length, unsgined char* data)
{
// send
}
void create_serial_connection(connection_t* connection)
{
connection->send = serial_send;
}
then other code can use use a connection_t without caring whether its via serial, tcp, or anything else that you can come up with.
They reduce dependencies between modules. Somtimes a library must query the calling code for things (are these objects equal? Are they in a certain order?). But you can't hardcode a call to the proper function without making the library (a) depend on the calling code and (b) non-generic.
Function pointers provide the missing pieces of information all the while keeping the library module independant of any code that might use it.
They're indispensable when an API needs a callback back to the application.
Another use is for the implementation of event-emitters or signal handlers: callback functions.
What if you're writing a library in which the user inputs a function? Like qsort that can work on any type, but the user must write and supply a compare function.
Its signature is
void qsort (void* base, size_t num, size_t size,
int (*compar)(const void*,const void*));

How to make generic function using void * in c?

I have an incr function to increment the value by 1
I want to make it generic,because I don't want to make different functions for the same functionality.
Suppose I want to increment int,float,char by 1
void incr(void *vp)
{
(*vp)++;
}
But the problem I know is Dereferencing a void pointer is undefined behaviour. Sometimes It may give error :Invalid use of void expression.
My main funciton is :
int main()
{
int i=5;
float f=5.6f;
char c='a';
incr(&i);
incr(&f);
incr(&c);
return 0;
}
The problem is how to solve this ? Is there a way to solve it in Conly
or
will I have to define incr() for each datatypes ? if yes, then what's the use of void *
Same problem with the swap() and sort() .I want to swap and sort all kinds of data types with same function.
You can implement the first as a macro:
#define incr(x) (++(x))
Of course, this can have unpleasant side effects if you're not careful. It's about the only method C provides for applying the same operation to any of a variety of types though. In particular, since the macro is implemented using text substitution, by the time the compiler sees it, you just have the literal code ++whatever;, and it can apply ++ properly for the type of item you've provided. With a pointer to void, you don't know much (if anything) about the actual type, so you can't do much direct manipulation on that data).
void * is normally used when the function in question doesn't really need to know the exact type of the data involved. In some cases (e.g., qsort) it uses a callback function to avoid having to know any details of the data.
Since it does both sort and swap, let's look at qsort in a little more detail. Its signature is:
void qsort(void *base, size_t nmemb, size_t size,
int(*cmp)(void const *, void const *));
So, the first is the void * you asked about -- a pointer to the data to be sorted. The second tells qsort the number of elements in the array. The third, the size of each element in the array. The last is a pointer to a function that can compare individual items, so qsort doesn't need to know how to do that. For example, somewhere inside qsort will be some code something like:
// if (base[j] < base[i]) ...
if (cmp((char *)base+i, (char *)base+j) == -1)
Likewise, to swap two items, it'll normally have a local array for temporary storage. It'll then copy bytes from array[i] to its temp, then from array[j] to array[i] and finally from temp to array[j]:
char temp[size];
memcpy(temp, (char *)base+i, size); // temp = base[i]
memcpy((char *)base+i, (char *)base+j, size); // base[i] = base[j]
memcpy((char *)base+j, temp, size); // base[j] = temp
Using void * will not give you polymorphic behavior, which is what I think you're looking for. void * simply allows you to bypass the type-checking of heap variables. To achieve actual polymorphic behavior, you will have to pass in the type information as another variable and check for it in your incr function, then casting the pointer to the desired type OR by passing in any operations on your data as function pointers (others have mentioned qsort as an example). C does not have automatic polymorphism built in to the language, so it would be on you to simulate it. Behind the scenes, languages that build in polymorphism are doing something just like this behind the scenes.
To elaborate, void * is a pointer to a generic block of memory, which could be anything: an int, float, string, etc. The length of the block of memory isn't even stored in the pointer, let alone the type of the data. Remember that internally, all data are bits and bytes, and types are really just markers for how the logical data are physically encoded, because intrinsically, bits and bytes are typeless. In C, this information is not stored with variables, so you have to provide it to the compiler yourself, so that it knows whether to apply operations to treat the bit sequences as 2's complement integers, IEEE 754 double-precision floating point, ASCII character data, functions, etc.; these are all specific standards of formats and operations for different types of data. When you cast a void * to a pointer to a specific type, you as the programmer are asserting that the data pointed to actually is of the type you're casting it to. Otherwise, you're probably in for weird behavior.
So what is void * good for? It's good for dealing with blocks of data without regards to type. This is necessary for things like memory allocation, copying, file operations, and passing pointers-to-functions. In almost all cases though, a C programmer abstracts from this low-level representation as much as possible by structuring their data with types, which have built-in operations; or using structs, with operations on these structs defined by the programmer as functions.
You may want to check out the Wikipedia explanation for more info.
You can't do exactly what you're asking - operators like increment need to work with a specific type. So, you could do something like this:
enum type {
TYPE_CHAR,
TYPE_INT,
TYPE_FLOAT
};
void incr(enum type t, void *vp)
{
switch (t) {
case TYPE_CHAR:
(*(char *)vp)++;
break;
case TYPE_INT:
(*(int *)vp)++;
break;
case TYPE_FLOAT:
(*(float *)vp)++;
break;
}
}
Then you'd call it like:
int i=5;
float f=5.6f;
char c='a';
incr(TYPE_INT, &i);
incr(TYPE_FLOAT, &f);
incr(TYPE_CHAR, &c);
Of course, this doesn't really give you anything over just defining separate incr_int(), incr_float() and incr_char() functions - this isn't the purpose of void *.
The purpose of void * is realised when the algorithm you're writing doesn't care about the real type of the objects. A good example is the standard sorting function qsort(), which is declared as:
void qsort(void *base, size_t nmemb, size_t size, int(*compar)(const void *, const void *));
This can be used to sort arrays of any type of object - the caller just needs to supply a comparison function that can compare two objects.
Both your swap() and sort() functions fall into this category. swap() is even easier - the algorithm doesn't need to know anything other than the size of the objects to swap them:
void swap(void *a, void *b, size_t size)
{
unsigned char *ap = a;
unsigned char *bp = b;
size_t i;
for (i = 0; i < size; i++) {
unsigned char tmp = ap[i];
ap[i] = bp[i];
bp[i] = tmp;
}
}
Now given any array you can swap two items in that array:
int ai[];
double ad[];
swap(&ai[x], &ai[y], sizeof(int));
swap(&di[x], &di[y], sizeof(double));
Example for using "Generic" swap.
This code swaps two blocks of memory.
void memswap_arr(void* p1, void* p2, size_t size)
{
size_t i;
char* pc1= (char*)p1;
char* pc2= (char*)p2;
char ch;
for (i= 0; i<size; ++i) {
ch= pc1[i];
pc1[i]= pc2[i];
pc2[i]= ch;
}
}
And you call it like this:
int main() {
int i1,i2;
double d1,d2;
i1= 10; i2= 20;
d1= 1.12; d2= 2.23;
memswap_arr(&i1,&i2,sizeof(int)); //I use memswap_arr to swap two integers
printf("i1==%d i2==%d \n",i1,i2); //I use the SAME function to swap two doubles
memswap_arr(&d1,&d2,sizeof(double));
printf("d1==%f d2==%f \n",d1,d2);
return 0;
}
I think that this should give you an idea of how to use one function for different data types.
Sorry if this may come off as a non-answer to the broad question "How to make generic function using void * in c?".. but the problems you seem to have (incrementing a variable of an arbitrary type, and swapping 2 variables of unknown types) can be much easier done with macros than functions and pointers to void.
Incrementing's simple enough:
#define increment(x) ((x)++)
For swapping, I'd do something like this:
#define swap(x, y) \
({ \
typeof(x) tmp = (x); \
(x) = (y); \
(y) = tmp; \
})
...which works for ints, doubles and char pointers (strings), based on my testing.
Whilst the incrementing macro should be pretty safe, the swap macro relies on the typeof() operator, which is a GCC/clang extension, NOT part of standard C (tho if you only really ever compile with gcc or clang, this shouldn't be too much of a problem).
I know that kind of dodged the original question; but hopefully it still solves your original problems.
You can use the type-generic facilities (C11 standard). If you intend to use more advanced math functions (more advanced than the ++ operator), you can go to <tgmath.h>, which is type-generic definitions of the functions in <math.h> and <complex.h>.
You can also use the _Generic keyword to define a type-generic function as a macro. Below an example:
#include <stdio.h>
#define add1(x) _Generic((x), int: ++(x), float: ++(x), char: ++(x), default: ++(x))
int main(){
int i = 0;
float f = 0;
char c = 0;
add1(i);
add1(f);
add1(c);
printf("i = %d\tf = %g\tc = %d", i, f, c);
}
You can find more information on the language standard and more soffisticated examples in this post from Rob's programming blog.
As for the * void, swap and sort questions, better refer to Jerry Coffin's answer.
You should cast your pointer to concrete type before dereferencing it. So you should also add code to pass what is the type of pointer variable.

Solution for "dereferencing `void *' pointer" warning in struct in C?

I was trying to create a pseudo super struct to print array of structs. My basic
structures are as follows.
/* Type 10 Count */
typedef struct _T10CNT
{
int _cnt[20];
} T10CNT;
...
/* Type 20 Count */
typedef struct _T20CNT
{
long _cnt[20];
} T20CNT;
...
I created the below struct to print the array of above mentioned structures. I got dereferencing void pointer error while compiling the below code snippet.
typedef struct _CMNCNT
{
long _cnt[3];
} CMNCNT;
static int printCommonStatistics(void *cmncntin, int cmncnt_nelem, int cmncnt_elmsize)
{
int ii;
for(ii=0; ii<cmncnt_nelem; ii++)
{
CMNCNT *cmncnt = (CMNCNT *)&cmncntin[ii*cmncnt_elmsize];
fprintf(stout,"STATISTICS_INP: %d\n",cmncnt->_cnt[0]);
fprintf(stout,"STATISTICS_OUT: %d\n",cmncnt->_cnt[1]);
fprintf(stout,"STATISTICS_ERR: %d\n",cmncnt->_cnt[2]);
}
return SUCCESS;
}
T10CNT struct_array[10];
...
printCommonStatistics(struct_array, NELEM(struct_array), sizeof(struct_array[0]);
...
My intention is to have a common function to print all the arrays. Please let me know the correct way of using it.
Appreciate the help in advance.
Edit: The parameter name is changed to cmncntin from cmncnt. Sorry it was typo error.
Thanks,
Mathew Liju
I think your design is going to fail, but I am also unconvinced that the other answers I see fully deal with the deeper reasons why.
It appears that you are trying to use C to deal with generic types, something that always gets to be hairy. You can do it, if you are careful, but it isn't easy, and in this case, I doubt if it would be worthwhile.
Deeper Reason: Let's assume we get past the mere syntactic (or barely more than syntactic) issues. Your code shows that T10CNT contains 20 int and T20CNT contains 20 long. On modern 64-bit machines - other than under Win64 - sizeof(long) != sizeof(int). Therefore, the code inside your printing function should be distinguishing between dereferencing int arrays and long arrays. In C++, there's a rule that you should not try to treat arrays polymorphically, and this sort of thing is why. The CMNCNT type contains 3 long values; different from both the T10CNT and T20CNT structures in number, though the base type of the array matches T20CNT.
Style Recommendation: I strongly recommend avoiding leading underscores on names. In general, names beginning with underscore are reserved for the implementation to use, and to use as macros. Macros have no respect for scope; if the implementation defines a macro _cnt it would wreck your code. There are nuances to what names are reserved; I'm not about to go into those nuances. It is much simpler to think 'names starting with underscore are reserved', and it will steer you clear of trouble.
Style Suggestion: Your print function returns success unconditionally. That is not sensible; your function should return nothing, so that the caller does not have to test for success or failure (since it can never fail). A careful coder who observes that the function returns a status will always test the return status, and have error handling code. That code will never be executed, so it is dead, but it is hard for anyone (or the compiler) to determine that.
Surface Fix: Temporarily, we can assume that you can treat int and long as synonyms; but you must get out of the habit of thinking that they are synonyms, though. The void * argument is the correct way to say "this function takes a pointer of indeterminate type". However, inside the function, you need to convert from a void * to a specific type before you do indexing.
typedef struct _CMNCNT
{
long count[3];
} CMNCNT;
static void printCommonStatistics(const void *data, size_t nelem, size_t elemsize)
{
int i;
for (i = 0; i < nelem; i++)
{
const CMNCNT *cmncnt = (const CMNCNT *)((const char *)data + (i * elemsize));
fprintf(stdout,"STATISTICS_INP: %ld\n", cmncnt->count[0]);
fprintf(stdout,"STATISTICS_OUT: %ld\n", cmncnt->count[1]);
fprintf(stdout,"STATISTICS_ERR: %ld\n", cmncnt->count[2]);
}
}
(I like the idea of a file stream called stout too. Suggestion: use cut'n'paste on real source code--it is safer! I'm generally use "sed 's/^/ /' file.c" to prepare code for cut'n'paste into an SO answer.)
What does that cast line do? I'm glad you asked...
The first operation is to convert the const void * into a const char *; this allows you to do byte-size operations on the address. In the days before Standard C, char * was used in place of void * as the universal addressing mechanism.
The next operation adds the correct number of bytes to get to the start of the ith element of the array of objects of size elemsize.
The second cast then tells the compiler "trust me - I know what I'm doing" and "treat this address as the address of a CMNCNT structure".
From there, the code is easy enough. Note that since the CMNCNT structure contains long value, I used %ld to tell the truth to fprintf().
Since you aren't about to modify the data in this function, it is not a bad idea to use the const qualifier as I did.
Note that if you are going to be faithful to sizeof(long) != sizeof(int), then you need two separate blocks of code (I'd suggest separate functions) to deal with the 'array of int' and 'array of long' structure types.
The type of void is deliberately left incomplete. From this, it follows you cannot dereference void pointers, and neither you can take the sizeof of it. This means you cannot use the subscript operator using it like an array.
The moment you assign something to a void pointer, any type information of the original pointed to type is lost, so you can only dereference if you first cast it back to the original pointer type.
First and the most important, you pass T10CNT* to the function, but you try to typecast (and dereference) that to CMNCNT* in your function. This is not valid and undefined behavior.
You need a function printCommonStatistics for each type of array elements. So, have a
printCommonStatisticsInt, printCommonStatisticsLong, printCommonStatisticsChar which all differ by their first argument (one taking int*, the other taking long*, and so on). You might create them using macros, to avoid redundant code.
Passing the struct itself is not a good idea, since then you have to define a new function for each different size of the contained array within the struct (since they are all different types). So better pass the contained array directly (struct_array[0]._cnt, call the function for each index)
Change the function declaration to char * like so:
static int printCommonStatistics(char *cmncnt, int cmncnt_nelem, int cmncnt_elmsize)
the void type does not assume any particular size whereas a char will assume a byte size.
You can't do this:
cmncnt->_cnt[0]
if cmnct is a void pointer.
You have to specify the type. You may need to re-think your implementation.
The function
static int printCommonStatistics(void *cmncntin, int cmncnt_nelem, int cmncnt_elmsize)
{
char *cmncntinBytes;
int ii;
cmncntinBytes = (char *) cmncntin;
for(ii=0; ii<cmncnt_nelem; ii++)
{
CMNCNT *cmncnt = (CMNCNT *)(cmncntinBytes + ii*cmncnt_elmsize); /* Ptr Line */
fprintf(stdout,"STATISTICS_INP: %d\n",cmncnt->_cnt[0]);
fprintf(stdout,"STATISTICS_OUT: %d\n",cmncnt->_cnt[1]);
fprintf(stdout,"STATISTICS_ERR: %d\n",cmncnt->_cnt[2]);
}
return SUCCESS;
}
Works for me.
The issue is that on the line commented "Ptr Line" the code adds a pointer to an integer. Since our pointer is a char * we move forward in memory sizeof(char) * ii * cmncnt_elemsize, which is what we want since a char is one byte. Your code tried to do an equivalent thing moving forward sizeof(void) * ii * cmncnt_elemsize, but void doesn't have a size, so the compiler gave you the error.
I'd change T10CNT and T20CNT to both use int or long instead of one with each. You're depending on sizeof(int) == sizeof(long)
On this line:
CMNCNT *cmncnt = (CMNCNT *)&cmncnt[ii*cmncnt_elmsize];
You are trying to declare a new variable called cmncnt, but a variable with this name already exists as a parameter to the function. You might want to use a different variable name to solve this.
Also you may want to pass a pointer to a CMNCNT to the function instead of a void pointer, because then the compiler will do the pointer arithmetic for you and you don't have to cast it. I don't see the point of passing a void pointer when all you do with it is cast it to a CMNCNT. (Which is not a very descriptive name for a data type, by the way.)
Your expression
(CMNCNT *)&cmncntin[ii*cmncnt_elmsize]
tries to take the address of cmncntin[ii*cmncnt_elmsize] and then cast that pointer to type (CMNCNT *). It can't get the address of cmncntin[ii*cmncnt_elmsize] because cmncntin has type void*.
Study C's operator precedences and insert parentheses where necessary.
Point of Information: Internal Padding can really screw this up.
Consider struct { char c[6]; }; -- It has sizeof()=6. But if you had an array of these, each element might be padded out to an 8 byte alignment!
Certain assembly operations don't handle mis-aligned data gracefully. (For example, if an int spans two memory words.) (YES, I have been bitten by this before.)
.
Second: In the past, I've used variably sized arrays. (I was dumb back then...) It works if you are not changing type. (Or if you have a union of the types.)
E.g.:
struct T { int sizeOfArray; int data[1]; };
Allocated as
T * t = (T *) malloc( sizeof(T) + sizeof(int)*(NUMBER-1) );
t->sizeOfArray = NUMBER;
(Though padding/alignment can still screw you up.)
.
Third: Consider:
struct T {
int sizeOfArray;
enum FOO arrayType;
union U { short s; int i; long l; float f; double d; } data [1];
};
It solves problems with knowing how to print out the data.
.
Fourth: You could just pass in the int/long array to your function rather than the structure. E.g:
void printCommonStatistics( int * data, int count )
{
for( int i=0; i<count; i++ )
cout << "FOO: " << data[i] << endl;
}
Invoked via:
_T10CNT foo;
printCommonStatistics( foo._cnt, 20 );
Or:
int a[10], b[20], c[30];
printCommonStatistics( a, 10 );
printCommonStatistics( b, 20 );
printCommonStatistics( c, 30 );
This works much better than hiding data in structs. As you add members to one of your struct's, the layout may change between your struct's and no longer be consistent. (Meaning the address of _cnt relative to the start of the struct may change for _T10CNT and not for _T20CNT. Fun debugging times there. A single struct with a union'ed _cnt payload would avoid this.)
E.g.:
struct FOO {
union {
int bar [10];
long biff [20];
} u;
}
.
Fifth:
If you must use structs... C++, iostreams, and templating would be a lot cleaner to implement.
E.g.:
template<class TYPE> void printCommonStatistics( TYPE & mystruct, int count )
{
for( int i=0; i<count; i++ )
cout << "FOO: " << mystruct._cnt[i] << endl;
} /* Assumes all mystruct's have a "_cnt" member. */
But that's probably not what you are looking for...
C isn't my cup o'java, but I think your problem is that "void *cmncnt" should be CMNCNT *cmncnt.
Feel free to correct me now, C programmers, and tell me this is why java programmers can't have nice things.
This line is kind of tortured, don'tcha think?
CMNCNT *cmncnt = (CMNCNT *)&cmncntin[ii*cmncnt_elmsize];
How about something more like
CMNCNT *cmncnt = ((CMNCNT *)(cmncntin + (ii * cmncnt_elmsize));
Or better yet, if cmncnt_elmsize = sizeof(CMNCNT)
CMNCNT *cmncnt = ((CMNCNT *)cmncntin) + ii;
That should also get rid of the warning, since you are no longer dereferencing a void *.
BTW: I'm not real sure why you are doing it this way, but if cmncnt_elmsize is sometimes not sizeof(CMNCNT), and can in fact vary from call to call, I'd suggest rethinking this design. I suppose there could be a good reason for it, but it looks really shaky to me. I can almost guarantee there is a better way to design things.

Resources