How to safely used a padded struct as a hashmap key - c

I'm using libuv to write a UDP server. To tell clients apart I need to look at the source IP and source port. This is provided in the on_read callback as const struct sockaddr*. I need to use this information as the key for looking up the user's context somehow.
Ideally I would use a hashmap and use this struct as the key. However it's not clear if libuv zero initialises that structure and so there could be random data in the padding making it unsuitable as a raw hashmap key (a memcmp on the struct).
Assuming that libuv doesn't zero out the padding first, what would be the most efficient way to build a key out of this information? I am thinking I can simply use assignment or memcpy to copy the two fields I want into a clean struct, but I would have to do this for every packet.
I know that in the grand scheme of things this is not a huge amount of overhead, but have I missed a more elegant or efficient solution?
Edit: I've updated the title to reflect that even though my challenge is with libuv right now, this isn't really just a libuv specific problem as a struct like this could come from a number of places. When you get passed a struct from somewhere and you need to use that (or its contents) as a key, what's the correct / safe way to do that?

EDIT: Adding "generic" response, moving back the bad libuv TCP response.
If you don't want to copy the struct (that is very small in this case, but as a generic problem) the straightforward solution is hash member by member.
Let's assume that you need to extract a lot of sparse fields from a large struct. For example, if the hash is only to sum:
#define KEY_INITIAL_STATUS 0
void hash(char *status, const char *buf, size_t len) {
size_t i;
for (i=0; i<len; ++i)
status += buf[i];
}
void receive_buf(struct addr_t addr, ...) {
char key = KEY_INITIAL_STATUS;
hash(&key, addr.field1, addr.field1_len);
hash(&key, addr.field2, addr.field2_len);
void *value = hashtable_search(hashtable, key, ...);
// Do things with the value
}
The majority of hashes can be calculated this way, and then optimized (no need to be byte by byte).
Benchmark is needed to check if is better to do this or to copy all to a zeroed struct.
I see that the libuv read callback use this signature:
void read_cb(uv_stream_t * stream, ssize_t nread, const uv_buf_t *buf)
The client data is linked to the connection/stream, and libuv already have done this lookup for you. The library expects you to pass the data somehow.
If I look for the doc here:
http://docs.libuv.org/en/v1.x/stream.html
"See also: The uv_handle_t members also apply."
So if I check the uv_handle_t members in http://docs.libuv.org/en/v1.x/handle.html#c.uv_handle_t:
void* uv_handle_t.data
Space for user-defined arbitrary data. libuv does not use this field.
So you should save and use your client information here, no need for you to do a single search.
In other libraries, is common to return this type of data either in the connection struct, as a parameter in the "on_read" (or similar) callback as a void * pointer , or even allocating more memory in the library_stream_t structure, like malloc(sizeof(uv_stream_t) + sizeof(my_opaque_data).

I wouldn't recommend directly using the struct as a key, but rather choose a Set or Hashtable library that allows you to pass in a comparator, when you initialize it. Of course the comparator should know how to compare the struct.

Related

How to make a typedef as private as possible without using Malloc?

I am looking for a way to make private style typedefs that can only be accessed or manipulated from a specific set of function calls (setBit(bit_typ *const t), getBit(bit_typ *const t)). I am looking for a way to do this without using malloc, does anyone have any ideas?
EDIT:// this question is different than this one because it is looking for ways to get as close to a "private" structure whereas the other question (TL;DR is there a way to define an opaque type which can nonetheless be allocated on stack, and without breaking strict aliasing rule ?) looks for a solution to a problem related to one possible solution to my question.
One way to do it is to expose the total size of the opaque type and make used declare the objects of your opaque type as unsigned char [N] buffers. For example, let's say you have some type OpaqueType, internals of which you want to hide from the user.
In the header file (exposed to the user) you do this
typedef unsigned char OpaqueType[16];
where 16 is the exact byte-size of the type you want to hide. In the header file you write the whole interface in terms of that type, e.g.
void set_data(OpaqueType *dst, int data);
In the implementation file you declare the actual type
typedef struct OpaqueTypeImpl
{
int data1;
double data2;
} OpaqueTypeImpl;
and implement the functions as follows
void set_data(OpaqueType *dst, int data)
{
OpaqueTypeImpl *actual_dst = (OpaqueTypeImpl *) dst;
actual_dst->data1 = data;
}
You can also add a static assertion that will make sure that sizeof(OpaqueType) is the same as sizeof(OpaqueTypeImpl).
Of course, as it has been noted in the comments below, extra steps have to be taken to ensure the proper alignment of such objects, like _Alignas in C11 or some union-based technique in "classic" C.
That way you give the user opportunity to declare non-dynamic object of OpaqueType, i.e. you don't force the user to call your function that will malloc such objects internally. And at the same time you don't expose to user anything about the inner structure of your type (besides its total size and its alignment requirement).
Note also that OpaqueType declared in that way is an array, meaning that it is not copyable (unless you use memcpy). That might be a good thing, if you want to actively prevent unrestrained user-level copying. But if you want to enable copying, you can wrap the array into a struct.
This approach is not terribly elegant, but that's probably the only way to hide implementation when you want to keep objects of your type freely user-definable.

Get struct's size passed as void to function

I'm changing some codes in a database library. The way it works I send a void pointer, to get the size of it I call a query and using the query I calculate the size of the structure. Now the problem is I receive the struct as params but the function fails before/in the middle of the first fetch. After that I need to clear the structure, but I dont even have the size.
I know the best way is send the size of the structure as a param, but I have thousands and thousands programs already compiled, the library is from 1996, so I need to find a way to calculate the structure size even if the type is void.
One idea I had was to calculate the position of the next element that is not in the structure
0x000010 0x000042
[int|char[30]|int|int][int]
So the size is 32, because the 0x00042-0x000010 is 32.
Is there a way to know when I got out of the structure.
the prototype of the function is
int getData(char* fields, void* myStruct)
I need to find out the structure size.
Sorry if I missed some information, the code is HUGE and unfortunately I cannot post it here.
No, in general there's no way, given a void *, to figure out what you're after. The only thing you can do is compare it against NULL, which of course doesn't help here.
Note that there's nothing in the void * that even says it points at a struct, it could just as well be pointing into the middle of an array.
If you have some global means of recording the pointers before they're passed to getData(), you might be able to implement a look-up function that simply compares the pointer value against those previously recorded, but that's just using the pointer value as a key.

For pass to function is it worth packing the matrix and its dimension in a struct or is it OK to use additional parameters?

Here is a matrix declared as pointer to an array of pointers to rows.
(source: Numerical Recipes in C)
What is the better way to pass this matrix to a function along with its dimensions?
void printMatrix(float **matrix, int rows, int cols);
Or pack it in a struct
struct Matrix {
int rows, cols;
int **data;
};
and pass a pointer to the struct?
void printMatrix(struct Matrix *m);
Both ways work, however, the approach using a struct is a bit "easier" to use. You (or whoever will use this) won't have to worry about passing the correct size as well and it isn't required to organize it at all. You just handle one struct or one logical object. If you split everything up, you'll have to handle the data as well as the meta data yourself (i.e. storying/passing data and dimensions).
Is there a downside using the struct? Not that I know of (other than having to handle one more pointer). However there is one huge advantage: Using the struct you could use a function wanting data and meta data separated as well (by passing the struct elements rather than a pointer to the struct). This isn't that easy the other way around.
As for "is it worth it?" considering "should I do it for organisaiton?": Do it, if the grouping is logical. Lots of windows APIs work with structs that way, but I'm not a real fan of them, if the grouping isn't logical or it creates additional "pains". In other words: Don't group your parameters into a struct, if they're not related or if the user most likely wouldn't have them in that form (i.e. they're grouped for this call only).
Edit:
As an example:
I'd group your example data, as width and height belong to the matrix data and they're related (plus they might be used in other functions the same way).
However, I wouldn't group parameters such as this: write_log(LOG_INFO, "All data has been processed"); Adding a struct here would add complexity that isn't required. It's very likely that this group of data won't be used elsewhere and makes calling the function more complicated (as you'll have to create the struct first).
For the sake of optimization, I would consider simply passing the struct by value. i.e.
void printMatrix(struct Matrix m);
without the pointer. It's a very small data structure and the processor might just store this top-level data in the cache. The compiler and processor may be able to optimize access to this top-level data.
Then again, it might do nothing or even make it worse. Optimization can be a black art.
(And don't forget that if you make changes to the top-level Matrix struct, then you'll need to return it somehow). So maybe this should only be considered in place of const struct Matrix *m.
There is no single perfect method. In the appropriate chapter of c-faq you can see 5 methods and their comparison.

Writing and reading (fwrite - fread) structures with pointers

I'm working on a mailbox project, and I have these two structures:
struct mmbox_mail
struct mmbox_mail {
char *sender, *recipient;
char *obj, *date;
char flags;
size_t size;
};
and
mail_t
typedef struct{
struct mmbox_mail info;
void *body;
void *next;
} mail_t;
I cannot modify the structures' fields, because I need variable data (for this purpose I used char* instead of char[]).
Each mail_t structure is a mail. I need to save every mail of a user in a file, that could be binary or text file (but I think it's better with a binary file, because I have the void* body that is difficult to save in plain text.
I tried to do this, but it seems like it doesn't work:
while(mailtmp != NULL){
fwrite(mailtmp, sizeof(mail_t), 1, fp);
/* next mail */
mailtmp=mailtmp->next;
}
while(mailtmp != NULL){ /* i have a list of mails and i use a mailtmp pointer to save each mail */
Could you help me? I tried to search everywhere but I never found someone that ask to save two structures, one inside one other.
Of course, that will not work as for strings it will copy the size of pointer, (usually 4 bytes). I see 3 options here:
Serializing data, binary file (http://en.wikipedia.org/wiki/Serialization).
Creating a format to store data in a text file.
Use markup language like XML/JSON etc.
In any case you would need to go through every field of the structure in order to write it to data file. As for reading, in first 2 cases you would have to do reading exactly in the order you wrote the data, in third case you would be able to read fields independently in any order.
In case you choose first method, for every string (char *) field write also zero-termination byte so that you always know where it ends when reading it back.
What you're doing is saving the literal binary representation of mail_t into the text file, which is just a bunch of pointers. What you want to do is something to the effect of:
fprintf( fp, "To: %s\nFrom: %s\n....\nContents: %*s\n\n", mailtmp->info.recipient, mailtmp->info.sender, mailtmp->info.size, mailtmp->body );
That will render the values pointed to as a string and save it to the file. A pointer to a location in memory held by your application is a bit useless to most people after said application closes ;)
EDIT: "Could you help me? I tried to search everywhere but i never found someone that ask to save two structures, one inside one other."
If you just had first class data types, such as ints or floats etc, your method would work perfectly. However, since you are using second class types, namely your char and void arrays, you have to actually specify how the data pointed to should be saved.
well,you are storing the struct's pointer into file.not the data it point to.even you store the struct you want.it is hard to get it from file. i think you need a serialization component like google protocal buffer. then you can write a adaptor,translate the struct to probuf object,then store it to file.when you want,retr it.hoping it will help you:)

Serialize Data Structures in C

I'd like a C library that can serialize my data structures to disk, and then load them again later. It should accept arbitrarily nested structures, possibly with circular references.
I presume that this tool would need a configuration file describing my data structures. The library is allowed to use code generation, although I'm fairly sure it's possible to do this without it.
Note I'm not interested in data portability. I'd like to use it as a cache, so I can rely on the environment not changing.
Thanks.
Results
Someone suggested Tpl which is an awesome library, but I believe that it does not do arbitrary object graphs, such as a tree of Nodes that each contain two other Nodes.
Another candidate is Eet, which is a project of the Enlightenment window manager. Looks interesting but, again, seems not to have the ability to serialize nested structures.
Check out tpl. From the overview:
Tpl is a library for serializing C
data. The data is stored in its
natural binary form. The API is small
and tries to stay "out of the way".
Compared to using XML, tpl is faster
and easier to use in C programs. Tpl
can serialize many C data types,
including structures.
I know you're asking for a library. If you can't find one (::boggle::, you'd think this was a solved problem!), here is an outline for a solution:
You should be able to write a code generator[1] to serialize trees/graphs without (run-time) pre-processing fairly simply.
You'll need to parse the node structure (typedef handling?), and write the included data values in a straight ahead fashion, but treat the pointers with some care.
For pointer to other objects (i.e. char *name;) which you know are singly referenced, you can serialize the target data directly.
For objects that might be multiply refernced and for other nodes of your tree you'll have to represent the pointer structure. Each object gets assigned a serialization number, which is what is written out in-place of the pointer. Maintain a translation structure between current memory position and serialization number. On encountering a pointer, see if it is already assigned a number, if not, give it one and queue that object up for serialization.
Reading back also requires a node-#/memory-location translation step, and might be easier to do in two passes: regenerate the nodes with the node numbers in the pointer slots (bad pointer, be warned) to find out where each node gets put, then walk the structure again fixing the pointers.
I don't know anything about tpl, but you might be able to piggy-back on it.
The on-disk/network format should probably be framed with some type information. You'll need a name-mangling scheme.
[1] ROOT uses this mechanism to provide very flexible serialization support in C++.
Late addition: It occurs to me that this is not always as easy as I implied above. Consider the following (contrived and badly designed) declaration:
enum {
mask_none = 0x00,
mask_something = 0x01,
mask_another = 0x02,
/* ... */
mask_all = 0xff
};
typedef struct mask_map {
int mask_val;
char *mask_name;
} mask_map_t;
mask_map_t mask_list[] = {
{mask_something, "mask_something"},
{mask_another, "mask_another"},
/* ... */
};
struct saved_setup {
char* name;
/* various configuration data */
char* mask_name;
/* ... */
};
and assume that we initalize out struct saved_setup items so that mask_name points at mask_list[foo].mask_name.
When we go to serialize the data, what do we do with struct saved_setup.mask_name?
You will need to take care in designing your data structures and/or bring some case-specific intelligence to the serialization process.
This is my solution. It uses my own implementation of malloc, free and mmap, munmap system calls. Follow the given example codes. Ref: http://amscata.blogspot.com/2013/02/serialize-your-memory.html
In my approach I create a char array as my own RAM space. Then there are functions for allocate the memory and free them. After creating the data structure, by using mmap, I write the char array to a file.
Whenever you want to load it back to the memory there is a function which used munmap to put the data structure again to the char array. Since it has virtual addresses for your pointers, you can re use your data structure. That means, you can create data structure, save it, load it, again edit it, and save it again.
You can take a look on eet. A library of the enlightenment project to store C data types (including nested structures). Although nearly all libs of the enlightenment project are in pre-alpha state, eet is already released. I'm not sure, however, if it can handle circular references. Probably not.
http://s11n.net/c11n/
HTH
you should checkout gwlib. the serializer/deserializer is extensive. and there are extensive tests available to look at. http://gwlib.com/
I'm assuming you are talking about storing a graph structure, if not then disregard...
If your storing a graph, I personally think the best idea would be implementing a function that converts your graph into an adjacency matrix. You can then make a function that converts an adjacency matrix to your graph data structure.
This has three benefits (that may or may not matter in your application):
adjacency matrix are a very natural way to create and store a graph
You can create an adjacency matrix and import them into your applications
You can store and read your data in a meaningful way.
I used this method during a CS project and is definitely how I would do it again.
You can read more about adjacency matrix here: http://en.wikipedia.org/wiki/Modified_adjacency_matrix
Another option is Avro C, an implementation of Apache Avro in C.
Here is an example using the Binn library (my creation):
binn *obj;
// create a new object
obj = binn_object();
// add values to it
binn_object_set_int32(obj, "id", 123);
binn_object_set_str(obj, "name", "Samsung Galaxy Charger");
binn_object_set_double(obj, "price", 12.50);
binn_object_set_blob(obj, "picture", picptr, piclen);
// send over the network
send(sock, binn_ptr(obj), binn_size(obj));
// release the buffer
binn_free(obj);
If you don't want to use strings as keys you can use a binn_map which uses integers as keys.
There is also support for lists, and all these structures can be nested:
binn *list;
// create a new list
list = binn_list();
// add values to it
binn_list_add_int32(list, 123);
binn_list_add_double(list, 2.50);
// add the list to the object
binn_object_set_list(obj, "items", list);
// or add the object to the list
binn_list_add_object(list, obj);
In theory YAML should do what you want http://code.google.com/p/yaml-cpp/
Please let me know if it works for you.

Resources