since now I've been reading Stackoverflow for a long time and I've learned a lot.
But now I have a problem, I couldn't find on Stackoverflow, even it should be kind of a "standard" question. So please forgive me if this topic has been answered already.
Problem:
I'am writing a module with defined interfaces for input and output structures.
It should be some kind of a "multiplexer" with maybe three inputs and one output.
The module should switch one of the inputs to the output (depending on some logic).
A working example is shown here:
#include <stdio.h>
typedef struct{
short myVariable1;
short myVariable2;
} myType;
struct input_type{
myType Inp1;
myType Inp2;
myType Inp3;
};
struct output_type{
myType Out1;
};
struct input_type input;
struct output_type output;
void main(){
for (int i=0; i<10; i++){ // this for loop simulates a cyclic call of a function where all the inputs are written
input.Inp1.myVariable1 = i;
input.Inp2.myVariable1 = i*2;
input.Inp3.myVariable1 = i*3;
printf("Inp1: %d | Inp2: %d | Inp3: %d \n",input.Inp1.myVariable1,input.Inp2.myVariable1,input.Inp3.myVariable1);
output.Out1 = input.Inp2; // Actual routing is done here, but i want to avoid this copy by working on the same dataset (e.g. input.InpX)
printf("Out: %d\n",output.Out1.myVariable1);
}
}
In this snipped, the structures are simply copied every cycle.
To avoid this step, I could do the following:
#include <stdio.h>
typedef struct{
short myVariable1;
short myVariable2;
} myType;
struct input_type{
myType Inp1;
myType Inp2;
myType Inp3;
};
struct output_type{
myType * Out1;
};
struct input_type input;
struct output_type output;
void main(){
output.Out1 = &input.Inp2; // Actual routing is done here; But in this case, the output structure includes a pointer, therefore all other modules need to dereference Out1 with "->" or "*"
for (int i=0; i<10; i++){ // this for loop simulates a cyclic call of a function where all the inputs are written
input.Inp1.myVariable1 = i;
input.Inp2.myVariable1 = i*2;
input.Inp3.myVariable1 = i*3;
printf("Inp1: %d | Inp2: %d | Inp3: %d \n",input.Inp1.myVariable1,input.Inp2.myVariable1,input.Inp3.myVariable1);
printf("Out: %d\n",output.Out1->myVariable1);
}
}
But in this case, the output structure is not compatible to the existing interface anymore.
Access to Out1 would need dereferencing.
Is it possible to avoid copying the structures from one to another without changing my interface?
Thanks in advance for your answers!
Rees.
Is it possible to avoid copying the structures from one to another without changing my interface?
By "the existing interface", I take you to mean that you have code that consumes objects of this type ...
struct output_type{
myType Out1;
};
... and you would like to avoid modifying that code.
In that case, no, you cannot substitute
struct output_type{
myType * Out1;
};
. Moreover, it is inherent in the design of the former structure that populating the myType that is a direct member involves copying all the data you care about, whether on a per-member basis or on a whole-structure basis.
At this point, I would recommend that you just stick with making those copies, until and unless you discover that doing so causes performance or memory usage to be unsatisfactory. Changing at this point would involve more than just syntactic changes: it would require careful review of all uses of struct output_type to find and mitigate any situations where the code relies on the properties of the original structure (such as to be confident of non-aliasing).
I'm trying to create a generic hash table in C. I've read a few different implementations, and came across a couple of different approaches.
The first is to use macros like this: http://attractivechaos.awardspace.com/khash.h.html
And the second is to use a struct with 2 void pointers like this:
struct hashmap_entry
{
void *key;
void *value;
};
From what I can tell this approach isn't great because it means that each entry in the map requires at least 2 allocations: one for the key and one for the value, regardless of the data types being stored. (Is that right???)
I haven't been able to find a decent way of keeping it generic without going the macro route. Does anyone have any tips or examples that might help me out?
C does not provide what you need directly, nevertheless you may want to do something like this:
Imagine that your hash table is a fixed size array of double linked lists and it is OK that items are always allocated/destroyed on the application layer. These conditions will not work for every case, but in many cases they will. Then you will have these data structures and sketches of functions and protototypes:
struct HashItemCore
{
HashItemCore *m_prev;
HashItemCore *m_next;
};
struct HashTable
{
HashItemCore m_data[256]; // This is actually array of circled
// double linked lists.
int (*GetHashValue)(HashItemCore *item);
bool (*CompareItems)(HashItemCore *item1, HashItemCore *item2);
void (*ReleaseItem)(HashItemCore *item);
};
void InitHash(HashTable *table)
{
// Ensure that user provided the callbacks.
assert(table->GetHashValue != NULL && table->CompareItems != NULL && table->ReleaseItem != NULL);
// Init all double linked lists. Pointers of empty list should point to themselves.
for (int i=0; i<256; ++i)
table->m_data.m_prev = table->m_data.m_next = table->m_data+i;
}
void AddToHash(HashTable *table, void *item);
void *GetFromHash(HashTable *table, void *item);
....
void *ClearHash(HashTable *table);
In these functions you need to implement the logic of the hash table. While working they will be calling user defined callbacks to find out the index of the slot and if items are identical or not.
The users of this table should define their own structures and callback functions for every pair of types that they want to use:
struct HashItemK1V1
{
HashItemCore m_core;
K1 key;
V1 value;
};
int CalcHashK1V1(void *p)
{
HashItemK1V1 *param = (HashItemK1V1*)p;
// App code.
}
bool CompareK1V1(void *p1, void *p2)
{
HashItemK1V1 *param1 = (HashItemK1V1*)p1;
HashItemK1V1 *param2 = (HashItemK1V1*)p2;
// App code.
}
void FreeK1V1(void *p)
{
HashItemK1V1 *param = (HashItemK1V1*)p;
// App code if needed.
free(p);
}
This approach will not provide type safety because items will be passed around as void pointers assuming that every application structure starts with HashItemCore member. This will be sort of hand made polymorphysm. This is maybe not perfect, but this will work.
I implemented this approach in C++ using templates. But if you will strip out all fancies of C++, in the nutshell it will be exactly what I described above. I used my table in multiple projects and it worked like charm.
A generic hashtable in C is a bad idea.
a neat implementation will require function pointers, which are slow, since these functions cannot be inlined (the general case will need at least two function calls per hop: one to compute the hash value and one for the final compare)
to allow inlining of functions you'll either have to
write the code manually
or use a code generator
or macros. Which can get messy
IIRC, the linux kernel uses macros to create and maintain (some of?) its hashtables.
C does not have generic data types, so what you want to do (no extra allocations and no void* casting) is not really possible. You can use macros to generate the right data functions/structs on the fly, but you're trying to avoid macros as well.
So you need to give up at least one of your ideas.
You could have a generic data structure without extra allocations by allocating something like:
size_t key_len;
size_t val_len;
char key[];
char val[];
in one go and then handing out either void pointers, or adding an api for each specific type.
Alternatively, if you have a limited number of types you need to handle, you could also tag the value with the right one so now each entry contains:
size_t key_len;
size_t val_len;
int val_type;
char key[];
char val[];
but in the API at least you can verify that the requested type is the right one.
Otherwise, to make everything generic, you're left with either macros, or changing the language.
I have done far more C++ programming than "plain old C" programming. One thing I sorely miss when programming in plain C is type-safe generic data structures, which are provided in C++ via templates.
For sake of concreteness, consider a generic singly linked list. In C++, it is a simple matter to define your own template class, and then instantiate it for the types you need.
In C, I can think of a few ways of implementing a generic singly linked list:
Write the linked list type(s) and supporting procedures once, using void pointers to go around the type system.
Write preprocessor macros taking the necessary type names, etc, to generate a type-specific version of the data structure and supporting procedures.
Use a more sophisticated, stand-alone tool to generate the code for the types you need.
I don't like option 1, as it is subverts the type system, and would likely have worse performance than a specialized type-specific implementation. Using a uniform representation of the data structure for all types, and casting to/from void pointers, so far as I can see, necessitates an indirection that would be avoided by an implementation specialized for the element type.
Option 2 doesn't require any extra tools, but it feels somewhat clunky, and could give bad compiler errors when used improperly.
Option 3 could give better compiler error messages than option 2, as the specialized data structure code would reside in expanded form that could be opened in an editor and inspected by the programmer (as opposed to code generated by preprocessor macros). However, this option is the most heavyweight, a sort of "poor-man's templates". I have used this approach before, using a simple sed script to specialize a "templated" version of some C code.
I would like to program my future "low-level" projects in C rather than C++, but have been frightened by the thought of rewriting common data structures for each specific type.
What experience do people have with this issue? Are there good libraries of generic data structures and algorithms in C that do not go with Option 1 (i.e. casting to and from void pointers, which sacrifices type safety and adds a level of indirection)?
Option 1 is the approach taken by most C implementations of generic containers that I see. The Windows driver kit and the Linux kernel use a macro to allow links for the containers to be embedded anywhere in a structure, with the macro used to obtain the structure pointer from a pointer to the link field:
list_entry() macro in Linux
CONTAINING_RECORD() macro in Windows
Option 2 is the tack taken by BSD's tree.h and queue.h container implementation:
http://openbsd.su/src/sys/sys/queue.h
http://openbsd.su/src/sys/sys/tree.h
I don't think I'd consider either of these approaches type safe. Useful, but not type safe.
C has a different kind of beauty to it than C++, and type safety and being able to always see what everything is when tracing through code without involving casts in your debugger is typically not one of them.
C's beauty comes a lot from its lack of type safety, of working around the type system and at the raw level of bits and bytes. Because of that, there's certain things it can do more easily without fighting against the language like, say, variable-length structs, using the stack even for arrays whose sizes are determined at runtime, etc. It also tends to be a lot simpler to preserve ABI when you're working at this lower level.
So there's a different kind of aesthetic involved here as well as different challenges, and I'd recommend a shift in mindset when you work in C. To really appreciate it, I'd suggest doing things many people take for granted these days, like implementing your own memory allocator or device driver. When you're working at such a low level, you can't help but look at everything as memory layouts of bits and bytes as opposed to 'objects' with behaviors attached. Furthermore, there can come a point in such low-level bit/byte manipulation code where C becomes easier to comprehend than C++ code littered with reinterpret_casts, e.g.
As for your linked list example, I would suggest a non-intrusive version of a linked node (one that does not require storing list pointers into the element type, T, itself, allowing the linked list logic and representation to be decoupled from T itself), like so:
struct ListNode
{
struct ListNode* prev;
struct ListNode* next;
MAX_ALIGN char element[1]; // Watch out for alignment here.
// see your compiler's specific info on
// aligning data members.
};
Now we can create a list node like so:
struct ListNode* list_new_node(int element_size)
{
// Watch out for alignment here.
return malloc_max_aligned(sizeof(struct ListNode) + element_size - 1);
}
// create a list node for 'struct Foo'
void foo_init(struct Foo*);
struct ListNode* foo_node = list_new_node(sizeof(struct Foo));
foo_init(foo_node->element);
To retrieve the element from the list as T*:
T* element = list_node->element;
Since it's C, there's no type checking whatsoever when casting pointers in this way, and that will probably also give you an uneasy feeling if you're coming from a C++ background.
The tricky part here is to make sure that this member, element, is properly aligned for whatever type you want to store. When you can solve that problem as portably as you need it to be, you'll have a powerful solution for creating efficient memory layouts and allocators. Often this will have you just using max alignment for everything which might seem wasteful, but typically isn't if you are using appropriate data structures and allocators which aren't paying this overhead for numerous small elements on an individual basis.
Now this solution still involves the type casting. There's little you can do about that short of having a separate version of code of this list node and the corresponding logic to work with it for every type, T, that you want to support (short of dynamic polymorphism). However, it does not involve an additional level of indirection as you might have thought was needed, and still allocates the entire list node and element in a single allocation.
And I would recommend this simple way to achieve genericity in C in many cases. Simply replace T with a buffer that has a length matching sizeof(T) and aligned properly. If you have a reasonably portable and safe way you can generalize to ensure proper alignment, you'll have a very powerful way of working with memory in a way that often improves cache hits, reduces the frequency of heap allocations/deallocations, the amount of indirection required, build times, etc.
If you need more automation like having list_new_node automatically initialize struct Foo, I would recommend creating a general type table struct that you can pass around which contains information like how big T is, a function pointer pointing to a function to create a default instance of T, another to copy T, clone T, destroy T, a comparator, etc. In C++, you can generate this table automatically using templates and built-in language concepts like copy constructors and destructors. C requires a bit more manual effort, but you can still reduce it the boilerplate a bit with macros.
Another trick that can be useful if you go with a more macro-oriented code generation route is to cash in a prefix or suffix-based naming convention of identifiers. For example, CLONE(Type, ptr) could be defined to return Type##Clone(ptr), so CLONE(Foo, foo) could invoke FooClone(foo). This is kind of a cheat to get something akin to function overloading in C, and is useful when generating code in bulk (when CLONE is used to implement another macro) or even a bit of copying and pasting of boilerplate-type code to at least improve the uniformity of the boilerplate.
Option 1, either using void * or some union based variant is what most C programs use, and it may give you BETTER performance than the C++/macro style of having multiple implementations for different types, as it has less code duplication, and thus less icache pressure and fewer icache misses.
GLib is has a bunch of generic data structures in it, http://www.gtk.org/
CCAN has a bunch of useful snippets and such http://ccan.ozlabs.org/
Your option 1 is what most old time c programmers would go for, possibly salted with a little of 2 to cut down on the repetitive typing, and just maybe employing a few function pointers for a flavor of polymorphism.
There's a common variation to option 1 which is more efficient as it uses unions to store the values in the list nodes, ie there's no additional indirection. This has the downside that the list only accepts values of certain types and potentially wastes some memory if the types are of different sizes.
However, it's possible to get rid of the union by using flexible array member instead if you're willing to break strict aliasing. C99 example code:
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct ll_node
{
struct ll_node *next;
long long data[]; // use `long long` for alignment
};
extern struct ll_node *ll_unshift(
struct ll_node *head, size_t size, void *value);
extern void *ll_get(struct ll_node *head, size_t index);
#define ll_unshift_value(LIST, TYPE, ...) \
ll_unshift((LIST), sizeof (TYPE), &(TYPE){ __VA_ARGS__ })
#define ll_get_value(LIST, INDEX, TYPE) \
(*(TYPE *)ll_get((LIST), (INDEX)))
struct ll_node *ll_unshift(struct ll_node *head, size_t size, void *value)
{
struct ll_node *node = malloc(sizeof *node + size);
if(!node) assert(!"PANIC");
memcpy(node->data, value, size);
node->next = head;
return node;
}
void *ll_get(struct ll_node *head, size_t index)
{
struct ll_node *current = head;
while(current && index--)
current = current->next;
return current ? current->data : NULL;
}
int main(void)
{
struct ll_node *head = NULL;
head = ll_unshift_value(head, int, 1);
head = ll_unshift_value(head, int, 2);
head = ll_unshift_value(head, int, 3);
printf("%i\n", ll_get_value(head, 0, int));
printf("%i\n", ll_get_value(head, 1, int));
printf("%i\n", ll_get_value(head, 2, int));
return 0;
}
An old question, I know, but in case it is still of interest: I was experimenting with option 2) (pre-processor macros) today, and came up with the example I will paste below. Slightly clunky indeed, but not terrible. The code is not fully type safe, but contains sanity checks to provide a reasonable level of safety. And dealing with the compiler error messages while writing it was mild compared to what I have seen when C++ templates came into play. You are probably best starting reading this at the example use code in the "main" function.
#include <stdio.h>
#define LIST_ELEMENT(type) \
struct \
{ \
void *pvNext; \
type value; \
}
#define ASSERT_POINTER_TO_LIST_ELEMENT(type, pElement) \
do { \
(void)(&(pElement)->value == (type *)&(pElement)->value); \
(void)(sizeof(*(pElement)) == sizeof(LIST_ELEMENT(type))); \
} while(0)
#define SET_POINTER_TO_LIST_ELEMENT(type, pDest, pSource) \
do { \
ASSERT_POINTER_TO_LIST_ELEMENT(type, pSource); \
ASSERT_POINTER_TO_LIST_ELEMENT(type, pDest); \
void **pvDest = (void **)&(pDest); \
*pvDest = ((void *)(pSource)); \
} while(0)
#define LINK_LIST_ELEMENT(type, pDest, pSource) \
do { \
ASSERT_POINTER_TO_LIST_ELEMENT(type, pSource); \
ASSERT_POINTER_TO_LIST_ELEMENT(type, pDest); \
(pDest)->pvNext = ((void *)(pSource)); \
} while(0)
#define TERMINATE_LIST_AT_ELEMENT(type, pDest) \
do { \
ASSERT_POINTER_TO_LIST_ELEMENT(type, pDest); \
(pDest)->pvNext = NULL; \
} while(0)
#define ADVANCE_POINTER_TO_LIST_ELEMENT(type, pElement) \
do { \
ASSERT_POINTER_TO_LIST_ELEMENT(type, pElement); \
void **pvElement = (void **)&(pElement); \
*pvElement = (pElement)->pvNext; \
} while(0)
typedef struct { int a; int b; } mytype;
int main(int argc, char **argv)
{
LIST_ELEMENT(mytype) el1;
LIST_ELEMENT(mytype) el2;
LIST_ELEMENT(mytype) *pEl;
el1.value.a = 1;
el1.value.b = 2;
el2.value.a = 3;
el2.value.b = 4;
LINK_LIST_ELEMENT(mytype, &el1, &el2);
TERMINATE_LIST_AT_ELEMENT(mytype, &el2);
printf("Testing.\n");
SET_POINTER_TO_LIST_ELEMENT(mytype, pEl, &el1);
if (pEl->value.a != 1)
printf("pEl->value.a != 1: %d.\n", pEl->value.a);
ADVANCE_POINTER_TO_LIST_ELEMENT(mytype, pEl);
if (pEl->value.a != 3)
printf("pEl->value.a != 3: %d.\n", pEl->value.a);
ADVANCE_POINTER_TO_LIST_ELEMENT(mytype, pEl);
if (pEl != NULL)
printf("pEl != NULL.\n");
printf("Done.\n");
return 0;
}
I use void pointers (void*) to represent generic data structures defined with structs and typedefs. Below I share my implementation of a lib which I'm working on.
With this kind of implementation, you can think of each new type, defined with typedef, like a pseudo-class. Here, this pseudo-class is the set of the source code (some_type_implementation.c) and its header file (some_type_implementation.h).
In the source code, you have to define the struct that will present the new type. Note the struct in the "node.c" source file. There I made a void pointer to the "info" atribute. This pointer may carry any type of pointer (I think), but the price you have to pay is a type identifier inside the struct (int type), and all the switchs to make the propper handle of each type defined. So, in the node.h" header file, I defined the type "Node" (just to avoid have to type struct node every time), and also I had to define the constants "EMPTY_NODE", "COMPLEX_NODE", and "MATRIX_NODE".
You can perform the compilation, by hand, with "gcc *.c -lm".
main.c Source File
#include <stdio.h>
#include <math.h>
#define PI M_PI
#include "complex.h"
#include "matrix.h"
#include "node.h"
int main()
{
//testCpx();
//testMtx();
testNode();
return 0;
}
node.c Source File
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "node.h"
#include "complex.h"
#include "matrix.h"
#define PI M_PI
struct node
{
int type;
void* info;
};
Node* newNode(int type,void* info)
{
Node* newNode = (Node*) malloc(sizeof(Node));
newNode->type = type;
if(info != NULL)
{
switch(type)
{
case COMPLEX_NODE:
newNode->info = (Complex*) info;
break;
case MATRIX_NODE:
newNode->info = (Matrix*) info;
break;
}
}
else
newNode->info = NULL;
return newNode;
}
int emptyInfoNode(Node* node)
{
return (node->info == NULL);
}
void printNode(Node* node)
{
if(emptyInfoNode(node))
{
printf("Type:%d\n",node->type);
printf("Empty info\n");
}
else
{
switch(node->type)
{
case COMPLEX_NODE:
printCpx(node->info);
break;
case MATRIX_NODE:
printMtx(node->info);
break;
}
}
}
void testNode()
{
Node *node1,*node2, *node3;
Complex *Z;
Matrix *M;
Z = mkCpx(POLAR,5,3*PI/4);
M = newMtx(3,4,PI);
node1 = newNode(COMPLEX_NODE,Z);
node2 = newNode(MATRIX_NODE,M);
node3 = newNode(EMPTY_NODE,NULL);
printNode(node1);
printNode(node2);
printNode(node3);
}
node.h Header File
#define EMPTY_NODE 0
#define COMPLEX_NODE 1
#define MATRIX_NODE 2
typedef struct node Node;
Node* newNode(int type,void* info);
int emptyInfoNode(Node* node);
void printNode(Node* node);
void testNode();
matrix.c Source File
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "matrix.h"
struct matrix
{
// Meta-information about the matrix
int rows;
int cols;
// The elements of the matrix, in the form of a vector
double** MTX;
};
Matrix* newMtx(int rows,int cols,double value)
{
register int row , col;
Matrix* M = (Matrix*)malloc(sizeof(Matrix));
M->rows = rows;
M->cols = cols;
M->MTX = (double**) malloc(rows*sizeof(double*));
for(row = 0; row < rows ; row++)
{
M->MTX[row] = (double*) malloc(cols*sizeof(double));
for(col = 0; col < cols ; col++)
M->MTX[row][col] = value;
}
return M;
}
Matrix* mkMtx(int rows,int cols,double** MTX)
{
Matrix* M;
if(MTX == NULL)
{
M = newMtx(rows,cols,0);
}
else
{
M = (Matrix*)malloc(sizeof(Matrix));
M->rows = rows;
M->cols = cols;
M->MTX = MTX;
}
return M;
}
double getElemMtx(Matrix* M , int row , int col)
{
return M->MTX[row][col];
}
void printRowMtx(double* row,int cols)
{
register int j;
for(j = 0 ; j < cols ; j++)
printf("%g ",row[j]);
}
void printMtx(Matrix* M)
{
register int row = 0, col = 0;
printf("\vSize\n");
printf("\tRows:%d\n",M->rows);
printf("\tCols:%d\n",M->cols);
printf("\n");
for(; row < M->rows ; row++)
{
printRowMtx(M->MTX[row],M->cols);
printf("\n");
}
printf("\n");
}
void testMtx()
{
Matrix* M = mkMtx(10,10,NULL);
printMtx(M);
}
matrix.h Header File
typedef struct matrix Matrix;
Matrix* newMtx(int rows,int cols,double value);
Matrix* mkMatrix(int rows,int cols,double** MTX);
void print(Matrix* M);
double getMtx(Matrix* M , int row , int col);
void printRowMtx(double* row,int cols);
void printMtx(Matrix* M);
void testMtx();
complex.c Source File
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "complex.h"
struct complex
{
int type;
double a;
double b;
};
Complex* mkCpx(int type,double a,double b)
{
/** Doc - {{{
* This function makes a new Complex number.
*
* #params:
* |-->type: Is an interger that denotes if the number is in
* | the analitic or in the polar form.
* | ANALITIC:0
* | POLAR :1
* |
* |-->a: Is the real part if type = 0 and is the radius if
* | type = 1
* |
* `-->b: Is the imaginary part if type = 0 and is the argument
* if type = 1
*
* #return:
* Returns the new Complex number initialized with the values
* passed
*}}} */
Complex* number = (Complex*)malloc(sizeof(Complex));
number->type = type;
number->a = a;
number->b = b;
return number;
}
void printCpx(Complex* number)
{
switch(number->type)
{
case ANALITIC:
printf("Re:%g | Im:%g\n",number->a,number->b);
break;
case POLAR:
printf("Radius:%g | Arg:%g\n",number->a,number->b);
break;
}
}
void testCpx()
{
Complex* Z = mkCpx(ANALITIC,3,2);
printCpx(Z);
}
complex.h Header File
#define ANALITIC 0
#define POLAR 1
typedef struct complex Complex;
Complex* mkCpx(int type,double a,double b);
void printCpx(Complex* number);
void testCpx();
I hope I hadn't missed nothing.
I am using option 2 for a couple of high performance collections, and it is extremely time-consuming working through the amount of macro logic needed to do anything truly compile-time generic and worth using. I am doing this purely for raw performance (games). An X-macros approach is used.
A painful issue that constantly comes up with Option 2 is, "Assuming some finite number of options, such as 8/16/32/64 bit keys, do I make said value a constant and define several functions each with a different element of this set of values that constant can take on, or do I just make it a member variable?" The former means a less performant instruction cache since you have a lot of repeated functions with just one or two numbers different, while the latter means you have to reference allocated variables which in the worst case means a data cache miss. Since Option 1 is purely dynamic, you will make such values member variables without even thinking about it. This truly is micro-optimisation, though.
Also bear in mind the trade-off between returning pointers vs. values: the latter is most performant when the size of the data item is less than or equal to pointer size; whereas if the data item is larger, it is most likely better to return pointers than to force a copy of a large object by returning value.
I would strongly suggest going for Option 1 in any scenario where you are not 100% certain that collection performance will be your bottleneck. Even with my use of Option 2, my collections library supplies a "quick setup" which is like Option 1, i.e. use of void * values in my list and map. This is sufficient for 90+% of circumstances.
You could check out https://github.com/clehner/ll.c
It's easy to use:
#include <stdio.h>
#include <string.h>
#include "ll.h"
int main()
{
int *numbers = NULL;
*( numbers = ll_new(numbers) ) = 100;
*( numbers = ll_new(numbers) ) = 200;
printf("num is %d\n", *numbers);
numbers = ll_next(numbers);
printf("num is %d\n", *numbers);
typedef struct _s {
char *word;
} s;
s *string = NULL;
*( string = ll_new(string) ) = (s) {"a string"};
*( string = ll_new(string) ) = (s) {"another string"};
printf("string is %s\n", string->word);
string = ll_next( string );
printf("string is %s\n", string->word);
return 0;
}
Output:
num is 200
num is 100
string is another string
string is a string
I'm having a very big struct in an existing program. This struct includes a great number of bitfields.
I wish to save a part of it (say, 10 fields out of 150).
An example code I would use to save the subclass is:
typedef struct {int a;int b;char c} bigstruct;
typedef struct {int a;char c;} smallstruct;
void substruct(smallstruct *s,bigstruct *b) {
s->a = b->a;
s->c = b->c;
}
int save_struct(bigstruct *bs) {
smallstruct s;
substruct(&s,bs);
save_struct(s);
}
I also wish that selecting which part of it wouldn't be too much hassle, since I wish to change it every now and then. The naive approach I presented before is very fragile and unmaintainable. When scaling up to 20 different fields, you have to change fields both in the smallstruct, and in the substruct function.
I thought of two better approaches. Unfortunately both requires me to use some external CIL like tool to parse my structs.
The first approach is automatically generating the substruct function. I'll just set the struct of smallstruct, and have a program that would parse it and generate the substruct function according to the fields in smallstruct.
The second approach is building (with C parser) a meta-information about bigstruct, and then write a library that would allow me to access a specific field in the struct. It would be like ad-hoc implementation of Java's class reflection.
For example, assuming no struct-alignment, for struct
struct st {
int a;
char c1:5;
char c2:3;
long d;
}
I'll generate the following meta information:
int field2distance[] = {0,sizeof(int),sizeof(int),sizeof(int)+sizeof(char)}
int field2size[] = {sizeof(int),1,1,sizeof(long)}
int field2bitmask[] = {0,0x1F,0xE0,0};
char *fieldNames[] = {"a","c1","c2","d"};
I'll get the ith field with this function:
long getFieldData(void *strct,int i) {
int distance = field2distance[i];
int size = field2size[i];
int bitmask = field2bitmask[i];
void *ptr = ((char *)strct + distance);
long result;
switch (size) {
case 1: //char
result = *(char*)ptr;
break;
case 2: //short
result = *(short*)ptr;
...
}
if (bitmask == 0) return result;
return (result & bitmask) >> num_of_trailing_zeros(bitmask);
}
Both methods requires extra work, but once the parser is in your makefile - changing the substruct is a breeze.
However I'd rather do that without any external dependencies.
Does anyone have any better idea? Where my ideas any good, is there some availible implementation of my ideas on the internet?
From your description, it looks like you have access to and can modify your original structure. I suggest you refactor your substructure into a complete type (as you did in your example), and then make that structure a field on your big structure, encapsulating all of those fields in the original structure into the smaller structure.
Expanding on your small example:
typedef struct
{
int a;
char c;
} smallstruct;
typedef struct
{
int b;
smallstruct mysub;
} bigstruct;
Accessing the smallstruct info would be done like so:
/* stack-based allocation */
bigstruct mybig;
mybig.mysub.a = 1;
mybig.mysub.c = '1';
mybig.b = 2;
/* heap-based allocation */
bigstruct * mybig = (bigstruct *)malloc(sizeof(bigstruct));
mybig->mysub.a = 1;
mybig->mysub.c = '1';
mybig->b = 2;
But you could also pass around pointers to the small struct:
void dosomething(smallstruct * small)
{
small->a = 3;
small->c = '3';
}
/* stack based */
dosomething(&(mybig.mysub));
/* heap based */
dosomething(&((*mybig).mysub));
Benefits:
No Macros
No external dependencies
No memory-order casting hacks
Cleaner, easier-to-read and use code.
If changing the order of the fields isn't out of the question, you can rearrange the bigstruct fields in such a way that the smallstruct fields are together, and then its simply a matter of casting from one to another (possibly adding an offset).
Something like:
typedef struct {int a;char c;int b;} bigstruct;
typedef struct {int a;char c;} smallstruct;
int save_struct(bigstruct *bs) {
save_struct((smallstruct *)bs);
}
Macros are your friend.
One solution would be to move the big struct out into its own include file and then have a macro party.
Instead of defining the structure normally, come up with a selection of macros, such as BEGIN_STRUCTURE, END_STRUCTURE, NORMAL_FIELD, SUBSET_FIELD
You can then include the file a few times, redefining those structures for each pass. The first one will turn the defines into a normal structure, with both types of field being output as normal. The second would define NORMAL_FIELD has nothing and would create your subset. The third would create the appropriate code to copy the subset fields over.
You'll end up with a single definition of the structure, that lets you control which fields are in the subset and automatically creates suitable code for you.
Just to help you in getting your metadata, you can refer to the offsetof() macro, which also has the benefit of taking care of any padding you may have
I suggest to take this approach:
Curse the guy who wrote the big structure. Get a voodoo doll and have some fun.
Mark each field of the big structure that you need somehow (macro or comment or whatever)
Write a small tool which reads the header file and extracts the marked fields. If you use comments, you can give each field a priority or something to sort them.
Write a new header file for the substructure (using a fixed header and footer).
Write a new C file which contains a function createSubStruct which takes a pointer to the big struct and returns a pointer to the substruct
In the function, loop over the fields collected and emit ss.field = bs.field (i.e. copy the fields one by one).
Add the small tool to your makefile and add the new header and C source file to your build
I suggest to use gawk, or any scripting language you're comfortable with, as the tool; that should take half an hour to build.
[EDIT] If you really want to try reflection (which I suggest against; it'll be a whole lot of work do get that working in C), then the offsetof() macro is your friend. This macro returns the offset of a field in a structure (which is most often not the sum of the sizes of the fields before it). See this article.
[EDIT2] Don't write your own parser. To get your own parser right will take months; I know since I've written lots of parsers in my life. Instead mark the parts of the original header file which need to be copied and then rely on the one parser which you know works: The one of your C compiler. Here are a couple of ideas how to make this work:
struct big_struct {
/**BEGIN_COPY*/
int i;
int j : 3;
int k : 2;
char * str;
/**END_COPY*/
...
struct x y; /**COPY_STRUCT*/
}
Just have your tool copy anything between /**BEGIN_COPY*/ and /**END_COPY*/.
Use special comments like /**COPY_STRUCT*/ to instruct your tool to generate a memcpy() instead of an assignment, etc.
This can be written and debugged in a few hours. It would take as long to set up a parser for C without any functionality; that is you'd just have something which can read valid C but you'd still have to write the part of the parser which understands C, and the part which does something useful with the data.