Is this hack valid according to standard? - c

This is just like struct hack.
Is it valid according to standard C?
// error check omitted!
typedef struct foo {
void *data;
char *comment;
size_t num_foo;
}foo;
foo *new_Foo(size_t num, blah blah)
{
foo *f;
f = malloc(num + sizeof(foo) + MAX_COMMENT_SIZE );
f->data = f + 1; // is this OK?
f->comment = f + 1 + num;
f->num_foo = num;
...
return f;
}

Yes, it's completely valid. And I would strongly encourage doing this when it allows you to avoid unnecessary additional allocations (and the error handling and memory fragmentation they entail). Others may have different opinions.
By the way, if your data isn't void * but something you can access directly, it's even easier (and more efficient because it saves space and avoids the extra indirection) to declare your structure as:
struct foo {
size_t num_foo;
type data[];
};
and allocate space for the amount of data you need. The [] syntax is only valid in C99, so for C89-compatibility you should use [1] instead, but this may waste a few bytes.

The line you question is valid - as others have said.
Interestingly, the next line, which you did not query, is syntactically valid but is not giving you the answer you want (except in the case where num == 0).
typedef struct foo
{
void *data;
char *comment;
size_t num_foo;
} foo;
foo *new_Foo(size_t num, blah blah)
{
foo *f;
f = malloc(num + sizeof(foo) + MAX_COMMENT_SIZE );
f->data = f + 1; // This is OK
f->comment = f + 1 + num; // This is !!BAD!!
f->num_foo = num;
...
return f;
}
The value of f + 1 is a foo * (implicitly coerced into a void * by the assignment).
The value of f + 1 + num is also a foo *; it points to the num+1th foo.
What you probably had in mind was:
foo->comment = (char *)f->data + num;
Or:
foo->comment = (char *)(f + 1) + num;
Note that while GCC will allow you to add num to a void pointer, and it will treat it as if sizeof(void) == 1, the C Standard does not give you that permission.

That is an old game, though the usual form is like
struct foo {
size_t size
char data[1]
}
and then allocate the space as big as you want and use array as if it had the desired size.
It is valid, but I would encourage you to find another way if possible: there are lots of chance to screw this up.

Yes, the general idea of the hack is valid, but at least as I read it, you haven't implemented it quite correctly. This much you've done right:
f = malloc(num + sizeof(foo) + MAX_COMMENT_SIZE );
f->data = f + 1; // is this OK?
But this is wrong:
f->comment = f + 1 + num;
Since f is foo *, the f+1+num is computed in terms of sizeof(foo) -- i.e., it's equivalent to saying f[1+num] -- it (attempts to) index to the 1+numth foo in an array. I'm pretty sure that's not what you want. When you allocate the data, you're passing sizeof(foo)+num+MAX_COMMENT_SIZE, so what you're allocating space for is num chars, and what you (presumably) want is to point f->comment to a spot in memory that's num chars after f->data, which would be more like this:
f->comment = (char *)f + sizeof(foo) + num;
Casting f to a char * forces the math to be done in terms of chars instead of foos.
OTOH, since you're always allocating MAX_COMMENT_SIZE for comment, I'd probably simplify things (quite) a bit, and use something like this:
typedef struct foo {
char comment[MAX_COMMENT_SIZE];
size_t num_foo;
char data[1];
}foo;
And then allocate it like:
foo *f = malloc(sizeof(foo) + num-1);
f->num_foo = num;
and it'll work without any pointer manipulation at all. If you have a C99 compiler, you can modify this slightly:
typedef struct foo {
char comment[MAX_COMMENT_SIZE];
size_t num_foo;
char data[];
}foo;
and allocate:
foo *f = malloc(sizeof(foo) + num);
f->num_foo = num;
This has the additional advantage that the standard actually blesses it, though in this case the advantage is pretty minor (I believe the version with data[1] will work with every C89/90 compiler in existence).

Another possible problem might be alignment.
If you simply malloc your f->data, then you can safely e.g. convert your void* to double* and use it to read/write a double (provided that num is sufficiently large). However, in your example you can no longer do that, as f->data might not be properly aligned. For example, to store a double in f->data, you will need to use something like memcpy instead of a simple typecast.

I'd rather use some function to allocate the data dynamically and free it correctly instead.
Using this trick only saves you the trouble of initializing the data structure, and can lead to very bad problems (see Jerry's comment).
I'd do something like this:
typedef struct foo {
void *data;
char *comment;
size_t num_foo;
}foo;
foo *alloc_foo( void * data, size_t data_size, const char *comment)
{
foo *elem = calloc(1,sizeof(foo));
void *elem_data = calloc(data_size, sizeof(char));
char *elem_comment = calloc(strlen(comment)+1, sizeof(char));
elem->data = elem_data;
elem->comment = elem_comment;
memcpy(elem_data, data, data_size);
memcpy(elem_comment, comment, strlen(comment)+1);
elem->num_foo = data_size + strlen(comment) + 1;
}
void free_foo(foo *f)
{
if(f->data)
free(f->data);
if(f->comment)
free(f->comment);
free(f);
}
Note that I did no check on data validity, and my alloc can be optimized (replacing strlen() calls by a stored lenght value).
It seems to me that this behavior is more secure... at the price of a disseminated data chunk maybe.

Related

C vs C++ placing structs in unsigned char buffer

Does C have anything similar to C++ where one can place structs in an unsigned char buffer as is done in C++ as shown in the standard sec. 6.7.2
template<typename ...T>
struct AlignedUnion {
alignas(T...) unsigned char data[max(sizeof(T)...)];
};
int f() {
AlignedUnion<int, char> au;
int *p = new (au.data) int; // OK, au.data provides storage
char *c = new (au.data) char(); // OK, ends lifetime of *p
char *d = new (au.data + 1) char();
return *c + *d; // OK
}
In C I can certainly memcpy a struct of things(or int as shown above) into an unsigned char buffer, but then using a pointer to this struct one runs into strict aliasing violations; the buffer has different declared type.
So suppose one would want to replicate the second line in f the C++ above in C. One would do something like this
#include<string.h>
#include<stdio.h>
struct Buffer {
unsigned char data[sizeof(int)];
};
int main()
{
struct Buffer b;
int n = 5;
int* p = memcpy(&b.data,&n,sizeof(int));
printf("%d",*p); // aliasing violation here as unsigned char is accessed as int
return 0;
}
Unions are often suggested i.e. union Buffer {int i;unsigned char b[sizeof(int)]}; but this is not quite as nice if the aim of the buffer is to act as storage (i.e. placing different sized types in there, by advancing a pointer into the buffer to the free part + potenially some more for proper alignment).
Have you tried using a union?
#include <string.h>
#include <stdio.h>
union Buffer {
int int_;
double double_;
long double long_double_;
unsigned char data[1];
};
int main() {
union Buffer b;
int n = 5;
int *p = memcpy(&b.data, &n, sizeof(int));
printf("%d", *p); // aliasing violation here as unsigned char is accessed as int
return 0;
}
The Buffer aligns data member according the type with the greatest alignment requirement.
Yes, because of strict aliasing rule it is just not possible. As it is not possible to write a standard compliant malloc().
Your buffer is not aligned - alignas(int) from stdalign.h needs to be added.
If you want to protect against compiler optimizations, either:
just cast the pointer and access it and compile with -fno-strict-aliasing, or use volatile
or move the accessor to the buffer to another file that is compiled without LTO so that compiler just is not able to optimize it.
// mybuffer.c
#include <stdalign.h>
alignas(int) unsigned char buffer[sizeof(int)];
void *getbuffer() { return buffer; }
// main.c
#include <string.h>
#include <stdio.h>
#include "mybuffer.h"
int main() {
void *data = getbuffer();
// int *p = new (au.data) int; // OK, au.data provides storage
int *p = data;
// char *c = new (au.data) char(); // OK, ends lifetime of *p
char *c = data;
*c = 0;
// char *d = new (au.data + 1) char();
char *d = (char*)data + 1;
*d = 0;
return *c + *d;
}
The way the definition of Effective Type in 6.5p6 is written, it's unclear what it's supposed to mean in all corner cases--likely because there was never a consensus among Committee Members as to how all corner cases should be handled. Defect reports often add more confusion than clarity, since they use terms like the "active member" of a union when neither the Standard nor the defect reports specify what actions would set or change it.
If one wants to use an object of static or automatic duration as though it were a buffer without a declared type, a safe way of doing that should be to do something like the following:
void volatile *volatile dummy_vp;
void test(void)
{
union {
char dat[1000];
unsigned long force_alignment;
} buffer;
void *volatile launder = buffer.dat;
dummy_vp = &launder;
void *storage_blob = launder;
...
}
Unless an implementation goes out of its way to test whether the read of
launder happened to yield an address matching buffer.dat, it would have no way of knowing whether the object at that address had a declared type. Nothing in the Standard would forbid an implementation from behaving nonsensically if the address happened to match that of buffer.dat, but situations where performance improvements would justify the cost of the check aren't likely to be common enough for compilers to attempt such "optimization".

How to construct a C function with void pointer parameters and conditionally cast them to other types at runtime?

I'm trying to create a function where parameters are passed as void pointers, and including a parameter setting the data type the void pointers will be cast to, so that the function may be used on different types. Something like the following, which does not work:
void test_function(int use_type, void * value, void * array) {
// Set types to the parameters based on 'use_type'
if (use_type == 0) { // Int type
int * valueT = (int *) value;
int * arrayT = (int *) array;
} else if (use_type == 1) { // Double type
double * valueT = (double *) value;
double * arrayT = (double *) array;
}
// Main code of the program, setting an array item, regardless of type
arrayT[0] = *valueT;
}
There are two problems with the above code: the properly typed valueT and arrayT are scoped in the conditional blocks and not visible to the main part of the code. Moving their declarations out of the blocks isn't viable in the given structure of the code though, as they would then need different names for int and double versions, defeating the whole idea of what I'm trying to achieve. The other problem is that valueT and arrayT are local to the function. What I really want is to set the parameter array: array[0] = *value.
It appears that what I'm trying to do isn't possible in C... Is there a way that this could be done?
EDIT:
The assignment to array line is there to demonstrate what I want to do, there is a lot more code in that part. There will also be a number of other types besides int and double. Moving the assignment line into the blocks would mean too much code duplication.
You're trying to implement polymorphism in C. Down this path lies madness, unmaintainable code, and new programming languages.
Instead, I strongly recommend refactoring your code to use a better method of working with mixed data. union or struct or pointers or any of the solutions here. This will be less work in the long run and result in faster and more maintainable code.
Or you can switch to C++ and use templates.
Or you can use somebody else's implementation like GLib's GArray. This is a system of clever macros and functions to allow easy access to any type of data in an array. It's Open Source so you can examine its implementation, a mix of macros and clever functions. It has many features like automatic resizing and garbage collection. And it is very mature and well tested.
A GArray remembers its type, so it isn't necessary to keep telling it.
GArray *ints = g_array_new(FALSE, FALSE, sizeof(int));
GArray *doubles = g_array_new(FALSE, FALSE, sizeof(double));
int val1 = 23;
double val2 = 42.23;
g_array_append_val(ints, val1);
g_array_append_val(doubles, val2);
The underlying plain C array can be accessed as the data field of the GArray struct. It's typed gchar * so it must be recast.
double *doubles_array = (double *)doubles->data;
printf("%f", doubles_array[0]);
If we continue down your path, the uncertainty about the type infects every "generic" function and you wind up writing parallel implementations anyway.
For example, let's write a function that adds two indexes together. Something which should be simple.
First, let's do it conventionally.
int add_int(int *array, size_t idx1, size_t idx2) {
return array[idx1] + array[idx2];
}
double add_double(double *array, size_t idx1, size_t idx2) {
return array[idx1] + array[idx2];
}
int main() {
int ints[] = {5, 10, 15, 20};
int value = add_int(ints, 1, 2);
printf("%d\n", value);
}
Taking advantage of token concatenation, we can put a clever macro in front of that to choose the correct function for us.
#define add(a, t, i1, i2) (add_ ## t(a, i1, i2))
int main() {
int ints[] = {5, 10, 15, 20};
int value = add(ints, int, 1, 2);
printf("%d\n", value);
}
The macro is clever, but probably not worth the extra complexity. So long as you're consistent about the naming the programmer can choose between the _int and _double form themselves. But it's there if you like.
Now let's see it with "one" function.
// Using an enum gives us some type safety and code clarity.
enum Types { _int, _double };
void *add(void * array, enum Types type, size_t idx1, size_t idx2) {
// Using an enum on a switch, with -Wswitch, will warn us if we miss a type.
switch(type) {
case _int : {
int *sum = malloc(sizeof(int));
*sum = (int *){array}[idx1] + (int *){array}[idx2];
return sum;
};
case _double : {
double *sum = malloc(sizeof(double));
*sum = (double *){array}[idx1] + (double *){array}[idx2];
return sum;
};
};
}
int main() {
int ints[] = {5, 10, 15, 20};
int value = *(int *)add((void *)ints, _int, 1, 2);
printf("%d\n", value);
}
Here we see the infection. We need a return value, but we don't know the type, so we have to return a void pointer. That means we need to allocate memory of the correct type. And we need to access the array with the correct type, more redundancy, more typecasting. And then the caller has to mess with a bunch of typecasting.
What a mess.
We can clean up some of the redundancy with macros.
#define get_idx(a,t,i) ((t *){a}[i])
#define make_var(t) ((t *)malloc(sizeof(t)))
void *add(void * array, enum Types type, size_t idx1, size_t idx2) {
switch(type) {
case _int : {
int *sum = make_var(int);
*sum = get_idx(array, int, idx1) + get_idx(array, int, idx2);
return sum;
};
case _double : {
double *sum = make_var(double);
*sum = get_idx(array, double, idx1) + get_idx(array, double, idx2);
return sum;
};
};
}
You can probably reduce the redundancy with even more macros, like Patrick's answer, but boy is this rapidly turning into macro hell. At a certain point you're no longer coding in C as you are rapidly expanding custom language implemented with stacks of macros.
Clifford's very clever idea of using sizes rather than types will not work here. In order to actually do anything with the values we need to know their types.
Once again, I cannot express strongly enough how big of a tar pit polymorphism in C is.
Instead of passing a type identifier, it is sufficient and simpler to pass the size of the object:
void test_function( size_t sizeof_type, void* value, void* array )
{
size_t element_index = 0 ; // for example
memcpy( (char*)array + element_index * sizeof_type, value, sizeof_type ) ;
}
In order to remain type-agnostic and maintain the flexibility of usage you appear to want, you'll need move your "main code" into a macro and call it for each case:
typedef enum {
USE_TYPE_INT = 0,
USE_TYPE_DOUBLE = 1,
// ...
} USE_TYPE;
void test_function(USE_TYPE use_type, void * value, void * array) {
#define TEST_FUNCTION_T(type) do { \
type * valueT = value; \
type * arrayT = array; \
/* Main code of the program */ \
arrayT[0] = *valueT; \
/* ... */ \
} while(0)
// Set types to the parameters based on 'use_type'
switch (use_type) {
case USE_TYPE_INT:
TEST_FUNCTION_T(int);
break;
case USE_TYPE_DOUBLE:
TEST_FUNCTION_T(double);
break;
// ...
}
#undef TEST_FUNCTION_T
}
Note that, while you only define the TEST_FUNCTION_T macro once, each usage will result in a duplicate code block differing only by the type pasted into the macro call when the program is compiled.
The direct answer to your question is do the assignment dereferencing in the block in which the pointers are valid:
void test_function(int use_type, void * value, void * array) {
// Set types to the parameters based on 'use_type'
if (use_type == 0) { // Int type
int * valueT = value, *arrayT = array; //the casts in C are unnecessary
arrayT[0] = *valueT;
} else if (use_type == 1) { // Double type
double * valueT = value, *arrayT = array;
arrayT[0] = *valueT;
}
}
but you should probably be doing this inline, without any type<->int translation:
(type*){array}[0] = *(type*){value} //could make it DRY with a macro

using #define for defining struct objects

I came across this simple program somewhere
#include<stdio.h>
#include<stdlib.h>
char buffer[2];
struct globals {
int value;
char type;
long tup;
};
#define G (*(struct globals*)&buffer)
int main ()
{
G.value = 233;
G.type = '*';
G.tup = 1234123;
printf("\nValue = %d\n",G.value);
printf("\ntype = %c\n",G.type);
printf("\ntup = %ld\n",G.tup);
return 0;
}
It's compiling (using gcc) and executing well and I get the following output:
Value = 233
type = *
tup = 1234123
I am not sure how the #define G statement is working.
How G is defined as an object of type struct globals ?
First, this code has undefined behavior, because it re-interprets a two-byte array as a much larger struct. Therefore, it is writing past the end of the allocated space. You could make your program valid by using the size of the struct to declare the buffer array, like this:
struct globals {
int value;
char type;
long tup;
};
char buffer[sizeof(struct globals)];
The #define is working in its usual way - by providing textual substitutions of the token G, as if you ran a search-and-replace in your favorite text editor. Preprocessor, the first stage of the C compiler, finds every entry G, and replaces it with (*(struct globals*)&buffer).
Once the preprocessor is done, the compiler sees this code:
int main ()
{
(*(struct globals*)&buffer).value = 233;
(*(struct globals*)&buffer).type = '*';
(*(struct globals*)&buffer).tup = 1234123;
printf("\nValue = %d\n",(*(struct globals*)&buffer).value);
printf("\ntype = %c\n",(*(struct globals*)&buffer).type);
printf("\ntup = %ld\n",(*(struct globals*)&buffer).tup);
return 0;
}
The macro simply casts the address of the 2-character buffer buf into a pointer to the appropriate structure type, then de-references that to produce a struct-typed lvalue. That's why the dot (.) struct-access operator works on G.
No idea why anyone would do this. I would think it much cleaner to convert to/from the character array when that is needed (which is "never" in the example code, but presumably it's used somewhere in the larger original code base), or use a union to get rid of the macro.
union {
struct {
int value;
/* ... */
} s;
char c[2];
} G;
G.s.value = 233; /* and so on */
is both cleaner and clearer. Note that the char array is too small.

type casting void pointer and allocate memory

I have two structures:
typedef struct abc {
unsigned int pref;
unsigned int port;
char *aRecordIp;
int index;
int count;
}abc_t;
typedef struct xyz {
abc_t *ab;
int index;
int count;
}xyz_t;
and I would like to achieve the following
int Lookup (char *lookup,void *handle) {
*handle = (xyz_t *)malloc(sizeof(xyz_t *));
handle->ab = (abc_t *) malloc(sizeof(abc_t *));
//
}
I am trying to typecast void pointer to xyz_t basically.
Is this correct?
You are doing it wrong on multiple counts:
You're trying to set a variable handle->ab, but handle is a void *, not a structure type pointer.
You need to show your call, but there's likely to be problems — why do you think a void * argument is a good idea?
You want to allocate structures, so the sizeof() operands should be xyz_t and not xyz_t *; repeat for abc_t.
You should probably use:
int Lookup(const char *lookup, xyz_t **handle)
{
...
*handle = (xyz_t *)malloc(sizeof(xyz_t));
(*handle)->ab = (abc_t *)malloc(sizeof(abc_t));
...
}
Don't forget to check the result of malloc().
There are those who will castigate you for using casts on malloc(). I won't. When I learned C (a long time ago, years before there was a C standard), on a machine where the int * value for an address was not the same bit pattern as the char * address for the same memory location, where malloc() had to be declared char *malloc() or all hell broke loose, the casts were necessary. But — and this is the major issue that people are concerned about — it is crucial that you compile with compiler options such that if you invoke a function without a prototype in scope, you will get a compilation error, or a warning that you will pay attention to. The concern is that if you do not have a declaration for malloc() in scope, you will get incorrect results from using the cast which the compiler would diagnose if you don't.
On the whole, though, I think you should separate your lookup code from your 'create xyz_t' code — your function is doing two jobs and it complicates the interface to your function.
xyz_t *Create_xyz(void);
int Lookup(const char *lookup, const xyz_t *handle);
While casting of a void* to any pointer type is correct, it is not necessary in C, and it is not recommended for malloc (e.g. see Do I cast the result of malloc? ).
Also, you should specify sizeof(xyz_t), not sieof(xyz_t*), otherwise you allocate memory enough only for pointer, not for the whole structure.
And of course you should assign a pointer to handle, not to *handle. And handle should be of proper pointer type (xyz_t*).
Oh, and if the question is about casting handle to xyz_t*, then you can do it like ((xyz_t*)handle)->ab.
I'd recommend reading a book before playing with pointers like that.
I believe you want handle to hold a valid address of a xyz_t struct after the function call. Then you need to change the function signature and contents like so:
int Lookup (char *lookup, xyz_t **handle) { // double indirection here
*handle = (xyz_t *)malloc(sizeof(xyz_t));
(*handle)->ab = (abc_t *) malloc(sizeof(abc_t));
}
And call it like this:
xyz_t *myhandle;
char lookup;
Lookup(&lookup, &mynandle);
// now you can use it
myhandle->index ...
You will need to free the memory as well...
free(myhandle->ab);
free(myhandle);
If you want to pass void *, here is the solution
int Lookup (char *lookup, void *handle) {
handle = malloc(sizeof(xyz_t));
((xyz_t *)handle)->ab = (abc_t *) malloc(sizeof(abc_t));
//
}

Re-typecasting a variable, possible?

Is it possible to recast the a variable permanently, or have a wrapper function such that the variable would behave like another type?
I would want to achieve something I posted in the other question:
Typecasting variable with another typedef
Update: Added GCC as compiler. May have a extension that would help?
Yes, you can cast a variable from one type to another:
int x = 5;
double y = (double) x; // <== this is what a cast looks like
However, you cannot modify the type of the identifier 'x' in-place, if that is what you are asking. Close to that, though, you can introduce another scope with that identifier redeclared with some new type:
int x = 5;
double y = (double) x;
{
double x = y; // NOTE: this isn't the same as the 'x' identifier above
// ...
}
// NOTE: the symbol 'x' reverts to its previous meaning here.
Another thing you could do, though it is really a horrible, horrible idea is:
int x = 5;
double new_version_of_x = (double) x; // Let's make 'x' mean this
#define x new_version_of_x
// The line above is pure evil, don't actually do it, but yes,
// all lines after this one will think 'x' has type double instead
// of int, because the text 'x' has been rewritten to refer to
// 'new_version_of_x'. This will likely lead to all sorts of havoc
You accomplish that by casting then assigning.
int f(void * p) {
int * i;
i = (int *)p;
//lots of code here with the i pointer, and every line
//really thinks that it is an int pointer and will treat it as such
}
EDIT From the other question you linked:
typedef struct {
unsigned char a;
unsigned char b;
unsigned char c;
} type_a;
typedef struct {
unsigned char e;
unsigned char f[2];
} type_b;
//initialize type a
type_a sample;
sample.a = 1;
sample.b = 2;
sample.c = 3;
Now sample is initialized, but you want to access it differently, you want to pretend that in fact that variable has another type, so you declare a pointer to the type you want to "disguise" sample as:
type_b * not_really_b;
not_really_b = (type_b*)&sample;
See, that is the whole magic.
not_really_b->e is equal 1
not_really_b->f[0] is equal 2
not_really_b->f[1] is equal 3
Does this answer your question?
The other answers are better (declare a variable of the type you want, and do an assignment). If that's not what you're asking for, you could use a macro:
long i;
#define i_as_int ((int)i)
printf( "i = %ld\n", i);
printf( "i = %d\n", i_as_int);
But wouldn't it be clearer to just say (int) i if that's what you mean?
As long as you realize in C pointers are nothing but addresses of memory
locations of certain types, you should have your answer. For example the
following program will print the name of the file
int main(int argc, char *argv[]) {
int *i;
i = (int *) argv[0];
printf("%s\n", argv[0]);
printf("%s\n", ((char *) i));
}

Resources