I saw the following weird type of macro in C (Linux) code here:
#define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f))
What does ((t*)0)->f) do?
How does it work?
The thing does exactly what the name suggests – deliver the size of a field of a struct.
what it first does is cast 0 (which is an arbitrary address) to a pointer of the type of struct.
Then it -> (access via pointer) takes the field and applies sizeof to it.
Pretty straightforward!
It does what it says on the tin, like Marcus Müller explains. If you are wondering why bother with it, why can't we just do sizeof(type_of_field) instead, then consider this:
struct foo {
struct {
int a;
float b;
} bar;
};
We cannot name type_of_bar, since for the programmer it has no name. But the macro allows us to obtain the field size via a workaround, regardless.
Because sizeof is computed at compile time (except in the case of variable length arrays, which is not the case here) its argument is not evaluated at runtime. It is therefore OK to cast a NULL pointer, as it's only used to indicate the field for which size is being computed.
Related
I have a question about some code in Eric Roberts' Programming Abstractions in C. He use several libraries of his own both to simplify things for readers and to teach how to write libraries. (All of the library code for the book can be found on this site.)
One library, genlib provides a macro for generic allocation of a pointer to a struct type. I don't understand part of the macro. I'll copy the code below, plus an example of how it is meant to be used, then I'll explain my question in more detail.
/*
* Macro: New
* Usage: p = New(pointer-type);
* -----------------------------
* The New pseudofunction allocates enough space to hold an
* object of the type to which pointer-type points and returns
* a pointer to the newly allocated pointer. Note that
* "New" is different from the "new" operator used in C++;
* the former takes a pointer type and the latter takes the
* target type.
*/
#define New(type) ((type) GetBlock(sizeof *((type) NULL)))
/* GetBlock is a wrapper for malloc. It encasulates the
* common sequence of malloc, check for NULL, return or
* error out, depending on the NULL check. I'm not going
* to copy that code since I'm pretty sure it isn't
* relevant to my question. It can be found here though:
* ftp://ftp.awl.com/cseng/authors/roberts/cs1-c/standard/genlib.c
*/
Roberts intends for the code to be used as follows:
typedef struct {
string name;
/* etc. */
} *employeeT;
employeeT emp;
emp = New(employeeT);
He prefers to use a pointer to the record as the type name, rather than the record itself. So New provides a generic way to allocate such struct records.
In the macro New, what I don't understand is this: sizeof *((type)) NULL). If I'm reading that correctly, it says "take the size of the dereferenced cast of NULL to whatever struct type type represents in a given call". I think I understand the dereferencing: we want to allocate enough space for the struct; the size of the pointer is not what we need, so we dereference to get at the size of the underlying record-type. But I don't understand the idea of casting NULL to a type.
My questions:
You can cast NULL? What does that even mean?
Why is the cast necessary? When I tried removing it, the compiler says error: expected expression. So, sizeof *(type) is not an expression? That confused me since I can do the following to get the sizes of arbitrary pointers-to-structs:
#define struct_size(s_ptr) do { \
printf("sizeof dereferenced pointer to struct %s: %lu\n", \
#s_ptr, sizeof *(s_ptr)); \
} while(0)
Edit: As many people point out below, the two examples aren't the same:
/* How genlib uses the macro. */
New(struct MyStruct*)
/* How I was using my macro. */
struct MyStruct *ptr; New(ptr)
For the record, this isn't homework. I'm an amateur trying to improve at C. Also, there's no problem with the code, as far as I can tell. That is, I'm not asking how I can do something different with it. I'm just trying to better understand (1) how it works and (2) why it must be written the way it is. Thanks.
The issue is that the macro needs to get the size of the type pointed at by the pointer type.
As an example, suppose that you have the the pointer type struct MyStruct*. Without removing the star from this expression, how would you get the size of struct MyStruct? You couldn't write
sizeof(*(struct MyStruct*))
since that's not legal C code.
On the other hand, if you had a variable of type struct MyStruct*, you could do something like this:
struct MyStruct* uselessPointer;
sizeof(*uselessPointer);
Since sizeof doesn't actually evaluate its argument (it just determines the static size of the type of the expression), this is safe.
Of course, in a macro, you can't define a new variable. However, you could make up a random pointer to a struct MyStruct* by casting an existing pointer. Here, NULL is a good candidate - it's an existing pointer that you can legally cast to a struct MyStruct*. Therefore, if you were to write
sizeof(* ((struct MyStruct*)NULL))
the code would
Cast NULL to a struct MyStruct*, yielding a pointer of static type struct MyStruct*.
Determine the size of the object that would be formed by dereferencing the pointer. Since the pointer has type struct MyStruct*, it points at an object of type struct MyStruct, so this yields the type of struct MyStruct.
In other words, it's a simple way to get an object of the pointer type so that you can dereference it and obtain an object of the underlying type.
I've worked with Eric on some other macros and he is a real pro with the preprocessor. I'm not surprised that this works, and I'm not surprised that it's tricky, but it certainly is clever!
As a note - in C++, this sort of trick used to be common until the introduction of the declval utility type, which is a less-hacky version of this operation.
Hope this helps!
It's a hack. It relies on the fact that the argument to the sizeof operator isn't actually evaluated.
To answer your specific questions:
Yes, NULL is just a pointer literal. Like any other pointer, it may be cast.
sizeof operates on either a type or an expression. *(type) would be neither (after macro substitution has occurred), it would be a syntax error.
Can someone please help me understand what this is doing:
alt_up_sd_card_dev *dev = (alt_up_sd_card_dev *) alt_find_dev(name, &alt_dev_list);
if (dev != NULL)
{
aux_status_register = ((short int *) SD_CARD_AUX_STATUS(dev->base));
}
I understand that the (short int *) is "type-casting" (as explained to me by some other helpful people on this forum) what SD_CARD_AUX_STATUS should be when the contents are called, but I've never seen the dev->base syntax before....
1.Here dev, is structure pointer. This pointer gets the memory from this line (alt_up_sd_card_dev *) alt_find_dev(name, &alt_dev_list);
2.The structure alt_up_sd_card_dev may have member called base.
3.SD_CARD_AUX_STATUS could be macro, which does some manipulation on dev->base pointer.
For more information, check the parameterised MACRO concepts in C
We can't give you a proper answer without knowing all the include files this references, but by general convention:
1) By standard C naming conventions, the all-uppercase SD_CARD_AUX_STATUS() is a macro rather than a function. The macro is set up by a #define either earlier in this file or in one of the #included .h files. Look for that definition to find out what it's actually doing.
2) -> is like . but for pointers-to-structures rather than structures. That is, if you have a struct { int foo, bar; } baz, then baz.foo is the same thing as (&baz)->foo Or, as Wikipedia puts it:
Structure dereference ("member b of object pointed to by a") a->b
Structure reference ("member b of object a") a.b
This is not related to C syntax in general. This piece of code is very specific.
I can only guess what it does.
alt_up_sd_card_dev *dev = (alt_up_sd_card_dev *) alt_find_dev(name, &alt_dev_list);
This calls a function alt_find_dev which probably looks for a device. The device is apparently an sd card reader... The result of the function is cast to a specific type of pointer. Probably the result is of generic pointer type and it is cast to a pointer to a structure that describes specifically an sd card device. It is then stored in the dev variable.
if (dev != NULL)
if the device is found....
aux_status_register = ((short int *) SD_CARD_AUX_STATUS(dev->base));
a macro SD_CARD_AUX_STATUS is called with a parameter dev->base (where base is a field in the structure describing the sd card device). The operator -> is called pointer dereference and it is simillar to . operator. It allows to access fields of a struct which is pointed by the pointer. The macro returns some kind of a status of the device. Hard to tell why it is cast to a pointer to short int, but the result is stored in a variable aux_status_register.
Without additional information it's impossible to tell anything more about the code.
dev is a pointer to a data structure in memory - called a struct in C. A struct has members - a set of variables within it. dev->base means access the member called base within the struct of type alt_up_sd_card_dev which dev is pointing to.
Look for the definition of struct alt_up_sd_card_dev which you will find in one of the header files included from the one you are looking at.
In general the -> operator is said to de-reference the pointer.
SD_CARD_AUX_STATUS is probably a macro - traditionally these are named all upper case. It might perform some kind of conversion or even call a function. Search for its definition in the headers.
Let's say I have this function, which is part of some gui toolkit:
typedef struct _My_Struct My_Struct;
/* struct ... */
void paint_handler( void* data )
{
if ( IS_MY_STRUCT(data) ) /* <-- can I do something like this? */
{
My_Struct* str = (My_Struct*) data;
}
}
/* in main() */
My_Struct s;
signal_connect( SIGNAL_PAINT, &paint_handler, (void*) &s ); /* sent s as a void* */
Since the paint_handler will also be called by the GUI toolkit's main loop with other arguments, I cannot always be sure that the parameter I am receiving will always be a pointer to s.
Can I do something like IS_MY_STRUCT in the paint_handler function to check that the parameter I am receiving can be safely cast back to My_Struct* ?
Your void pointer looses all its type information, so by that alone, you cannot check if it can be cast safely. It's up to the programmer to know if a void* can be cast safely to a type.
Unfortunately there is no function to check what the pointer was before it appears in that context (void).
The one solution I can think of is if you place an int _struct_id as the first member of all of your structs. This id member can then be safely checked regardless of the type but this will fail if you pass pointers that don't implement this member (or int, char, ... pointers).
The best you could do would be to look at what data points to to see if it has telltale signs of being what you want, although a) it wouldn't be anywhere close to a guarantee and b) might be dangerous, as you don't know how big the thing data actually points to is. I suppose it isn't any more dangerous than just casting it and using it, but (as has been suggested) a redesign would be better.
If you are creating the type that is being used, you could include as part of the type some kind of identifying information that would help you rule out some void pointers as not being of the type you are looking for. While you would run the chance that some random area of memory would contain the same data or signature as what you are looking for, at least you would know when something was not the type you were looking for.
This approach would require that the struct was initialized in such a way that the signature members, used to determine if the memory area is not valid, is initialized to the signature value.
An example:
typedef struct {
ULONG ulSignature1;
// .. data elements that you want to have
ULONG ulSignature2;
} MySignedStruct;
#define MYSIGNEDSTRUCT_01 0x1F2E3D4C
#define MYSIGNEDSTRUCT_02 0xF1E2D3C4
#define IS_MY_STRUCT(sAdr) ( (((MySignedStruct *)sAdr)->ulSignature1 == MYSIGNEDSTRUCT_01 ) && (((MySignedStruct *)sAdr)->ulSignature1 == MYSIGNEDSTRUCT_02))
This is kind of a rough approach however it can help. Naturally using a macro like IS_MY_STRUCT() where the argument is used twice can be problematic if the argument has a side effect so you would have to be careful of something like IS_MY_STRUCT(xStruct++) where xStruct is a pointer to a MySignedStruct.
There really isn't in c. void pointers are typeless, and should only ever be casted when you truly know what they point to.
Perhaps you should instead reconsider your design; rewrite your code so that no inspection is necessary. This is the same reason google disallows RTTI in its style guide.
I know the question is 3 years old but here I go,
How about using a simple global enum to distinguish where the function is called from. then you can switch between what type to cast the void pointer to.
This question already has answers here:
Closed 11 years ago.
The community reviewed whether to reopen this question 9 months ago and left it closed:
Original close reason(s) were not resolved
Possible Duplicate:
Why does this C code work?
How do you use offsetof() on a struct?
I read about this offsetof macro on the Internet, but it doesn't explain what it is used for.
#define offsetof(a,b) ((int)(&(((a*)(0))->b)))
What is it trying to do and what is the advantage of using it?
R.. is correct in his answer to the second part of your question: this code is not advised when using a modern C compiler.
But to answer the first part of your question, what this is actually doing is:
(
(int)( // 4.
&( ( // 3.
(a*)(0) // 1.
)->b ) // 2.
)
)
Working from the inside out, this is ...
Casting the value zero to the struct pointer type a*
Getting the struct field b of this (illegally placed) struct object
Getting the address of this b field
Casting the address to an int
Conceptually this is placing a struct object at memory address zero and then finding out at what the address of a particular field is. This could allow you to figure out the offsets in memory of each field in a struct so you could write your own serializers and deserializers to convert structs to and from byte arrays.
Of course if you would actually dereference a zero pointer your program would crash, but actually everything happens in the compiler and no actual zero pointer is dereferenced at runtime.
In most of the original systems that C ran on the size of an int was 32 bits and was the same as a pointer, so this actually worked.
It has no advantages and should not be used, since it invokes undefined behavior (and uses the wrong type - int instead of size_t).
The C standard defines an offsetof macro in stddef.h which actually works, for cases where you need the offset of an element in a structure, such as:
#include <stddef.h>
struct foo {
int a;
int b;
char *c;
};
struct struct_desc {
const char *name;
int type;
size_t off;
};
static const struct struct_desc foo_desc[] = {
{ "a", INT, offsetof(struct foo, a) },
{ "b", INT, offsetof(struct foo, b) },
{ "c", CHARPTR, offsetof(struct foo, c) },
};
which would let you programmatically fill the fields of a struct foo by name, e.g. when reading a JSON file.
It's finding the byte offset of a particular member of a struct. For example, if you had the following structure:
struct MyStruct
{
double d;
int i;
void *p;
};
Then you'd have offsetOf(MyStruct, d) == 0, offsetOf(MyStruct, i) == 8, and offsetOf(MyStruct, p) == 12 (that is, the member named d is 0 bytes from the start of the structure, etc.).
The way that it works is it pretends that an instance of your structure exists at address 0 (the ((a*)(0)) part), and then it takes the address of the intended structure member and casts it to an integer. Although dereferencing an object at address 0 would ordinarily be an error, it's ok to take the address because the address-of operator & and the member dereference -> cancel each other out.
It's typically used for generalized serialization frameworks. If you have code for converting between some kind of wire data (e.g. bytes in a file or from the network) and in-memory data structures, it's often convenient to create a mapping from member name to member offset, so that you can serialize or deserialize values in a generic manner.
The implementation of the offsetof macro is really irrelevant.
The actual C standard defines it as in 7.17.3:
offsetof(type, member-designator)
which expands to an integer constant expression that has type size_t, the value of which is the offset in bytes, to the structure member (designated by member-designator), from the beginning of its structure (designated by type). The type and member designator shall be such that given static type t;.
Trust Adam Rosenfield's answer.
R is completely wrong, and it has many uses - especially being able to tell when code is non-portable among platforms.
(OK, it's C++, but we use it in static template compile time assertions to make sure our data structures do not change size between platforms/versions.)
So, recently I had the unfortunate need to make a C extension for Ruby (because of performance). Since I was having problems with understanding VALUE (and still do), so I looked into the Ruby source and found: typedef unsigned long VALUE; (Link to Source, but you will notice that there are a few other 'ways' it's done, but I think it's essentially a long; correct me if I'm wrong). So, while investigating this further I found an interesting blog post, which says:
"...in some cases the VALUE object could BE the data instead of POINTING TO the data."
What confuses me is that, when I attempt to pass a string to C from Ruby, and use RSTRING_PTR(); on the VALUE (passed to the C-function from Ruby), and try to 'debug' it with strlen(); it returns 4. Always 4.
example code:
VALUE test(VALUE inp) {
unsigned char* c = RSTRING_PTR(inp);
//return rb_str_new2(c); //this returns some random gibberish
return INT2FIX(strlen(c));
}
This example returns always 1 as the string length:
VALUE test(VALUE inp) {
unsigned char* c = (unsigned char*) inp;
//return rb_str_new2(c); // Always "\x03" in Ruby.
return INT2FIX(strlen(c));
}
Sometimes in ruby I see an Exception saying "Can't convert Module to String" (or something along those lines, however I was messing with the code so much trying to figure this out that I am unable to reproduce the error now the error would happen when I tried StringValuePtr(); [I'm a bit unclear what this exactly does. Documentation says it changes the passed paramater to char*] on inp):
VALUE test(VALUE inp) {
StringValuePtr(inp);
return rb_str_new2((char*)inp); //Without the cast, I would get compiler warnings
}
So, the Ruby code in question is: MyMod::test("blahblablah")
EDIT: Fixed a few typos and updated the post a little.
The questions
What exactly does VALUE imp hold? A pointer to the object/value?
The value itself?
If it holds the value itself: when does it do that, and is there a way to check for it?
How do I actually access the value (since I seem to accessing almost everything but
the value)?
P.S: My understanding of C isn't really the best, but it's a work in progress; also, read the comments in the code snippets for some additional description (if it helps).
Thanks!
Ruby Strings vs. C strings
Let's start with strings first. First of all, before trying to retrieve a string in C, it is good habit to call StringValue(obj) on your VALUE first. This ensures that you will really deal with a Ruby string in the end because if it is not already a string, then it will turn it into one by coercing it with a call to that object's to_str method. So this makes things safer and prevents the occasional segfault you might get otherwise.
The next thing to watch out for is that Ruby strings are not \0-terminated as your C code would expect them to make things like strlen etc. work as expected. Ruby's strings carry their length information with them instead - that's why in addition to RSTRING_PTR(str) there is also the RSTRING_LEN(str) macro to determine the actual length.
So what StringValuePtr now does is returning the non-zero-terminated char * to you - this is great for buffers where you have a separate length, but not what you want for e.g. strlen. Use StringValueCStr instead, it will modify the string to be zero-terminated so that it is safe for usage with functions in C that expect it to be zero-terminated. But, try to avoid this wherever possible, because this modification is much less performant than retrieving the non-zero-terminated string that does not have to be modified at all. It's surprising if you keep an eye on this how rarely you will actually need "real" C strings.
self as an implicit VALUE argument
Another reason why your current code doesn't work as expected is that every C function to be called by Ruby gets passed self as an implicit VALUE.
No arguments in Ruby ( e.g. obj.doit ) translates to
VALUE doit(VALUE self)
Fixed amount of arguments (>0, e.g. obj.doit(a, b)) translates to
VALUE doit(VALUE self, VALUE a, VALUE b)
Var args in Ruby ( e.g. obj.doit(a, b=nil)) translates to
VALUE doit(int argc, VALUE *argv, VALUE self)
in Ruby. So what you were working on in your example is not the string passed to you by Ruby but actually the current value of self, that is the object that was the receiver when you called that function. A correct definition for your example would be
static VALUE test(VALUE self, VALUE input)
I made it static to point out another rule that you should follow in your C extensions. Make your C functions only public if you intend to share them among several source files. Since that's almost never the case for function that you attach to a Ruby class, you should declare them as static by default and only make them public if there is a good reason to do so.
What is VALUE and where does it come from?
Now to the harder part. If you dig down deeply into Ruby internals, then you will find the function rb_objnew in gc.c. Here you can see that any newly created Ruby object becomes a VALUEby being cast as one from something called the freelist. It's defined as:
#define freelist objspace->heap.freelist
You can imagine the objspace as a huge map that stores each and every object that is currently alive at a given point in time in your code. This is also where the garbage collector fulfills his duty and the heap struct in particular is the place where new objects are born. The "freelist" of the heap is again declared as being an RVALUE *. This is the C-internal representation of the Ruby built-in types. An RVALUE is actually defined as follows:
typedef struct RVALUE {
union {
struct {
VALUE flags; /* always 0 for freed obj */
struct RVALUE *next;
} free;
struct RBasic basic;
struct RObject object;
struct RClass klass;
struct RFloat flonum;
struct RString string;
struct RArray array;
struct RRegexp regexp;
struct RHash hash;
struct RData data;
struct RTypedData typeddata;
struct RStruct rstruct;
struct RBignum bignum;
struct RFile file;
struct RNode node;
struct RMatch match;
struct RRational rational;
struct RComplex complex;
} as;
#ifdef GC_DEBUG
const char *file;
int line;
#endif
} RVALUE;
That is, basically a union of core data types that Ruby knows about. Missing something? Yes, Fixnums, Symbols, nil and boolean values are not included there. It's because these kinds of objects are directly represented using the unsigned long that a VALUE boils down to in the end. I think the design decision there was (besides being a cool idea) that dereferencing a pointer might be slightly less performant than the bit shifts that are currently needed when transforming the VALUE to what it actually represents. Essentially
obj = (VALUE)freelist;
says give me whatever freelist points to currently and treat is as unsigned long. This is safe because freelist is a pointer to an RVALUE - and a pointer can also be safely interpreted as unsigned long. This implies that every VALUE except those carrying Fixnums, symbols, nil or Booleans are essentially pointers to an RVALUE, the others are directly represented within the VALUE.
Your last question, how can you check for what a VALUE stands for? You can use the TYPE(x) macro to check whether a VALUE's type would be one of the "primitive" ones.
VALUE test(VALUE inp)
The first issue is here: inp is self (so, in your case, the module). If you want to refer to the first argument, you need to add a self argument before that (which makes me to add -Wno-unused-parameters to my cflags, as it is never used in the case of module functions):
VALUE test(VALUE self, VALUE inp)
Your first example uses a module as a string, which certainly won't result into anything good. RSTRING_PTR lacks type checks, which is a good reason not to use it.
A VALUE is a reference to the Ruby object, but not directly a pointer to what it may contain (like a char* in the case of a string). You need to get that pointer using some macros or functions depending on each object. For a string, you want StringValuePtr (or StringValueCStr to ensure that the string is null-terminated) which returns the pointer (it doesn't change the content of your VALUE in any way).
strlen(StringValuePtr(thing));
RSTRING_LEN(thing); /* I assume strlen was just an example ;) */
The actual content of the VALUE is, in MRI and YARV at least, the object_id of the object (or at least, it is after a bitshift).
For your own objects, the VALUE will most likely contain a pointer to a C object which you can get using Data_Get_Struct:
my_type *thing = NULL;
Data_Get_Struct(rb_thing, my_type, thing);